[Blacklight-development] where to get catalog records to index

Jonathan Rochkind rochkind at jhu.edu
Fri May 9 12:07:21 EDT 2008


That's a great plan.

For an actual getting started set, I'd suggest that someone try to make 
a ~10k record set, 19M is still way too big for just getting started and 
testing and tuning your indexing, takes too long to re-index.

Ideally that 10k sample set would have musical records with data in MARC 
that could be taken advantage of by the Blacklight music library 
interface, so that could be tested/tuned too, since some of us are 
interested in that specifically.

And actually, there's another big problem with that as a sample set. It 
doesn't include any item-level data. Holdings data. We need a sample set 
with that too. I suppose we could add "fake" holdings data in the same 
data fields that UVa uses _to_ an LC sample set, as a sample. But we 
need it, to put that stuff through it's paces too.

So now I've made this more complicated then you hoped. :) But I do think 
we need all those things to have a useful sample set.

Jonathan

Bess Sadler wrote:
> Since the legal status of our UVA catalog records is somewhat 
> questionable, we need to come up with a test set that we can 
> distribute for indexing. I propose we use the Library of Congress set 
> that the incomparable Casey Bisson purchased and provided to the 
> community for precisely this purpose.
>
> They are downloadable here:
>
> http://www.archive.org/details/marc_records_scriblio_net
>
> I suggest we start running indexing tests with these. And there's even 
> one (part29.dat) that's only 19M... perfect for getting started with.
>
> Bess
>
>
> Elizabeth (Bess) Sadler
> Research and Development Librarian
> Digital Scholarship Services
> Box 400129
> Alderman Library
> University of Virginia
> Charlottesville, VA 22904
>
> bess at virginia.edu
> (434) 243-2305
>
>
> _______________________________________________
> Blacklight-development mailing list
> Blacklight-development at rubyforge.org
> http://rubyforge.org/mailman/listinfo/blacklight-development

-- 
Jonathan Rochkind
Digital Services Software Engineer
The Sheridan Libraries
Johns Hopkins University
410.516.8886 
rochkind (at) jhu.edu



More information about the Blacklight-development mailing list