[Blacklight-development] [NGC4LIB] Whose elephant is it, anyway? (the OLE project)
rochkind at jhu.edu
Fri Mar 13 10:34:14 EDT 2009
The work Mark has done in Australia to implement a 'browse' list based
on solr with vufind seems potentially useful to blacklight. Especially
since he says he did most of it in a custom Solr handler.
Mark Triggs wrote:
> [Apologies if this comes through twice. I sent this about 14 hours ago and
> haven't seen it arrive yet, so...]
> Hi all,
> I've been watching this discussion with some interest because I'm the
> guy who implemented the browse functionality in the NLA's catalogue. I
> just thought I'd jump in and confirm/deny a few things here.
> Our current title and uniform title browses were among the first browses
> we attempted to implement, so they're currently a bit of a legacy
> feature. We implemented these using a combination of Solr range queries
> and sorting and they mostly sort of work, but perhaps not quite as
> smoothly as the other browses (as evidenced by the 'internal server
> error' that Owen managed to produce ;o). I'm on holidays at the moment,
> but this will probably be revisited when I'm back at work.
> Our other browses (names, subjects, callnumbers and series) make use of
> a combination of SQLite databases and Lucene indexes. Each browse
> consists of an SQLite database with a single table of two columns: a
> sort key and the text of the browse heading. When we receive a request
> to browse from a certain point we can get back the pageful of headings
> to display by using a simple SQL SELECT statement. For each heading
> listed we determine the number of titles matched and any
> cross-references by performing Lucene term queries (fast) on indexes of
> our bib data and authority data respectively. All of this is handled by
> a Solr browse handler I've written, so all our VuFind code needs to know
> is to hit the browse handler and style the XML it gets back.
> Regarding scalability, our largest browse is the callnumber browse,
> which consists of about 3 million entries (for 4 million bib records).
> I've tested this SQLite approach up to 20 million entries and it
> continued to perform well, so I'm not terribly worried for now. Finding
> the point to browse from is effectively just searching a big sorted text
> file, so I would expect O(log N) growth here anyway. Plus, our largest
> SQLite database still fits entirely in memory, so that's nice too.
> In terms of indexing performance, the SQLite databases take about 5-10
> minutes to build in total, and they're built from scratch every time.
> For each type of browse we pull all the browse headings from our bib and
> authority data, remove any duplicates then load them all into an SQLite
> database. My browse handler notices when these databases have been
> updated and automatically reopens them, so the update is transparent.
> Currently we just do these updates once per night, as this is how often
> we update our main bib indexes and it makes sense to keep the updates
> synchronised, but I don't see any problem with doing this more often if
> it made sense.
> I'm happy to answer any questions about our implementation either on or
> off list.
> "Stephens, Owen" <o.stephens at IMPERIAL.AC.UK> writes:
>> Just to understand what you are looking for in terms of Browse. The
>> NLA implementation of VuFind has what I would regard as a Browse
>> function - you can Browse the following:
>> Names at http://catalogue.nla.gov.au/Browse/Names?browse=names&from=
>> Subjects at
>> Callnumbers at
>> Series at
>> All these options are available in the user interface at
>> http://catalogue.nla.gov.au/Browse/Home ('Browse' is an option in the
>> horizontal menu under the main 'catalogue' banner)
>> This page also offers Title and Uniform Title browsing, but these seem
>> not to work in the same way at the moment (I've sent feedback about
>> Is this browsing as you mean it? If not, what would you require
>> (also you question the scalability - what scale are you thinking of?
>> I'd guess that NLA is reasonably large - but I can't easily find a
>> figure for the number of bib records - but obviously it may not be as
>> large as other national libraries or consortium collections)
More information about the Blacklight-development