[Blacklight-development] more little importer tricks - a fix?

Naomi Dushay ndushay at stanford.edu
Fri May 2 14:11:16 EDT 2008


The java marc importer uses a different version of lucene than  
the .tar.gz of blacklight, and the index format is incompatible.

unpacking the solr.war, the lucene jar files are all named lucene- 
blah-2007-05-20

the lucene jars in the java importer are lucene version 2.3.1, which  
seem to date from 2008-02-22

I swapped out the old lucene jar files for the new ones and made a new  
solr.war.  Putting my new solr.war into jetty, then firing up solr ...  
makes everything happy!

So, to summarize:

1.  java importer:  index.sh line breaks are not in (lin)ux format.    
(thanks Jonathan!)
2.  java importer:  sample file name in GettingStarted.txt  has a typo  
(thanks Jonathan!)
3.  java importer:  sample data has a record with multiple 020  
subfield a values.  To get the data to index cleanly, you much change  
a line in blacklight.properties file:

from

field_list_25a = isbn_display, all, 020a
to
field_list_25a = isbn_display, first, 020a

4. blacklight itself:  solr.war in the jetty/webapps directory has an  
older version of lucene jars which is incompatible with the lucene  
jars in the marc importer.  The jars need to be the same.  I changed  
the jars in the solr.war by unpacking the war, substituting the new  
lucene jar files, then repacking the war and putting the new solr.war  
in jetty/webapps.  It might also work to just put the lucene jars from  
the solr.war into the java importer lib, replacing the ones that are  
there.

Jonathan -- would you like to test these fixes on your system?

hope this helps!
- Naomi


On May 2, 2008, at 9:20 AM, Naomi Dushay wrote:

> Hi Matt,
>
> I got my blacklight code from the .tar.gz download file, not  
> directly from svn.   Do I need to pull some updates from the trunk?
>
> - Naomi
>
>
> On May 1, 2008, at 6:04 PM, Matt Mitchell wrote:
>> Hi Naomi,
>>
>> Thanks for the detailed report! The _display fields should be  
>> multiValued and the current schema is here:
>>
>> http://blacklight.rubyforge.org/svn/trunk/rails/solr/conf/schema.xml
>>
>> Are you using the trunk version or the last release?
>>
>> Matt
>>
>> On Thu, May 1, 2008 at 8:50 PM, Naomi Dushay <ndushay at stanford.edu>  
>> wrote:
>> I tried the importer just now (just pulled it from svn, too.) and  
>> hit a few bumps also.  I concur with the index.sh problems ... I  
>> just ended up executing the java command directly from the command  
>> line.
>>
>> I believe the sample data has a record with two 020 subfield a  
>> values.  From the output of the importer on the sample file:
>>
>> Adding record 8: u89
>> Error indexing
>> org.apache.solr.common.SolrException: ERROR: multiple values  
>> encountered for non multiValued field isbn_display:  
>> first='0877663637' second='0877663343 (pbk.)'
>> 	at  
>> org 
>> .apache 
>> .solr.update.DocumentBuilder.addSingleField(DocumentBuilder.java:67)
>> 	at  
>> org 
>> .apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java:88)
>> 	at  
>> org 
>> .apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java: 
>> 118)
>> 	at  
>> org 
>> .apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java: 
>> 101)
>> 	at SolrIndexer.addField(Unknown Source)
>> 	at SolrIndexer.addFields(Unknown Source)
>> 	at SolrIndexer.indexRecord(Unknown Source)
>> 	at MarcImporter.addToIndex(Unknown Source)
>> 	at MarcImporter.importRecords(Unknown Source)
>> 	at MarcImporter.main(Unknown Source)
>> Adding record 9: u144
>>
>> The SOLR schema.xml file in  blacklight/solr-home/conf directory   
>> says that all *_display fields are NOT multiValued.
>>
>> To get the sample data to index without an error, it's just a  
>> matter of changing one line in the blacklight.properties file:
>>
>> from:
>> field_list_25a = isbn_display, all, 020a
>>
>> to
>> field_list_25a = isbn_display, first, 020a
>>
>> HOWEVER, I am still getting a
>>
>> java.lang.RuntimeException:  
>> org.apache.lucene.index.CorruptIndexException: Unknown format  
>> version: -4
>>
>> when trying to view the index with solr  (http://yer.path:8983/solr/admin 
>>  ).
>>
>> But I can look at the same index with luke without a problem.
>>
>>
>> - Naomi Dushay
>> Stanford University Libraries
>> ndushay at stanford.edu
>>
>>
>>
>>
>> _______________________________________________
>> Blacklight-development mailing list
>> Blacklight-development at rubyforge.org
>> http://rubyforge.org/mailman/listinfo/blacklight-development
>>
>>
>> _______________________________________________
>> Blacklight-development mailing list
>> Blacklight-development at rubyforge.org
>> http://rubyforge.org/mailman/listinfo/blacklight-development
>
> Naomi Dushay
> ndushay at stanford.edu
>
>
>
> _______________________________________________
> Blacklight-development mailing list
> Blacklight-development at rubyforge.org
> http://rubyforge.org/mailman/listinfo/blacklight-development

Naomi Dushay
ndushay at stanford.edu



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://rubyforge.org/pipermail/blacklight-development/attachments/20080502/c62b3dcc/attachment.html>


More information about the Blacklight-development mailing list