[Blacklight-development] solr schema; facetting vs searching

Erik Hatcher erikhatcher at mac.com
Thu Mar 13 21:24:27 EDT 2008


On Mar 13, 2008, at 11:31 AM, Jonathan Rochkind wrote:
> I expected luke would give me a way to look through all the indexes  
> and tell me the characterstics/settings of each one, but note quite.
>
> It does let me look at each document, and how it's indexed. I  
> expected the index called "text" would have all searchable words  
> for each document? But when I look at a document in luke, 'text' is  
> empty for all of them. "<not present or not stored>".

Just a terminology adjustment here.  In Lucene parlance, "text" is  
considered _field_.  An index is the entire directory created by  
Lucene (Solr's data/index directory).

Empty is not quite true.  The text field is indexed, but not stored.   
This is because it is a purely aggregate field of all other stored  
fields.  In Luke's document tab view, click the "Reconstruct & Edit"  
button, then look at the text field and the inner "Tokenized" tab to  
see the terms that were indexed for that particular field.

Or, from Solr, hit http://localhost:8983/solr/admin/luke?fl=text to  
get stats on that field and top terms, etc.

> In general, what I want to figure out next is where those "_text"  
> fields are coming from.

look at scripts/virgo_marc_map.rb, search for _text in there.  That's  
where the fields come from.

> I could probably hack the indexer and/or the solr schema so that  
> every single *_facet field automatically got a corresponding  
> searchable *_index field, yes?

To be clear, the *_facet fields are _searchable_ they just aren't  
tokenized.  So you can search for format_facet:"Digital Media" as an  
exact term match.  But you wouldn't find it by searching for just  
"digital" or "media" (on that exact field).  *_facet fields are,  
however, copied into "text", which is of course tokenized and  
searchable on individual words.

But you could <copyField source="*_facet" dest="*_text"/> if you like  
to have facet fields also individually searchable by words using the  
*_text version.

> Any hints as to where I'd do this? (Not sure if I'd do it in the  
> indexer code, the solr schema file, or both).

Either would work, but Solr's <copyField> handles this scenario.

	Erik



More information about the Blacklight-development mailing list