[Blacklight-development] solr schema; facetting vs searching
Erik Hatcher
erikhatcher at mac.com
Thu Mar 13 21:24:27 EDT 2008
On Mar 13, 2008, at 11:31 AM, Jonathan Rochkind wrote:
> I expected luke would give me a way to look through all the indexes
> and tell me the characterstics/settings of each one, but note quite.
>
> It does let me look at each document, and how it's indexed. I
> expected the index called "text" would have all searchable words
> for each document? But when I look at a document in luke, 'text' is
> empty for all of them. "<not present or not stored>".
Just a terminology adjustment here. In Lucene parlance, "text" is
considered _field_. An index is the entire directory created by
Lucene (Solr's data/index directory).
Empty is not quite true. The text field is indexed, but not stored.
This is because it is a purely aggregate field of all other stored
fields. In Luke's document tab view, click the "Reconstruct & Edit"
button, then look at the text field and the inner "Tokenized" tab to
see the terms that were indexed for that particular field.
Or, from Solr, hit http://localhost:8983/solr/admin/luke?fl=text to
get stats on that field and top terms, etc.
> In general, what I want to figure out next is where those "_text"
> fields are coming from.
look at scripts/virgo_marc_map.rb, search for _text in there. That's
where the fields come from.
> I could probably hack the indexer and/or the solr schema so that
> every single *_facet field automatically got a corresponding
> searchable *_index field, yes?
To be clear, the *_facet fields are _searchable_ they just aren't
tokenized. So you can search for format_facet:"Digital Media" as an
exact term match. But you wouldn't find it by searching for just
"digital" or "media" (on that exact field). *_facet fields are,
however, copied into "text", which is of course tokenized and
searchable on individual words.
But you could <copyField source="*_facet" dest="*_text"/> if you like
to have facet fields also individually searchable by words using the
*_text version.
> Any hints as to where I'd do this? (Not sure if I'd do it in the
> indexer code, the solr schema file, or both).
Either would work, but Solr's <copyField> handles this scenario.
Erik
More information about the Blacklight-development
mailing list