[Brug-talk] Ferret - sphinx - Solr
Peter Vandenabeele
peter at vandenabeele.com
Wed Jan 23 19:28:25 EST 2008
On Jan 23, 2008 11:19 PM, Peter Vandenabeele <peter at vandenabeele.com> wrote:
> 2008/1/21 Peter De Berdt (10-forward) <peter at 10-forward.be>:
> > Sadly, sphinx is out of the question, it doesn't support wildcard searches
> > as our customers are used to apparently, searches for "pet*" don't return
> > "Peter" records for example. I may have missed something, but this is a
> > feature we can't live without.
Test set of 6500 records with 3 fields of approx. 800 bytes/records (4.8 MBytes)
(using a few bash and linux manuals as source text).
Indexing without prefix takes 0.4 seconds
Indexing with a prefix of e.g. 5 characters takes 1.0 seconds
(the prefix is used to index "partial" words efficiently)
No stemming is used (so only "exact" matches will work).
I am still searching for a stemmer for Dutch language.
Search results with and without stars.
sphinx-0.9.8/test$ /usr/local/bin/search "scripting" | head -9
Sphinx 0.9.8-dev (r1065)
Copyright (c) 2001-2008, Andrew Aksyonoff
using config file './sphinx.conf'...
index 'el_index1': query 'scripting ': returned 82 matches of 82 total
in 0.000 sec
displaying matches:
1. document=2, weight=2
id=2
sphinx/sphinx-0.9.8/test$ /usr/local/bin/search "scriptin" | head -9
Sphinx 0.9.8-dev (r1065)
Copyright (c) 2001-2008, Andrew Aksyonoff
using config file './sphinx.conf'...
index 'el_index1': query 'scriptin ': returned 0 matches of 0 total in 0.000 sec
words:
1. 'scriptin': 0 documents, 0 hits
==> zero matches since the last letter is removed
peterv at debian-new:~/data/biz/allejobsinleuven/development/sphinx/sphinx-0.9.8/test$
/usr/local/bin/search "scriptin*" | head -9
Sphinx 0.9.8-dev (r1065)
Copyright (c) 2001-2008, Andrew Aksyonoff
using config file './sphinx.conf'...
index 'el_index1': query 'scriptin* ': returned 82 matches of 82 total
in 0.000 sec
displaying matches:
1. document=2, weight=2
id=2
==> the exact same matches are found back with the star
peterv at debian-new:~/data/biz/allejobsinleuven/development/sphinx/sphinx-0.9.8/test$
/usr/local/bin/search "scripti*" | head -9
Sphinx 0.9.8-dev (r1065)
Copyright (c) 2001-2008, Andrew Aksyonoff
using config file './sphinx.conf'...
index 'el_index1': query 'scripti* ': returned 82 matches of 82 total
in 0.000 sec
displaying matches:
1. document=2, weight=2
id=2
peterv at debian-new:~/data/biz/allejobsinleuven/development/sphinx/sphinx-0.9.8/test$
/usr/local/bin/search "script*" | head -9
Sphinx 0.9.8-dev (r1065)
Copyright (c) 2001-2008, Andrew Aksyonoff
using config file './sphinx.conf'...
index 'el_index1': query 'script* ': returned 1000 matches of 1123
total in 0.000 sec
displaying matches:
1. document=49, weight=3
id=49
peterv at debian-new:~/data/biz/allejobsinleuven/development/sphinx/sphinx-0.9.8/test$
/usr/local/bin/search "scri*" | head -9
Sphinx 0.9.8-dev (r1065)
Copyright (c) 2001-2008, Andrew Aksyonoff
using config file './sphinx.conf'...
index 'el_index1': query 'scri* ': returned 1000 matches of 1123 total
in 0.000 sec
displaying matches:
1. document=49, weight=3
id=49
peterv at debian-new:~/data/biz/allejobsinleuven/development/sphinx/sphinx-0.9.8/test$
/usr/local/bin/search "scr*" | head -9
Sphinx 0.9.8-dev (r1065)
Copyright (c) 2001-2008, Andrew Aksyonoff
using config file './sphinx.conf'...
index 'el_index1': query 'scr* ': returned 0 matches of 0 total in 0.000 sec
words:
1. 'scr*': 0 documents, 0 hits
==> if the amount of letters gets too small, no matches.
HTH,
Peter
More information about the Brug-talk
mailing list