From toastkid.williams at gmail.com Thu Apr 9 07:45:37 2009 From: toastkid.williams at gmail.com (Max Williams) Date: Thu, 9 Apr 2009 13:45:37 +0200 Subject: [Ferret-talk] Weird analyzer issue with the word 'fly' Message-ID: Hi all I'm using a_a_f in rails with a StemmingAnalyzer, in the index and in my search. I got the idea from this topic: http://www.ruby-forum.com/topic/80178 I'm having a problem with some search terms - i narrowed one of them down to the inclusion of the word 'fly'. Can anyone give me any clues at to what might be happening, or even how i can investigate? My index is set up like this: acts_as_ferret({ :store_class_name => true, :analyzer => Ferret::Analysis::StemmingAnalyzer.new, :fields => {:name => { :boost => 2.0 }, ... }}) And this analyzer is defined in a module thus: module Ferret::Analysis class StemmingAnalyzer def token_stream(field, text) StemFilter.new(StandardTokenizer.new(text)) end end end Now, here's a search without using the analyzer: >> TeachingObject.find_with_ferret("flea fly", :per_page => 2000).size => 14 And with the analyzer: >> TeachingObject.find_with_ferret("flea fly", :per_page => 2000, :analyzer => Ferret::Analysis::StemmingAnalyzer.new).size => 0 Now, for other searches, the analyzer seems to be doing it's job nicely. EG i have lots of resources with the word 'brass'. With the analyzer, a search for 'brasses' brings all these resources back, while without the analyzer i don't get any of them: that's all fine, it's working out that 'brasses' and 'brass' are equivalent searches. So what's going on with the word 'fly'? It's definitely this word because if i change one of the "flea fly" resources to be called "flea walk" then a search for 'flea walk' brings it back, as does a search for 'flea walks'. I'm guessing that the analyzer takes a word and converts it into other terms, or some symbols or something, and searches with that combined set, and during this process the orginal word 'fly' gets lost somewhere. But, i don't know where to look to monitor this process. Any help/advice/clues very welcome... thanks max -- Posted via http://www.ruby-forum.com/. From toastkid.williams at gmail.com Thu Apr 9 08:13:28 2009 From: toastkid.williams at gmail.com (Max Williams) Date: Thu, 9 Apr 2009 14:13:28 +0200 Subject: [Ferret-talk] Weird analyzer issue with the word 'fly' In-Reply-To: References: Message-ID: Just a bit more info - i started to look at what's going on in the analyzer by putting a bit of logging in: module Ferret::Analysis class StemmingAnalyzer def token_stream(field, text) RAILS_DEFAULT_LOGGER.debug "SEARCHING, field = #{field}, text = #{text}" StemFilter.new(StandardTokenizer.new(text)) end end end And, i see these results for a single search on "flea fly": SEARCHING, field = property_ancestor_names, text = flea SEARCHING, field = description, text = flea SEARCHING, field = name, text = flea SEARCHING, field = keyword_string, text = flea SEARCHING, field = property_ids_string, text = flea SEARCHING, field = property_names, text = flea SEARCHING, field = unaccented_name, text = flea SEARCHING, field = property_titles, text = flea SEARCHING, field = resource_id, text = flea One call to token_stream for each of my indexed methods, but with each only using the first word of the search! Now i'm even more confused... -- Posted via http://www.ruby-forum.com/. From jk at jkraemer.net Thu Apr 9 08:40:21 2009 From: jk at jkraemer.net (Jens Kraemer) Date: Thu, 9 Apr 2009 14:40:21 +0200 Subject: [Ferret-talk] Weird analyzer issue with the word 'fly' In-Reply-To: References: Message-ID: <355732A6-75A4-4889-A3F3-C7BE37FE05D8@jkraemer.net> Hi Max! On 09.04.2009, at 13:45, Max Williams wrote: > > I'm having a problem with some search terms - i narrowed one of them > down to the inclusion of the word 'fly'. Can anyone give me any clues > at to what might be happening, or even how i can investigate? First of all I'd have a look at what the analyzer does to your query terms: ts = StemmingAnalyzer.new.token_stream nil, 'flea fly' while token = ts.next puts token end For some reason the word 'fly' is turned into 'fli' by the analyzer. But that's ok, as long as it works the same way at indexing time. Next use the ferret_browser tool to inspect your index and check whether the term 'fli' really appears in your index. I doubt that, because if this was the case everything would work as expected. So I guess we have a problem with the analysis at indexing time. > My index is set up like this: > > acts_as_ferret({ :store_class_name => true, > :analyzer => Ferret::Analysis::StemmingAnalyzer.new, > :fields => {:name => { :boost => 2.0 }, > ... > }}) now that I look at this the second time the problem seems quite obvious :-) The analyzer option needs to be given as part of a separate ferret options hash like this: acts_as_ferret :store_class_name => true, :ferret => { :analyzer => Ferret::Analysis::StemmingAnalyzer.new }, :fields => { ... } rebuild your index and everything should be working as expected. Cheers, Jens -- Jens Kr?mer Finkenlust 14, 06449 Aschersleben, Germany VAT Id DE251962952 http://www.jkraemer.net/ - Blog http://www.omdb.org/ - The new free film database -------------- next part -------------- A non-text attachment was scrubbed... Name: PGP.sig Type: application/pgp-signature Size: 194 bytes Desc: This is a digitally signed message part URL: From toastkid.williams at gmail.com Thu Apr 9 10:34:31 2009 From: toastkid.williams at gmail.com (Max Williams) Date: Thu, 9 Apr 2009 16:34:31 +0200 Subject: [Ferret-talk] ferret finds 'tests' but not 'test' In-Reply-To: References: <94cbc17ff76e8950daeea9a13b10afd6@ruby-forum.com> <490ff92ace22fc678e620105f75bc5b3@ruby-forum.com> <6EABC590-396E-4CB6-A289-56E7D4CB970B@gmx.net> <99502401-110C-40D0-8B23-918A040EA6E3@gmx.net> Message-ID: This is just postscript correction for this thread, in case anyone else browses to it (like i did) and gets sent down the slightly wrong track. If you're going to include the :analyzer option in your call to acts_as_ferret, then it needs to live inside another option hash called :ferret. EG, some of the examples above say to do this: acts_as_ferret :fields=> ['short_description'], :analyzer => Ferret::Analysis::MyAnalyzer.new This won't work - it needs to be like this: acts_as_ferret :fields=> ['short_description'], :ferret => {:analyzer => Ferret::Analysis::MyAnalyzer.new} Thanks to Jens for setting me straight on this :) -- Posted via http://www.ruby-forum.com/. From toastkid.williams at gmail.com Thu Apr 9 10:29:18 2009 From: toastkid.williams at gmail.com (Max Williams) Date: Thu, 9 Apr 2009 15:29:18 +0100 Subject: [Ferret-talk] Weird analyzer issue with the word 'fly' In-Reply-To: <355732A6-75A4-4889-A3F3-C7BE37FE05D8@jkraemer.net> References: <355732A6-75A4-4889-A3F3-C7BE37FE05D8@jkraemer.net> Message-ID: 2009/4/9 Jens Kraemer > Hi Max! Hi Jens, thanks for responding so quickly. > > > > For some reason the word 'fly' is turned into 'fli' by the analyzer. Indeed it is: >> ts = Ferret::Analysis::StemmingAnalyzer.new.token_stream nil, 'flea fly' => # >> while token = ts.next >> puts token >> end token["flea":0:4:1] token["fli":5:8:1] > But that's ok, as long as it works the same way at indexing time. Next use > the ferret_browser tool to inspect your index and check whether the term > 'fli' really appears in your index I've not seen this tool before, it sounds useful - would you mind pointing me at some docs for it? I can find the class in the ferret rdoc but there's no explanation for it as far as i can see. > acts_as_ferret :store_class_name => true, > :ferret => { :analyzer => > Ferret::Analysis::StemmingAnalyzer.new }, > :fields => { ... } > > rebuild your index and everything should be working as expected. It is indeed! Thanks very much Jens, i really appreciate the support. Hope you have a great easter weekend! cheers max -------------- next part -------------- An HTML attachment was scrubbed... URL: From jk at jkraemer.net Thu Apr 9 15:20:23 2009 From: jk at jkraemer.net (Jens Kraemer) Date: Thu, 9 Apr 2009 21:20:23 +0200 Subject: [Ferret-talk] Weird analyzer issue with the word 'fly' In-Reply-To: References: <355732A6-75A4-4889-A3F3-C7BE37FE05D8@jkraemer.net> Message-ID: <406478FA-4636-4B4A-A9A5-0D711904210E@jkraemer.net> Hi! On 09.04.2009, at 16:29, Max Williams wrote: [..] > > I've not seen this tool before, it sounds useful - would you mind > pointing me at some docs for it? I can find the class in the > ferret rdoc but there's no explanation for it as far as i can see. ferret_browser is a standalone web application that gets installed along with ferret. Just run it with ferret_browser path/to/index and point your browser to the url shown in the output. should be pretty self explaining then. > > acts_as_ferret :store_class_name => true, > :ferret => { :analyzer => > Ferret::Analysis::StemmingAnalyzer.new }, > :fields => { ... } > > rebuild your index and everything should be working as expected. > > It is indeed! Thanks very much Jens, i really appreciate the > support. > > Hope you have a great easter weekend! Thank you, and the same to you! Cheers, Jens -- Jens Kr?mer Finkenlust 14, 06449 Aschersleben, Germany VAT Id DE251962952 http://www.jkraemer.net/ - Blog http://www.omdb.org/ - The new free film database -------------- next part -------------- A non-text attachment was scrubbed... Name: PGP.sig Type: application/pgp-signature Size: 194 bytes Desc: This is a digitally signed message part URL: