From dbalmain.ml at gmail.com Sat Jul 1 03:49:25 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Sat, 1 Jul 2006 16:49:25 +0900 Subject: [Ferret-talk] Substantial problems with write locking (and other flux) In-Reply-To: <2057f6a26dd78ef9355774349d1cf8af@ruby-forum.com> References: <8603da99552206db2a0720a2261abf55@ruby-forum.com> <2057f6a26dd78ef9355774349d1cf8af@ruby-forum.com> Message-ID: On 7/1/06, Julik wrote: > Julik wrote: > > > No, it throws within a unit test which runs inside a single process. I > > ain't even got to any real concurrency yet :-) > > And to be fair - the problem seems to be removed altogether by using > auto_flush. So now the get_field_names thing is the only one and then I > can hack further and switch an app to Ferret. I've released the Gem. I'm not sure how much has changed since the last release as it's been so long since I did any work on that branch, but it does have the get_field_names method and it raises a StandardError instead of Exception. But if I were you, I'd wait around for the next release of Ferret and possibly even join in the discussions on the API. There is going to be 0.10.x string of releases which will break backwards compatibility with 0.9.x and earlier. The goal is to get out a 1.0 release with a stable API. I'll then come back and fix the bugs in 0.9.x for people who aren't able to change over, but I won't be supporting it for very long. Cheers, Dave From yossarian1 at gmail.com Sat Jul 1 21:58:20 2006 From: yossarian1 at gmail.com (Mike Michelson) Date: Sun, 2 Jul 2006 03:58:20 +0200 Subject: [Ferret-talk] Use of "&" kills rails -- redux Message-ID: <1bf347cbc427a6984f8abe5b0251094a@ruby-forum.com> I see someone else mentioned this below, but the answer was just "use &&", which isn't what I'm looking for. I'm using acts_as_ferret in my web app, but if the user inputs certain symbols, like &, it kills his/her browser. I'm running everything locally on OS X. I had to reboot OS X because the crash was so bad. Is there a way to screen out all these user queries that will crash the system? -- Posted via http://www.ruby-forum.com/. From wborgon at gmail.com Sun Jul 2 16:29:05 2006 From: wborgon at gmail.com (Wolfgang Borgon) Date: Sun, 2 Jul 2006 22:29:05 +0200 Subject: [Ferret-talk] Multiple terms accross multipl fields and associated tables Message-ID: I'm looking for a good way to search a few fields accross multiple asociated tables (i.e. find 'friends and family' accross Photo.name, Photo.description, and Tags.name where Photo has_many tags). And, ideally there's a competent query analyzer/parser. I've expirimented with constructing my own SQL using ... LIKE %term1% ... etc, but the performance is poor -- queries take seconds or more depending on the number of terms. Will acts_as_ferret work for me? Any recommendations? Thanks! -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Mon Jul 3 02:55:00 2006 From: kraemer at webit.de (Jens Kraemer) Date: Mon, 3 Jul 2006 08:55:00 +0200 Subject: [Ferret-talk] acts_as_ferret rdoc In-Reply-To: <363e75235d2229acb19f84afb2b1dddf@ruby-forum.com> References: <363e75235d2229acb19f84afb2b1dddf@ruby-forum.com> Message-ID: <20060703065500.GW15787@cordoba.webit.de> On Sat, Jul 01, 2006 at 01:21:49AM +0200, ryan king wrote: > The wiki for acts_as_ferret claims that the rdoc is available at > http://projects.jkraemer.net/acts_as_ferret/rdoc, but that page 404s. Is > the rdoc up somewhere? d'uh, I'll fix that asap. Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From JanPrill at blauton.de Mon Jul 3 03:50:42 2006 From: JanPrill at blauton.de (Jan Prill) Date: Mon, 3 Jul 2006 09:50:42 +0200 Subject: [Ferret-talk] Multiple terms accross multipl fields and associated tables In-Reply-To: References: Message-ID: <562a35c10607030050u5e0cb5b4mf63b09030214c392@mail.gmail.com> Hi Wolfgang, at least ferret will work perfectly well for you. Let your photo-model 'acts_as_ferret' and make acts_as_ferret index a method named eg 'all_tags' which returns a comma (or tab or whatever) seperated string of your tags on this photo. Now you'll be able to constructs a query as described in the rdocs of the QueryParser of ferret that includes your tags and should be very performant. Cheers, Jan On 7/2/06, Wolfgang Borgon wrote: > > I'm looking for a good way to search a few fields accross multiple > asociated tables (i.e. find 'friends and family' accross Photo.name, > Photo.description, and Tags.name where Photo has_many tags). And, > ideally there's a competent query analyzer/parser. > > I've expirimented with constructing my own SQL using ... LIKE %term1% > ... etc, but the performance is poor -- queries take seconds or more > depending on the number of terms. > > Will acts_as_ferret work for me? Any recommendations? > Thanks! > > -- > Posted via http://www.ruby-forum.com/. > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20060703/70069850/attachment.html From jan.prill at gmail.com Mon Jul 3 03:50:11 2006 From: jan.prill at gmail.com (Jan Prill) Date: Mon, 3 Jul 2006 09:50:11 +0200 Subject: [Ferret-talk] Multiple terms accross multipl fields and associated tables In-Reply-To: References: Message-ID: <562a35c10607030050k41733701q3cbac306ad148ee8@mail.gmail.com> Hi Wolfgang, at least ferret will work perfectly well for you. Let your photo-model 'acts_as_ferret' and make acts_as_ferret index a method named eg 'all_tags' which returns a comma (or tab or whatever) seperated string of your tags on this photo. Now you'll be able to constructs a query as described in the rdocs of the QueryParser of ferret that includes your tags and should be very performant. Cheers, Jan On 7/2/06, Wolfgang Borgon wrote: > > I'm looking for a good way to search a few fields accross multiple > asociated tables (i.e. find 'friends and family' accross Photo.name, > Photo.description, and Tags.name where Photo has_many tags). And, > ideally there's a competent query analyzer/parser. > > I've expirimented with constructing my own SQL using ... LIKE %term1% > ... etc, but the performance is poor -- queries take seconds or more > depending on the number of terms. > > Will acts_as_ferret work for me? Any recommendations? > Thanks! > > -- > Posted via http://www.ruby-forum.com/. > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20060703/1bb53c58/attachment.html From guest at guest.com Mon Jul 3 10:19:40 2006 From: guest at guest.com (Guest) Date: Mon, 3 Jul 2006 16:19:40 +0200 Subject: [Ferret-talk] Pagination with acts_as_ferret Message-ID: Hey there, Previous to finding ferret I had a test query setup to do pagination like: @listing_pages, @listings = paginate(:listing, :per_page => 10, :conditions => [" name LIKE ? ", "%" + search_criteria + "%"], :order_by => "name ASC") Can I do similar pagination with @results array returned from a ferret query? Thanks. :) -- Posted via http://www.ruby-forum.com/. From guest at guest.com Mon Jul 3 10:42:30 2006 From: guest at guest.com (Guest) Date: Mon, 3 Jul 2006 16:42:30 +0200 Subject: [Ferret-talk] Ferret not returning the right results Message-ID: I have ferret setup in my model with multiple fields, but when I do a search on the value that might be stored in two fields, I get no results. Here's an example: "Jim 12333" Where Jim is a name field, and 12333 is a zip code. I have this in my model: acts_as_ferret :fields => [ 'name', 'zip' ] I'm not sure what's up. Any help is appreciated. If I just search for Jim, or just 12333 it comes up. Is there a way to search like this? Thanks. -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Mon Jul 3 10:53:10 2006 From: kraemer at webit.de (Jens Kraemer) Date: Mon, 3 Jul 2006 16:53:10 +0200 Subject: [Ferret-talk] Ferret not returning the right results In-Reply-To: References: Message-ID: <20060703145310.GD15787@cordoba.webit.de> On Mon, Jul 03, 2006 at 04:42:30PM +0200, Guest wrote: > I have ferret setup in my model with multiple fields, but when I do a > search on the value that might be stored in two fields, I get no > results. > > Here's an example: > > "Jim 12333" > > Where Jim is a name field, and 12333 is a zip code. I have this in my > model: > > acts_as_ferret :fields => [ 'name', 'zip' ] > > I'm not sure what's up. Any help is appreciated. > > If I just search for Jim, or just 12333 it comes up. Is there a way to > search like this? query terms are ANDed by default in acts_as_ferret. Does "Jim OR 12333" work ? Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From guest at guest.com Mon Jul 3 11:33:15 2006 From: guest at guest.com (Guest) Date: Mon, 3 Jul 2006 17:33:15 +0200 Subject: [Ferret-talk] Ferret not returning the right results In-Reply-To: <20060703145310.GD15787@cordoba.webit.de> References: <20060703145310.GD15787@cordoba.webit.de> Message-ID: Cool, how do I change it to use OR? Thanks Jens. acts_as_ferret is awesome btw. Great work! Jens Kraemer wrote: > On Mon, Jul 03, 2006 at 04:42:30PM +0200, Guest wrote: >> >> acts_as_ferret :fields => [ 'name', 'zip' ] >> >> I'm not sure what's up. Any help is appreciated. >> >> If I just search for Jim, or just 12333 it comes up. Is there a way to >> search like this? > > > query terms are ANDed by default in acts_as_ferret. Does "Jim OR 12333" > work ? > > Jens > > > -- > webit! Gesellschaft f?r neue Medien mbH www.webit.de > Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de > Schnorrstra?e 76 Tel +49 351 46766 0 > D-01069 Dresden Fax +49 351 46766 66 -- Posted via http://www.ruby-forum.com/. From mike78703 at gmail.com Mon Jul 3 14:24:21 2006 From: mike78703 at gmail.com (Mike b) Date: Mon, 3 Jul 2006 20:24:21 +0200 Subject: [Ferret-talk] Ferret on 64 bit Red Hat Message-ID: Can anyone verify that Ferret should work correctly on 64 bit Red Hat ES 4? The test suite gets the following segfault: ........................../unit/../unit/analysis/../../unit/index/../../unit/store/../../unit/search/tc_filter.rb:20: [BUG] Segmentation fault ruby 1.8.4 (2005-12-24) [x86_64-linux] This is with version "ferret-0.9.4" on an AMD Dual Opteron 270. -- Posted via http://www.ruby-forum.com/. From ian-rubyforum at petersens.ca Mon Jul 3 23:05:06 2006 From: ian-rubyforum at petersens.ca (Ian) Date: Tue, 4 Jul 2006 05:05:06 +0200 Subject: [Ferret-talk] Ferret on 64 bit Red Hat In-Reply-To: References: Message-ID: I have never used Ferret, so this advice may be meaningless, but I got a similar segfault with Selenium on Rails combined with Rails 1.1.2 and Ruby 1.8.4 running on a Gentoo AMD64 box. For me, the fix was to upgrade Ruby from the 2005-12-24 release to the 2006-05-29 one. See http://bugs.gentoo.org/show_bug.cgi?id=136880 if you're interested in my original bug report. Ian -- Posted via http://www.ruby-forum.com/. From dbalmain.ml at gmail.com Mon Jul 3 23:22:23 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Tue, 4 Jul 2006 12:22:23 +0900 Subject: [Ferret-talk] Ferret on 64 bit Red Hat In-Reply-To: References: Message-ID: On 7/4/06, Mike b wrote: > Can anyone verify that Ferret should work correctly on 64 bit Red Hat ES > 4? > > The test suite gets the following segfault: > > ........................../unit/../unit/analysis/../../unit/index/../../unit/store/../../unit/search/tc_filter.rb:20: > [BUG] Segmentation fault > ruby 1.8.4 (2005-12-24) [x86_64-linux] > > > This is with version "ferret-0.9.4" on an AMD Dual Opteron 270. Hi Mike, There are problems with the current version of Ferret on 64-bit machines. I've done a lot of work towards fixing this so hopefully the next version will help you. I still have quite a lot of work to do before I can release it (ie writing the binding code) but if you could possibly try out the C code, that would be a great help. If you have the time, try; svn co svn://www.davebalmain.com/exp ferret_experimental Then cd into the ferret_experimental/c directory and try running make. If everything is working correctly you should get it all to compile without any warnings. If there are any warnings or problems compiling, I'd love to know about them. They'll go a long way in helping me get Ferret working on 64-bit systems. Cheers, Dave PS sames goes for anyone who wants to try it out under VC6. From wborgon at gmail.com Tue Jul 4 01:08:21 2006 From: wborgon at gmail.com (Wolfgang Borgon) Date: Tue, 4 Jul 2006 07:08:21 +0200 Subject: [Ferret-talk] Ferret on 64 bit Red Hat In-Reply-To: References: Message-ID: FWIW, I had no trouble installing the Ferret GEM or the acts_as_ferret plugin. I haven't tested extensively, but the few queries I did try seemed to work -- I just have to figure out Query parser so I can get singluar and plural forms, recognition of private data, etc. I'm running this on Fedora Core 4 x64 SMP. David Balmain wrote: > On 7/4/06, Mike b wrote: >> This is with version "ferret-0.9.4" on an AMD Dual Opteron 270. > Hi Mike, > > There are problems with the current version of Ferret on 64-bit > machines. I've done a lot of work towards fixing this so hopefully the > next version will help you. I still have quite a lot of work to do > before I can release it (ie writing the binding code) but if you could > possibly try out the C code, that would be a great help. If you have > the time, try; > > svn co svn://www.davebalmain.com/exp ferret_experimental > > Then cd into the ferret_experimental/c directory and try running make. > If everything is working correctly you should get it all to compile > without any warnings. If there are any warnings or problems compiling, > I'd love to know about them. They'll go a long way in helping me get > Ferret working on 64-bit systems. > > Cheers, > Dave > > PS sames goes for anyone who wants to try it out under VC6. -- Posted via http://www.ruby-forum.com/. From wborgon at gmail.com Tue Jul 4 01:12:39 2006 From: wborgon at gmail.com (Wolfgang Borgon) Date: Tue, 4 Jul 2006 07:12:39 +0200 Subject: [Ferret-talk] Multiple terms accross multipl fields and associated tab In-Reply-To: <562a35c10607030050k41733701q3cbac306ad148ee8@mail.gmail.com> References: <562a35c10607030050k41733701q3cbac306ad148ee8@mail.gmail.com> Message-ID: <8625d1c55adac3aa45dd5a0eb0a624ee@ruby-forum.com> Jan, Thanks a lot...I saw a blog posting recommending the same thing after I posted... it works like a charm. I like it and far faster and confidence-inspiring than mysql LIKE. For anyone else that sees this, the blog post I saw is: http://olivier.liquid-concept.ch/articles/2006/04/17/add-tags-in-a-ferret-index W Jan Prill wrote: > Hi Wolfgang, > > at least ferret will work perfectly well for you. > > Let your photo-model 'acts_as_ferret' and make acts_as_ferret index a > method > named eg 'all_tags' which returns a comma (or tab or whatever) seperated > string of your tags on this photo. Now you'll be able to constructs a > query > as described in the rdocs of the QueryParser of ferret that includes > your > tags and should be very performant. > > Cheers, > Jan -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Tue Jul 4 06:15:20 2006 From: kraemer at webit.de (Jens Kraemer) Date: Tue, 4 Jul 2006 12:15:20 +0200 Subject: [Ferret-talk] Ferret not returning the right results In-Reply-To: References: <20060703145310.GD15787@cordoba.webit.de> Message-ID: <20060704101520.GG15787@cordoba.webit.de> On Mon, Jul 03, 2006 at 05:33:15PM +0200, Guest wrote: > > Cool, how do I change it to use OR? acts_as_ferret( :fields => [ 'name', 'zip' ], :occur_default => Ferret::Search::BooleanClause::Occur::SHOULD ) btw: the API docs are online again, see http://projects.jkraemer.net/acts_as_ferret/rdoc/ ;-) > Thanks Jens. acts_as_ferret is awesome btw. Great work! thanks :-) Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From bk at benjaminkrause.com Tue Jul 4 09:19:34 2006 From: bk at benjaminkrause.com (Benjamin Krause) Date: Tue, 4 Jul 2006 15:19:34 +0200 Subject: [Ferret-talk] Recalculating the score Message-ID: <078eb4f56a626fd541f98d842b7c388f@ruby-forum.com> Hey .. I'm using ferret to index various objects and i'm create a Ferret::Document for each of these objects. Indexing and searching is working fine. Each of these Ferret::Documents has a 'relevance' field, storing an integer, how relevant this object is for the search. The 'relevance' is in the range of 1..10 Now i would like to multiply the relevance of the document with the score, and sort the results by that. e.g.: A document with a score of 0.82 and a relevance of 3 should have a final score of 2.46 I couldn't figure out how to do this .. I've read the 'Balancing relevancy and recentness' thread.. > score = yield( doc, score ) if block_given? > > This allows a block attached to a search call to adjust > document scores before documents are sorted, based on > some (possibly dynamic) numerical factors associated > with the document, e.g. the number and importance i guess this works for the pure ruby implementation but won't work for the c-implementation? > As long as Ferret does what Lucene does with boosts, you could scale > document boosts at indexing time by some factor related to age and > that will factor into scoring. Boost won't help me here, i've even set the boost value for relevance to 0.0, as it should not be part of the query.. Is there any way on how to recaluclate the score? Thanks, Ben -- Posted via http://www.ruby-forum.com/. From joost at swx.nl Wed Jul 5 02:59:53 2006 From: joost at swx.nl (Joost) Date: Wed, 5 Jul 2006 08:59:53 +0200 Subject: [Ferret-talk] Sorting results by column In-Reply-To: <1ab7c1cb86a668451c85963fd3d25211@ruby-forum.com> References: <1ab7c1cb86a668451c85963fd3d25211@ruby-forum.com> Message-ID: I've installed acts_as_ferret just 2 seconds ago and I must say: It works splendit!! The following query works (might help you towards a solution): YourModel.find_by_contents('id:"2"') # Returns entries from yourmodels table with id=2. YourModel.find_by_contents('name:"somename"') # Only searches somename field.. etc..etc.. -- Posted via http://www.ruby-forum.com/. From joost at swx.nl Wed Jul 5 03:12:33 2006 From: joost at swx.nl (Joost) Date: Wed, 5 Jul 2006 09:12:33 +0200 Subject: [Ferret-talk] Sorting results by column In-Reply-To: References: <1ab7c1cb86a668451c85963fd3d25211@ruby-forum.com> Message-ID: <427bb537430bdb8e88db67763d540ea9@ruby-forum.com> This does also work: ModelName.find_by_contents('id:( > 2) id:( < 5)') Also see the Ferret RDOC: http://gemjack.com/gems/ferret-0.9.3/index.html -- Posted via http://www.ruby-forum.com/. From chris.smoak at gmail.com Wed Jul 5 08:51:17 2006 From: chris.smoak at gmail.com (Chris) Date: Wed, 5 Jul 2006 14:51:17 +0200 Subject: [Ferret-talk] search speed eclipsed by retrieval speed Message-ID: <62ed587f57d89d91eb53be9a1ddbb92b@ruby-forum.com> Hi all, I've recently started working with Ferret and I'm getting what seems to be slow searches. I have about 10000 documents in the index, with several fields per document, with some fields having an array of several values that are indexed. I am using a RAMDirectory to store the index for searching. When doing testing, I find that searches are reasonable at around .2 to .5 seconds per search (for simple single word searches). However, when trying to retrieve the documents from the index, to retrieve the results ends up taking well over 2 to 3 seconds, totally eclipsing the search time, and making the whole thing quite slow. Am I missing anything here? Will reducing the document size greatly affect the retrieval time of the documents? Any suggestions for general speed improvement? Thanks! Below, I have detailed te process I am using to create and search the index, in case that's useful: I have created an index that is stored on disk. I'd like to read it back into memory and use a RAMDirectory to see what speed improvements I can get by using that. Here's what I'm doing to create the index: ram_dir = Ferret::Store::RAMDirectory.new in_mem_index = Ferret::Index::IndexWriter.new(ram_dir, :create => true) # ... add stuff to the index in_mem_index.optimize in_mem_index.close index = Ferret::Index::Index.new(:dir => ram_dir) index.persist('path/to/index', true) index.close I use a RAMDirectory when initially writing to the index because I am writing a lot to the index and I assume writing directly to a FSDirectory will be slower. Later, I am trying to load this index back into memory as a RAMDirectory. I am not actually sure how to do this, so I am guessing here: ram_dir = Ferret::Store::RAMDirectory.new index = Ferret::Index::Index.new(:dir => ram_dir, :create => true) index.add_indexes(Ferret::Store::FSDirectory.new('path/to/index')) results = [] num_results = index.search_each('search word(s)', { :first_doc => 0, :num_docs => 50 }) do | doc, score | results << index[doc] end Any help would be awesome. Thanks! - chris -- Posted via http://www.ruby-forum.com/. From reverri at gmail.com Wed Jul 5 09:19:51 2006 From: reverri at gmail.com (Dan) Date: Wed, 5 Jul 2006 15:19:51 +0200 Subject: [Ferret-talk] Sorting results by column In-Reply-To: <1ab7c1cb86a668451c85963fd3d25211@ruby-forum.com> References: <1ab7c1cb86a668451c85963fd3d25211@ruby-forum.com> Message-ID: You should be able to sort the ferret results like this: @posts = Post.find_by_contents(params[:query], :sort => ["end_date"]) This post talked a little bit more about using ferret to sort results: http://lists.rubyonrails.org/pipermail/rails/2006-June/044513.html *Note: Whenever I tried to do this ferret would only sort the first page of results. So if ferret returned 100 results and each page had 10 items only the first 10 items would be sorted by the desired field. Of course I was probably doing something wrong. Let me know how this works out for you. -- Posted via http://www.ruby-forum.com/. From guest at guest.com Wed Jul 5 12:35:07 2006 From: guest at guest.com (guest) Date: Wed, 5 Jul 2006 18:35:07 +0200 Subject: [Ferret-talk] pagination in acts_as_ferret In-Reply-To: References: <20060503163044.GS29289@cordoba.webit.de> Message-ID: <94397f209e879c9d1ccbc43d99705d79@ruby-forum.com> Nice! This is very useful! What code did you snip out of the serach method? Is that required? Thanks. Tom Davies wrote: > To add to what Jens said, you may find this code useful: > > In your model: > > def self.search(q, options = {}) > return nil if q.nil? > default_options = {:limit => 10, :page => 1} > options = default_options.merge options > options[:offset] = options[:limit] * (options[:page].to_i-1) > ... snip ... > num = INDEX.search_each(query, {:num_docs => options[:limit], > :first_doc => options[:offset]}) do |doc, score| > ... snip ... > [num, results] > end > > Notice that I return the total matches as num, plus the results. The > total matches is necessary to generate a paginator across all the > items. > > For the pagination, I created this simple method in my application > controller (note it assumes a params[:page] being passed around): > > def pages_for(size, options = {}) > default_options = {:per_page => 10} > options = default_options.merge options > pages = Paginator.new self, size, options[:per_page], > (params[:page]||1) > pages > end > > And lastly, to use it in a controller: > @total, @results = YourModel.search(@query, :page => > (params[:page]||1) > @result_pages = pages_for(@total) > > Tom > > On 5/3/06, Jens Kraemer wrote: >> > unless @query.blank? >> >> Ferret-talk mailing list >> Ferret-talk at rubyforge.org >> http://rubyforge.org/mailman/listinfo/ferret-talk >> > > > -- > Tom Davies > > http://blog.atomgiant.com > http://gifthat.com -- Posted via http://www.ruby-forum.com/. From dbalmain.ml at gmail.com Wed Jul 5 23:23:24 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Thu, 6 Jul 2006 12:23:24 +0900 Subject: [Ferret-talk] search speed eclipsed by retrieval speed In-Reply-To: <62ed587f57d89d91eb53be9a1ddbb92b@ruby-forum.com> References: <62ed587f57d89d91eb53be9a1ddbb92b@ruby-forum.com> Message-ID: On 7/5/06, Chris wrote: > Hi all, > > I've recently started working with Ferret and I'm getting what seems to > be slow searches. I have about 10000 documents in the index, with > several fields per document, with some fields having an array of several > values that are indexed. > > I am using a RAMDirectory to store the index for searching. When doing > testing, I find that searches are reasonable at around .2 to .5 seconds > per search (for simple single word searches). However, when trying to > retrieve the documents from the index, to retrieve the results ends up > taking well over 2 to 3 seconds, totally eclipsing the search time, and > making the whole thing quite slow. Am I missing anything here? Will > reducing the document size greatly affect the retrieval time of the > documents? Any suggestions for general speed improvement? Thanks! > > Below, I have detailed te process I am using to create and search the > index, in case that's useful: > > I have created an index that is stored on disk. I'd like to read it back > into memory and use a RAMDirectory to see what speed improvements I can > get by using that. > > Here's what I'm doing to create the index: > > ram_dir = Ferret::Store::RAMDirectory.new > in_mem_index = Ferret::Index::IndexWriter.new(ram_dir, :create => > true) > > # ... add stuff to the index > > in_mem_index.optimize > in_mem_index.close > > index = Ferret::Index::Index.new(:dir => ram_dir) > index.persist('path/to/index', true) > index.close Hi Chris, This is currently the fastest way to create small indexes. In the next version of Ferret it won't make any difference though. Ferret will automatically try and create as much of the index in Memory as possible. It's up to you to set the amount of memory that you want to use to create the index. But forget about that for now. I'll try and answer your question. > I use a RAMDirectory when initially writing to the index because I am > writing a lot to the index and I assume writing directly to a > FSDirectory will be slower. Yes, but not by a lot. > Later, I am trying to load this index back into memory as a > RAMDirectory. I am not actually sure how to do this, so I am guessing > here: > > ram_dir = Ferret::Store::RAMDirectory.new > index = Ferret::Index::Index.new(:dir => ram_dir, :create => true) > index.add_indexes(Ferret::Store::FSDirectory.new('path/to/index')) Better to do it like this; ram_dir = Ferret::Store::RAMDirectory.new(FSDirectory.new("path/to/index"), true) That reads and FSDirectory directly into a RAMDirectory. > results = [] > num_results = index.search_each('search word(s)', { :first_doc => 0, > :num_docs => 50 }) do | doc, score | > results << index[doc] > end > > > Any help would be awesome. Thanks! This all looks fine. It depends on your exact situation but if you are indexing data from a database it is usually a better idea to only store the id in the index. That way, when you load the document from the index, you are only loading one short string. You can then get any other data you need from the database. If your documents are large, Ferret needs to read the whole document into memory. I've added a lazy loading document to Ferret which will speed things up a lot in the next version. It still seems very surprising to me that your queries are taking so long. Are you working on Windows? That would explain things a little. Cheers, Dave From dbalmain.ml at gmail.com Wed Jul 5 23:53:27 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Thu, 6 Jul 2006 12:53:27 +0900 Subject: [Ferret-talk] Recalculating the score In-Reply-To: <078eb4f56a626fd541f98d842b7c388f@ruby-forum.com> References: <078eb4f56a626fd541f98d842b7c388f@ruby-forum.com> Message-ID: On 7/4/06, Benjamin Krause wrote: > Hey .. > > I'm using ferret to index various objects and i'm create a > Ferret::Document for each of these objects. Indexing and searching is > working fine. > > Each of these Ferret::Documents has a 'relevance' field, storing an > integer, how relevant this object is for the search. The 'relevance' is > in the range of 1..10 > > Now i would like to multiply the relevance of the document with the > score, and sort the results by that. > > e.g.: > A document with a score of 0.82 and a relevance of 3 should have a final > score of 2.46 > > I couldn't figure out how to do this .. > > I've read the 'Balancing relevancy and recentness' thread.. > > > score = yield( doc, score ) if block_given? > > > > This allows a block attached to a search call to adjust > > document scores before documents are sorted, based on > > some (possibly dynamic) numerical factors associated > > with the document, e.g. the number and importance > > i guess this works for the pure ruby implementation but won't work for > the c-implementation? Hi Ben, You are right, this is only possible in the pure ruby version. A more flexible framework for sorting will be coming in the future but currently you can only sort by integer, float, string, doc_id, and relevance. > > As long as Ferret does what Lucene does with boosts, you could scale > > document boosts at indexing time by some factor related to age and > > that will factor into scoring. > > Boost won't help me here, i've even set the boost value for relevance to > 0.0, as it should not be part of the query.. > > Is there any way on how to recaluclate the score? How about setting the boost for the whole document rather than just the :relevance field? Or do you sometimes want to sort by relevance without taking the :relevance field into account? Cheers, Dave PS: While we are on the topic, how would you like the sort API to look? Many have complained that the sort API is too java-like but no-one has suggested any improvements yet. I'd love to see some ideas. From yossarian1 at gmail.com Thu Jul 6 07:05:56 2006 From: yossarian1 at gmail.com (Mike Michelson) Date: Thu, 6 Jul 2006 13:05:56 +0200 Subject: [Ferret-talk] pagination in acts_as_ferret In-Reply-To: References: <20060503163044.GS29289@cordoba.webit.de> Message-ID: <852ac175466ea686f046a2ed90d4316b@ruby-forum.com> I think the simplest way to attack this problem is just to write a paginator for an array of objects. Then you can just do your usual find_by_contents, with num_docs set to :all, and then just paginate the array that is returned. Tom Davies wrote: > To add to what Jens said, you may find this code useful: > > In your model: > > def self.search(q, options = {}) > return nil if q.nil? > default_options = {:limit => 10, :page => 1} > options = default_options.merge options > options[:offset] = options[:limit] * (options[:page].to_i-1) > ... snip ... > num = INDEX.search_each(query, {:num_docs => options[:limit], > :first_doc => options[:offset]}) do |doc, score| > ... snip ... > [num, results] > end > > Notice that I return the total matches as num, plus the results. The > total matches is necessary to generate a paginator across all the > items. > > For the pagination, I created this simple method in my application > controller (note it assumes a params[:page] being passed around): > > def pages_for(size, options = {}) > default_options = {:per_page => 10} > options = default_options.merge options > pages = Paginator.new self, size, options[:per_page], > (params[:page]||1) > pages > end > > And lastly, to use it in a controller: > @total, @results = YourModel.search(@query, :page => > (params[:page]||1) > @result_pages = pages_for(@total) > > Tom > > On 5/3/06, Jens Kraemer wrote: >> > unless @query.blank? >> >> Ferret-talk mailing list >> Ferret-talk at rubyforge.org >> http://rubyforge.org/mailman/listinfo/ferret-talk >> > > > -- > Tom Davies > > http://blog.atomgiant.com > http://gifthat.com -- Posted via http://www.ruby-forum.com/. From yossarian1 at gmail.com Thu Jul 6 07:19:19 2006 From: yossarian1 at gmail.com (Mike Michelson) Date: Thu, 6 Jul 2006 13:19:19 +0200 Subject: [Ferret-talk] pagination in acts_as_ferret In-Reply-To: <852ac175466ea686f046a2ed90d4316b@ruby-forum.com> References: <20060503163044.GS29289@cordoba.webit.de> <852ac175466ea686f046a2ed90d4316b@ruby-forum.com> Message-ID: <5c7da249d6c32c6288add10f824611b5@ruby-forum.com> I guess that's sort of what you did, but I don't think you need to define the self.search method. find_by_contents works just fine. Mike Michelson wrote: > I think the simplest way to attack this problem is just to write a > paginator for an array of objects. Then you can just do your usual > find_by_contents, with num_docs set to :all, and then just paginate the > array that is returned. > > > -- Posted via http://www.ruby-forum.com/. From waspfactory at gmail.com Thu Jul 6 08:41:11 2006 From: waspfactory at gmail.com (Caspar) Date: Thu, 6 Jul 2006 14:41:11 +0200 Subject: [Ferret-talk] querying returned results Message-ID: <18edcdca808ffa2951cda2c90c00b506@ruby-forum.com> Hi I'm using the acts_as_ferret to index one table in my database. The table contains formation about listed items and each of these items belongs to a section and a category. On the results page I want to have two drop down boxes whose contents are populated depeneding on the returned results. So for example the section drop down should only show sections that exist in the result set, also i would like to display a count of the number of results returned in that section in the drop down e.g. sections[cars:5]. Does anyone know the best way to go about this? I'm a total ferret/lucene beginner... thanks for any help cheers caspar -- Posted via http://www.ruby-forum.com/. From contact at ezabel.com Thu Jul 6 17:58:57 2006 From: contact at ezabel.com (Ian Zabel) Date: Thu, 6 Jul 2006 23:58:57 +0200 Subject: [Ferret-talk] acts_as_ferret Locale issue Message-ID: <227c8b013db10fc48ecebfaaef5e9a74@ruby-forum.com> I've just installed acts_as_ferret, and am trying to build my index, but I'm getting the following error: >> r = Topic.find_by_contents('testing') StandardError: : Error occured at :704 Error: exception 2 not handled: Error decoding input string. Check that you have the locale set correctly from ./script/../config/../config/../vendor/plugins/acts_as_ferret/lib/acts_as_ferret.rb:227:in `<<' from ./script/../config/../config/../vendor/plugins/acts_as_ferret/lib/acts_as_ferret.rb:227:in `rebuild_index' from ./script/../config/../config/../vendor/plugins/acts_as_ferret/lib/acts_as_ferret.rb:227:in `rebuild_index' from ./script/../config/../config/../vendor/plugins/acts_as_ferret/lib/acts_as_ferret.rb:247:in `create_index_instance' from ./script/../config/../config/../vendor/plugins/acts_as_ferret/lib/acts_as_ferret.rb:240:in `ferret_index' from ./script/../config/../config/../vendor/plugins/acts_as_ferret/lib/acts_as_ferret.rb:325:in `find_id_by_contents' from ./script/../config/../config/../vendor/plugins/acts_as_ferret/lib/acts_as_ferret.rb:262:in `find_by_contents' from (irb):1 I'm using the current version of ferret (gem install ferret) and acts_as_ferret (script/plugin install svn://projects.jkraemer.net/acts_as_ferret/tags/plugin/stable/acts_as_ferret) as of today, 7/6/06. I have tried setting my locale in environment.rb as mentioned here http://projects.jkraemer.net/acts_as_ferret/wiki/TypoWithFerret (note: i'm not using typo, but the locale note at the bottom seems to apply). So in the Rails::Initializer.run block, I've put this line: ENV['LANG'] = 'en_US.UTF-8' Didn't make a difference. Any other ideas? Thanks, Ian. -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Thu Jul 6 18:16:25 2006 From: kraemer at webit.de (Jens Kraemer) Date: Fri, 7 Jul 2006 00:16:25 +0200 Subject: [Ferret-talk] acts_as_ferret Locale issue In-Reply-To: <227c8b013db10fc48ecebfaaef5e9a74@ruby-forum.com> References: <227c8b013db10fc48ecebfaaef5e9a74@ruby-forum.com> Message-ID: <20060706221625.GA17700@cordoba.webit.de> Hi Ian, On Thu, Jul 06, 2006 at 11:58:57PM +0200, Ian Zabel wrote: > I've just installed acts_as_ferret, and am trying to build my index, but > I'm getting the following error: > > >> r = Topic.find_by_contents('testing') > StandardError: : Error occured at :704 > Error: exception 2 not handled: Error decoding input string. Check that > you have the locale set correctly [..] > I have tried setting my locale in environment.rb as mentioned here > http://projects.jkraemer.net/acts_as_ferret/wiki/TypoWithFerret (note: > i'm not using typo, but the locale note at the bottom seems to apply). > > So in the Rails::Initializer.run block, I've put this line: ENV['LANG'] > = 'en_US.UTF-8' > > Didn't make a difference. Any other ideas? I put this statement at the very top of the file, outside of the block. Maybe that will do the trick. You also should make sure the locale exists on your system. On a Debian-based system, you could do dpkg-reconfigure locales and make sure the box before "en_US.UTF-8" is ticked. Hope this helps, Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From yingfeng.zhang at gmail.com Thu Jul 6 22:45:52 2006 From: yingfeng.zhang at gmail.com (Charlie) Date: Fri, 7 Jul 2006 04:45:52 +0200 Subject: [Ferret-talk] How to add Asia token analyzer to ferret simply? Message-ID: <1c6ae83de30b2cb8edd128c7a5652680@ruby-forum.com> Hi,David Can you give me an example of how to add analyzer to ferret to Asian languages? My web application will have to support multi language search,which means,for example,both Chinese and English will be searched through the form. Currently,I have decided to use the simple token principles,which means that every Chinese character will be a token,although this is not so well in some cases,my database column to be full-text searched include at most tens of UTF-8 characters,therefore i think it can works well. Thanks a lot! David Balmain wrote: > On 7/5/06, Charlie wrote: >> Is there any schema of full-text search that support utf-8 especially >> for Asia language such as Chinese,Japanese,etc. >> Ferret/acts_as_ferret can not work when these language key words are >> searched,and also, it is difficult to implement pagination-which need >> both the count of search results and offset. >> Very grateful! > > Hi Charlie, > > Ferret will work fine on Asian Languages. You just need to write your > own Analyzer which matches tokens correctly for the language you are > interested in. Have a look at the RegExpAnalyzer in Ferret. You can > look at test/unit/analysis/ctc_analyzer.rb to see how it works. > > Cheers, > Dave -- Posted via http://www.ruby-forum.com/. From yingfeng.zhang at gmail.com Thu Jul 6 22:59:11 2006 From: yingfeng.zhang at gmail.com (Charlie) Date: Fri, 7 Jul 2006 04:59:11 +0200 Subject: [Ferret-talk] How to add Asia token analyzer to ferret simply? In-Reply-To: <1c6ae83de30b2cb8edd128c7a5652680@ruby-forum.com> References: <1c6ae83de30b2cb8edd128c7a5652680@ruby-forum.com> Message-ID: <467b7a2720b7860ac16626227d1820e5@ruby-forum.com> And also it is needed to make the new Chinese analyzer work together with the original standard analyzer -- Posted via http://www.ruby-forum.com/. From contact at ezabel.com Thu Jul 6 23:29:50 2006 From: contact at ezabel.com (Ian Zabel) Date: Fri, 7 Jul 2006 05:29:50 +0200 Subject: [Ferret-talk] acts_as_ferret Locale issue In-Reply-To: <20060706221625.GA17700@cordoba.webit.de> References: <227c8b013db10fc48ecebfaaef5e9a74@ruby-forum.com> <20060706221625.GA17700@cordoba.webit.de> Message-ID: Thanks for the response! > I put this statement at the very top of the file, outside of the block. > Maybe that will do the trick. > > You also should make sure the locale exists on your system. On > a Debian-based system, you could do > dpkg-reconfigure locales > and make sure the box before "en_US.UTF-8" is ticked. I determined with `locale -a` that the locale on the box is called "en_US.utf8", so I added "ENV['LANG'] = 'en_US.utf8'" at the top of my environment.rb (right after "ENV['RAILS_ENV'] ||= 'production'". Still getting the same error: "Error decoding input string. Check that you have the locale set correctly" :( It may be worth noting that it seems to only be a problem with this particular model. I am able to index a different model without any issues. So it's gotta be something with the data. I noticed that my InnoDB topics table was set to latin1 charset, so I changed it to utf8. I still get the same error. Not sure where to go next. Ian. -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Fri Jul 7 03:48:05 2006 From: kraemer at webit.de (Jens Kraemer) Date: Fri, 7 Jul 2006 09:48:05 +0200 Subject: [Ferret-talk] acts_as_ferret Locale issue In-Reply-To: References: <227c8b013db10fc48ecebfaaef5e9a74@ruby-forum.com> <20060706221625.GA17700@cordoba.webit.de> Message-ID: <20060707074805.GA17139@cordoba.webit.de> On Fri, Jul 07, 2006 at 05:29:50AM +0200, Ian Zabel wrote: > Thanks for the response! > > > I put this statement at the very top of the file, outside of the block. > > Maybe that will do the trick. > > > > You also should make sure the locale exists on your system. On > > a Debian-based system, you could do > > dpkg-reconfigure locales > > and make sure the box before "en_US.UTF-8" is ticked. > > I determined with `locale -a` that the locale on the box is called > "en_US.utf8", so I added "ENV['LANG'] = 'en_US.utf8'" at the top of my > environment.rb (right after "ENV['RAILS_ENV'] ||= 'production'". > > Still getting the same error: "Error decoding input string. Check that > you have the locale set correctly" > > :( > > It may be worth noting that it seems to only be a problem with this > particular model. I am able to index a different model without any > issues. So it's gotta be something with the data. > > I noticed that my InnoDB topics table was set to latin1 charset, so I > changed it to utf8. I still get the same error. Imho changing the default charset of a table doesn't change the encoding of the data stored in it. So that's still latin1 what you get from your DB. > Not sure where to go next. The ENV['LANG'] value has to correspond to the encoding of the data you want to index, so if your data is latin1, Ferret needs to run with such a locale, i.e. ISO-8859-1. In such cases I dump the data as text, convert to utf8 (usually with vim :set fileencoding=utf8), re-create the table with DEFAULT CHARSET UTF-8 and re-import the data. With large data sets other solutions might be more efficient, though. Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From dbalmain.ml at gmail.com Fri Jul 7 07:25:41 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Fri, 7 Jul 2006 20:25:41 +0900 Subject: [Ferret-talk] acts_as_ferret Locale issue In-Reply-To: <20060707074805.GA17139@cordoba.webit.de> References: <227c8b013db10fc48ecebfaaef5e9a74@ruby-forum.com> <20060706221625.GA17700@cordoba.webit.de> <20060707074805.GA17139@cordoba.webit.de> Message-ID: On 7/7/06, Jens Kraemer wrote: > On Fri, Jul 07, 2006 at 05:29:50AM +0200, Ian Zabel wrote: > > Thanks for the response! > > > > > I put this statement at the very top of the file, outside of the block. > > > Maybe that will do the trick. > > > > > > You also should make sure the locale exists on your system. On > > > a Debian-based system, you could do > > > dpkg-reconfigure locales > > > and make sure the box before "en_US.UTF-8" is ticked. > > > > I determined with `locale -a` that the locale on the box is called > > "en_US.utf8", so I added "ENV['LANG'] = 'en_US.utf8'" at the top of my > > environment.rb (right after "ENV['RAILS_ENV'] ||= 'production'". > > > > Still getting the same error: "Error decoding input string. Check that > > you have the locale set correctly" > > > > :( > > > > It may be worth noting that it seems to only be a problem with this > > particular model. I am able to index a different model without any > > issues. So it's gotta be something with the data. > > > > I noticed that my InnoDB topics table was set to latin1 charset, so I > > changed it to utf8. I still get the same error. > > Imho changing the default charset of a table doesn't change the encoding > of the data stored in it. So that's still latin1 what you get from your > DB. > > > Not sure where to go next. > > The ENV['LANG'] value has to correspond to the encoding of the data you > want to index, so if your data is latin1, Ferret needs to run with such > a locale, i.e. ISO-8859-1. > > In such cases I dump the data as text, convert to utf8 (usually > with vim :set fileencoding=utf8), re-create the table with DEFAULT > CHARSET UTF-8 and re-import the data. > > With large data sets other solutions might be more efficient, though. Here's one way you can convert ISO-8859-1 to UTF-8; str = str.unpack("C*).map {|c| if c < 0x80 next c.chr elsif c < 0xC0 next "\xC2" + c.chr else next "\xC3" + (c - 64).chr end }.join("") That may help. Cheers, Dave From dbalmain.ml at gmail.com Fri Jul 7 08:15:24 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Fri, 7 Jul 2006 21:15:24 +0900 Subject: [Ferret-talk] How to add Asia token analyzer to ferret simply? In-Reply-To: <467b7a2720b7860ac16626227d1820e5@ruby-forum.com> References: <1c6ae83de30b2cb8edd128c7a5652680@ruby-forum.com> <467b7a2720b7860ac16626227d1820e5@ruby-forum.com> Message-ID: On 7/7/06, Charlie wrote: > And also it is needed to make the new Chinese analyzer work together > with the original standard analyzer I answered this on the rails list but just in case; # Create a PerFieldAnalyzer (AKA PerFieldAnalyzerWrapper) which # defaults to Standard analyzer = PerFieldAnalyzer.new(StandardAnalyzer.new) # Add a special character analyzer for the chinese field or # whatever field it is that has chinese characters. This splits the # data into single characters. analyzer["chinese"] = RegExpAnalyzer.new(/./, false) There you have it. Pretty simple. Cheers, Dave From dbalmain.ml at gmail.com Fri Jul 7 08:43:21 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Fri, 7 Jul 2006 21:43:21 +0900 Subject: [Ferret-talk] querying returned results In-Reply-To: <18edcdca808ffa2951cda2c90c00b506@ruby-forum.com> References: <18edcdca808ffa2951cda2c90c00b506@ruby-forum.com> Message-ID: On 7/6/06, Caspar wrote: > Hi I'm using the acts_as_ferret to index one table in my database. The > table contains formation about listed items and each of these items > belongs to a section and a category. On the results page I want to have > two drop down boxes whose contents are populated depeneding on the > returned results. So for example the section drop down should only show > sections that exist in the result set, also i would like to display a > count of the number of results returned in that section in the drop down > e.g. sections[cars:5]. Does anyone know the best way to go about this? > I'm a total ferret/lucene beginner... > thanks for any help > cheers > caspar Hi Caspar, I'll try and answer this from a Ferret point of view (I don't know acts_as_ferret well enough yet). I think the best way to do this is to do the search for all documents and do a running count. So something like this; sections = {} models = {} index.search_each(query_str) do |doc_id, score| doc = index[doc_id] (sections[doc[:section]]||=0) += 1 (models[doc[:model]]||=0) += 1 end You might run into performance problems doing this if your queries are returning a large number of documents or if the documents are quite large. The biggest performance hit will come from loading so many documents from the index. You'll be able to make this quite fast by caching the fields that you want to count. Let me know if you do run into performance problems and I'll show you how to build a cache of the field values. Cheers, Dave From atomgiant at gmail.com Fri Jul 7 08:47:55 2006 From: atomgiant at gmail.com (Tom Davies) Date: Fri, 7 Jul 2006 08:47:55 -0400 Subject: [Ferret-talk] pagination in acts_as_ferret In-Reply-To: <94397f209e879c9d1ccbc43d99705d79@ruby-forum.com> References: <20060503163044.GS29289@cordoba.webit.de> <94397f209e879c9d1ccbc43d99705d79@ruby-forum.com> Message-ID: The snipped out portion is just where you process the results. In my case, I am searching for gifts so here is how I search and collect the Gift objects: num = INDEX.search_each(query, {:num_docs => options[:limit], :first_doc => options[:offset]}) do |doc, score| logger.debug("Found doc: #{doc}, id: #{INDEX[doc]['id']}, score: #{score}") gifts << Gift.find(INDEX[doc]['id']) end Also, the reason I defined my own search method as opposed to using find_by_contents is because I am not using the acts_as_ferret plugin. >From looking at the acts_as_ferret code, it looks like you can just use the SearchResults object returned by the find_by_contents since it also includes the total hits needed to create the Paginator pages. Tom On 7/5/06, guest wrote: > > Nice! > > This is very useful! > > What code did you snip out of the serach method? Is that required? > > Thanks. > > > Tom Davies wrote: > > To add to what Jens said, you may find this code useful: > > > > In your model: > > > > def self.search(q, options = {}) > > return nil if q.nil? > > default_options = {:limit => 10, :page => 1} > > options = default_options.merge options > > options[:offset] = options[:limit] * (options[:page].to_i-1) > > ... snip ... > > num = INDEX.search_each(query, {:num_docs => options[:limit], > > :first_doc => options[:offset]}) do |doc, score| > > ... snip ... > > [num, results] > > end > > > > Notice that I return the total matches as num, plus the results. The > > total matches is necessary to generate a paginator across all the > > items. > > > > For the pagination, I created this simple method in my application > > controller (note it assumes a params[:page] being passed around): > > > > def pages_for(size, options = {}) > > default_options = {:per_page => 10} > > options = default_options.merge options > > pages = Paginator.new self, size, options[:per_page], > > (params[:page]||1) > > pages > > end > > > > And lastly, to use it in a controller: > > @total, @results = YourModel.search(@query, :page => > > (params[:page]||1) > > @result_pages = pages_for(@total) > > > > Tom > > > > On 5/3/06, Jens Kraemer wrote: > >> > unless @query.blank? > >> > >> Ferret-talk mailing list > >> Ferret-talk at rubyforge.org > >> http://rubyforge.org/mailman/listinfo/ferret-talk > >> > > > > > > -- > > Tom Davies > > > > http://blog.atomgiant.com > > http://gifthat.com > > > -- > Posted via http://www.ruby-forum.com/. > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > -- Tom Davies http://atomgiant.com http://gifthat.com From atomgiant at gmail.com Fri Jul 7 08:59:46 2006 From: atomgiant at gmail.com (Tom Davies) Date: Fri, 7 Jul 2006 08:59:46 -0400 Subject: [Ferret-talk] find_by_contents not returning SearchResults? In-Reply-To: References: <20060629141119.GU15787@cordoba.webit.de> <1151592281.12185.1.camel@localhost.localdomain> <20060629170405.GV15787@cordoba.webit.de> <1151601589.12185.6.camel@localhost.localdomain> <20060629210558.GA30043@cordoba.webit.de> Message-ID: Hi Dave, I currently only use a small subset of the Ferret API so this may not work universally for all areas you support sorting. One simple sorting solution would be to just support SQL style string expressions for sorting. It should be relatively easy to implement. So this code: sort_fields = [] sort_fields << Ferret::Search::SortField.new('created_at') sort_fields << Ferret::Search::SortField.new('url') num = INDEX.search_each(query, :sort => sort_fields) do |doc, score| Could be rewritten as: num = INDEX.search_each(query, :sort => 'created_at, url') do |doc, score| And perhaps it could also support ASC DESC syntax as well. In general I think anywhere the Ferret API requires us to create internal Ferret classes and pass them in it would be nice if there was a way to abstract this from the caller as much as possible using simple datastructures such as symbols, strings and arrays. Tom On 6/30/06, David Balmain wrote: > On 6/30/06, Jens Kraemer wrote: > > On Thu, Jun 29, 2006 at 06:19:48PM +0100, Pedro C?rte-Real wrote: > > [..] > > > > this will be fixed in the soon-to-be-released next version. > > > > > > Cool. I hate having non-standard patches to stuff. It would also be cool > > > to have a cleaner API to do sorting than the ferret one. One that uses > > > the field names passed to acts_as_ferret. Ferret is great but it's API > > > seems to be too much like Java and not like most ruby API's. I ended up > > > building a small class to encapsulate searching for my rails model to > > > hide all that away. > > > > Good point, but I'd rather wait for ferret's upcoming API changes before > > doing such changes in acts_as_ferret. > > > > Jens > > That's a very good plan. While we are on the subject, how do you think > the sort API should look? Once we get to a 1.0 release we are going to > be stuck with that API for a while so I want to get it right before > then and the sooner the better. Also, what other areas of the API do > you feel need work. For starters, I'll be getting rid of the Parameter > class. Instead of Field::Index::TOKENIZED it'll just be :index => :yes > or :index => :untokenized etc. Anyway, I'd love to hear any feed back > on any part of the API. Let's start with the Sort API. > > Cheers, > Dave > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > -- Tom Davies http://atomgiant.com http://gifthat.com From atomgiant at gmail.com Fri Jul 7 09:01:49 2006 From: atomgiant at gmail.com (Tom Davies) Date: Fri, 7 Jul 2006 09:01:49 -0400 Subject: [Ferret-talk] pagination in acts_as_ferret In-Reply-To: References: <20060503163044.GS29289@cordoba.webit.de> <94397f209e879c9d1ccbc43d99705d79@ruby-forum.com> Message-ID: Oh, one more thing, I just looked and the find_by_contents didn't support returning the total results when I wrote my original pagination post :) Also, I don't think the current acts_as_ferret release supports it either (but the trunk does), so you may have to grab the trunk if you want to paginate in pure acts_as_ferret land. Tom On 7/7/06, Tom Davies wrote: > The snipped out portion is just where you process the results. In my > case, I am searching for gifts so here is how I search and collect the > Gift objects: > > num = INDEX.search_each(query, {:num_docs => options[:limit], > :first_doc => options[:offset]}) do |doc, score| > logger.debug("Found doc: #{doc}, id: #{INDEX[doc]['id']}, score: > #{score}") > gifts << Gift.find(INDEX[doc]['id']) > end > > Also, the reason I defined my own search method as opposed to using > find_by_contents is because I am not using the acts_as_ferret plugin. > From looking at the acts_as_ferret code, it looks like you can just > use the SearchResults object returned by the find_by_contents since it > also includes the total hits needed to create the Paginator pages. > > Tom > > On 7/5/06, guest wrote: > > > > Nice! > > > > This is very useful! > > > > What code did you snip out of the serach method? Is that required? > > > > Thanks. > > > > > > Tom Davies wrote: > > > To add to what Jens said, you may find this code useful: > > > > > > In your model: > > > > > > def self.search(q, options = {}) > > > return nil if q.nil? > > > default_options = {:limit => 10, :page => 1} > > > options = default_options.merge options > > > options[:offset] = options[:limit] * (options[:page].to_i-1) > > > ... snip ... > > > num = INDEX.search_each(query, {:num_docs => options[:limit], > > > :first_doc => options[:offset]}) do |doc, score| > > > ... snip ... > > > [num, results] > > > end > > > > > > Notice that I return the total matches as num, plus the results. The > > > total matches is necessary to generate a paginator across all the > > > items. > > > > > > For the pagination, I created this simple method in my application > > > controller (note it assumes a params[:page] being passed around): > > > > > > def pages_for(size, options = {}) > > > default_options = {:per_page => 10} > > > options = default_options.merge options > > > pages = Paginator.new self, size, options[:per_page], > > > (params[:page]||1) > > > pages > > > end > > > > > > And lastly, to use it in a controller: > > > @total, @results = YourModel.search(@query, :page => > > > (params[:page]||1) > > > @result_pages = pages_for(@total) > > > > > > Tom > > > > > > On 5/3/06, Jens Kraemer wrote: > > >> > unless @query.blank? > > >> > > >> Ferret-talk mailing list > > >> Ferret-talk at rubyforge.org > > >> http://rubyforge.org/mailman/listinfo/ferret-talk > > >> > > > > > > > > > -- > > > Tom Davies > > > > > > http://blog.atomgiant.com > > > http://gifthat.com > > > > > > -- > > Posted via http://www.ruby-forum.com/. > > _______________________________________________ > > Ferret-talk mailing list > > Ferret-talk at rubyforge.org > > http://rubyforge.org/mailman/listinfo/ferret-talk > > > > > -- > Tom Davies > > http://atomgiant.com > http://gifthat.com > -- Tom Davies http://atomgiant.com http://gifthat.com From dbalmain.ml at gmail.com Fri Jul 7 09:52:20 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Fri, 7 Jul 2006 22:52:20 +0900 Subject: [Ferret-talk] find_by_contents not returning SearchResults? In-Reply-To: References: <20060629141119.GU15787@cordoba.webit.de> <1151592281.12185.1.camel@localhost.localdomain> <20060629170405.GV15787@cordoba.webit.de> <1151601589.12185.6.camel@localhost.localdomain> <20060629210558.GA30043@cordoba.webit.de> Message-ID: On 7/7/06, Tom Davies wrote: > Hi Dave, > > I currently only use a small subset of the Ferret API so this may not > work universally for all areas you support sorting. One simple > sorting solution would be to just support SQL style string expressions > for sorting. It should be relatively easy to implement. So this > code: > > sort_fields = [] > sort_fields << Ferret::Search::SortField.new('created_at') > sort_fields << Ferret::Search::SortField.new('url') > num = INDEX.search_each(query, :sort => sort_fields) do |doc, score| > > Could be rewritten as: > num = INDEX.search_each(query, :sort => 'created_at, url') do |doc, score| > > And perhaps it could also support ASC DESC syntax as well. > > In general I think anywhere the Ferret API requires us to create > internal Ferret classes and pass them in it would be nice if there was > a way to abstract this from the caller as much as possible using > simple datastructures such as symbols, strings and arrays. Thanks for your feedback Tom. I couldn't agree more. In the version I'm working on now for example, Documents are going to basically be Hashes with a boost attribute. So everything you normally do with a Hash, you'll be able to do with a Document. By the way, did you already know you could do this in the current version of Ferret?: num = INDEX.search_each(query, :sort => ['created_at', 'url']) do |doc, score| The only problem with this is that it doesn't allow you to reverse the sort. You have to do this; url_sorter = Ferret::Search::SortField.new('url', :reverse => true) num = INDEX.search_each(query, :sort => ['created_at', url_sorter]) do |doc, score| Maybe your sql sort string idea is a good one. Which do you think is better? num = INDEX.search_each(query, :sort => 'created_at, url_sorter DESC') do |doc, score| or num = INDEX.search_each(query, :sort => ['created_at', 'url_sorter', DESC]) do |doc, score| From kraemer at webit.de Fri Jul 7 10:07:49 2006 From: kraemer at webit.de (Jens Kraemer) Date: Fri, 7 Jul 2006 16:07:49 +0200 Subject: [Ferret-talk] find_by_contents not returning SearchResults? In-Reply-To: References: <20060629141119.GU15787@cordoba.webit.de> <1151592281.12185.1.camel@localhost.localdomain> <20060629170405.GV15787@cordoba.webit.de> <1151601589.12185.6.camel@localhost.localdomain> <20060629210558.GA30043@cordoba.webit.de> Message-ID: <20060707140749.GD17139@cordoba.webit.de> On Fri, Jul 07, 2006 at 10:52:20PM +0900, David Balmain wrote: > On 7/7/06, Tom Davies wrote: [..] > > By the way, did you already know you could do this in the current > version of Ferret?: > > num = INDEX.search_each(query, :sort => ['created_at', 'url']) do > |doc, score| > > The only problem with this is that it doesn't allow you to reverse the > sort. You have to do this; > > url_sorter = Ferret::Search::SortField.new('url', :reverse => true) > num = INDEX.search_each(query, :sort => ['created_at', > url_sorter]) do |doc, score| > > Maybe your sql sort string idea is a good one. Which do you think is better? > > num = INDEX.search_each(query, :sort => 'created_at, url_sorter > DESC') do |doc, score| +1 using :order instead of :sort would make it even more Rails-like. Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From atomgiant at gmail.com Fri Jul 7 10:19:26 2006 From: atomgiant at gmail.com (Tom Davies) Date: Fri, 7 Jul 2006 10:19:26 -0400 Subject: [Ferret-talk] find_by_contents not returning SearchResults? In-Reply-To: <20060707140749.GD17139@cordoba.webit.de> References: <20060629141119.GU15787@cordoba.webit.de> <1151592281.12185.1.camel@localhost.localdomain> <20060629170405.GV15787@cordoba.webit.de> <1151601589.12185.6.camel@localhost.localdomain> <20060629210558.GA30043@cordoba.webit.de> <20060707140749.GD17139@cordoba.webit.de> Message-ID: I am leaning towards the first version, so all in one string. That way you can have mixed sortings too such as "name ASC, created_at DESC". I also agree with Jens comment that calling it :order would make it more rails like. While we are at it, it would also be nice to use :limit for :num_docs, and :offset for :first_doc. Tom On 7/7/06, Jens Kraemer wrote: > On Fri, Jul 07, 2006 at 10:52:20PM +0900, David Balmain wrote: > > On 7/7/06, Tom Davies wrote: > [..] > > > > By the way, did you already know you could do this in the current > > version of Ferret?: > > > > num = INDEX.search_each(query, :sort => ['created_at', 'url']) do > > |doc, score| > > > > The only problem with this is that it doesn't allow you to reverse the > > sort. You have to do this; > > > > url_sorter = Ferret::Search::SortField.new('url', :reverse => true) > > num = INDEX.search_each(query, :sort => ['created_at', > > url_sorter]) do |doc, score| > > > > Maybe your sql sort string idea is a good one. Which do you think is better? > > > > num = INDEX.search_each(query, :sort => 'created_at, url_sorter > > DESC') do |doc, score| > > +1 > > using :order instead of :sort would make it even more Rails-like. > > Jens > > > -- > webit! Gesellschaft f?r neue Medien mbH www.webit.de > Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de > Schnorrstra?e 76 Tel +49 351 46766 0 > D-01069 Dresden Fax +49 351 46766 66 > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > -- Tom Davies http://atomgiant.com http://gifthat.com From dbalmain.ml at gmail.com Fri Jul 7 11:57:33 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Sat, 8 Jul 2006 00:57:33 +0900 Subject: [Ferret-talk] find_by_contents not returning SearchResults? In-Reply-To: References: <1151592281.12185.1.camel@localhost.localdomain> <20060629170405.GV15787@cordoba.webit.de> <1151601589.12185.6.camel@localhost.localdomain> <20060629210558.GA30043@cordoba.webit.de> <20060707140749.GD17139@cordoba.webit.de> Message-ID: On 7/7/06, Tom Davies wrote: > I am leaning towards the first version, so all in one string. That > way you can have mixed sortings too such as "name ASC, created_at > DESC". I also agree with Jens comment that calling it :order would > make it more rails like. Ok, string it is. And :order I can do too, although it won't hurt to leave :sort as an option. > While we are at it, it would also be nice to use :limit for :num_docs, > and :offset for :first_doc. +1 This sounds better to me too. Anyone disagree? > Tom > > On 7/7/06, Jens Kraemer wrote: > > On Fri, Jul 07, 2006 at 10:52:20PM +0900, David Balmain wrote: > > > On 7/7/06, Tom Davies wrote: > > [..] > > > > > > By the way, did you already know you could do this in the current > > > version of Ferret?: > > > > > > num = INDEX.search_each(query, :sort => ['created_at', 'url']) do > > > |doc, score| > > > > > > The only problem with this is that it doesn't allow you to reverse the > > > sort. You have to do this; > > > > > > url_sorter = Ferret::Search::SortField.new('url', :reverse => true) > > > num = INDEX.search_each(query, :sort => ['created_at', > > > url_sorter]) do |doc, score| > > > > > > Maybe your sql sort string idea is a good one. Which do you think is better? > > > > > > num = INDEX.search_each(query, :sort => 'created_at, url_sorter > > > DESC') do |doc, score| > > > > +1 > > > > using :order instead of :sort would make it even more Rails-like. > > > > Jens > > > > > > -- > > webit! Gesellschaft f?r neue Medien mbH www.webit.de > > Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de > > Schnorrstra?e 76 Tel +49 351 46766 0 > > D-01069 Dresden Fax +49 351 46766 66 > > _______________________________________________ > > Ferret-talk mailing list > > Ferret-talk at rubyforge.org > > http://rubyforge.org/mailman/listinfo/ferret-talk > > > > > -- > Tom Davies > > http://atomgiant.com > http://gifthat.com > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > From quinn at qutek.net Fri Jul 7 12:52:39 2006 From: quinn at qutek.net (Quinn) Date: Fri, 7 Jul 2006 18:52:39 +0200 Subject: [Ferret-talk] acts_as_ferret transactions Message-ID: acts_as_ferret does not handle transaction aborts properly. If a model is modified but something latter causes the transaction it was wrapped in to abort, the ferret index will not revert to the original record data. I have included a klunky patch to defer modifying the ferret index until after the current transaction commits. It would seem prudent to resolve this issue, though I don't think I have an ideal solution. I have an acts_as_ferret model that has indexed properties that are derived from other models that it is associated with. As a side effect this patch ensures that if I wrap modifications to the model and the associated models in a transaction all the modifications to the associated models make it into the ferret index. Thoughts? Index: lib/acts_as_ferret.rb =================================================================== --- lib/acts_as_ferret.rb (revision 59) +++ lib/acts_as_ferret.rb (working copy) @@ -496,7 +496,8 @@ module InstanceMethods attr_reader :reindex - @ferret_reindex = true + @ferret_reindex = false + @defer_for_transaction = false def ferret_before_update @ferret_reindex = true @@ -505,9 +506,13 @@ # add to index def ferret_create - logger.debug "ferret_create/update: #{self.class.name} : #{self.id}" - self.class.ferret_index << self.to_doc if @ferret_reindex - @ferret_reindex = true + unless @defer_for_transaction + logger.debug "ferret_create/update: #{self.class.name} : #{self.id}" + self.class.ferret_index << self.to_doc if @ferret_reindex + @ferret_reindex = true + else + logger.debug "deferred ferret_create/update: #{self.class.name} : #{self.id}" + end true end alias :ferret_update :ferret_create @@ -522,6 +527,21 @@ end true end + + def at_start_transaction(name = nil) + @defer_for_transaction = true + end + + def at_abort_transaction(name = nil) + @defer_for_transaction = false + @ferret_reindex = false + end + + def at_commit_transaction(name = nil) + @defer_for_transaction = false + ferret_create + @ferret_reindex = false + end # convert instance to ferret document def to_doc @@ -786,4 +806,28 @@ end end +module Transaction + module Simple + alias :start_transaction_object :start_transaction + alias :abort_transaction_object :abort_transaction + alias :commit_transaction_object :commit_transaction + + def start_transaction(name = nil) + at_start_transaction(name) if respond_to?(:at_start_transaction) + start_transaction_object(name) + end + + def abort_transaction(name = nil) + at_abort_transaction(name) if respond_to?(:at_abort_transaction) + abort_transaction_object(name) + end + + def commit_transaction(name = nil) + at_commit_transaction(name) if respond_to?(:at_commit_transaction) + commit_transaction_object(name) + end + end +end + + # END acts_as_ferret.rb -- Posted via http://www.ruby-forum.com/. From waspfactory at gmail.com Fri Jul 7 13:01:24 2006 From: waspfactory at gmail.com (caspar) Date: Fri, 7 Jul 2006 19:01:24 +0200 Subject: [Ferret-talk] Can't run WEBRick with Plugin In-Reply-To: <20060628082722.GN15787@cordoba.webit.de> References: <4594716165d5e9ae8b72f042dfb98393@ruby-forum.com> <20060628082722.GN15787@cordoba.webit.de> Message-ID: When i copied my rails directory from work to home and tried to run WEBrick it died on the first line "Booting webrick..." Problem was quickly sorted by installing ferret gem.... Jens Kraemer wrote: > On Tue, Jun 27, 2006 at 08:43:16PM +0200, guest wrote: >> After copying the acts_as_ferret pluin to my rails folder, I can't boot >> WEBrick. Has anyone run into this before? > > no, but if you gave some more info (what Webrick prints out when it > refuses to start, for example), we could probably help. > > regards, > Jens > > -- > webit! Gesellschaft f?r neue Medien mbH www.webit.de > Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de > Schnorrstra?e 76 Tel +49 351 46766 0 > D-01069 Dresden Fax +49 351 46766 66 -- Posted via http://www.ruby-forum.com/. From bk at benjaminkrause.com Fri Jul 7 13:23:27 2006 From: bk at benjaminkrause.com (Benjamin Krause) Date: Fri, 7 Jul 2006 19:23:27 +0200 Subject: [Ferret-talk] Recalculating the score In-Reply-To: References: <078eb4f56a626fd541f98d842b7c388f@ruby-forum.com> Message-ID: Hey David, thanks for the answer .. > How about setting the boost for the whole document rather than just > the :relevance field? Or do you sometimes want to sort by relevance > without taking the :relevance field into account? ah.. you mean i should boost each field of the document? or is there a way to set a boost level for the document as a whole? if so, i've missed it .. > PS: While we are on the topic, how would you like the sort API to > look? Many have complained that the sort API is too java-like but > no-one has suggested any improvements yet. I'd love to see some ideas. i like the idea of giving a short block with a sort algorithm.. i would like to see something like that: index.search ( :query => my_query, :sort => Proc.new( |doc| # some caluclation; return new_score ), :reverse => false, :filter => false, :start => 0, :limit => 10 ) alternativly you should be able to give the sort param a name of a filed, like ':sort => :score' or an array of fields like ':sort => [ :score, :title ]' and sort by the first element and then by the 2nd if the two or more docs share the same value for the 1st element. I guess something like ":sort => :score" is enough for most people .. i think the other options are almost like it is implemented right now .. i don't think you nee the SortField class. btw.. i do find the filter API not really intuitive, actually i didn't understand it at all ;) i know what you want to do with filters and how you want to get there, but i haven't found any understandable documentation, on how to build one .. maybe you should write a short tutorial on how to write a filter.. i would find it very intuitive, to have something like a base_query.. like having one query to filter/limit results, and have another query to do the real search.. and btw.. one feature i would definitely would like to see is to limit the search on a number of fields.. i know i can write something like field_one:"search string" || field_two:"search string||field_three:"search string"||field_four:"search string" but i would like to be able to write something like (field_one|field_two|field_three|field_four):"search string" furthermore, you should be able to say something like .. search in all fields, except field_one .. like (*|!field_one):"search string" Ben -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Fri Jul 7 14:53:18 2006 From: kraemer at webit.de (Jens Kraemer) Date: Fri, 7 Jul 2006 20:53:18 +0200 Subject: [Ferret-talk] find_by_contents not returning SearchResults? In-Reply-To: References: <1151592281.12185.1.camel@localhost.localdomain> <20060629170405.GV15787@cordoba.webit.de> <1151601589.12185.6.camel@localhost.localdomain> <20060629210558.GA30043@cordoba.webit.de> <20060707140749.GD17139@cordoba.webit.de> Message-ID: <20060707185318.GA25692@cordoba.webit.de> On Sat, Jul 08, 2006 at 12:57:33AM +0900, David Balmain wrote: > On 7/7/06, Tom Davies wrote: > > I am leaning towards the first version, so all in one string. That > > way you can have mixed sortings too such as "name ASC, created_at > > DESC". I also agree with Jens comment that calling it :order would > > make it more rails like. > > Ok, string it is. And :order I can do too, although it won't hurt to > leave :sort as an option. > > > While we are at it, it would also be nice to use :limit for :num_docs, > > and :offset for :first_doc. > > +1 > This sounds better to me too. Anyone disagree? nope, sounds perfect that way :-) Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From mxcurioni at yahoo.com Fri Jul 7 18:13:53 2006 From: mxcurioni at yahoo.com (Maxime Curioni) Date: Sat, 8 Jul 2006 00:13:53 +0200 Subject: [Ferret-talk] Search on data accross many tables, linked by belongs_to Message-ID: I am using Ferret and acts_as_ferret, as my search back-end for my Rails project. I have a question about using acts_as_ferret on a main table that is linked to other tables by foreign keys. Is there a way to include the information linked by the belongs_to keyword in the search results ? As an example, let's say I have a main table 'posts': ============================= | posts ============================= id title content category_id where category_id is a foreign key pointing to a row of the table 'categories': ============================= | categories ============================= id name My Rails models are then: class Post < ActiveRecord::Base belongs_to :category acts_as_ferret :fields => [ 'title', 'content' ] end class Category < ActiveRecord::Base has_many :posts end I would want to be able to change the following line acts_as_ferret :fields => [ 'title', 'content' ] to acts_as_ferret :fields => [ 'title', 'content', 'category.name' ] and have Ferret search for records specified by title, content and category name. Is there an known solution to my problem ? In Ferret, how can I specify a search accross many tables for results returning only a main class (in the example, search all posts that have a certain category name by just typing that cateogry name as the query) ? Do I have to create a new model class PostIndex that gathers all the information just for the indexing, even though the information is not stored that way in the database ? Thanks a lot for your help, Maxime Curioni -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Fri Jul 7 18:22:31 2006 From: kraemer at webit.de (Jens Kraemer) Date: Sat, 8 Jul 2006 00:22:31 +0200 Subject: [Ferret-talk] Search on data accross many tables, linked by belongs_to In-Reply-To: References: Message-ID: <20060707222231.GA31705@cordoba.webit.de> Hi Maxime, On Sat, Jul 08, 2006 at 12:13:53AM +0200, Maxime Curioni wrote: > I am using Ferret and acts_as_ferret, as my search back-end for my Rails > project. I have a question about using acts_as_ferret on a main table > that is linked to other tables by foreign keys. Is there a way to > include the information linked by the belongs_to keyword in the search > results ? > [..] > My Rails models are then: > > class Post < ActiveRecord::Base > belongs_to :category > acts_as_ferret :fields => [ 'title', 'content' ] > end > > class Category < ActiveRecord::Base > has_many :posts > end > > > I would want to be able to change the following line > acts_as_ferret :fields => [ 'title', 'content' ] > to > acts_as_ferret :fields => [ 'title', 'content', 'category.name' > ] > > and have Ferret search for records specified by title, content and > category name. you could use acts_as_ferret :fields => [ 'title', 'content', :category_name] def category_name category.name end to achieve this. Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From dbalmain.ml at gmail.com Fri Jul 7 19:02:20 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Sat, 8 Jul 2006 08:02:20 +0900 Subject: [Ferret-talk] Recalculating the score In-Reply-To: References: <078eb4f56a626fd541f98d842b7c388f@ruby-forum.com> Message-ID: On 7/8/06, Benjamin Krause wrote: > Hey David, > > thanks for the answer .. > > > How about setting the boost for the whole document rather than just > > the :relevance field? Or do you sometimes want to sort by relevance > > without taking the :relevance field into account? > > ah.. you mean i should boost each field of the document? or is there a > way to set a boost level for the document as a whole? if so, i've missed > it .. doc = Ferret::Document::Document.new() doc.boost = 100.0 > > PS: While we are on the topic, how would you like the sort API to > > look? Many have complained that the sort API is too java-like but > > no-one has suggested any improvements yet. I'd love to see some ideas. > > i like the idea of giving a short block with a sort algorithm.. i would > like to see something like that: > > index.search ( :query => my_query, > :sort => Proc.new( |doc| # some caluclation; return > new_score ), > :reverse => false, > :filter => false, > :start => 0, > :limit => 10 ) The way sort works at the moment is that it caches all fields that are sorted on. If you start doing sort like this and you have to load every document in the result set which would have a huge performance hit. I guess I could make this feature available though. In the pure ruby version of Ferret you can do this; st_length = SortField::SortType.new("length", lambda{|str| str.length}) sf = SortField.new("content", {:sort_type => st_length, :reverse => true, :comparator => lambda{|i,j| j <=> i}}) The sort type lambda allows you to create the sort cache. Then the comparator lets you compare those two values. This is flexible while remaining performant, although I still think I can make it more intuitive. > alternativly you should be able to give the sort param a name of a > filed, like ':sort => :score' or an array of fields like ':sort => [ > :score, :title ]' and sort by the first element and then by the 2nd if > the two or more docs share the same value for the 1st element. > I guess something like ":sort => :score" is enough for most people .. Actually, you can already do this. Have you tried it? Only :score is treated as a field name. You'd have to do this; index.search_each(query, :sort => [SortField::RELEVANCE, :title, :price]) > i think the other options are almost like it is implemented right now .. > i don't think you nee the SortField class. > > btw.. i do find the filter API not really intuitive, actually i didn't > understand it at all ;) > > i know what you want to do with filters and how you want to get there, > but i haven't found any understandable documentation, on how to build > one .. > > maybe you should write a short tutorial on how to write a filter.. i > would find it very intuitive, to have something like a base_query.. like > having one query to filter/limit results, and have another query to do > the real search.. I will. The TermEnum and TermDocEnum are essential for using filters and they've undergone major changes so I'll hold off on this until I get the next release out. > and btw.. one feature i would definitely would like to see is to limit > the search on a number of fields.. > > i know i can write something like > > field_one:"search string" || field_two:"search > string||field_three:"search string"||field_four:"search string" > > but i would like to be able to write something like > > (field_one|field_two|field_three|field_four):"search string" You can do this already, just get rid of the brackets; field_one|field_two|field_three|field_four:"search string" > furthermore, you should be able to say something like .. search in all > fields, except field_one .. like > > (*|!field_one):"search string" You can't do this, but it is a nice idea. I'll think about it. I might also add the brackets into the syntax. Anyway, thanks for your feedback Ben. I will definitely use it. Cheers, Dave From waspfactory at gmail.com Fri Jul 7 19:10:18 2006 From: waspfactory at gmail.com (caspar) Date: Sat, 8 Jul 2006 01:10:18 +0200 Subject: [Ferret-talk] querying the SearchResults instance Message-ID: <2872073863b1658f03721e7410429541@ruby-forum.com> Hi how do you search against the results returned by find_by_contents using ferret? i.e. how do you "search within these results"? This is an acts_as_ferret question again... thanks in advance.. cheers caspar -- Posted via http://www.ruby-forum.com/. From contact at ezabel.com Fri Jul 7 22:21:15 2006 From: contact at ezabel.com (Ian Zabel) Date: Sat, 8 Jul 2006 04:21:15 +0200 Subject: [Ferret-talk] acts_as_ferret Locale issue In-Reply-To: <20060707074805.GA17139@cordoba.webit.de> References: <227c8b013db10fc48ecebfaaef5e9a74@ruby-forum.com> <20060706221625.GA17700@cordoba.webit.de> <20060707074805.GA17139@cordoba.webit.de> Message-ID: <8e008f649bac6f6e576fa6d43df56d4f@ruby-forum.com> > The ENV['LANG'] value has to correspond to the encoding of the data you > want to index, so if your data is latin1, Ferret needs to run with such > a locale, i.e. ISO-8859-1. > > In such cases I dump the data as text, convert to utf8 (usually > with vim :set fileencoding=utf8), re-create the table with DEFAULT > CHARSET UTF-8 and re-import the data. So I tried to get this to work in MANY different ways. I converted the encoding with vim, iconv, and mysqldump/import. I changed the table types. I tried this: http://textsnippets.com/posts/show/84 , I tried this: http://climbtothestars.org/archives/2004/07/18/converting-mysql-database-contents-to-utf-8/ No matter what I try, I get the same error when I change my environment.rb to en_US.utf8. If I set it to en_US.iso88591, everything works fine. If I could successfully convert my database to utf8 AND get it to work with ferret, I would love to. But I just can't get it. So.... I think I'm going to stick with latin1 for now. :( Thanks for your help, guys! Ian. -- Posted via http://www.ruby-forum.com/. From yingfeng.zhang at gmail.com Sat Jul 8 03:05:52 2006 From: yingfeng.zhang at gmail.com (Charlie) Date: Sat, 8 Jul 2006 09:05:52 +0200 Subject: [Ferret-talk] How to add Asia token analyzer to ferret simply? In-Reply-To: References: <1c6ae83de30b2cb8edd128c7a5652680@ruby-forum.com> <467b7a2720b7860ac16626227d1820e5@ruby-forum.com> Message-ID: <6af30af0a5ff9641e0eb919cff3e7778@ruby-forum.com> David Balmain wrote: > On 7/7/06, Charlie wrote: >> And also it is needed to make the new Chinese analyzer work together >> with the original standard analyzer > > I answered this on the rails list but just in case; > > # Create a PerFieldAnalyzer (AKA PerFieldAnalyzerWrapper) which > # defaults to Standard > analyzer = PerFieldAnalyzer.new(StandardAnalyzer.new) > > # Add a special character analyzer for the chinese field or > # whatever field it is that has chinese characters. This splits the > # data into single characters. > analyzer["chinese"] = RegExpAnalyzer.new(/./, false) Thank you Dave,I looked up the API and found that PerFieldAnalyzerWrapper is useful for field analyze,especially for the coresponding SQL: select * from students where title like '%Charlie%' and location_id = 1, where location_id =1 query can be got through PerFieldAnalyzerWrapper. I have just now downloaded and read the book of Lucene In Action,and in Chapter 4,it tolds that the standardanalyzer will also split the CJK language into tokens although there is no spaces among them,for example:'????' will be splitted into tokens of '?' '?' '?' '?',that is just what I want. But I still can not search any results from ferret. I use the MySQL as the database with all the encoding of UTF-8,and also,all of my rails sources is saved in the form of UTF-8,then when I input the search box of the above characters of '????', I will got zero searched results,can you please help with that situation? Very Grateful! Best Regards Charlie -- Posted via http://www.ruby-forum.com/. From dbalmain.ml at gmail.com Sat Jul 8 05:43:51 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Sat, 8 Jul 2006 18:43:51 +0900 Subject: [Ferret-talk] How to add Asia token analyzer to ferret simply? In-Reply-To: <6af30af0a5ff9641e0eb919cff3e7778@ruby-forum.com> References: <1c6ae83de30b2cb8edd128c7a5652680@ruby-forum.com> <467b7a2720b7860ac16626227d1820e5@ruby-forum.com> <6af30af0a5ff9641e0eb919cff3e7778@ruby-forum.com> Message-ID: On 7/8/06, Charlie wrote: > David Balmain wrote: > > On 7/7/06, Charlie wrote: > >> And also it is needed to make the new Chinese analyzer work together > >> with the original standard analyzer > > > > I answered this on the rails list but just in case; > > > > # Create a PerFieldAnalyzer (AKA PerFieldAnalyzerWrapper) which > > # defaults to Standard > > analyzer = PerFieldAnalyzer.new(StandardAnalyzer.new) > > > > # Add a special character analyzer for the chinese field or > > # whatever field it is that has chinese characters. This splits the > > # data into single characters. > > analyzer["chinese"] = RegExpAnalyzer.new(/./, false) > > Thank you Dave,I looked up the API and found that > PerFieldAnalyzerWrapper is useful for field analyze,especially for the > coresponding SQL: select * from students where title like '%Charlie%' > and location_id = 1, where location_id =1 query can be got through > PerFieldAnalyzerWrapper. > > I have just now downloaded and read the book of Lucene In Action,and in > Chapter 4,it tolds that the standardanalyzer will also split the CJK > language into tokens although there is no spaces among them,for > example:'????' will be splitted into tokens of '?' '?' '?' '?',that is > just what I want. But I still can not search any results from ferret. I > use the MySQL as the database with all the encoding of UTF-8,and > also,all of my rails sources is saved in the form of UTF-8,then when I > input the search box of the above characters of > '????', I will got zero searched results,can you please help with that > situation? Very Grateful! Hi Charlie, The StandardAnalyzer in Ferret works a little differently to the StandardAnalyzer in Lucene. That's why you need to use the RegExpAnalyzer I gave you. analyzer = PerFieldAnalyzer.new(StandardAnalyzer.new) analyzer["chinese"] = RegExpAnalyzer.new(/./, false) You also need to make sure that this is the analyzer that is being used by the query parser. If you are using the Index::Index class it will handle it for you. Try this in irb; $ irb -KU irb(main):001:0> require 'rubygems' => true irb(main):002:0> require 'ferret' => true irb(main):003:0> include Ferret::Index => Object irb(main):004:0> include Ferret::Analysis => Object irb(main):005:0> analyzer = PerFieldAnalyzer.new(StandardAnalyzer.new) => # irb(main):006:0> analyzer["chinese"] = RegExpAnalyzer.new(/./, false) => # irb(main):007:0> index = Index.new(:analyzer => analyzer) => # irb(main):008:0> index << {:english => "the quick brown fox jumped over the lazy dog", :chinese => '????'} => # irb(main):009:0> index << {:chinese => "the quick brown fox jumped over the lazy dog", :english => '????'} => # irb(main):010:0> index.search_each("chinese:?") {|doc, score| puts "found in #{doc}"} found in 0 => 1 irb(main):011:0> index.search_each("english:?") {|doc, score| puts "found in #{doc}"} => 0 From mxcurioni at yahoo.com Sat Jul 8 18:08:01 2006 From: mxcurioni at yahoo.com (Maxime Curioni) Date: Sun, 9 Jul 2006 00:08:01 +0200 Subject: [Ferret-talk] Search on data accross many tables, linked by belongs_to In-Reply-To: <20060707222231.GA31705@cordoba.webit.de> References: <20060707222231.GA31705@cordoba.webit.de> Message-ID: Thanks Jens, I didn't know it was that simple ! > you could use > > acts_as_ferret :fields => [ 'title', 'content', :category_name] > def category_name > category.name > end > > to achieve this. > > Jens -- Posted via http://www.ruby-forum.com/. From waspfactory at gmail.com Sun Jul 9 11:43:41 2006 From: waspfactory at gmail.com (Caspar) Date: Sun, 9 Jul 2006 17:43:41 +0200 Subject: [Ferret-talk] acts_as_ferret.. what does it actually do? Message-ID: <5433872b7b21329f49c3ff93f2087394@ruby-forum.com> Okay in this plea for help I'm going to repeat some of what i posted before but with a larger amount of background info in the hope that i can get a decent grip on ferret before it wriggles away.. Firstly, what does installing the acts_as_ferret plugin actually do? I install it and add it to my model and then the index is automatically generated and a few methods are added to it and as far as i can see all the index CRUD is being handled too. The Ferret gem is installed and the index is generated and searchable so i assumed that i would be able top use Ferret methods (other than the ones in acts as ferret)in my application. when i try the code at the bottom of this post i get the following errors (also in full at the bottom of post) NameError: uninitialized constant BooleanQuery So this means the rails/ruby/ is not seeing ferret right? so how do i get rials to be able to see ferret? I have tried many variations on include/require/ferret_config.rb but am obviously not getting it. Do i need to do something extra if i want to simplicity of acts_as_ferret and the power of Ferret? If anyone out there knows what th eproblem is then please get in touch, also does anyone use acts_as_ferret successfully in an application doing more than just using find_by_contents? Thnaks in advance for any replies/help cheers caspar #####extract from model require 'ferret' class VoObject < ActiveRecord::Base acts_as_ferret :fields=> ['short_description','section','sale_category','sale_type','outcode'] def VoObject.refine_search(search_input) bq = BooleanQuery.new bq.add_query(TermQuery.new(Term.new("section", search_input), BooleanClause::Occur::Should)) filter = QueryFilter.new(bq) @vobjects = Item.find_by_contents(search_text,:filter => filter, :sort => ["section", "sale_category"]) redirect_to :results end ############ I get this ############## ruby script/console Loading development environment. >> VoObject.refine_search('housing') NameError: uninitialized constant BooleanQuery from /usr/lib/ruby/gems/1.8/gems/activesupport-1.3.1/lib/active_support/dependencies.rb:123:in `const_missing' from /usr/lib/ruby/gems/1.8/gems/activesupport-1.3.1/lib/active_support/dependencies.rb:131:in `const_missing' from /usr/lib/ruby/gems/1.8/gems/activesupport-1.3.1/lib/active_support/dependencies.rb:133:in `const_missing' from script/../config/../config/../app/models/vo_object.rb:22:in `refine_search' from (irb):1 -- Posted via http://www.ruby-forum.com/. From chris.lowis at gmail.com Sun Jul 9 11:51:27 2006 From: chris.lowis at gmail.com (Chris Lowis) Date: Sun, 9 Jul 2006 17:51:27 +0200 Subject: [Ferret-talk] Search terms in URL Message-ID: I have set up a ferret search using the examples in the demo app. Is it possible to have the search terms included in the URL, so that, for example, users can bookmark a search results page ? Chris -- Posted via http://www.ruby-forum.com/. From dbalmain.ml at gmail.com Sun Jul 9 12:10:10 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Mon, 10 Jul 2006 01:10:10 +0900 Subject: [Ferret-talk] acts_as_ferret.. what does it actually do? In-Reply-To: <5433872b7b21329f49c3ff93f2087394@ruby-forum.com> References: <5433872b7b21329f49c3ff93f2087394@ruby-forum.com> Message-ID: On 7/10/06, Caspar wrote: > Okay in this plea for help I'm going to repeat some of what i posted > before but with a larger amount of background info in the hope that i > can get a decent grip on ferret before it wriggles away.. > > Firstly, what does installing the acts_as_ferret plugin actually do? I > install it and add it to my model and then the index is automatically > generated and a few methods are added to it and as far as i can see all > the index CRUD is being handled too. The Ferret gem is installed and > the index is generated and searchable so i assumed that i would be able > top use Ferret methods (other than the ones in acts as ferret)in my > application. when i try the code at the bottom of this post i get the > following errors (also in full at the bottom of post) > > NameError: uninitialized constant BooleanQuery Basically you just need to include the Search module in your code. ie. include Ferret::Search This could be dangerous so it is often better to include the full class path as I demonstrate bellow. > So this means the rails/ruby/ is not seeing ferret right? so how do i > get rials to be able to see ferret? I have tried many variations on > include/require/ferret_config.rb but am obviously not getting it. Do i > need to do something extra if i want to simplicity of acts_as_ferret and > the power of Ferret? > If anyone out there knows what th eproblem is then please get in touch, > also does anyone use acts_as_ferret successfully in an application doing > more than just using find_by_contents? Thnaks in advance for any > replies/help > cheers > caspar > > > #####extract from model > > require 'ferret' > > class VoObject < ActiveRecord::Base > acts_as_ferret :fields=> > ['short_description','section','sale_category','sale_type','outcode'] > > def VoObject.refine_search(search_input) > bq = BooleanQuery.new > bq.add_query(TermQuery.new(Term.new("section", search_input), > BooleanClause::Occur::Should)) > filter = QueryFilter.new(bq) > @vobjects = Item.find_by_contents(search_text,:filter => filter, > :sort => ["section", "sale_category"]) > redirect_to :results > end > def VoObject.refine_search(search_input) bq = Ferret::Search::BooleanQuery.new bq.add_query(Ferret::Search::TermQuery.new(Ferret::Index::Term.new("section", search_input), Ferret::Search::BooleanClause::Occur::Should)) filter = Ferret::Search::QueryFilter.new(bq) @vobjects = Item.find_by_contents(search_text,:filter => filter, :sort => ["section", "sale_category"]) redirect_to :results end Please let me know if you are still having problems. Cheers, Dave From waspfactory at gmail.com Sun Jul 9 12:41:24 2006 From: waspfactory at gmail.com (Caspar) Date: Sun, 9 Jul 2006 18:41:24 +0200 Subject: [Ferret-talk] acts_as_ferret.. what does it actually do? In-Reply-To: References: <5433872b7b21329f49c3ff93f2087394@ruby-forum.com> Message-ID: <89cf9021677761abd02cefc02f75c3ee@ruby-forum.com> Hi okay thank you very much for replying, I'm very impressed with ferret and more impressed by you answering support questions, excellent work! anyway update wrong number of arguments (2 for 1) RAILS_ROOT: script/../config/.. Application Trace | Framework Trace | Full Trace #{RAILS_ROOT}/app/models/vo_object.rb:26:in `initialize' #{RAILS_ROOT}/app/models/vo_object.rb:26:in `refine_search' #{RAILS_ROOT}/app/controllers/search_controller.rb:34:in `refine' line 26: bq.add_query(Ferret::Search::TermQuery.new(Ferret::Index::Term.new("section",search_input), Ferret::Search::BooleanClause::Occur::SHOULD)) sorry I'm just rushing off for the commute home and have only just tried this, have not had enough time to play around with it yet so if its a stupid mistake by me i'm sorry for time wasting! regards caspar David Balmain wrote: > On 7/10/06, Caspar wrote: >> application. when i try the code at the bottom of this post i get the >> following errors (also in full at the bottom of post) >> >> NameError: uninitialized constant BooleanQuery > > Basically you just need to include the Search module in your code. ie. > > include Ferret::Search > > This could be dangerous so it is often better to include the full > class path as I demonstrate bellow. > >> caspar >> def VoObject.refine_search(search_input) >> bq = BooleanQuery.new >> bq.add_query(TermQuery.new(Term.new("section", search_input), >> BooleanClause::Occur::Should)) >> filter = QueryFilter.new(bq) >> @vobjects = Item.find_by_contents(search_text,:filter => filter, >> :sort => ["section", "sale_category"]) >> redirect_to :results >> end >> > > def VoObject.refine_search(search_input) > bq = Ferret::Search::BooleanQuery.new > bq.add_query(Ferret::Search::TermQuery.new(Ferret::Index::Term.new("section", > search_input), > Ferret::Search::BooleanClause::Occur::Should)) > filter = Ferret::Search::QueryFilter.new(bq) > @vobjects = Item.find_by_contents(search_text,:filter => filter, > :sort => ["section", "sale_category"]) > redirect_to :results > end > > Please let me know if you are still having problems. > > Cheers, > Dave -- Posted via http://www.ruby-forum.com/. From dbalmain.ml at gmail.com Sun Jul 9 13:06:35 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Mon, 10 Jul 2006 02:06:35 +0900 Subject: [Ferret-talk] acts_as_ferret.. what does it actually do? In-Reply-To: <89cf9021677761abd02cefc02f75c3ee@ruby-forum.com> References: <5433872b7b21329f49c3ff93f2087394@ruby-forum.com> <89cf9021677761abd02cefc02f75c3ee@ruby-forum.com> Message-ID: On 7/10/06, Caspar wrote: > Hi okay thank you very much for replying, I'm very impressed with ferret > and more impressed by you answering support questions, excellent work! > > anyway update > > > > > wrong number of arguments (2 for 1) > > RAILS_ROOT: script/../config/.. > Application Trace | Framework Trace | Full Trace > #{RAILS_ROOT}/app/models/vo_object.rb:26:in `initialize' > #{RAILS_ROOT}/app/models/vo_object.rb:26:in `refine_search' > #{RAILS_ROOT}/app/controllers/search_controller.rb:34:in `refine' > > > > line 26: > bq.add_query(Ferret::Search::TermQuery.new(Ferret::Index::Term.new("section",search_input), > Ferret::Search::BooleanClause::Occur::SHOULD)) Looks like you've got the brackets in the wrong place. I have to take partial responsibility since didn't pick it up the first time either. Try this; bq.add_query(Ferret::Search::TermQuery.new(Ferret::Index::Term.new("section",search_input)), Ferret::Search::BooleanClause::Occur::SHOULD) > sorry I'm just rushing off for the commute home and have only just tried > this, have not had enough time to play around with it yet so if its a > stupid mistake by me i'm sorry for time wasting! No worries. :D > regards > caspar > > > > David Balmain wrote: > > On 7/10/06, Caspar wrote: > >> application. when i try the code at the bottom of this post i get the > >> following errors (also in full at the bottom of post) > >> > >> NameError: uninitialized constant BooleanQuery > > > > Basically you just need to include the Search module in your code. ie. > > > > include Ferret::Search > > > > This could be dangerous so it is often better to include the full > > class path as I demonstrate bellow. > > > >> caspar > >> def VoObject.refine_search(search_input) > >> bq = BooleanQuery.new > >> bq.add_query(TermQuery.new(Term.new("section", search_input), > >> BooleanClause::Occur::Should)) > >> filter = QueryFilter.new(bq) > >> @vobjects = Item.find_by_contents(search_text,:filter => filter, > >> :sort => ["section", "sale_category"]) > >> redirect_to :results > >> end > >> > > > > def VoObject.refine_search(search_input) > > bq = Ferret::Search::BooleanQuery.new > > bq.add_query(Ferret::Search::TermQuery.new(Ferret::Index::Term.new("section", > > search_input), > > Ferret::Search::BooleanClause::Occur::Should)) > > filter = Ferret::Search::QueryFilter.new(bq) > > @vobjects = Item.find_by_contents(search_text,:filter => filter, > > :sort => ["section", "sale_category"]) > > redirect_to :results > > end > > > > Please let me know if you are still having problems. > > > > Cheers, > > Dave > > > -- > Posted via http://www.ruby-forum.com/. > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > From kraemer at webit.de Mon Jul 10 03:47:45 2006 From: kraemer at webit.de (Jens Kraemer) Date: Mon, 10 Jul 2006 09:47:45 +0200 Subject: [Ferret-talk] Search terms in URL In-Reply-To: References: Message-ID: <20060710074745.GE17139@cordoba.webit.de> On Sun, Jul 09, 2006 at 05:51:27PM +0200, Chris Lowis wrote: > I have set up a ferret search using the examples in the demo app. Is it > possible to have the search terms included in the URL, so that, for > example, users can bookmark a search results page ? sure, setting the search form's method to 'get' should do the trick. Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From kraemer at webit.de Mon Jul 10 03:57:46 2006 From: kraemer at webit.de (Jens Kraemer) Date: Mon, 10 Jul 2006 09:57:46 +0200 Subject: [Ferret-talk] querying the SearchResults instance In-Reply-To: <2872073863b1658f03721e7410429541@ruby-forum.com> References: <2872073863b1658f03721e7410429541@ruby-forum.com> Message-ID: <20060710075746.GG17139@cordoba.webit.de> On Sat, Jul 08, 2006 at 01:10:18AM +0200, caspar wrote: > Hi how do you search against the results returned by find_by_contents > using ferret? > i.e. how do you "search within these results"? > This is an acts_as_ferret question again... As acts_as_ferret defaults to using AND as operator between query terms, the easiest solution is to simply append the new query to the original one. If you configured aaf to use OR as default operator, you'll have to AND the original and the new query together, i.e. "(#{old_query}) AND (#{new_query})" Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From clare.cavanagh at argoent.co.uk Mon Jul 10 09:36:37 2006 From: clare.cavanagh at argoent.co.uk (BlueJay) Date: Mon, 10 Jul 2006 15:36:37 +0200 Subject: [Ferret-talk] Category Number Results returned Message-ID: I am looking to have a number of categories populated from my results of a search. For example, searching on "sport" would display all results for sport. I want to also have a number of categories to refine the documents down. So by clicking on the "Fishing" category or the "Shooting" category, I would only see the results on sport around that category. Now for the fun. I want to determine the total number of results in each category for a give search. So in the above, for a search on sport I want to display the results but in the Fishing item I want to say how many results there are in total before the user clicks on the item. For example in the pull down I want to display "Fishing (10001), Shooting (2003)". I was going to do this in Ruby by doing a simple count for each category item on the returned result set, but I believe that this would mean returning all the results of a given query to Ruby in order to do this count and I am concerned that this would cause performance issues for large result sets. If I put pagination into the mix and only display the first 50 results on the screen, would this add an additional complexity or would this just be called through Ruby? Thanks for your assistance with this... -- Posted via http://www.ruby-forum.com/. From dbalmain.ml at gmail.com Mon Jul 10 09:52:33 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Mon, 10 Jul 2006 22:52:33 +0900 Subject: [Ferret-talk] Category Number Results returned In-Reply-To: References: Message-ID: On 7/10/06, BlueJay wrote: > I am looking to have a number of categories populated from my results of > a search. For example, searching on "sport" would display all results > for sport. I want to also have a number of categories to refine the > documents down. So by clicking on the "Fishing" category or the > "Shooting" category, I would only see the results on sport around that > category. > > Now for the fun. I want to determine the total number of results in each > category for a give search. So in the above, for a search on sport I > want to display the results but in the Fishing item I want to say how > many results there are in total before the user clicks on the item. For > example in the pull down I want to display "Fishing (10001), Shooting > (2003)". Hi Clare, The fastest way to do this would be to run the query multiple times. So for your "sport" example you'd do something like this; fishing_count = index.search_each("sport AND fishing", :num_docs => 1) {} shooting_count = index.search_each("sport AND shooting", :num_docs => 1) {} # etc. Then go ahead and paginate your query as you usually would. > I was going to do this in Ruby by doing a simple count for each category > item on the returned result set, but I believe that this would mean > returning all the results of a given query to Ruby in order to do this > count and I am concerned that this would cause performance issues for > large result sets. Quite possibly. But running the query multiple times should be fine in terms of performance. You could use filters instead of the code I demonstrated above to further improve performance. > If I put pagination into the mix and only display the first 50 results > on the screen, would this add an additional complexity or would this > just be called through Ruby? > > Thanks for your assistance with this... I'm not exactly sure what you mean here when you say "would this be called through ruby". I hope I've already answered your question. Let me know if I didn't. Cheers, Dave From clare.cavanagh at argoent.co.uk Mon Jul 10 10:24:18 2006 From: clare.cavanagh at argoent.co.uk (BlueJay) Date: Mon, 10 Jul 2006 16:24:18 +0200 Subject: [Ferret-talk] Category Number Results returned In-Reply-To: References: Message-ID: <1f3cd32808cb80e76f73b4b3ed71cb37@ruby-forum.com> David Balmain wrote: > On 7/10/06, BlueJay wrote: >> many results there are in total before the user clicks on the item. For >> example in the pull down I want to display "Fishing (10001), Shooting >> (2003)". > > Hi Clare, > The fastest way to do this would be to run the query multiple times. > So for your "sport" example you'd do something like this; > > fishing_count = index.search_each("sport AND fishing", :num_docs => > 1) {} > shooting_count = index.search_each("sport AND shooting", :num_docs > => 1) {} > # etc. > > Then go ahead and paginate your query as you usually would. > Thank you very much for your quite response. I have several sub categories (taxonomy really) and what I was thinking of doing was this in 2 queries. Index the data as per normal so that you can do the full text search but also index the structure of the taxonomy and have each branch contain the records that contain it. Run one big search over the fulltext to get the list of hits and then use this list as a query against the second index to get all the category bits. This would be a big query though - although it should be quick but I would need to re-index the category bits everytime a document was added. Does this make sense and/or would it make sense in Ferret. I have done this before in another search engine that required special category manipulation but never with Ferret and not sure how to go about doing this in Ferret. I am not sure about your idea around filtering the results >> I was going to do this in Ruby by doing a simple count for each category >> item on the returned result set, but I believe that this would mean >> returning all the results of a given query to Ruby in order to do this >> count and I am concerned that this would cause performance issues for >> large result sets. > > Quite possibly. But running the query multiple times should be fine in > terms of performance. You could use filters instead of the code I > demonstrated above to further improve performance. > >> If I put pagination into the mix and only display the first 50 results >> on the screen, would this add an additional complexity or would this >> just be called through Ruby? >> >> Thanks for your assistance with this... > > I'm not exactly sure what you mean here when you say "would this be > called through ruby". I hope I've already answered your question. Let > me know if I didn't. > > Cheers, > Dave -- Posted via http://www.ruby-forum.com/. From dbalmain.ml at gmail.com Mon Jul 10 10:46:46 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Mon, 10 Jul 2006 23:46:46 +0900 Subject: [Ferret-talk] Category Number Results returned In-Reply-To: <1f3cd32808cb80e76f73b4b3ed71cb37@ruby-forum.com> References: <1f3cd32808cb80e76f73b4b3ed71cb37@ruby-forum.com> Message-ID: On 7/10/06, BlueJay wrote: > David Balmain wrote: > > On 7/10/06, BlueJay wrote: > >> many results there are in total before the user clicks on the item. For > >> example in the pull down I want to display "Fishing (10001), Shooting > >> (2003)". > > > > Hi Clare, > > The fastest way to do this would be to run the query multiple times. > > So for your "sport" example you'd do something like this; > > > > fishing_count = index.search_each("sport AND fishing", :num_docs => > > 1) {} > > shooting_count = index.search_each("sport AND shooting", :num_docs > > => 1) {} > > # etc. > > > > Then go ahead and paginate your query as you usually would. > > > > Thank you very much for your quite response. > > I have several sub categories (taxonomy really) and what I was thinking > of doing was this in 2 queries. Index the data as per normal so that you > can do the full text search but also index the structure of the taxonomy > and have each branch contain the records that contain it. > Run one big search over the fulltext to get the list of hits and then > use this list as a query against the second index to get all the > category bits. I'm not sure what you mean by "category bits". Can you possible implement the categories like this; sport/ sport/shooting/ sport/fishing/ sport/fishing/fly sprot/fishing/deep_sea etc. Then, lets say you have a query in query_str. You can get all results in the sport category like this; index.search_each(query_str + "AND category:sport/*") { # ... } You can get all results in the fishing category like this; index.search_each(query_str + "AND category:sport/fishing/*") { # ... } Am I making sense? > This would be a big query though - although it should be quick but I > would need to re-index the category bits everytime a document was added. You've lost me. Could you give some example code? > Does this make sense and/or would it make sense in Ferret. I have done > this before in another search engine that required special category > manipulation but never with Ferret and not sure how to go about doing > this in Ferret. > > I am not sure about your idea around filtering the results I'll explain filtering once I understand better what it is you are trying to do. Cheers, Dave From chris.lowis at gmail.com Mon Jul 10 15:39:45 2006 From: chris.lowis at gmail.com (Chris Lowis) Date: Mon, 10 Jul 2006 21:39:45 +0200 Subject: [Ferret-talk] Search terms in URL In-Reply-To: <20060710074745.GE17139@cordoba.webit.de> References: <20060710074745.GE17139@cordoba.webit.de> Message-ID: > sure, setting the search form's method to 'get' should do the trick. Thank you for taking the time to respond. I now have <%= form_tag :action => 'search', :method => 'get'%>
Search by street name, or zip code
<%= submit_tag 'search' %> <%= end_form_tag %> <% if @results -%> ... <% end -%> in my view. However the URL generated after search looks like : http://localhost:3000/residence/search?method=get and doesn't contain the search terms. Any suggestions ? Kind regards, Chris -- Posted via http://www.ruby-forum.com/. From robinluckey at gmail.com Mon Jul 10 15:47:12 2006 From: robinluckey at gmail.com (Robin Luckey) Date: Mon, 10 Jul 2006 21:47:12 +0200 Subject: [Ferret-talk] Ferret on 64 bit Red Hat In-Reply-To: References: Message-ID: <95467bb5d6c296a2363295ec0bd50690@ruby-forum.com> FYI, inspired by some confusing and problematic search results on our web server, I tried enlisting in the ferret_experimental branch and building on an Ubuntu Dapper Drake server with an AMD 64 dual-core Opteron. Here are the warnings from the ferret_experimental make: src/global.c: In function ?estrdup?: src/global.c:170: warning: format ?%d? expects type ?int?, but argument 3 has type ?size_t? src/global.c: In function ?emalloc?: src/global.c:183: warning: format ?%d? expects type ?int?, but argument 3 has type ?size_t? src/global.c: In function ?ecalloc?: src/global.c:195: warning: format ?%d? expects type ?int?, but argument 3 has type ?size_t? src/global.c: In function ?erealloc?: src/global.c:207: warning: format ?%d? expects type ?int?, but argument 3 has type ?size_t? ... src/search.c: In function ?w_create?: src/search.c:285: warning: format ?%d? expects type ?int?, but argument 3 has type ?size_t? src/search.c:285: warning: format ?%d? expects type ?int?, but argument 4 has type ?long unsigned int? src/search.c: In function ?q_create?: src/search.c:468: warning: format ?%d? expects type ?int?, but argument 3 has type ?size_t? src/search.c:468: warning: format ?%d? expects type ?int?, but argument 4 has type ?long unsigned int? src/search.c: In function ?scorer_create?: src/search.c:498: warning: format ?%d? expects type ?int?, but argument 3 has type ?size_t? src/search.c:498: warning: format ?%d? expects type ?int?, but argument 4 has type ?long unsigned int? src/search.c: In function ?msea_create_weight?: src/search.c:1416: warning: cast to pointer from integer of different size ... test/test_index.c: In function ?test_ir_norms?: test/test_index.c:1234: warning: cast from pointer to integer of different size test/test_index.c: In function ?test_ir_delete?: test/test_index.c:1313: warning: cast from pointer to integer of different size Ferret problems are high on my task list right now, so I'm happy to offer more info as required, Robin -- Posted via http://www.ruby-forum.com/. From clare.cavanagh at argoent.co.uk Mon Jul 10 16:35:04 2006 From: clare.cavanagh at argoent.co.uk (BlueJay) Date: Mon, 10 Jul 2006 22:35:04 +0200 Subject: [Ferret-talk] Category Number Results returned In-Reply-To: References: <1f3cd32808cb80e76f73b4b3ed71cb37@ruby-forum.com> Message-ID: <67fd465b37133995fc2e24e240b0a0b6@ruby-forum.com> David Balmain wrote: David Thanks for your continued help and assistance. I don't have code at this stage because I started writing it one way and realised that the way I was writing it through counts in Ruby would not work because of pagination. A little more background is in order. The user will be presented with a pull down menu with 5 selections in a main category. Doing 6 queries (one main query) and 5 count queries in this instance is not a problem. The problem arises when they select one of these categories. They will then be presented with up to 5 other category structures. One would be new or old, another would be type (up to 5 nodes), another would be, for example, book type (such as fiction, no fiction, authbiography) etc. (up to 20 categories), another could have up to 40 categories. The user is free to select any of these category nodes because they may be interested in old books and fiction. I will therefore have to populate all of the nodes with the number of documents in each node. This could leave me with spawing 60 odd queries to count the number of documents in each node. Subsequent selections of nodes would refine the result set down further. What I really would like to do is 2 or 3 queries. One which does the normal search over the document set (collection) and the second to populate each node in the classification structure with the number of documents that match each node. It is pretty easy in 2 queries to tell if there are any documents in each node but doing a count over all the nodes is more tricky. I was originally going to have another table which had a row for each node with the name of the node (and structure) in one field and the document_id's in another field. For example, [Fishing, "doc1 doc2 doc3 doc4"], [Fishing/Fiction, "doc2, doc3"], [Fishing/Non Fiction, "doc 1] etc. I would then get a result set that provided all the categories that had hits against a given query. However, it does not provide the number of documents against each node. So I could not populate the pull down categories with Fishing (2), Fiction (1), Non Fiction (1) etc. Therefore, what I really need is a function that will return the number of documents in each node of a given classification structure. An addition to the Num_Docs capability already available perhaps. I could easily produce a results set that would be like this.... Fishing doc1 Fishing doc2 Fishing/Fiction doc3 Fishing/Fiction doc1 Fishing/Non Fiction doc4 etc... Num_Docs would provide 5 in this instance but what I really want is: Fishing 2 Fishing/Fiction 2 Fishing/Non Fiction 1 etc... All that, and done in 1 or 2 queries over and above the original search.... Simple eh! I hope that I have not confused you to much, but this is something that I desperately need or my project is kaput! I found this: http://www.mail-archive.com/ferret-talk at rubyforge.org/msg00343.html and http://www.ruby-forum.com/topic/56232#40931 Do you think that this is the way to go? Thanks very much. > On 7/10/06, BlueJay wrote: >> > fishing_count = index.search_each("sport AND fishing", :num_docs => >> I have several sub categories (taxonomy really) and what I was thinking >> of doing was this in 2 queries. Index the data as per normal so that you >> can do the full text search but also index the structure of the taxonomy >> and have each branch contain the records that contain it. >> Run one big search over the fulltext to get the list of hits and then >> use this list as a query against the second index to get all the >> category bits. > > I'm not sure what you mean by "category bits". Can you possible > implement the categories like this; > > sport/ > sport/shooting/ > sport/fishing/ > sport/fishing/fly > sprot/fishing/deep_sea > etc. > > Then, lets say you have a query in query_str. You can get all results > in the sport category like this; > > index.search_each(query_str + "AND category:sport/*") { > # ... > } > > You can get all results in the fishing category like this; > > index.search_each(query_str + "AND category:sport/fishing/*") { > # ... > } > > Am I making sense? > >> This would be a big query though - although it should be quick but I >> would need to re-index the category bits everytime a document was added. > > You've lost me. Could you give some example code? > >> Does this make sense and/or would it make sense in Ferret. I have done >> this before in another search engine that required special category >> manipulation but never with Ferret and not sure how to go about doing >> this in Ferret. >> >> I am not sure about your idea around filtering the results > > I'll explain filtering once I understand better what it is you are > trying to do. > > Cheers, > Dave -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Mon Jul 10 17:22:54 2006 From: kraemer at webit.de (Jens Kraemer) Date: Mon, 10 Jul 2006 23:22:54 +0200 Subject: [Ferret-talk] Search terms in URL In-Reply-To: References: <20060710074745.GE17139@cordoba.webit.de> Message-ID: <20060710212254.GB10427@cordoba.webit.de> On Mon, Jul 10, 2006 at 09:39:45PM +0200, Chris Lowis wrote: > > sure, setting the search form's method to 'get' should do the trick. > > Thank you for taking the time to respond. I now have > > <%= form_tag :action => 'search', :method => 'get'%> try <%= form_tag { :action => 'search' }, :method => 'get' %> instead. this explicitly tells ruby where the first hash argument (url_for_options) to form_tag ends, all following arguments then will be aggrgated into the second optional argument to form_tag, options: form_tag(url_for_options = {}, options = {}, *parameters_for_url) Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From kraemer at webit.de Mon Jul 10 17:42:02 2006 From: kraemer at webit.de (Jens Kraemer) Date: Mon, 10 Jul 2006 23:42:02 +0200 Subject: [Ferret-talk] [ANN] acts_as_ferret 0.2.2 Message-ID: <20060710214202.GC10427@cordoba.webit.de> Hi all, I just tagged acts_as_ferret 0.2.2 as the current stable version, so get it while it's hot ;-) new features: - added support for the multiple models/single index approach. - find out the total number of search results by calling total_hits on the array returned by find_by_contents. fixes: - trac tickets #20 (find_by_contents breaks ferret sorting) and #24 (rebuild_index wastes a huge ammount of memory). as always, thanks for reporting issues and uploading patches :-) More info can be found on my blog: http://www.jkraemer.net/articles/2006/07/10/rails-full-text-search-version-0-2-2 Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From clare.cavanagh at argoent.co.uk Mon Jul 10 18:09:22 2006 From: clare.cavanagh at argoent.co.uk (BlueJay) Date: Tue, 11 Jul 2006 00:09:22 +0200 Subject: [Ferret-talk] Plurals and synonym lists Message-ID: <986bc769d7920a7a655158a94421aab3@ruby-forum.com> I want to correct spelling errors automatically. I have used search in the past where I can pass an argument through standard search to correct a word with up to 2 spelling errors for example or do the more Google like "Did ya mean?". In this case I just want to change it automatically and search. I am not too interested in specifying the number of characters it is out by. What is the easiest way of doing something similar in Ferret? Would I use fuzzy search to correct misspellings? I am guessing so, but would this also do plurals and perhaps stemming. For example, tax would search for taxes and taxing or ball would search for balls... Also - anyone done any work with synonym lists. I want to be able when I do a search on chair to also do a search on stool, seat, bench etc. I saw something around this in a Lucene book around WordNet but have no idea how to implement in Ferret. As you can probably tell I am pretty new to both Lucene and Ferret. Thanks again in advance. -- Posted via http://www.ruby-forum.com/. From dbalmain.ml at gmail.com Mon Jul 10 20:40:25 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Tue, 11 Jul 2006 09:40:25 +0900 Subject: [Ferret-talk] [ANN] acts_as_ferret 0.2.2 In-Reply-To: <20060710214202.GC10427@cordoba.webit.de> References: <20060710214202.GC10427@cordoba.webit.de> Message-ID: On 7/11/06, Jens Kraemer wrote: > Hi all, > > I just tagged acts_as_ferret 0.2.2 as the current stable version, so get > it while it's hot ;-) > > new features: > - added support for the multiple models/single index approach. > - find out the total number of search results by calling total_hits on > the array returned by find_by_contents. > > fixes: > - trac tickets #20 (find_by_contents breaks ferret sorting) and #24 > (rebuild_index wastes a huge ammount of memory). > > as always, thanks for reporting issues and uploading patches :-) > > More info can be found on my blog: > http://www.jkraemer.net/articles/2006/07/10/rails-full-text-search-version-0-2-2 > > Jens Great stuff Jens. Keep up the good work. Dave From dbalmain.ml at gmail.com Mon Jul 10 21:05:23 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Tue, 11 Jul 2006 10:05:23 +0900 Subject: [Ferret-talk] Category Number Results returned In-Reply-To: <67fd465b37133995fc2e24e240b0a0b6@ruby-forum.com> References: <1f3cd32808cb80e76f73b4b3ed71cb37@ruby-forum.com> <67fd465b37133995fc2e24e240b0a0b6@ruby-forum.com> Message-ID: On 7/11/06, BlueJay wrote: > David Balmain wrote: > > David > > Thanks for your continued help and assistance. > > I don't have code at this stage because I started writing it one way and > realised that the way I was writing it through counts in Ruby would not > work because of pagination. > > A little more background is in order. The user will be presented with a > pull down menu with 5 selections in a main category. Doing 6 queries > (one main query) and 5 count queries in this instance is not a problem. > The problem arises when they select one of these categories. > > They will then be presented with up to 5 other category structures. One > would be new or old, another would be type (up to 5 nodes), another > would be, for example, book type (such as fiction, no fiction, > authbiography) etc. (up to 20 categories), another could have up to 40 > categories. The user is free to select any of these category nodes > because they may be interested in old books and fiction. I will > therefore have to populate all of the nodes with the number of documents > in each node. This could leave me with spawing 60 odd queries to count > the number of documents in each node. Subsequent selections of nodes > would refine the result set down further. > > What I really would like to do is 2 or 3 queries. One which does the > normal search over the document set (collection) and the second to > populate each node in the classification structure with the number of > documents that match each node. > > It is pretty easy in 2 queries to tell if there are any documents in > each node but doing a count over all the nodes is more tricky. I was > originally going to have another table which had a row for each node > with the name of the node (and structure) in one field and the > document_id's in another field. For example, [Fishing, "doc1 doc2 doc3 > doc4"], [Fishing/Fiction, "doc2, doc3"], [Fishing/Non Fiction, "doc 1] > etc. I would then get a result set that provided all the categories that > had hits against a given query. However, it does not provide the number > of documents against each node. So I could not populate the pull down > categories with Fishing (2), Fiction (1), Non Fiction (1) etc. > > Therefore, what I really need is a function that will return the number > of documents in each node of a given classification structure. An > addition to the Num_Docs capability already available perhaps. > > I could easily produce a results set that would be like this.... > > Fishing doc1 > Fishing doc2 > Fishing/Fiction doc3 > Fishing/Fiction doc1 > Fishing/Non Fiction doc4 > etc... > > Num_Docs would provide 5 in this instance but what I really want is: > Fishing 2 > Fishing/Fiction 2 > Fishing/Non Fiction 1 > etc... > > All that, and done in 1 or 2 queries over and above the original > search.... Simple eh! > > I hope that I have not confused you to much, but this is something that > I desperately need or my project is kaput! > > I found this: > http://www.mail-archive.com/ferret-talk at rubyforge.org/msg00343.html and > > http://www.ruby-forum.com/topic/56232#40931 > > Do you think that this is the way to go? I think I finally understand what you want now and I do think this is the way to go. What you will need to do is build BitVectors for each of your categories and sub-categories using the examples in those those threads. Or you could just use a QueryFilter. filter = QueryFilter.new(PrefixQuery.new(:category, "fishing"))) fishing_bits = filter.bits(index_reader) filter = QueryFilter.new(PrefixQuery.new(:category, "fishing/fiction"))) fishing_fiction_bits = filter.bits(index_reader) filter = QueryFilter.new(PrefixQuery.new(:category, "fishing/nonfiction"))) fishing_nonfiction_bits = filter.bits(index_reader) This assumes that everything in fishing/fiction is also in fishing/. In your example, it doesn't seem to be the case, so you should use a TermQuery instead of a PrefixQuery. Now you just need to run your search the same way. Something like this; query = query_parser.parse(query_str) query_bits = QueryFilter.new(query).bits(index_reader) And now you can get your counts like this; fishing_count = (fishing_bits & query_bits).count fishing_fiction_count = (fishing_fiction_bits & query_bits).count fishing_nofiction_count = (fishing_nonfiction_bits & query_bits).count Sadly, this code only works in theory since I haven't release the code that &s bit vectors yet and I used the new style PrefixQuery declarations so they won't work either. But if this solution seems like it will work for you and you can wait a week, you'll be set. Cheers, Dave From dbalmain.ml at gmail.com Mon Jul 10 21:23:12 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Tue, 11 Jul 2006 10:23:12 +0900 Subject: [Ferret-talk] Ferret on 64 bit Red Hat In-Reply-To: <95467bb5d6c296a2363295ec0bd50690@ruby-forum.com> References: <95467bb5d6c296a2363295ec0bd50690@ruby-forum.com> Message-ID: On 7/11/06, Robin Luckey wrote: > FYI, inspired by some confusing and problematic search results on our > web server, I tried enlisting in the ferret_experimental branch and > building on an Ubuntu Dapper Drake server with an AMD 64 dual-core > Opteron. > > Here are the warnings from the ferret_experimental make: > > src/global.c: In function 'estrdup': > src/global.c:170: warning: format '%d' expects type 'int', but argument > 3 has type 'size_t' > src/global.c: In function 'emalloc': > src/global.c:183: warning: format '%d' expects type 'int', but argument > 3 has type 'size_t' > src/global.c: In function 'ecalloc': > src/global.c:195: warning: format '%d' expects type 'int', but argument > 3 has type 'size_t' > src/global.c: In function 'erealloc': > src/global.c:207: warning: format '%d' expects type 'int', but argument > 3 has type 'size_t' > ... > src/search.c: In function 'w_create': > src/search.c:285: warning: format '%d' expects type 'int', but argument > 3 has type 'size_t' > src/search.c:285: warning: format '%d' expects type 'int', but argument > 4 has type 'long unsigned int' > src/search.c: In function 'q_create': > src/search.c:468: warning: format '%d' expects type 'int', but argument > 3 has type 'size_t' > src/search.c:468: warning: format '%d' expects type 'int', but argument > 4 has type 'long unsigned int' > src/search.c: In function 'scorer_create': > src/search.c:498: warning: format '%d' expects type 'int', but argument > 3 has type 'size_t' > src/search.c:498: warning: format '%d' expects type 'int', but argument > 4 has type 'long unsigned int' > src/search.c: In function 'msea_create_weight': > src/search.c:1416: warning: cast to pointer from integer of different > size > ... > test/test_index.c: In function 'test_ir_norms': > test/test_index.c:1234: warning: cast from pointer to integer of > different size > test/test_index.c: In function 'test_ir_delete': > test/test_index.c:1313: warning: cast from pointer to integer of > different size > > Ferret problems are high on my task list right now, so I'm happy to > offer more info as required, > Robin Thanks a bunch, Robin. I've added some fixes. Could you try it again? Also, did all the tests pass? Thanks, Dave From dbalmain.ml at gmail.com Mon Jul 10 21:36:25 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Tue, 11 Jul 2006 10:36:25 +0900 Subject: [Ferret-talk] Plurals and synonym lists In-Reply-To: <986bc769d7920a7a655158a94421aab3@ruby-forum.com> References: <986bc769d7920a7a655158a94421aab3@ruby-forum.com> Message-ID: On 7/11/06, BlueJay wrote: > I want to correct spelling errors automatically. I have used search in > the past where I can pass an argument through standard search to correct > a word with up to 2 spelling errors for example or do the more Google > like "Did ya mean?". In this case I just want to change it automatically > and search. I am not too interested in specifying the number of > characters it is out by. > > What is the easiest way of doing something similar in Ferret? Would I > use fuzzy search to correct misspellings? I am guessing so, but would > this also do plurals and perhaps stemming. For example, tax would search > for taxes and taxing or ball would search for balls... > You can use Fuzzy query to do SpellChecking but it's not ideal. I'm thinking of extracting the vim spell check and making it available in Ruby. It is awesome. Lightning fast and there are heaps of dictionaries available in multiple languages. This you can do with a Fuzzy Query. But still it's a bit difficult. What I'm thinking of doing is adding a did_you_mean class method to FuzzyQuery. suggestions = FuzzyQuery.did_you_mean(term, index_reader) What this would do is return a ordered list of term/frequency pairs of all terms that are similar to but more common in the index than the original term > Also - anyone done any work with synonym lists. I want to be able when I > do a search on chair to also do a search on stool, seat, bench etc. I > saw something around this in a Lucene book around WordNet but have no > idea how to implement in Ferret. You can do the same thing in Ferret. You basically need to write your own analyzer. I'll have a book out on how to do all of this eventually. > As you can probably tell I am pretty new to both Lucene and Ferret. A warm welcome to you then. From marciorf at gmail.com Mon Jul 10 22:45:48 2006 From: marciorf at gmail.com (Marcio) Date: Tue, 11 Jul 2006 04:45:48 +0200 Subject: [Ferret-talk] Installing ferret on windows In-Reply-To: <562a35c10606290210l33e70789t59d180f6f350a9dc@mail.gmail.com> References: <58f5ef49ecc9010691f6c569aecba978@ruby-forum.com> <562a35c10606290210l33e70789t59d180f6f350a9dc@mail.gmail.com> Message-ID: I tryed it but it didnt worked too. I was going to try again, but my friend installed from a Mac Box. Then, i got it from the repo... thanks Jan Prill wrote: > Hi you two, > > I've ran into the some issues with the script/plugin script on windows > the > last few days myself. Which have nothing to do with acts_as_ferret. You > don't rely on script/plugin for installing acts_as_ferret. With a > subversion > client like tortoise on windows or whatever you like you might check out > the > acts_as_ferret repository and simply copy the checked out version to > RAILS_ROOT/vendor/plugins and your ready to go... > > Cheers, > Jan -- Posted via http://www.ruby-forum.com/. From clare.cavanagh at argoent.co.uk Mon Jul 10 23:22:16 2006 From: clare.cavanagh at argoent.co.uk (BlueJay) Date: Tue, 11 Jul 2006 05:22:16 +0200 Subject: [Ferret-talk] Category Number Results returned In-Reply-To: References: <1f3cd32808cb80e76f73b4b3ed71cb37@ruby-forum.com> <67fd465b37133995fc2e24e240b0a0b6@ruby-forum.com> Message-ID: <0d4d499e28cb9b6c8d07a2b70d3b120e@ruby-forum.com> David Balmain wrote: > > I think I finally understand what you want now and I do think this is > the way to go. What you will need to do is build BitVectors for each > of your categories and sub-categories using the examples in those > those threads. Or you could just use a QueryFilter. > > filter = QueryFilter.new(PrefixQuery.new(:category, "fishing"))) > fishing_bits = filter.bits(index_reader) > > filter = QueryFilter.new(PrefixQuery.new(:category, > "fishing/fiction"))) > fishing_fiction_bits = filter.bits(index_reader) > > filter = QueryFilter.new(PrefixQuery.new(:category, > "fishing/nonfiction"))) > fishing_nonfiction_bits = filter.bits(index_reader) > > This assumes that everything in fishing/fiction is also in fishing/. > In your example, it doesn't seem to be the case, so you should use a > TermQuery instead of a PrefixQuery. > > Now you just need to run your search the same way. Something like this; > > query = query_parser.parse(query_str) > query_bits = QueryFilter.new(query).bits(index_reader) > > And now you can get your counts like this; > > fishing_count = (fishing_bits & query_bits).count > fishing_fiction_count = (fishing_fiction_bits & query_bits).count > fishing_nofiction_count = (fishing_nonfiction_bits & > query_bits).count > > Sadly, this code only works in theory since I haven't release the code > that &s bit vectors yet and I used the new style PrefixQuery > declarations so they won't work either. But if this solution seems > like it will work for you and you can wait a week, you'll be set. > > Cheers, > Dave Dave Thanks very much for this and I can wait a week for this to be released. I am sorry if I was not clear about this but everything in the sub categories will have to be in the category above as this is the way that the system is designed. Fishing contains documents A B C D E F G H I J Fishing_Fiction contains A B C Fishing_Non_Fiction contains D and E Fishing_Fiction_New contains A B Fishing_Fiction_Old contains C etc. I am assuming that I need to still wait in this case? I will try and understand this in more detail in the meantime. Thanks once again for all your assistance. -- Posted via http://www.ruby-forum.com/. From clare.cavanagh at argoent.co.uk Mon Jul 10 23:47:58 2006 From: clare.cavanagh at argoent.co.uk (BlueJay) Date: Tue, 11 Jul 2006 05:47:58 +0200 Subject: [Ferret-talk] Plurals and synonym lists In-Reply-To: References: <986bc769d7920a7a655158a94421aab3@ruby-forum.com> Message-ID: David Balmain wrote: > On 7/11/06, BlueJay wrote: >> for taxes and taxing or ball would search for balls... >> > > You can use Fuzzy query to do SpellChecking but it's not ideal. I'm > thinking of extracting the vim spell check and making it available in > Ruby. It is awesome. Lightning fast and there are heaps of > dictionaries available in multiple languages. > > This you can do with a Fuzzy Query. But still it's a bit difficult. > What I'm thinking of doing is adding a did_you_mean class method to > FuzzyQuery. > > suggestions = FuzzyQuery.did_you_mean(term, index_reader) > > What this would do is return a ordered list of term/frequency pairs of > all terms that are similar to but more common in the index than the > original term > >> Also - anyone done any work with synonym lists. I want to be able when I >> do a search on chair to also do a search on stool, seat, bench etc. I >> saw something around this in a Lucene book around WordNet but have no >> idea how to implement in Ferret. > > You can do the same thing in Ferret. You basically need to write your > own analyzer. I'll have a book out on how to do all of this > eventually. > >> As you can probably tell I am pretty new to both Lucene and Ferret. > > A warm welcome to you then. Great - a book would be of great help and I am more than willing to proof read it for you... especially the bits around category search, spell checking and synonym lists. Perhaps you could even use my site as an example site in the book! because I am sure there are hundreds or thousands of people wanting to do the same things as me. -- Posted via http://www.ruby-forum.com/. From dbalmain.ml at gmail.com Mon Jul 10 23:58:42 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Tue, 11 Jul 2006 12:58:42 +0900 Subject: [Ferret-talk] Category Number Results returned In-Reply-To: <0d4d499e28cb9b6c8d07a2b70d3b120e@ruby-forum.com> References: <1f3cd32808cb80e76f73b4b3ed71cb37@ruby-forum.com> <67fd465b37133995fc2e24e240b0a0b6@ruby-forum.com> <0d4d499e28cb9b6c8d07a2b70d3b120e@ruby-forum.com> Message-ID: On 7/11/06, BlueJay wrote: > David Balmain wrote: > > > I think I finally understand what you want now and I do think this is > > the way to go. What you will need to do is build BitVectors for each > > of your categories and sub-categories using the examples in those > > those threads. Or you could just use a QueryFilter. > > > > filter = QueryFilter.new(PrefixQuery.new(:category, "fishing"))) > > fishing_bits = filter.bits(index_reader) > > > > filter = QueryFilter.new(PrefixQuery.new(:category, > > "fishing/fiction"))) > > fishing_fiction_bits = filter.bits(index_reader) > > > > filter = QueryFilter.new(PrefixQuery.new(:category, > > "fishing/nonfiction"))) > > fishing_nonfiction_bits = filter.bits(index_reader) > > > > This assumes that everything in fishing/fiction is also in fishing/. > > In your example, it doesn't seem to be the case, so you should use a > > TermQuery instead of a PrefixQuery. > > > > Now you just need to run your search the same way. Something like this; > > > > query = query_parser.parse(query_str) > > query_bits = QueryFilter.new(query).bits(index_reader) > > > > And now you can get your counts like this; > > > > fishing_count = (fishing_bits & query_bits).count > > fishing_fiction_count = (fishing_fiction_bits & query_bits).count > > fishing_nofiction_count = (fishing_nonfiction_bits & > > query_bits).count > > > > Sadly, this code only works in theory since I haven't release the code > > that &s bit vectors yet and I used the new style PrefixQuery > > declarations so they won't work either. But if this solution seems > > like it will work for you and you can wait a week, you'll be set. > > > > Cheers, > > Dave > > Dave > > Thanks very much for this and I can wait a week for this to be released. > Great. A word or warning though, it's all new code and you'll be riding on the bleeding edge. But hopefully it will stabalize quickly. I'm working on this full time at the moment (when I'm not answering emails ;-)). Cheers, Dave From dbalmain.ml at gmail.com Tue Jul 11 00:05:49 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Tue, 11 Jul 2006 13:05:49 +0900 Subject: [Ferret-talk] Plurals and synonym lists In-Reply-To: References: <986bc769d7920a7a655158a94421aab3@ruby-forum.com> Message-ID: On 7/11/06, BlueJay wrote: > David Balmain wrote: > > On 7/11/06, BlueJay wrote: > >> for taxes and taxing or ball would search for balls... > >> > > > > You can use Fuzzy query to do SpellChecking but it's not ideal. I'm > > thinking of extracting the vim spell check and making it available in > > Ruby. It is awesome. Lightning fast and there are heaps of > > dictionaries available in multiple languages. > > > > This you can do with a Fuzzy Query. But still it's a bit difficult. > > What I'm thinking of doing is adding a did_you_mean class method to > > FuzzyQuery. > > > > suggestions = FuzzyQuery.did_you_mean(term, index_reader) > > > > What this would do is return a ordered list of term/frequency pairs of > > all terms that are similar to but more common in the index than the > > original term > > > >> Also - anyone done any work with synonym lists. I want to be able when I > >> do a search on chair to also do a search on stool, seat, bench etc. I > >> saw something around this in a Lucene book around WordNet but have no > >> idea how to implement in Ferret. > > > > You can do the same thing in Ferret. You basically need to write your > > own analyzer. I'll have a book out on how to do all of this > > eventually. > > > >> As you can probably tell I am pretty new to both Lucene and Ferret. > > > > A warm welcome to you then. > > > > Great - a book would be of great help and I am more than willing to > proof read it for you... especially the bits around category search, > spell checking and synonym lists. Perhaps you could even use my site as > an example site in the book! because I am sure there are hundreds or > thousands of people wanting to do the same things as me. I'm looking forward to seeing it in action. As for examples in my book, we'll have to wait and see. But you can definitely mention it here it on the Ferret Wiki but hold back for the moment because it's being spammed like crazy. From dbalmain.ml at gmail.com Tue Jul 11 00:48:58 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Tue, 11 Jul 2006 13:48:58 +0900 Subject: [Ferret-talk] Ferret Wiki Spam - Solutions Anyone? Message-ID: Hi All, As some of you may have noticed, the Ferret Wiki has been getting spammed like crazy. And I haven't been able to do anything about it because I just don't have the time. I'm getting pretty close to releasing 0.10.0 which has been the major draw on my time for the last couple of months so I'm going to have some time to look into this soon. I was wondering if any of you web app experts out there could give me some advice. The two options I'm considering are sticking with TRAC and forcing registration to add/edit pages or tickets. The other option I'm considering is moving bug tracking to RubyForge and using a different Wiki for the website. Hopefully Ruse will be released soon. Any thoughts anyone? Dave From julioody at gmail.com Tue Jul 11 01:02:05 2006 From: julioody at gmail.com (Julio Cesar Ody) Date: Tue, 11 Jul 2006 15:02:05 +1000 Subject: [Ferret-talk] Ferret Wiki Spam - Solutions Anyone? In-Reply-To: References: Message-ID: Yes. http://trac.edgewall.org/wiki/SpamFilter. I never tried it. But I'd be happy to help with more than just the suggestion if needed. On 7/11/06, David Balmain wrote: > Hi All, > > As some of you may have noticed, the Ferret Wiki has been getting > spammed like crazy. And I haven't been able to do anything about it > because I just don't have the time. I'm getting pretty close to > releasing 0.10.0 which has been the major draw on my time for the last > couple of months so I'm going to have some time to look into this > soon. I was wondering if any of you web app experts out there could > give me some advice. The two options I'm considering are sticking with > TRAC and forcing registration to add/edit pages or tickets. The other > option I'm considering is moving bug tracking to RubyForge and using a > different Wiki for the website. Hopefully Ruse will be released soon. > > Any thoughts anyone? > > Dave > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > -- Julio C. Ody http://rootshell.be/~julioody From guest at guest.com Tue Jul 11 13:28:23 2006 From: guest at guest.com (Guest) Date: Tue, 11 Jul 2006 19:28:23 +0200 Subject: [Ferret-talk] Multiple Models w/ acts_as_ferret In-Reply-To: <7968d7490603021523s38e2eea0y9a088c0d896ec21d@mail.gmail.com> References: <3810d363d9653e8ed930f83ff448bb26@ruby-forum.com> <20060301221335.GB26099@cordoba.webit.de> <0e87693024511ad94389be18eec826f6@ruby-forum.com> <7968d7490603021523s38e2eea0y9a088c0d896ec21d@mail.gmail.com> Message-ID: <16900bf3b6104ecb1190a9028348d02a@ruby-forum.com> So is it correct then that you have to use inheritance to get it to work across multiple models? Lee Marlow wrote: > Here is our rake task which uses a slightly different version of > acts_as_ferret. It will try to load up all models in app/models and > call ferret_update on each instance. -- Posted via http://www.ruby-forum.com/. From chris.lowis at gmail.com Tue Jul 11 15:26:28 2006 From: chris.lowis at gmail.com (Chris Lowis) Date: Tue, 11 Jul 2006 21:26:28 +0200 Subject: [Ferret-talk] Search terms in URL In-Reply-To: <20060710212254.GB10427@cordoba.webit.de> References: <20060710074745.GE17139@cordoba.webit.de> <20060710212254.GB10427@cordoba.webit.de> Message-ID: Jens Kraemer wrote: > try > > <%= form_tag { :action => 'search' }, :method => 'get' %> > > instead. this explicitly tells ruby where the first hash argument > (url_for_options) to form_tag ends, all following arguments then > will be aggrgated into the second optional argument to form_tag, > options: Thank you Jens. In the end I had to call form_tag with () like so : <%= form_tag( { :action => 'search'}, :method => 'get') %> but it now works, Chris -- Posted via http://www.ruby-forum.com/. From clare.cavanagh at argoent.co.uk Tue Jul 11 16:01:13 2006 From: clare.cavanagh at argoent.co.uk (BlueJay) Date: Tue, 11 Jul 2006 22:01:13 +0200 Subject: [Ferret-talk] Category Number Results returned In-Reply-To: References: <1f3cd32808cb80e76f73b4b3ed71cb37@ruby-forum.com> <67fd465b37133995fc2e24e240b0a0b6@ruby-forum.com> <0d4d499e28cb9b6c8d07a2b70d3b120e@ruby-forum.com> Message-ID: <08915611ec377074f1e96bd201daf450@ruby-forum.com> David Balmain wrote: > On 7/11/06, BlueJay wrote: >> > "fishing/fiction"))) >> > Now you just need to run your search the same way. Something like this; >> > >> Thanks very much for this and I can wait a week for this to be released. >> > > Great. A word or warning though, it's all new code and you'll be > riding on the bleeding edge. But hopefully it will stabalize quickly. > I'm working on this full time at the moment (when I'm not answering > emails ;-)). > > Cheers, > Dave Dave One last thought on this... because this will be new code.... originally I was going to write the count as a client side piece of code to count the documents in each category. I realised that I would have to return the full result set in order to do this which would cause problems with performance. If I were to write this as a server side script, outside of ferret, I believe that I could achieve the same result as in your example. Can you think of any gotchas that would make this a stupid idea? Thanks (Sorry in advance for taking this outside Ferret!) -- Posted via http://www.ruby-forum.com/. From aaron.hundley at extension.org Tue Jul 11 16:01:26 2006 From: aaron.hundley at extension.org (Aaron Hundley) Date: Tue, 11 Jul 2006 22:01:26 +0200 Subject: [Ferret-talk] Ferret 0.9.4 C extensions and Macbook Pro Message-ID: <5b1398a8a5f396116d4e836ef5667cae@ruby-forum.com> Hello, I am developing on a MacBook Pro. I had installed make and the gcc compiler in order to take advantage of the C extensions, and I installed the ferret 0.9.4 gem. When I installed the gem, it compiled the C extensions as part of the installation process for the gem, and I received this output: Attempting remote installation of 'ferret' Building native extensions. This could take a while... dyld: Library not loaded: /usr/i686-apple-darwin8/lib/libgcc_s.1.dylib Referenced from: /usr/bin/gcc Reason: image not found make: *** [analysis.o] Trace/BPT trap dyld: Library not loaded: /usr/i686-apple-darwin8/lib/libgcc_s.1.dylib Referenced from: /usr/bin/gcc Reason: image not found make: *** [analysis.o] Trace/BPT trap ruby extconf.rb install --remote ferret creating Makefile make gcc -fno-common -O -pipe -I/Users/ryan/Desktop/building/min/framework/include -fno-common -pipe -fno-common -fno-common -I. -I/Applications/Locomotive2/Bundles/rails112.locobundle/i386/lib/ruby/1.8/i686-darwin8.6.1 -I/Applications/Locomotive2/Bundles/rails112.locobundle/i386/lib/ruby/1.8/i686-darwin8.6.1 -I. -O -pipe -I/Users/ryan/Desktop/building/min/framework/include -c analysis.c make install gcc -fno-common -O -pipe -I/Users/ryan/Desktop/building/min/framework/include -fno-common -pipe -fno-common -fno-common -I. -I/Applications/Locomotive2/Bundles/rails112.locobundle/i386/lib/ruby/1.8/i686-darwin8.6.1 -I/Applications/Locomotive2/Bundles/rails112.locobundle/i386/lib/ruby/1.8/i686-darwin8.6.1 -I. -O -pipe -I/Users/ryan/Desktop/building/min/framework/include -c analysis.c Successfully installed ferret-0.9.4 Installing RDoc documentation for ferret-0.9.4... It appears that it cannot find the dynamic library for the gcc compiler. I have read about others online that had similiar problems with compiling C programs, and they just get rid of the value for the DYLD_FALLBACK_LIBRARY_PATH environment variable: DYLD_FALLBACK_LIBRARY_PATH= Before it was set to /Applications/Locomotive2/Bundles/rails112.locobundle/i386/lib But, I can't do this before I install the gem b/c I also need the dynamic libraries in the /Applications/Locomotive2/Bundles/rails112.locobundle/i386/lib directory. I have even copied the libgcc_s.1.dylib file to the /Applications/Locomotive2/Bundles/rails112.locobundle/i386/lib directory, and it said that it found it but could not map to it when it tries to compile the C extensions. I have noticed that this dynamic library file is in the /usr/lib directory instead of the /usr/i686-apple-darwin8/lib directory. Are there any ideas on how I can work around this or fix it? Thanks, Aaron -- Posted via http://www.ruby-forum.com/. From atomgiant at gmail.com Tue Jul 11 18:50:40 2006 From: atomgiant at gmail.com (Tom Davies) Date: Tue, 11 Jul 2006 18:50:40 -0400 Subject: [Ferret-talk] Ferret Wiki Spam - Solutions Anyone? In-Reply-To: References: Message-ID: My vote is to stick with Trac. I haven't used the SpamFilter plugin, but it looks promising. Tom On 7/11/06, Julio Cesar Ody wrote: > Yes. http://trac.edgewall.org/wiki/SpamFilter. > > I never tried it. But I'd be happy to help with more than just the > suggestion if needed. > > > On 7/11/06, David Balmain wrote: > > Hi All, > > > > As some of you may have noticed, the Ferret Wiki has been getting > > spammed like crazy. And I haven't been able to do anything about it > > because I just don't have the time. I'm getting pretty close to > > releasing 0.10.0 which has been the major draw on my time for the last > > couple of months so I'm going to have some time to look into this > > soon. I was wondering if any of you web app experts out there could > > give me some advice. The two options I'm considering are sticking with > > TRAC and forcing registration to add/edit pages or tickets. The other > > option I'm considering is moving bug tracking to RubyForge and using a > > different Wiki for the website. Hopefully Ruse will be released soon. > > > > Any thoughts anyone? > > > > Dave > > _______________________________________________ > > Ferret-talk mailing list > > Ferret-talk at rubyforge.org > > http://rubyforge.org/mailman/listinfo/ferret-talk > > > > > -- > Julio C. Ody > http://rootshell.be/~julioody > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > -- Tom Davies http://atomgiant.com http://gifthat.com From atomgiant at gmail.com Tue Jul 11 18:44:25 2006 From: atomgiant at gmail.com (Tom Davies) Date: Tue, 11 Jul 2006 18:44:25 -0400 Subject: [Ferret-talk] [ANN] acts_as_ferret 0.2.2 In-Reply-To: References: <20060710214202.GC10427@cordoba.webit.de> Message-ID: Thanks Jens. acts_as_ferret has come a long way in a very short time. I am going to convert my own ferret implementation over to acts_as_ferret soon since it now seems to do everything I need, and probably better than my own implementation :) Thanks for your efforts. Tom On 7/10/06, David Balmain wrote: > On 7/11/06, Jens Kraemer wrote: > > Hi all, > > > > I just tagged acts_as_ferret 0.2.2 as the current stable version, so get > > it while it's hot ;-) > > > > new features: > > - added support for the multiple models/single index approach. > > - find out the total number of search results by calling total_hits on > > the array returned by find_by_contents. > > > > fixes: > > - trac tickets #20 (find_by_contents breaks ferret sorting) and #24 > > (rebuild_index wastes a huge ammount of memory). > > > > as always, thanks for reporting issues and uploading patches :-) > > > > More info can be found on my blog: > > http://www.jkraemer.net/articles/2006/07/10/rails-full-text-search-version-0-2-2 > > > > Jens > > Great stuff Jens. Keep up the good work. > > Dave > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > -- Tom Davies http://atomgiant.com http://gifthat.com From dbalmain.ml at gmail.com Tue Jul 11 20:24:51 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Wed, 12 Jul 2006 09:24:51 +0900 Subject: [Ferret-talk] Ferret 0.9.4 C extensions and Macbook Pro In-Reply-To: <5b1398a8a5f396116d4e836ef5667cae@ruby-forum.com> References: <5b1398a8a5f396116d4e836ef5667cae@ruby-forum.com> Message-ID: On 7/12/06, Aaron Hundley wrote: > Hello, > > I am developing on a MacBook Pro. > I had installed make and the gcc compiler in order to > take advantage of the C extensions, and I installed the ferret 0.9.4 > gem. > When I installed the gem, it compiled the C extensions as part of the > installation > process for the gem, and I received this output: > > > Attempting remote installation of 'ferret' > Building native extensions. This could take a while... > dyld: Library not loaded: /usr/i686-apple-darwin8/lib/libgcc_s.1.dylib > Referenced from: /usr/bin/gcc > Reason: image not found > make: *** [analysis.o] Trace/BPT trap > dyld: Library not loaded: /usr/i686-apple-darwin8/lib/libgcc_s.1.dylib > Referenced from: /usr/bin/gcc > Reason: image not found > make: *** [analysis.o] Trace/BPT trap > ruby extconf.rb install --remote ferret > creating Makefile > > make > gcc -fno-common -O -pipe > -I/Users/ryan/Desktop/building/min/framework/include -fno-common -pipe > -fno-common -fno-common -I. > -I/Applications/Locomotive2/Bundles/rails112.locobundle/i386/lib/ruby/1.8/i686-darwin8.6.1 > -I/Applications/Locomotive2/Bundles/rails112.locobundle/i386/lib/ruby/1.8/i686-darwin8.6.1 > -I. -O -pipe -I/Users/ryan/Desktop/building/min/framework/include -c > analysis.c > > make install > gcc -fno-common -O -pipe > -I/Users/ryan/Desktop/building/min/framework/include -fno-common -pipe > -fno-common -fno-common -I. > -I/Applications/Locomotive2/Bundles/rails112.locobundle/i386/lib/ruby/1.8/i686-darwin8.6.1 > -I/Applications/Locomotive2/Bundles/rails112.locobundle/i386/lib/ruby/1.8/i686-darwin8.6.1 > -I. -O -pipe -I/Users/ryan/Desktop/building/min/framework/include -c > analysis.c > Successfully installed ferret-0.9.4 > Installing RDoc documentation for ferret-0.9.4... > > > It appears that it cannot find the dynamic library for the gcc compiler. > > I have read about others online that had similiar problems with > compiling C programs, and they just get > rid of the value for the DYLD_FALLBACK_LIBRARY_PATH environment > variable: > > DYLD_FALLBACK_LIBRARY_PATH= > > Before it was set to > /Applications/Locomotive2/Bundles/rails112.locobundle/i386/lib > > But, I can't do this before I install the gem b/c I also need the > dynamic > libraries in the > /Applications/Locomotive2/Bundles/rails112.locobundle/i386/lib > directory. > > I have even copied the libgcc_s.1.dylib file to the > /Applications/Locomotive2/Bundles/rails112.locobundle/i386/lib > directory, and > it said that it found it but could not map to it when it tries to > compile the C extensions. > > I have noticed that this dynamic library file is in the /usr/lib > directory > instead of the /usr/i686-apple-darwin8/lib directory. > > Are there any ideas on how I can work around this or fix it? > > Thanks, > > Aaron Hi Aaron, I (sadly) don't have a mac so I don't know how the "fallback" path works but could you please check what your DYLD_LIBRARY_PATH is set to. Maybe you need to add /usr/lib there although I would have thought it would be checked by default. Cheers, Dave From dbalmain.ml at gmail.com Tue Jul 11 20:29:51 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Wed, 12 Jul 2006 09:29:51 +0900 Subject: [Ferret-talk] Category Number Results returned In-Reply-To: <08915611ec377074f1e96bd201daf450@ruby-forum.com> References: <1f3cd32808cb80e76f73b4b3ed71cb37@ruby-forum.com> <67fd465b37133995fc2e24e240b0a0b6@ruby-forum.com> <0d4d499e28cb9b6c8d07a2b70d3b120e@ruby-forum.com> <08915611ec377074f1e96bd201daf450@ruby-forum.com> Message-ID: On 7/12/06, BlueJay wrote: > David Balmain wrote: > > On 7/11/06, BlueJay wrote: > >> > "fishing/fiction"))) > >> > Now you just need to run your search the same way. Something like this; > >> > > >> Thanks very much for this and I can wait a week for this to be released. > >> > > > > Great. A word or warning though, it's all new code and you'll be > > riding on the bleeding edge. But hopefully it will stabalize quickly. > > I'm working on this full time at the moment (when I'm not answering > > emails ;-)). > > > > Cheers, > > Dave > > Dave > > One last thought on this... because this will be new code.... originally > I was going to write the count as a client side piece of code to count > the documents in each category. I realised that I would have to return > the full result set in order to do this which would cause problems with > performance. > > If I were to write this as a server side script, outside of ferret, I > believe that I could achieve the same result as in your example. Can > you think of any gotchas that would make this a stupid idea? If you mean grab the whole result set and loop through every result taking a running count then yes, this should work fine. I'd say my example would be a lot faster but you never know without trying it. Dave From clare.cavanagh at argoent.co.uk Wed Jul 12 00:05:12 2006 From: clare.cavanagh at argoent.co.uk (Guest) Date: Wed, 12 Jul 2006 06:05:12 +0200 Subject: [Ferret-talk] Category Number Results returned In-Reply-To: References: <1f3cd32808cb80e76f73b4b3ed71cb37@ruby-forum.com> <67fd465b37133995fc2e24e240b0a0b6@ruby-forum.com> <0d4d499e28cb9b6c8d07a2b70d3b120e@ruby-forum.com> <08915611ec377074f1e96bd201daf450@ruby-forum.com> Message-ID: David Balmain wrote: > On 7/12/06, BlueJay wrote: >> > I'm working on this full time at the moment (when I'm not answering >> the full result set in order to do this which would cause problems with >> performance. >> >> If I were to write this as a server side script, outside of ferret, I >> believe that I could achieve the same result as in your example. Can >> you think of any gotchas that would make this a stupid idea? > > If you mean grab the whole result set and loop through every result > taking a running count then yes, this should work fine. I'd say my > example would be a lot faster but you never know without trying it. > > Dave Again, many thanks for replying to my queries. I may go ahead and implement it this way just to see it working and then when your code is available implement it that way. It will give us the opportunity to compare but my suspicion is that the larger the dataset the faster your approach will be.... Would it be possible to ping me when your code is available? Thanks -- Posted via http://www.ruby-forum.com/. From julioody at gmail.com Wed Jul 12 01:33:25 2006 From: julioody at gmail.com (Julio Cesar Ody) Date: Wed, 12 Jul 2006 15:33:25 +1000 Subject: [Ferret-talk] ferret using UTF-8 Message-ID: Hey all, I went through the docs in Ferret's page, plus a quick search through the email list (thread titles), and I couldn't find any info on how to have Ferret storing it's data using UTF-8. In the scenario I would use it, nothing's being stored outside (like external databases). So it's just how Ferret would do it that I'm interesting in knowing. The reason why I ask is because I'm deploying a search engine for an application that will probably be searching for text content in Japanese/Chinese *apart* from english. I'm hinting it in case someone did it before and knows any pitfalls. Thanks in advance. -- Julio C. Ody http://rootshell.be/~julioody From dbalmain.ml at gmail.com Wed Jul 12 03:34:04 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Wed, 12 Jul 2006 16:34:04 +0900 Subject: [Ferret-talk] Category Number Results returned In-Reply-To: References: <1f3cd32808cb80e76f73b4b3ed71cb37@ruby-forum.com> <67fd465b37133995fc2e24e240b0a0b6@ruby-forum.com> <0d4d499e28cb9b6c8d07a2b70d3b120e@ruby-forum.com> <08915611ec377074f1e96bd201daf450@ruby-forum.com> Message-ID: On 7/12/06, Guest wrote: > David Balmain wrote: > > On 7/12/06, BlueJay wrote: > >> > I'm working on this full time at the moment (when I'm not answering > >> the full result set in order to do this which would cause problems with > >> performance. > >> > >> If I were to write this as a server side script, outside of ferret, I > >> believe that I could achieve the same result as in your example. Can > >> you think of any gotchas that would make this a stupid idea? > > > > If you mean grab the whole result set and loop through every result > > taking a running count then yes, this should work fine. I'd say my > > example would be a lot faster but you never know without trying it. > > > > Dave > > Again, many thanks for replying to my queries. I may go ahead and > implement it this way just to see it working and then when your code is > available implement it that way. It will give us the opportunity to > compare but my suspicion is that the larger the dataset the faster your > approach will be.... > > Would it be possible to ping me when your code is available? Sure. There will be an announcement on the this mailing list as well as the ruby and rails lists. Dave From dbalmain.ml at gmail.com Wed Jul 12 03:40:04 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Wed, 12 Jul 2006 16:40:04 +0900 Subject: [Ferret-talk] ferret using UTF-8 In-Reply-To: References: Message-ID: On 7/12/06, Julio Cesar Ody wrote: > Hey all, > > I went through the docs in Ferret's page, plus a quick search through > the email list (thread titles), and I couldn't find any info on how to > have Ferret storing it's data using UTF-8. > > In the scenario I would use it, nothing's being stored outside (like > external databases). So it's just how Ferret would do it that I'm > interesting in knowing. > > The reason why I ask is because I'm deploying a search engine for an > application that will probably be searching for text content in > Japanese/Chinese *apart* from english. I'm hinting it in case someone > did it before and knows any pitfalls. > > Thanks in advance. The core of ferret is character encoding agnostic. It treats all strings as an array of bytes so it doesn't matter what you put in. You could store JPEGs in the index if you wanted to. The analysis section of Ferret is another matter. There are two sets of analyzers, ASCII analyzers (AsciiWhiteSpaceAnalyzer, AsciiStandardAnalyzer) which are the most robust (no encoding errors raised) and the the other analyzers (WhiteSpaceAnalyzer, StandardAnalyzer) which are based on whichever locale you have set. So if your operating system's locale is set to UTF-8 then that will be how the analyzer treats any strings you pass through it. From shingler at gmail.com Wed Jul 12 10:56:17 2006 From: shingler at gmail.com (steven shingler) Date: Wed, 12 Jul 2006 16:56:17 +0200 Subject: [Ferret-talk] ferret using UTF-8 In-Reply-To: References: Message-ID: <67c29fb10d6d671774a5017ce7544034@ruby-forum.com> David Balmain wrote: > The core of ferret is character encoding agnostic. It treats all > strings as an array of bytes so it doesn't matter what you put in. You > could store JPEGs in the index if you wanted to. On which subject, I happen to have chucked some bmp files into my index, and was really quite amazed to see them being returned on search results. Not only that, but the results were accurate. For example, if I have a bmp which contains the word "Sheep" (when viewed as an image) and I search the index for "Sheep" - the bmp is returned. I am adding documents using the standard analyser and file.readlines to add the contents. If I open the bmp in a text editor and search for "Sheep" - that word is not contained within the file. So how come ferret can read the bmp? Cheers, Steven -- Posted via http://www.ruby-forum.com/. From shingler at gmail.com Wed Jul 12 11:23:40 2006 From: shingler at gmail.com (steven shingler) Date: Wed, 12 Jul 2006 17:23:40 +0200 Subject: [Ferret-talk] ferret using UTF-8 In-Reply-To: <67c29fb10d6d671774a5017ce7544034@ruby-forum.com> References: <67c29fb10d6d671774a5017ce7544034@ruby-forum.com> Message-ID: > So how come ferret can read the bmp? OK please ignore what must rank as the stupidest question for some time. "Sheep" was in the file path, and the path is one of the Ferret document fields. For a minute there, I was excited. :) Cheers, Steven -- Posted via http://www.ruby-forum.com/. From Pedro.CorteReal at iantt.pt Wed Jul 12 11:29:42 2006 From: Pedro.CorteReal at iantt.pt (Pedro =?ISO-8859-1?Q?C=F4rte-Real?=) Date: Wed, 12 Jul 2006 16:29:42 +0100 Subject: [Ferret-talk] ferret using UTF-8 In-Reply-To: References: <67c29fb10d6d671774a5017ce7544034@ruby-forum.com> Message-ID: <1152718183.5378.0.camel@localhost.localdomain> On Wed, 2006-07-12 at 17:23 +0200, steven shingler wrote: > > So how come ferret can read the bmp? > > OK please ignore what must rank as the stupidest question for some time. > > "Sheep" was in the file path, and the path is one of the Ferret document > fields. > > For a minute there, I was excited. :) And David was probably scared that ferret had become conscious. :) Pedro. From jan.prill at gmail.com Wed Jul 12 11:30:47 2006 From: jan.prill at gmail.com (Jan Prill) Date: Wed, 12 Jul 2006 17:30:47 +0200 Subject: [Ferret-talk] ferret using UTF-8 In-Reply-To: References: <67c29fb10d6d671774a5017ce7544034@ruby-forum.com> Message-ID: <562a35c10607120830t164d0598n62e0cd1789d2b90c@mail.gmail.com> Cool. For a minute I thought if I should ask if the file is maybe named 'sheep' but then decided that this might offend you ;-) Great one! Nonetheless I've got a question on this subject too. Has anyone experience with a task like this: A searchengine that doesn't use words as query objects but an uploaded image? Is there something like this already available on the net - a little google research of mine didn't yielded any results. This should be able to also find resized images of the same kind. Background: Images that aren't authorized by the copyright owner but won't be found by google images or the like because they were renamed. Cheers, Jan On 7/12/06, steven shingler wrote: > > > So how come ferret can read the bmp? > > OK please ignore what must rank as the stupidest question for some time. > > "Sheep" was in the file path, and the path is one of the Ferret document > fields. > > For a minute there, I was excited. :) > > Cheers, > Steven > > -- > Posted via http://www.ruby-forum.com/. > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20060712/d81b9a42/attachment.html From Pedro.CorteReal at iantt.pt Wed Jul 12 11:50:49 2006 From: Pedro.CorteReal at iantt.pt (Pedro =?ISO-8859-1?Q?C=F4rte-Real?=) Date: Wed, 12 Jul 2006 16:50:49 +0100 Subject: [Ferret-talk] ferret using UTF-8 In-Reply-To: <562a35c10607120830t164d0598n62e0cd1789d2b90c@mail.gmail.com> References: <67c29fb10d6d671774a5017ce7544034@ruby-forum.com> <562a35c10607120830t164d0598n62e0cd1789d2b90c@mail.gmail.com> Message-ID: <1152719449.5378.2.camel@localhost.localdomain> On Wed, 2006-07-12 at 17:30 +0200, Jan Prill wrote: > Cool. For a minute I thought if I should ask if the file is maybe > named 'sheep' but then decided that this might offend you ;-) > > Great one! > > Nonetheless I've got a question on this subject too. Has anyone > experience with a task like this: A searchengine that doesn't use > words as query objects but an uploaded image? Is there something like > this already available on the net - a little google research of mine > didn't yielded any results. This should be able to also find resized > images of the same kind. Background: Images that aren't authorized by > the copyright owner but won't be found by google images or the like > because they were renamed. See this: > Cheers, > Jan > > On 7/12/06, steven shingler wrote: > > So how come ferret can read the bmp? > > OK please ignore what must rank as the stupidest question for > some time. > > "Sheep" was in the file path, and the path is one of the > Ferret document > fields. > > For a minute there, I was excited. :) > > Cheers, > Steven > > -- > Posted via http://www.ruby-forum.com/. > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk From Pedro.CorteReal at iantt.pt Wed Jul 12 11:52:26 2006 From: Pedro.CorteReal at iantt.pt (Pedro =?ISO-8859-1?Q?C=F4rte-Real?=) Date: Wed, 12 Jul 2006 16:52:26 +0100 Subject: [Ferret-talk] ferret using UTF-8 In-Reply-To: <562a35c10607120830t164d0598n62e0cd1789d2b90c@mail.gmail.com> References: <67c29fb10d6d671774a5017ce7544034@ruby-forum.com> <562a35c10607120830t164d0598n62e0cd1789d2b90c@mail.gmail.com> Message-ID: <1152719546.5378.5.camel@localhost.localdomain> On Wed, 2006-07-12 at 17:30 +0200, Jan Prill wrote: > Nonetheless I've got a question on this subject too. Has anyone > experience with a task like this: A searchengine that doesn't use > words as query objects but an uploaded image? Is there something like > this already available on the net - a little google research of mine > didn't yielded any results. This should be able to also find resized > images of the same kind. Background: Images that aren't authorized by > the copyright owner but won't be found by google images or the like > because they were renamed. See this http://www.imgseek.net/ Never tried it myself but looks like what you meant. It's a desktop app though. Pedro. PS: Story for the other empty email. I pressed send by mistake before I was done. From jan.prill at gmail.com Wed Jul 12 11:58:17 2006 From: jan.prill at gmail.com (Jan Prill) Date: Wed, 12 Jul 2006 17:58:17 +0200 Subject: [Ferret-talk] ferret using UTF-8 In-Reply-To: <1152719546.5378.5.camel@localhost.localdomain> References: <67c29fb10d6d671774a5017ce7544034@ruby-forum.com> <562a35c10607120830t164d0598n62e0cd1789d2b90c@mail.gmail.com> <1152719546.5378.5.camel@localhost.localdomain> Message-ID: <562a35c10607120858w3a491128wc315004026905ebf@mail.gmail.com> Something like this but on the net is what I'm searching for. Thanks for the pointer! Jan -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20060712/5060104c/attachment.html From dbalmain.ml at gmail.com Wed Jul 12 12:13:31 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Thu, 13 Jul 2006 01:13:31 +0900 Subject: [Ferret-talk] ferret using UTF-8 In-Reply-To: References: <67c29fb10d6d671774a5017ce7544034@ruby-forum.com> Message-ID: On 7/13/06, steven shingler wrote: > > So how come ferret can read the bmp? > > OK please ignore what must rank as the stupidest question for some time. > > "Sheep" was in the file path, and the path is one of the Ferret document > fields. > > For a minute there, I was excited. :) This functionality isn't due until version Ferret-4.0. From Floyd_Morgan at intuit.com Wed Jul 12 18:40:51 2006 From: Floyd_Morgan at intuit.com (Floyd Morgan) Date: Thu, 13 Jul 2006 00:40:51 +0200 Subject: [Ferret-talk] Reverse sorting Message-ID: <859dd1a3afe75d24435fc22e7c9dbbb3@ruby-forum.com> I am getting strange results when I reverse sort a query. I am sorting by date, but it doesn't seem to be related to dates (I have tried just integers). I also paginate the results. Items in the result set are sometimes duplicated and the not ordered at all. When I try a non-reverse sort I don't see duplicates and the ordering is correct. Any ideas what is going on? Thanks -- Posted via http://www.ruby-forum.com/. From dbalmain.ml at gmail.com Wed Jul 12 19:57:11 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Thu, 13 Jul 2006 08:57:11 +0900 Subject: [Ferret-talk] Reverse sorting In-Reply-To: <859dd1a3afe75d24435fc22e7c9dbbb3@ruby-forum.com> References: <859dd1a3afe75d24435fc22e7c9dbbb3@ruby-forum.com> Message-ID: On 7/13/06, Floyd Morgan wrote: > I am getting strange results when I reverse sort a query. I am sorting > by date, but it doesn't seem to be related to dates (I have tried just > integers). I also paginate the results. Items in the result set are > sometimes duplicated and the not ordered at all. When I try a > non-reverse sort I don't see duplicates and the ordering is correct. Any > ideas what is going on? Thanks No idea. Could you show us some example code. Preferably with a short test case. Cheers, Dave From Floyd_Morgan at intuit.com Wed Jul 12 20:22:15 2006 From: Floyd_Morgan at intuit.com (Floyd Morgan) Date: Wed, 12 Jul 2006 17:22:15 -0700 Subject: [Ferret-talk] Reverse sorting In-Reply-To: References: <859dd1a3afe75d24435fc22e7c9dbbb3@ruby-forum.com> Message-ID: Here is the index snippet: doc = Ferret::Document::Document.new # insert the id doc << Ferret::Document::Field.new( "id", post.id, Ferret::Document::Field::Store::YES, Ferret::Document::Field::Index::UNTOKENIZED ) # insert the date doc << Ferret::Document::Field.new( "created_at", post.created_at, Ferret::Document::Field::Store::NO, Ferret::Document::Field::Index::UNTOKENIZED ) # add some other stuff ... # write to the index index << doc Here is the query snippet: sort_fields = [] sort_fields << Ferret::Search::SortField.new ( "created_at", :sort_type => Ferret::Search::SortField::SortType::INTEGER, :reverse => true ) # search the index top_docs = index.search( query, { :first_doc => first_doc , :num_docs => 5, :sort => sort_fields } ) On Jul 12, 2006, at 4:57 PM, David Balmain wrote: > On 7/13/06, Floyd Morgan wrote: >> I am getting strange results when I reverse sort a query. I am >> sorting >> by date, but it doesn't seem to be related to dates (I have tried >> just >> integers). I also paginate the results. Items in the result set are >> sometimes duplicated and the not ordered at all. When I try a >> non-reverse sort I don't see duplicates and the ordering is >> correct. Any >> ideas what is going on? Thanks > > No idea. Could you show us some example code. Preferably with a > short test case. > > Cheers, > Dave > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > From dbalmain.ml at gmail.com Wed Jul 12 22:13:10 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Thu, 13 Jul 2006 11:13:10 +0900 Subject: [Ferret-talk] Reverse sorting In-Reply-To: References: <859dd1a3afe75d24435fc22e7c9dbbb3@ruby-forum.com> Message-ID: On 7/13/06, Floyd Morgan wrote: > Here is the index snippet: > > doc = Ferret::Document::Document.new > # insert the id > doc << Ferret::Document::Field.new( "id", post.id, > Ferret::Document::Field::Store::YES, > Ferret::Document::Field::Index::UNTOKENIZED ) > # insert the date > doc << Ferret::Document::Field.new( "created_at", post.created_at, > Ferret::Document::Field::Store::NO, > Ferret::Document::Field::Index::UNTOKENIZED ) > # add some other stuff ... > # write to the index > index << doc > > Here is the query snippet: > > sort_fields = [] > sort_fields << Ferret::Search::SortField.new > ( "created_at", :sort_type => > Ferret::Search::SortField::SortType::INTEGER, :reverse => true ) > # search the index > top_docs = index.search( query, { :first_doc => > first_doc , :num_docs => 5, :sort => sort_fields } ) I'm not exactly sure what post.created_at but if it's a Time object then you need to convert it to a string that will sort correctly as a string. ie use strftime("%Y%m%d") (use whatever precision you need. Here is an example which adds 100 documents with 100 random dates in the last 100 days; require 'rubygems' require 'ferret' include Ferret::Index include Ferret::Search index = Index.new t = Time.now 100.times do index << {:id => "x", :date => (t-24*60*60*rand(100)).strftime("%Y%m%d")} end sort_fields = [SortField.new(:date, :sort_type => SortField::SortType::INTEGER, :reverse => true)] 10.times do |start| index.search_each("x", :first_doc => start*10, :num_docs => 10, :sort => sort_fields) do |doc_id, score| puts index[doc_id][:date] end end From Floyd_Morgan at intuit.com Wed Jul 12 22:59:26 2006 From: Floyd_Morgan at intuit.com (Floyd Morgan) Date: Wed, 12 Jul 2006 19:59:26 -0700 Subject: [Ferret-talk] Reverse sorting In-Reply-To: References: <859dd1a3afe75d24435fc22e7c9dbbb3@ruby-forum.com> Message-ID: The field is a DateTime. So I tried what you suggested and no luck. I noticed that when I remove the first_doc and num_doc options it appears to work correctly (getting all of the docs in the right order). On Jul 12, 2006, at 7:13 PM, David Balmain wrote: > On 7/13/06, Floyd Morgan wrote: >> Here is the index snippet: >> >> doc = Ferret::Document::Document.new >> # insert the id >> doc << Ferret::Document::Field.new( "id", post.id, >> Ferret::Document::Field::Store::YES, >> Ferret::Document::Field::Index::UNTOKENIZED ) >> # insert the date >> doc << Ferret::Document::Field.new( "created_at", post.created_at, >> Ferret::Document::Field::Store::NO, >> Ferret::Document::Field::Index::UNTOKENIZED ) >> # add some other stuff ... >> # write to the index >> index << doc >> >> Here is the query snippet: >> >> sort_fields = [] >> sort_fields << Ferret::Search::SortField.new >> ( "created_at", :sort_type => >> Ferret::Search::SortField::SortType::INTEGER, :reverse => >> true ) >> # search the index >> top_docs = index.search( query, { :first_doc => >> first_doc , :num_docs => 5, :sort => sort_fields } ) > > > I'm not exactly sure what post.created_at but if it's a Time object > then you need to convert it to a string that will sort correctly as a > string. ie use strftime("%Y%m%d") (use whatever precision you need. > Here is an example which adds 100 documents with 100 random dates in > the last 100 days; > > require 'rubygems' > require 'ferret' > include Ferret::Index > include Ferret::Search > > index = Index.new > t = Time.now > > 100.times do > index << {:id => "x", > :date => (t-24*60*60*rand(100)).strftime("%Y%m%d")} > end > > sort_fields = [SortField.new(:date, > :sort_type => > SortField::SortType::INTEGER, > :reverse => true)] > > 10.times do |start| > index.search_each("x", > :first_doc => start*10, > :num_docs => 10, > :sort => sort_fields) do |doc_id, score| > puts index[doc_id][:date] > end > end > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > From dbalmain.ml at gmail.com Thu Jul 13 01:23:48 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Thu, 13 Jul 2006 14:23:48 +0900 Subject: [Ferret-talk] Reverse sorting In-Reply-To: References: <859dd1a3afe75d24435fc22e7c9dbbb3@ruby-forum.com> Message-ID: On 7/13/06, Floyd Morgan wrote: > The field is a DateTime. > > So I tried what you suggested and no luck. I noticed that when I > remove the first_doc and num_doc options it appears to work correctly > (getting all of the docs in the right order). I'm sorry, I can't really help unless I can see an example that isn't working. Try modifying the code that I posted previously to match more closely what you are doing. Then send back the broken example snippet and I'll be able to tell you what is wrong with it or fix the bug if it exists. As long as the strings going into the index are the same (ie, in the format "%Y%m%d") then I can't see how your results could be any different. Cheers, Dave From aaron.hundley at extension.org Thu Jul 13 09:31:39 2006 From: aaron.hundley at extension.org (Aaron Hundley) Date: Thu, 13 Jul 2006 15:31:39 +0200 Subject: [Ferret-talk] Ferret 0.9.4 C extensions and Macbook Pro In-Reply-To: References: <5b1398a8a5f396116d4e836ef5667cae@ruby-forum.com> Message-ID: <2676ca2293573f7348a8f174f8a6c5fb@ruby-forum.com> > > Hi Aaron, > > I (sadly) don't have a mac so I don't know how the "fallback" path > works but could you please check what your DYLD_LIBRARY_PATH is set > to. Maybe you need to add /usr/lib there although I would have thought > it would be checked by default. > > Cheers, > Dave Hi Dave, Thanks for the tip. What I ended up doing was setting the DYLD_FALLBACK_LIBRARY_PATH environment variable like this: export DYLD_FALLBACK_LIBRARY_PATH=$DYLD_FALLBACK_LIBRARY_PATH:/usr/lib and everything compiled fine. The C extensions provided me with a tremendous performance improvement over the all Ruby version. Thanks Again, Aaron -- Posted via http://www.ruby-forum.com/. From guest at guest.com Thu Jul 13 10:59:02 2006 From: guest at guest.com (Guest) Date: Thu, 13 Jul 2006 16:59:02 +0200 Subject: [Ferret-talk] Search on data accross many tables, linked by belongs_to In-Reply-To: References: <20060707222231.GA31705@cordoba.webit.de> Message-ID: <309d586b2921ecc66a4d2ef8048217da@ruby-forum.com> Hmm. I tried doing it with a few levels of nesting like: acts_as_ferret :fields => { :menu_item_name, 'city', 'state', 'zip' } def menu_item_name menus.menu_categories.menu_items.name end And I don't get any results. The tables in the DB are like: menus menu_categories menu_id menu_items menu_category_id Am I missing something??? Maxime Curioni wrote: > Thanks Jens, I didn't know it was that simple ! > > >> you could use >> >> acts_as_ferret :fields => [ 'title', 'content', :category_name] >> def category_name >> category.name >> end >> >> to achieve this. >> >> Jens -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Thu Jul 13 11:33:03 2006 From: kraemer at webit.de (Jens Kraemer) Date: Thu, 13 Jul 2006 17:33:03 +0200 Subject: [Ferret-talk] Search on data accross many tables, linked by belongs_to In-Reply-To: <309d586b2921ecc66a4d2ef8048217da@ruby-forum.com> References: <20060707222231.GA31705@cordoba.webit.de> <309d586b2921ecc66a4d2ef8048217da@ruby-forum.com> Message-ID: <20060713153303.GQ17139@cordoba.webit.de> On Thu, Jul 13, 2006 at 04:59:02PM +0200, Guest wrote: > > Hmm. I tried doing it with a few levels of nesting like: > > acts_as_ferret :fields => { :menu_item_name, 'city', 'state', 'zip' } > > def menu_item_name > menus.menu_categories.menu_items.name > end > > And I don't get any results. that can have many reasons ;-) Basically, what you do in the method is irrelevant to ferret, it just indexes what is returned by the method. You should see what fields acts_as_ferret adds to the index in your development log when you save a record. please check there if it adds what you want. Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From guest at guest.com Thu Jul 13 13:39:28 2006 From: guest at guest.com (Guest) Date: Thu, 13 Jul 2006 19:39:28 +0200 Subject: [Ferret-talk] Search on data accross many tables, linked by belongs_to In-Reply-To: <20060713153303.GQ17139@cordoba.webit.de> References: <20060707222231.GA31705@cordoba.webit.de> <309d586b2921ecc66a4d2ef8048217da@ruby-forum.com> <20060713153303.GQ17139@cordoba.webit.de> Message-ID: <0a3cf7b23d2e37aa16591c543a1ef941@ruby-forum.com> OK, when I create the top level item, it indexes the fields, but since I'm creating the menus and menu_items separately it doesn't look like those are getting indexed according to the log. Do I have to set something else up? Thanks Jens! Jens Kraemer wrote: > On Thu, Jul 13, 2006 at 04:59:02PM +0200, Guest wrote: >> >> Hmm. I tried doing it with a few levels of nesting like: >> >> acts_as_ferret :fields => { :menu_item_name, 'city', 'state', 'zip' } >> >> def menu_item_name >> menus.menu_categories.menu_items.name >> end >> >> And I don't get any results. > > that can have many reasons ;-) > Basically, what you do in the method is irrelevant to ferret, it just > indexes what is returned by the method. > > You should see what fields acts_as_ferret adds to the index > in your development log when you save a record. > please check there if it adds what you want. > > Jens > > -- > webit! Gesellschaft f?r neue Medien mbH www.webit.de > Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de > Schnorrstra?e 76 Tel +49 351 46766 0 > D-01069 Dresden Fax +49 351 46766 66 -- Posted via http://www.ruby-forum.com/. From guest at guest.com Thu Jul 13 13:53:36 2006 From: guest at guest.com (Guest) Date: Thu, 13 Jul 2006 19:53:36 +0200 Subject: [Ferret-talk] Search on data accross many tables, linked by belongs_to In-Reply-To: <0a3cf7b23d2e37aa16591c543a1ef941@ruby-forum.com> References: <20060707222231.GA31705@cordoba.webit.de> <309d586b2921ecc66a4d2ef8048217da@ruby-forum.com> <20060713153303.GQ17139@cordoba.webit.de> <0a3cf7b23d2e37aa16591c543a1ef941@ruby-forum.com> Message-ID: <191bb6c04a50e78ceb67d2d060d5cdc0@ruby-forum.com> Wait so, now I added acts_as_ferret in the menu_item model, and it looks as it's it's being indexed in the log, but I still don't get the results. Hmm... Guest wrote: > > > OK, when I create the top level item, it indexes the fields, but since > I'm creating the menus and menu_items separately it doesn't look like > those are getting indexed according to the log. Do I have to set > something else up? > > Thanks Jens! > > Jens Kraemer wrote: >> On Thu, Jul 13, 2006 at 04:59:02PM +0200, Guest wrote: >>> >>> Hmm. I tried doing it with a few levels of nesting like: >>> >>> acts_as_ferret :fields => { :menu_item_name, 'city', 'state', 'zip' } >>> >>> def menu_item_name >>> menus.menu_categories.menu_items.name >>> end >>> >>> And I don't get any results. >> >> that can have many reasons ;-) >> Basically, what you do in the method is irrelevant to ferret, it just >> indexes what is returned by the method. >> >> You should see what fields acts_as_ferret adds to the index >> in your development log when you save a record. >> please check there if it adds what you want. >> >> Jens >> >> -- >> webit! Gesellschaft f?r neue Medien mbH www.webit.de >> Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de >> Schnorrstra?e 76 Tel +49 351 46766 0 >> D-01069 Dresden Fax +49 351 46766 66 -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Fri Jul 14 04:23:20 2006 From: kraemer at webit.de (Jens Kraemer) Date: Fri, 14 Jul 2006 10:23:20 +0200 Subject: [Ferret-talk] Search on data accross many tables, linked by belongs_to In-Reply-To: <191bb6c04a50e78ceb67d2d060d5cdc0@ruby-forum.com> References: <20060707222231.GA31705@cordoba.webit.de> <309d586b2921ecc66a4d2ef8048217da@ruby-forum.com> <20060713153303.GQ17139@cordoba.webit.de> <0a3cf7b23d2e37aa16591c543a1ef941@ruby-forum.com> <191bb6c04a50e78ceb67d2d060d5cdc0@ruby-forum.com> Message-ID: <20060714082320.GR17139@cordoba.webit.de> Hi! On Thu, Jul 13, 2006 at 07:53:36PM +0200, Guest wrote: > > Wait so, now I added acts_as_ferret in the menu_item model, and it looks > as it's it's being indexed in the log, but I still don't get the > results. calling acts_as_ferret in the MenuItem class will create a separate index for MenuItems, this is probably not what you want. The problem is, that ferret only can index what your menu_item_name method delivers when the object is saved. If you save the object before creating the menus and menu_items, there's nothing to index. You could try to create the menus and menu_items first, then add them to the object to be indexed and then call save. Or just call save another time after you created the menus. btw, 'menuitems' sounds like it is a collection of menuitems, so what should 'menuitems.name' deliver ? Maybe you want something like menuitems.inject('') { |value, item| value << " #{item.name}" } concatenating the names of all menuitems to one string which then would be indexed ? The same for 'menus', sounds like a has_many relationship, too. I'm sure we'll get this right somehow ;-) Jens > Hmm... > > Guest wrote: > > > > > > OK, when I create the top level item, it indexes the fields, but since > > I'm creating the menus and menu_items separately it doesn't look like > > those are getting indexed according to the log. Do I have to set > > something else up? > > > > Thanks Jens! > > > > Jens Kraemer wrote: > >> On Thu, Jul 13, 2006 at 04:59:02PM +0200, Guest wrote: > >>> > >>> Hmm. I tried doing it with a few levels of nesting like: > >>> > >>> acts_as_ferret :fields => { :menu_item_name, 'city', 'state', 'zip' } > >>> > >>> def menu_item_name > >>> menus.menu_categories.menu_items.name > >>> end > >>> > >>> And I don't get any results. > >> > >> that can have many reasons ;-) > >> Basically, what you do in the method is irrelevant to ferret, it just > >> indexes what is returned by the method. > >> > >> You should see what fields acts_as_ferret adds to the index > >> in your development log when you save a record. > >> please check there if it adds what you want. > >> > >> Jens > >> > >> -- > >> webit! Gesellschaft f?r neue Medien mbH www.webit.de > >> Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de > >> Schnorrstra?e 76 Tel +49 351 46766 0 > >> D-01069 Dresden Fax +49 351 46766 66 > > > -- > Posted via http://www.ruby-forum.com/. > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From shingler at gmail.com Fri Jul 14 09:00:07 2006 From: shingler at gmail.com (steven shingler) Date: Fri, 14 Jul 2006 15:00:07 +0200 Subject: [Ferret-talk] RDig config file problem Message-ID: <39ead55f5536eb3f4f0a6dac319c0fe7@ruby-forum.com> Hi All, Hope it is ok to post RDig queries on this forum. Just trying to get RDig working (Ubuntu 6.06, RDig 0.3.0, ferret 0.9.4, rubyful_soup 1.0.4) Here is my output: sh:~/rdigtry$ rdig -c config/rdig_config.rb discovered content extractor class: RDig::ContentExtractors::PdfContentExtractor discovered content extractor class: RDig::ContentExtractors::WordContentExtractor discovered content extractor class: RDig::ContentExtractors::HtmlContentExtractor /home/steven/rdigtry/config/rdig_config.rb:4 /usr/lib/ruby/gems/1.8/gems/rdig-0.3.0/lib/rdig.rb:113:in `configuration' /home/steven/rdigtry/config/rdig_config.rb:1 /usr/lib/ruby/gems/1.8/gems/rdig-0.3.0/lib/rdig.rb:226:in `load_configfile' /usr/lib/ruby/gems/1.8/gems/rdig-0.3.0/lib/rdig.rb:233:in `run' /usr/lib/ruby/gems/1.8/gems/rdig-0.3.0/bin/rdig:13 /usr/bin/rdig:18 /usr/lib/ruby/gems/1.8/gems/rdig-0.3.0/lib/rdig.rb:236:in `run': No Configfile found! (RuntimeError) undefined method `path=' for nil:NilClass from /usr/lib/ruby/gems/1.8/gems/rdig-0.3. 0/bin/rdig:13 from /usr/bin/rdig:18 and here is my config file: RDig.configuration do |cfg| cfg.crawler.start_urls = [ 'http://bbc.co.uk' ] cfg.indexer.path = '/home/steven/rdigtry/index' cfg.verbose = true end Seems as though the RDig script can't load my config file? Any advice very gratefully received. Many Thanks, Steven -- Posted via http://www.ruby-forum.com/. From clare.cavanagh at btclick.com Fri Jul 14 10:35:42 2006 From: clare.cavanagh at btclick.com (BlueJay) Date: Fri, 14 Jul 2006 16:35:42 +0200 Subject: [Ferret-talk] Whitespace Issues Message-ID: I am trying to build up a filtered search using the logic below. bq = Ferret::Search::BooleanQuery.new bq.add_query(Ferret::Search::TermQuery.new(Ferret::Index::Term.new("section",section.downcase!)), Ferret::Search::BooleanClause::Occur::MUST) filter = Ferret::Search::QueryFilter.new(bq) @vobjects = VoObject.find_by_contents(search_input,:filter => filter, :sort => ["section", "sale_category"]) This works fine when the "section" is a single word like "book" but when there is white spaces in the query like "paperback book" it does not find the appropriate result and comes back with zero hits. I changed this to use FuzzyQuery and it works but I sometimes get segmentation errors (this was reported in another topic). Does anyone have a solution to this problem for me? Thanks very much. -- Posted via http://www.ruby-forum.com/. From jbensley.ng at gmail.com Fri Jul 14 10:58:31 2006 From: jbensley.ng at gmail.com (Jeremy Bensley) Date: Fri, 14 Jul 2006 09:58:31 -0500 Subject: [Ferret-talk] Whitespace Issues In-Reply-To: References: Message-ID: It's hard to know for sure without seeing how your index is built, but if you are using TOKENIZED on that field, then whenever the index is built the text is split on whitespace, and each element is added as a separate term. It looks like when you are searching, you are trying to find the entire text as a single term. In order to solve this, I believe you can either construct your query using QueryParser, which will use the analyzer / tokenizer and split the terms out for you, or you can simply split the 'section' string on whitespace and build a Term and TermQuery for each resulting element and build a PhraseQuery from that set. I hope this is some help, Jeremy On 7/14/06, BlueJay wrote: > > I am trying to build up a filtered search using the logic below. > > > bq = Ferret::Search::BooleanQuery.new > bq.add_query(Ferret::Search::TermQuery.new(Ferret::Index:: > Term.new("section",section.downcase!)), > Ferret::Search::BooleanClause::Occur::MUST) > > filter = Ferret::Search::QueryFilter.new(bq) > @vobjects = VoObject.find_by_contents(search_input,:filter > => > filter, :sort => ["section", "sale_category"]) > > > This works fine when the "section" is a single word like "book" but when > there is white spaces in the query like "paperback book" it does not > find the appropriate result and comes back with zero hits. > > I changed this to use FuzzyQuery and it works but I sometimes get > segmentation errors (this was reported in another topic). > > Does anyone have a solution to this problem for me? > > Thanks very much. > > -- > Posted via http://www.ruby-forum.com/. > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20060714/dacd0555/attachment.html From kraemer at webit.de Fri Jul 14 11:58:27 2006 From: kraemer at webit.de (Jens Kraemer) Date: Fri, 14 Jul 2006 17:58:27 +0200 Subject: [Ferret-talk] RDig config file problem In-Reply-To: <39ead55f5536eb3f4f0a6dac319c0fe7@ruby-forum.com> References: <39ead55f5536eb3f4f0a6dac319c0fe7@ruby-forum.com> Message-ID: <20060714155827.GV17139@cordoba.webit.de> Hi Steven, On Fri, Jul 14, 2006 at 03:00:07PM +0200, steven shingler wrote: > [..] > and here is my config file: > > RDig.configuration do |cfg| > > cfg.crawler.start_urls = [ 'http://bbc.co.uk' ] > cfg.indexer.path = '/home/steven/rdigtry/index' that line should read cfg.index.path = '/home/steven/rdigtry/index' I fixed the readme now to reflect this. Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From shingler at gmail.com Fri Jul 14 12:02:56 2006 From: shingler at gmail.com (steven shingler) Date: Fri, 14 Jul 2006 18:02:56 +0200 Subject: [Ferret-talk] RDig config file problem In-Reply-To: <20060714155827.GV17139@cordoba.webit.de> References: <39ead55f5536eb3f4f0a6dac319c0fe7@ruby-forum.com> <20060714155827.GV17139@cordoba.webit.de> Message-ID: Jens Kraemer wrote: > that line should read > > cfg.index.path = '/home/steven/rdigtry/index' > > I fixed the readme now to reflect this. > > Jens Super! Thanks Jens. S~ -- Posted via http://www.ruby-forum.com/. From clare.cavanagh at btclick.com Fri Jul 14 12:37:23 2006 From: clare.cavanagh at btclick.com (BlueJay) Date: Fri, 14 Jul 2006 18:37:23 +0200 Subject: [Ferret-talk] Whitespace Issues In-Reply-To: References: Message-ID: <86b33c1c3078410d03baa99a5f734ce8@ruby-forum.com> Jeremy Bensley wrote: > It's hard to know for sure without seeing how your index is built, but > if > you are using TOKENIZED on that field, then whenever the index is built > the > text is split on whitespace, and each element is added as a separate > term. Jeremy Thanks for the reply. I am building the index like this... class VoObject < ActiveRecord::Base acts_as_ferret :fields=> ['short_description','section','sale_category','sale_type','outcode'] > It looks like when you are searching, you are trying to find the entire > text > as a single term. > > In order to solve this, I believe you can either construct your query > using > QueryParser, which will use the analyzer / tokenizer and split the terms > out > for you, or you can simply split the 'section' string on whitespace and > build a Term and TermQuery for each resulting element and build a > PhraseQuery from that set. Sorry for asking a silly question but how would I go about doing this? > I hope this is some help, > > Jeremy -- Posted via http://www.ruby-forum.com/. From jbensley.ng at gmail.com Fri Jul 14 14:16:42 2006 From: jbensley.ng at gmail.com (Jeremy Bensley) Date: Fri, 14 Jul 2006 13:16:42 -0500 Subject: [Ferret-talk] Whitespace Issues In-Reply-To: <86b33c1c3078410d03baa99a5f734ce8@ruby-forum.com> References: <86b33c1c3078410d03baa99a5f734ce8@ruby-forum.com> Message-ID: Method #1 should be shorter / easier, and would look something like this: qp = Ferret::QueryParser.new("section") #section defines the default field to build the query query = qp.parse("\"#{section}\"") # modified boolean query bq = Ferret::Search::BooleanQuery.new bq.add_query(pq, Ferret::Search::BooleanClause::Occur::MUST) filter = Ferret::Search::QueryFilter.new(bq) @vobjects = VoObject.find_by_contents(search_input,:filter => filter, :sort => ["section", "sale_category"]) Uness you have more than one query in the boolean query, you should probably just skip that entirely and build your filter from the PhraseQuery. On 7/14/06, BlueJay wrote: > > Jeremy Bensley wrote: > > It's hard to know for sure without seeing how your index is built, but > > if > > you are using TOKENIZED on that field, then whenever the index is built > > the > > text is split on whitespace, and each element is added as a separate > > term. > > Jeremy > > Thanks for the reply. I am building the index like this... > > class VoObject < ActiveRecord::Base > acts_as_ferret :fields=> > ['short_description','section','sale_category','sale_type','outcode'] > > > It looks like when you are searching, you are trying to find the entire > > text > > as a single term. > > > > In order to solve this, I believe you can either construct your query > > using > > QueryParser, which will use the analyzer / tokenizer and split the terms > > out > > for you, or you can simply split the 'section' string on whitespace and > > build a Term and TermQuery for each resulting element and build a > > PhraseQuery from that set. > > Sorry for asking a silly question but how would I go about doing this? > > > I hope this is some help, > > > > Jeremy > > > > > -- > Posted via http://www.ruby-forum.com/. > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20060714/4a6ffed9/attachment.html From jordan.w.frank at gmail.com Fri Jul 14 16:26:34 2006 From: jordan.w.frank at gmail.com (Jordan Frank) Date: Fri, 14 Jul 2006 16:26:34 -0400 Subject: [Ferret-talk] adding a custom filter to the query Message-ID: Hi all, I'm trying to figure out how to add a filter into a search. I've created the filter, basically copying the location filter from http://blog.tourb.us/archives/ferret-and-location-based-searches. But when I try to call Index.search and pass the filter in a hash with the key :filter, I get back that it is expecting type Data, and so I'm at a loss to figure out what to check next. Any help would be greatly appreciated. I'm sure I have a lot to learn, but some nudges in the right direction would be wonderful. -- Cheers, Jordan Frank jordan.w.frank at gmail.com From andy.caspar at gmail.com Fri Jul 14 18:47:05 2006 From: andy.caspar at gmail.com (Andy Caspar) Date: Fri, 14 Jul 2006 15:47:05 -0700 Subject: [Ferret-talk] Scaling Ferret Beyond One Server Message-ID: <146582890607141547w196def37lfbc22da3f9ed33bf@mail.gmail.com> Hi Everyone, I was wondering if folks here have had experience scaling Ferret beyond a single server? Currently, we are running Ferret in the same physical server as its Rails front end (via acts_as_ferret), but it is evident that we need a more scalable solution already. How would you split up the tasks (via dRB perhaps?) between two or three servers? Shared disk, replicated Ferret index (?), or any other ideas? Thanks in advance, AC -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20060714/7f6d610d/attachment.html From dbalmain.ml at gmail.com Sat Jul 15 01:16:24 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Sat, 15 Jul 2006 14:16:24 +0900 Subject: [Ferret-talk] adding a custom filter to the query In-Reply-To: References: Message-ID: On 7/15/06, Jordan Frank wrote: > Hi all, > I'm trying to figure out how to add a filter into a search. I've > created the filter, basically copying the location filter from > http://blog.tourb.us/archives/ferret-and-location-based-searches. But > when I try to call Index.search and pass the filter in a hash with the > key :filter, I get back that it is expecting type Data, and so I'm at > a loss to figure out what to check next. Any help would be greatly > appreciated. I'm sure I have a lot to learn, but some nudges in the > right direction would be wonderful. Hi Jordan, This is a bug which needs to be fixed. Please wait for the next version of Ferret. Or you could use the pure ruby version. Cheers, Dave From dbalmain.ml at gmail.com Sat Jul 15 01:32:13 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Sat, 15 Jul 2006 14:32:13 +0900 Subject: [Ferret-talk] Scaling Ferret Beyond One Server In-Reply-To: <146582890607141547w196def37lfbc22da3f9ed33bf@mail.gmail.com> References: <146582890607141547w196def37lfbc22da3f9ed33bf@mail.gmail.com> Message-ID: On 7/15/06, Andy Caspar wrote: > Hi Everyone, > > I was wondering if folks here have had experience scaling Ferret beyond a > single server? Currently, we are running Ferret in the same physical server > as its Rails front end (via acts_as_ferret), but it is evident that we need > a more scalable solution already. How would you split up the tasks (via dRB > perhaps?) between two or three servers? Shared disk, replicated Ferret > index (?), or any other ideas? > > Thanks in advance, > AC Hi Andy, I guess the answer depends on which part of the application is the bottleneck. If it is Ferret then replicating the index might be the solution but it's complicated and I doubt that is your problem. If Ferret is handling the workload (which it should be if you have the C extension installed) then my guess would be to use a DRb solution. In a few weeks I'm going to start experimenting with using Ferret with DRb and future versions may even come with a DRb server included. In the mean time let me know how you go. Cheers, Dave From bk at benjaminkrause.com Sat Jul 15 04:08:57 2006 From: bk at benjaminkrause.com (Benjamin Krause) Date: Sat, 15 Jul 2006 10:08:57 +0200 Subject: [Ferret-talk] FieldQuery not returning anything Message-ID: Hey .. The QueryParser RDoc page explains to me on how to search for a specific value in a specific field. This is not working the way i thought it should be, what am i doing wrong? Here's an example .. I'm storing model data in the index like this: doc << Field.new( "object_id", object.id, Field::Store::YES) doc << Field.new( "type", object.class.base_class.to_s, Field::Store::YES) This is working fine, i get something like this: >> i[0] => Document { stored/uncompressed,indexed, stored/uncompressed,indexed, } Now I want to search for all documents, with a specific type.. here's my naive approach .. >> i.search_each('type:Category') do |doc, score| puts "found #{doc}" end => 0 Searching for this will return my doc >> i.search_each('object_id:3') do |doc, score| puts "found #{doc}" end found 0 => 1 If I tokenize the 'type' field, my query is working, but I thought i can store the value as-it by leaving it untokenized and am still able to search for it with a FieldQuery. Am I wrong? Thanks, Ben -- Posted via http://www.ruby-forum.com/. From dbalmain.ml at gmail.com Sat Jul 15 07:51:04 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Sat, 15 Jul 2006 20:51:04 +0900 Subject: [Ferret-talk] FieldQuery not returning anything In-Reply-To: References: Message-ID: On 7/15/06, Benjamin Krause wrote: > Hey .. > > The QueryParser RDoc page explains to me on how to search for a specific > value in a specific field. This is not working the way i thought it > should be, what am i doing wrong? Here's an example .. > > I'm storing model data in the index like this: > > doc << Field.new( "object_id", object.id, Field::Store::YES) > doc << Field.new( "type", object.class.base_class.to_s, > Field::Store::YES) > > This is working fine, i get something like this: > > >> i[0] > => Document { > stored/uncompressed,indexed, > stored/uncompressed,indexed, > } > > > Now I want to search for all documents, with a specific type.. here's my > naive approach .. > > >> i.search_each('type:Category') do |doc, score| puts "found #{doc}" end > => 0 > > Searching for this will return my doc > > >> i.search_each('object_id:3') do |doc, score| puts "found #{doc}" end > found 0 > => 1 > > If I tokenize the 'type' field, my query is working, but I thought i can > store the value as-it by leaving it untokenized and am still able to > search for it with a FieldQuery. Am I wrong? > > Thanks, > Ben Hi Ben, The problem is that the Query parser will tokenize the query string. So Category is getting downcased to category. So you have two options. You can downcase all entries you want to make searchable in the untokenized field. Or, if you want to make case meaningful, you need to create an Analyzer that won't downcase the "type" field. Hope that helps, Dave From jordan.w.frank at gmail.com Sat Jul 15 11:02:56 2006 From: jordan.w.frank at gmail.com (Jordan Frank) Date: Sat, 15 Jul 2006 11:02:56 -0400 Subject: [Ferret-talk] adding a custom filter to the query In-Reply-To: References: Message-ID: On 7/15/06, David Balmain wrote: > Hi Jordan, > This is a bug which needs to be fixed. Please wait for the next > version of Ferret. Or you could use the pure ruby version. > > Cheers, > Dave > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > Oh, really...darn, it was kind of important. How do I force it to use the pure ruby version? How long until the next version? Is it a complicated fix or is it fixed in a version that I could access (SVN or something)? -- Cheers, Jordan Frank jordan.w.frank at gmail.com From dbalmain.ml at gmail.com Sat Jul 15 11:22:30 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Sun, 16 Jul 2006 00:22:30 +0900 Subject: [Ferret-talk] adding a custom filter to the query In-Reply-To: References: Message-ID: On 7/16/06, Jordan Frank wrote: > On 7/15/06, David Balmain wrote: > > Hi Jordan, > > This is a bug which needs to be fixed. Please wait for the next > > version of Ferret. Or you could use the pure ruby version. > > > > Cheers, > > Dave > > _______________________________________________ > > Ferret-talk mailing list > > Ferret-talk at rubyforge.org > > http://rubyforge.org/mailman/listinfo/ferret-talk > > > > Oh, really...darn, it was kind of important. How do I force it to use > the pure ruby version? How long until the next version? Is it a > complicated fix or is it fixed in a version that I could access (SVN > or something)? To force it to use the pure ruby version require 'rferret' instead of 'ferret'. Alternatively (I should have mentioned this the first time) you can use a QueryFilter. For example; filter = QueryFilter.new(TermQuery.new(Term.new("subject", "sport"))) You should be able to build pretty much any filter you need just like that. Hope that helps. Cheers, Dave PS: The fix can't be checked out of svn yet. I still have a lot of work to do. Sorry. From jordan.w.frank at gmail.com Sat Jul 15 12:10:54 2006 From: jordan.w.frank at gmail.com (Jordan Frank) Date: Sat, 15 Jul 2006 12:10:54 -0400 Subject: [Ferret-talk] adding a custom filter to the query In-Reply-To: References: Message-ID: On 7/15/06, David Balmain wrote: > To force it to use the pure ruby version require 'rferret' instead of > 'ferret'. Alternatively (I should have mentioned this the first time) > you can use a QueryFilter. For example; > > filter = QueryFilter.new(TermQuery.new(Term.new("subject", "sport"))) > > You should be able to build pretty much any filter you need just like > that. Hope that helps. > > Cheers, > Dave > > PS: The fix can't be checked out of svn yet. I still have a lot of > work to do. Sorry. > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > Don't apologize man, you've done an exceptional job with it so far. The filter I was trying to add would filter based on location, so I'm not sure that It could be done easily using a query-filter. It takes a latitude, longitude, and radius, then filters for records that are with the radius...think that's doable with the builtin filters? I guess I could do it with a bounding box instead, but I'd prefer to keep it accurate...Anyways, I'll try the rferret route for now, and hopefully by the time this application goes to production, the c version will be fixed up. Thanks for your help. -- Cheers, Jordan Frank jordan.w.frank at gmail.com From f at andreas-s.net Sat Jul 15 16:29:52 2006 From: f at andreas-s.net (Andreas S.) Date: Sat, 15 Jul 2006 22:29:52 +0200 Subject: [Ferret-talk] Ferret Wiki Spam - Solutions Anyone? In-Reply-To: References: Message-ID: <0218825a4648da1f1e70952b13fe8fab@ruby-forum.com> David Balmain wrote: > Hi All, > > As some of you may have noticed, the Ferret Wiki has been getting > spammed like crazy. And I haven't been able to do anything about it > because I just don't have the time. I'm getting pretty close to > releasing 0.10.0 Great, I'm looking forward to it. Did you manage to find the source of the occasional Index#search segfaults? > which has been the major draw on my time for the last > couple of months so I'm going to have some time to look into this > soon. I was wondering if any of you web app experts out there could > give me some advice. The two options I'm considering are sticking with > TRAC and forcing registration to add/edit pages or tickets. The other > option I'm considering is moving bug tracking to RubyForge and using a > different Wiki for the website. Hopefully Ruse will be released soon. The trac spam filter plugin works pretty well, a few simple Regexes can keep out 99% of the spam. Andreas -- Posted via http://www.ruby-forum.com/. From dbalmain.ml at gmail.com Sun Jul 16 02:52:17 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Sun, 16 Jul 2006 15:52:17 +0900 Subject: [Ferret-talk] Ferret Wiki Spam - Solutions Anyone? In-Reply-To: <0218825a4648da1f1e70952b13fe8fab@ruby-forum.com> References: <0218825a4648da1f1e70952b13fe8fab@ruby-forum.com> Message-ID: On 7/16/06, Andreas S. wrote: > David Balmain wrote: > > Hi All, > > > > As some of you may have noticed, the Ferret Wiki has been getting > > spammed like crazy. And I haven't been able to do anything about it > > because I just don't have the time. I'm getting pretty close to > > releasing 0.10.0 > > Great, I'm looking forward to it. Did you manage to find the source of > the occasional Index#search segfaults? Hi Andreas, A lot of the core has been rewritten so I haven't been fixing bugs like this. It may still be there or there could be new problems. One of the reasons for doing the rewrite was to make it easier to bind the Ruby and C code and hence make it easier to find these bugs. > > which has been the major draw on my time for the last > > couple of months so I'm going to have some time to look into this > > soon. I was wondering if any of you web app experts out there could > > give me some advice. The two options I'm considering are sticking with > > TRAC and forcing registration to add/edit pages or tickets. The other > > option I'm considering is moving bug tracking to RubyForge and using a > > different Wiki for the website. Hopefully Ruse will be released soon. > > The trac spam filter plugin works pretty well, a few simple Regexes can > keep out 99% of the spam. Thanks, I'll try it first then. Cheers, Dave From dbalmain.ml at gmail.com Sun Jul 16 02:55:03 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Sun, 16 Jul 2006 15:55:03 +0900 Subject: [Ferret-talk] adding a custom filter to the query In-Reply-To: References: Message-ID: On 7/16/06, Jordan Frank wrote: > On 7/15/06, David Balmain wrote: > > To force it to use the pure ruby version require 'rferret' instead of > > 'ferret'. Alternatively (I should have mentioned this the first time) > > you can use a QueryFilter. For example; > > > > filter = QueryFilter.new(TermQuery.new(Term.new("subject", "sport"))) > > > > You should be able to build pretty much any filter you need just like > > that. Hope that helps. > > > > Cheers, > > Dave > > > > PS: The fix can't be checked out of svn yet. I still have a lot of > > work to do. Sorry. > Don't apologize man, you've done an exceptional job with it so far. > The filter I was trying to add would filter based on location, so I'm > not sure that It could be done easily using a query-filter. It takes a > latitude, longitude, and radius, then filters for records that are > with the radius...think that's doable with the builtin filters? I > guess I could do it with a bounding box instead, but I'd prefer to > keep it accurate...Anyways, I'll try the rferret route for now, and > hopefully by the time this application goes to production, the c > version will be fixed up. Thanks for your help. That is a perfect example of what you can't use the QueryFilter for. I may even use it as an example in the documentation. Thanks and good luck with the pure Ruby version. Cheers, Dave From jordan.w.frank at gmail.com Sun Jul 16 12:19:02 2006 From: jordan.w.frank at gmail.com (Jordan Frank) Date: Sun, 16 Jul 2006 12:19:02 -0400 Subject: [Ferret-talk] adding a custom filter to the query In-Reply-To: References: Message-ID: On 7/16/06, David Balmain wrote: > > Don't apologize man, you've done an exceptional job with it so far. > > The filter I was trying to add would filter based on location, so I'm > > not sure that It could be done easily using a query-filter. It takes a > > latitude, longitude, and radius, then filters for records that are > > with the radius...think that's doable with the builtin filters? I > > guess I could do it with a bounding box instead, but I'd prefer to > > keep it accurate...Anyways, I'll try the rferret route for now, and > > hopefully by the time this application goes to production, the c > > version will be fixed up. Thanks for your help. > > That is a perfect example of what you can't use the QueryFilter for. I > may even use it as an example in the documentation. Thanks and good > luck with the pure Ruby version. > > Cheers, > Dave > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > I tried out the pure ruby version, but I'm having a little bit of trouble wrapping my head around how to write the filter. It seems that some stuff has changed internally since the sample code that I found at tourb.us was written. I tried looking at the RangeFilter code, but it seems to be solving too different a problem to really be useful as a guide. Do you know of any other filters, or have any pointers to how I would go about writing this filter? It seems really simple, just does a calculation on two of the fields, but because it's not iterating through terms, the RangeFilter code doesn't offer me much help. If you offer some pointers and I manage to get it working, I'd be happy to send you a copy to use as a sample, though it seems like the kind of thing you'd probably be able to write in a few minutes... -- Cheers, Jordan Frank jordan.w.frank at gmail.com From bk at benjaminkrause.com Sun Jul 16 12:39:05 2006 From: bk at benjaminkrause.com (Benjamin) Date: Sun, 16 Jul 2006 18:39:05 +0200 Subject: [Ferret-talk] FieldQuery not returning anything In-Reply-To: References: Message-ID: > The problem is that the Query parser will tokenize the query string. > So Category is getting downcased to category. So you have two options. > You can downcase all entries you want to make searchable in the > untokenized field. Or, if you want to make case meaningful, you need > to create an Analyzer that won't downcase the "type" field. ah.. i see.. thanks a lot :-) Ben -- Posted via http://www.ruby-forum.com/. From dbalmain.ml at gmail.com Sun Jul 16 21:54:45 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Mon, 17 Jul 2006 10:54:45 +0900 Subject: [Ferret-talk] adding a custom filter to the query In-Reply-To: References: Message-ID: On 7/17/06, Jordan Frank wrote: > On 7/16/06, David Balmain wrote: > > > > > > That is a perfect example of what you can't use the QueryFilter for. I > > may even use it as an example in the documentation. Thanks and good > > luck with the pure Ruby version. > > I tried out the pure ruby version, but I'm having a little bit of > trouble wrapping my head around how to write the filter. It seems that > some stuff has changed internally since the sample code that I found > at tourb.us was written. I tried looking at the RangeFilter code, but > it seems to be solving too different a problem to really be useful as > a guide. Do you know of any other filters, or have any pointers to how > I would go about writing this filter? It seems really simple, just > does a calculation on two of the fields, but because it's not > iterating through terms, the RangeFilter code doesn't offer me much > help. If you offer some pointers and I manage to get it working, I'd > be happy to send you a copy to use as a sample, though it seems like > the kind of thing you'd probably be able to write in a few minutes... I think I slightly misunderstood your problem the first time around. To create this filter, you actually have to iterate through every document in the index. This will take some time but it would be worth it if the filter gets used many times, since it gets cached. However, I don't think this would work for you because I'm guessing the longitude, latitude and radius change on a query by query basis. This is not really what the current filters are designed for. Filters should be common query restrictions that are run over and over again. For example, a blog may have a month filter for retrieving documents from a particular month. This is likely to be used over and over again and RangeFilters are pretty cheap to build. So the current solution to your problem is to actually post-filter your query results yourself (ie filter the results once you have them back). So let's say you need ten results. You'd do a search for maybe 50 and run through each result checking the distance and discarding the ones you don't need. You'd repeat the search until you found enough documents. Here is a quick and dirty solution (where num_docs is the number of documents you want in your resultset); def search(index, query, num_docs, latitude, longitude, radius) first_doc = 0 results = [] while true count = index.search_each(query, :first_doc => first_doc, :num_docs => num_docs*5) do |doc_id, score| doc = index[doc_id] # test distance and add to resultset if ok if ((doc[:latitude] - latitude) ** 2 + (doc[:longitude] - longitude) ** 2) < radius ** 2 results << doc end break if results.size == num_docs # have enough docs end break if count < (num_docs * 5) #already scanned all results first_doc += num_docs * 5 end return results end This gets even messier when you need to page through the results. A much nicer solution that this would be to add a :filter_proc to the search methods. Something like this; within_radius = lambda do |doc| return ((doc[:latitude] - latitude) ** 2 + (doc[:longitude] - longitude) ** 2) < (radius ** 2) end index.search_each(query, :filter_proc => within_radius) {|d, s| ...} Does this sound like a good idea? If so I could add it to a future version of Ferret. Please let me know if you can think of a better way to do this. Cheers, Dave From samuelgiffney at gmail.com Mon Jul 17 05:25:11 2006 From: samuelgiffney at gmail.com (Sam Giffney) Date: Mon, 17 Jul 2006 21:25:11 +1200 Subject: [Ferret-talk] adding a custom filter to the query Message-ID: I for one think this custom filter would be an awesome addition. Geospatial and local search is a hot area and it would be cool if ferret facilitated this type of query easily. Would it be a significant performance hit if ferret has to cycle through every document for this search? Fine over a couple of hundred, or thousand? but hundreds of thousands? Just tossing around the idea but... This particular search (distance) can be done quite efficiently with sql. Is it at all feasible that you could 'outsource' the query to sql? Obviously sql could return the id's simply enough, but i guess then you'd need to go through each document anyway... To return a bitset, would the database need to know about the ferret document order? Or how about the reverse, use ferret to create a list of ids to pass into a sql IN query? Afraid I have no idea how efficient that would be either... Anyone in here have a best practice? > ---------- Forwarded message ---------- > From: "David Balmain" > To: ferret-talk at rubyforge.org > Date: Mon, 17 Jul 2006 10:54:45 +0900 > Subject: Re: [Ferret-talk] adding a custom filter to the query > On 7/17/06, Jordan Frank wrote: > > On 7/16/06, David Balmain wrote: > > > > > > > > > That is a perfect example of what you can't use the QueryFilter for. I > > > may even use it as an example in the documentation. Thanks and good > > > luck with the pure Ruby version. > > > > I tried out the pure ruby version, but I'm having a little bit of > > trouble wrapping my head around how to write the filter. It seems that > > some stuff has changed internally since the sample code that I found > > at tourb.us was written. I tried looking at the RangeFilter code, but > > it seems to be solving too different a problem to really be useful as > > a guide. Do you know of any other filters, or have any pointers to how > > I would go about writing this filter? It seems really simple, just > > does a calculation on two of the fields, but because it's not > > iterating through terms, the RangeFilter code doesn't offer me much > > help. If you offer some pointers and I manage to get it working, I'd > > be happy to send you a copy to use as a sample, though it seems like > > the kind of thing you'd probably be able to write in a few minutes... > > I think I slightly misunderstood your problem the first time around. > To create this filter, you actually have to iterate through every > document in the index. This will take some time but it would be worth > it if the filter gets used many times, since it gets cached. However, > I don't think this would work for you because I'm guessing the > longitude, latitude and radius change on a query by query basis. This > is not really what the current filters are designed for. Filters > should be common query restrictions that are run over and over again. > For example, a blog may have a month filter for retrieving documents > from a particular month. This is likely to be used over and over again > and RangeFilters are pretty cheap to build. > > So the current solution to your problem is to actually post-filter > your query results yourself (ie filter the results once you have them > back). So let's say you need ten results. You'd do a search for maybe > 50 and run through each result checking the distance and discarding > the ones you don't need. You'd repeat the search until you found > enough documents. Here is a quick and dirty solution (where num_docs > is the number of documents you want in your resultset); > > def search(index, query, num_docs, latitude, longitude, radius) > first_doc = 0 > results = [] > while true > count = index.search_each(query, > :first_doc => first_doc, > :num_docs => num_docs*5) do |doc_id, score| > doc = index[doc_id] > # test distance and add to resultset if ok > if ((doc[:latitude] - latitude) ** 2 + > (doc[:longitude] - longitude) ** 2) < radius ** 2 > results << doc > end > break if results.size == num_docs # have enough docs > end > break if count < (num_docs * 5) #already scanned all results > first_doc += num_docs * 5 > end > return results > end > > > This gets even messier when you need to page through the results. A > much nicer solution that this would be to add a :filter_proc to the > search methods. Something like this; > > within_radius = lambda do |doc| > return ((doc[:latitude] - latitude) ** 2 + > (doc[:longitude] - longitude) ** 2) < (radius ** 2) > end > > index.search_each(query, :filter_proc => within_radius) {|d, s| ...} > > Does this sound like a good idea? If so I could add it to a future > version of Ferret. Please let me know if you can think of a better way > to do this. > > Cheers, > Dave From crafterm at gmail.com Mon Jul 17 07:51:57 2006 From: crafterm at gmail.com (Marcus Crafter) Date: Mon, 17 Jul 2006 13:51:57 +0200 Subject: [Ferret-talk] Draft port of lucene highlighter to ferret Message-ID: <5b402c165123914c847752c4cb4b9661@ruby-forum.com> Hi All, Hope all is going well. After being knocked out with the flu for a week I've been able to make some progress with porting the lucene highlighter to ferret. I've got a draft version available for perusal at: http://crafterm.net/ruby/highlighter/highlighter-ruby-0.1.tar.gz The port is pretty much a complete copy of the Java version at: http://svn.apache.org/viewvc/lucene/java/trunk/contrib/highlighter/ The highlighter_test.rb file includes a few of the test cases from the original lucene test case. 3 of the 7 test cases in that file are working which shows that basic simple and fuzzy highlighting is working fine. The remaining 4 failing test cases are: test_get_wild_card_fragments test_get_mid_wild_card_fragments - Both of these have problems rewriting a wild card query into a simplified boolean query test_get_range_fragments - emits a query parse exception test_get_best_fragments_span - emits a not implemented exception If you run 'rake' you'll see the wild card error first, the rest have been commented out. If anyone would like to take a look into these failing test cases that would be great. Will keep debugging here and keep you all updated, but just wanted to get the code out there in case anyone wants to join in. Cheers, Marcus -- Posted via http://www.ruby-forum.com/. From jordan.w.frank at gmail.com Mon Jul 17 09:13:43 2006 From: jordan.w.frank at gmail.com (Jordan Frank) Date: Mon, 17 Jul 2006 09:13:43 -0400 Subject: [Ferret-talk] adding a custom filter to the query In-Reply-To: References: Message-ID: Comments inline... On 7/17/06, Sam Giffney wrote: > I for one think this custom filter would be an awesome addition. > Geospatial and local search is a hot area and it would be cool if > ferret facilitated this type of query easily. Agreed. Though it does seem like an abuse of the search engine. The search engine's goal is to retrieve as few documents as possible to satisfy the query, as far as I can tell anyways. David is right, and performing a calculation on every document makes less and less sense the more I think about it. > [...] > Just tossing around the idea but... > This particular search (distance) can be done quite efficiently with > sql. Is it at all feasible that you could 'outsource' the query to > sql? Obviously sql could return the id's simply enough, but i guess > then you'd need to go through each document anyway... To return a > bitset, would the database need to know about the ferret document > order? > > Or how about the reverse, use ferret to create a list of ids to pass > into a sql IN query? Afraid I have no idea how efficient that would be > either... This is exactly how I'm doing it now, but the problem is that the data I'm using is so spread out location-wise that sometimes I only get 40-50 good hits for every 1,000 entries returned from ferret. And so I find myself going back to ferret to retrieve more results a few times for each query, when I need to return 100 results that are within a certain distance. This is obviously inefficient. Obviously I could just pull more results out of ferret in the first place, but most of the time 1,000 is more than enough to get 100 good results. Obviously testing will let me find the optimal number to pull from ferret, but I figured that if I could put the distance calculation into ferret itself, then I could ask for 100 results, and get 100 results every time. > Anyone in here have a best practice? I would like to know if anyone else has tackled this as well, and has some tips as well. > > ---------- Forwarded message ---------- > > From: "David Balmain" > > > > This gets even messier when you need to page through the results. A > > much nicer solution that this would be to add a :filter_proc to the > > search methods. Something like this; > > > > within_radius = lambda do |doc| > > return ((doc[:latitude] - latitude) ** 2 + > > (doc[:longitude] - longitude) ** 2) < (radius ** 2) > > end > > > > index.search_each(query, :filter_proc => within_radius) {|d, s| ...} > > > > Does this sound like a good idea? If so I could add it to a future > > version of Ferret. Please let me know if you can think of a better way > > to do this. > > This is how I'm doing it now. I guess adding the filter_proc would clean up my code a bit, and simplify the paging etc. My question would be how you'd handle the problem that I mentioned earlier, that is how to determine how many documents to retrieve before the filter_proc is evaluated in order to eventually return the desired number of documents. I don't know enough about the internals of ferret to know if I'm bringing up a valid point, but I'm guessing that if I only request the top 5 documents for a query, it doesn't retrieve every single document that satisfies the query and then take the top 5 from that list. Maybe it does though, as I said, I don't know enough about the internals of ferret, though I'd like to... So if the problem that I bring up is legitimate, then the problem would be in coming up with some sort of heuristic based on how many documents are expected to satisfy the filter_proc. If only 10% of the documents satisfy the filter_proc, then to get the top 5 documents matching a query, we'd want to retrieve the top 50 documents internally, then pass them through the filter_proc, and hopefully we'd be left with at least 5 to return. For my specific application, I'm in a better position to determine this hit percentage, and so I'm in a better position to do the filtering. I don't know whether doing this in ferret would be efficient or even feasible. Anyways, let me know what your thoughts are on this. The filter_proc idea is a good one, as long as it can be implemented efficiently. Otherwise I'll just keep using my two phase method, retrieve the documents from ferret, and then do the location filtering in SQL. -- Cheers, Jordan Frank jordan.w.frank at gmail.com From dbalmain.ml at gmail.com Mon Jul 17 09:44:24 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Mon, 17 Jul 2006 22:44:24 +0900 Subject: [Ferret-talk] adding a custom filter to the query In-Reply-To: References: Message-ID: On 7/17/06, Jordan Frank wrote: > > > > From: "David Balmain" > > > > > > This gets even messier when you need to page through the results. A > > > much nicer solution that this would be to add a :filter_proc to the > > > search methods. Something like this; > > > > > > within_radius = lambda do |doc| > > > return ((doc[:latitude] - latitude) ** 2 + > > > (doc[:longitude] - longitude) ** 2) < (radius ** 2) > > > end > > > > > > index.search_each(query, :filter_proc => within_radius) {|d, s| ...} > > > > > > Does this sound like a good idea? If so I could add it to a future > > > version of Ferret. Please let me know if you can think of a better way > > > to do this. > > > > > This is how I'm doing it now. I guess adding the filter_proc would > clean up my code a bit, and simplify the paging etc. My question would > be how you'd handle the problem that I mentioned earlier, that is how > to determine how many documents to retrieve before the filter_proc is > evaluated in order to eventually return the desired number of > documents. I don't know enough about the internals of ferret to know > if I'm bringing up a valid point, but I'm guessing that if I only > request the top 5 documents for a query, it doesn't retrieve every > single document that satisfies the query and then take the top 5 from > that list. Maybe it does though, as I said, I don't know enough about > the internals of ferret, though I'd like to... Ferret actually has to check the score of every singly document in the index that matches the query. It keeps a priority queue of as many documents as it needs to return the result set. So if :num_docs is 50, and :first_doc is 200 Ferret will need to keep a priority queue of 250 documents. > So if the problem that I bring up is legitimate, then the problem > would be in coming up with some sort of heuristic based on how many > documents are expected to satisfy the filter_proc. If only 10% of the > documents satisfy the filter_proc, then to get the top 5 documents > matching a query, we'd want to retrieve the top 50 documents > internally, then pass them through the filter_proc, and hopefully we'd > be left with at least 5 to return. For my specific application, I'm in > a better position to determine this hit percentage, and so I'm in a > better position to do the filtering. I don't know whether doing this > in ferret would be efficient or even feasible. You wouldn't need to request more documents than you need using the :filter_proc idea. You'd just specify :num_docs as usual and you'd get :num_docs back. So if you want 50 documents you'd get 50 documents (or less if fewer documents matched the query and distance constraint). > Anyways, let me know what your thoughts are on this. The filter_proc > idea is a good one, as long as it can be implemented efficiently. > Otherwise I'll just keep using my two phase method, retrieve the > documents from ferret, and then do the location filtering in SQL. The proc would just be called once for every matching document in the result set, not every document. It shouldn't be too expensive at all and probably a lot more efficient than filtering using the SQL method. Cheers, Dave From andy.caspar at gmail.com Mon Jul 17 10:25:07 2006 From: andy.caspar at gmail.com (Andy Caspar) Date: Mon, 17 Jul 2006 07:25:07 -0700 Subject: [Ferret-talk] Scaling Ferret Beyond One Server In-Reply-To: References: <146582890607141547w196def37lfbc22da3f9ed33bf@mail.gmail.com> Message-ID: <146582890607170725h384d02f5xc73b0e1edea81780@mail.gmail.com> Dave, Thanks for your feedback and for developing the wonderful Ferret! Besides performance, our application requirement is to have no single point of failure - which is why we are looking at running Ferret (at least the search node) beyond a single server. In the lucene world, there's an interesting post at http://www.mail-archive.com/lucene-user at jakarta.apache.org/msg12709.html on how Technorati is doing distributed Lucene... Our current options are (1) dRB, (2) some replication technique similar to the one described by Doug Cutting in the above post, and (3) possibly some form of distributed file system like hadoop (which will also serve other needs for our app). Will let the list know how it goes. Also, interested in hearing anybody else' experience on using Ferret on more than one machine. -AC On 7/14/06, David Balmain wrote: > > On 7/15/06, Andy Caspar wrote: > > Hi Everyone, > > > > I was wondering if folks here have had experience scaling Ferret beyond > a > > single server? Currently, we are running Ferret in the same physical > server > > as its Rails front end (via acts_as_ferret), but it is evident that we > need > > a more scalable solution already. How would you split up the tasks (via > dRB > > perhaps?) between two or three servers? Shared disk, replicated Ferret > > index (?), or any other ideas? > > > > Thanks in advance, > > AC > > Hi Andy, > > I guess the answer depends on which part of the application is the > bottleneck. If it is Ferret then replicating the index might be the > solution but it's complicated and I doubt that is your problem. > > If Ferret is handling the workload (which it should be if you have the > C extension installed) then my guess would be to use a DRb solution. > In a few weeks I'm going to start experimenting with using Ferret with > DRb and future versions may even come with a DRb server included. In > the mean time let me know how you go. > > Cheers, > Dave > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20060717/1e6ccbcf/attachment.html From jordan.w.frank at gmail.com Mon Jul 17 11:26:11 2006 From: jordan.w.frank at gmail.com (Jordan Frank) Date: Mon, 17 Jul 2006 11:26:11 -0400 Subject: [Ferret-talk] adding a custom filter to the query In-Reply-To: References: Message-ID: On 7/17/06, David Balmain wrote: > Ferret actually has to check the score of every singly document in the > index that matches the query. It keeps a priority queue of as many > documents as it needs to return the result set. So if :num_docs is 50, > and :first_doc is 200 Ferret will need to keep a priority queue of 250 > documents. > > The proc would just be called once for every matching document in the > result set, not every document. It shouldn't be too expensive at all > and probably a lot more efficient than filtering using the SQL method. > If that's the case, then I think the filter_proc idea would be fantastic, and I'd love to see it make it's way into a future version. -- Cheers, Jordan Frank jordan.w.frank at gmail.com From garypelliott at gmail.com Mon Jul 17 14:07:09 2006 From: garypelliott at gmail.com (Gary Elliott) Date: Mon, 17 Jul 2006 14:07:09 -0400 Subject: [Ferret-talk] adding a custom filter to the query In-Reply-To: References: Message-ID: This is a "me too" post. I would love to replace the query filter we use on tourb.us with this. gary On 7/17/06, Jordan Frank wrote: > On 7/17/06, David Balmain wrote: > > Ferret actually has to check the score of every singly document in the > > index that matches the query. It keeps a priority queue of as many > > documents as it needs to return the result set. So if :num_docs is 50, > > and :first_doc is 200 Ferret will need to keep a priority queue of 250 > > documents. > > > > The proc would just be called once for every matching document in the > > result set, not every document. It shouldn't be too expensive at all > > and probably a lot more efficient than filtering using the SQL method. > > > > If that's the case, then I think the filter_proc idea would be > fantastic, and I'd love to see it make it's way into a future version. > > -- > Cheers, > Jordan Frank > jordan.w.frank at gmail.com > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > From ryan at theryanking.com Mon Jul 17 19:37:24 2006 From: ryan at theryanking.com (Guest) Date: Tue, 18 Jul 2006 01:37:24 +0200 Subject: [Ferret-talk] Scaling Ferret Beyond One Server In-Reply-To: <146582890607170725h384d02f5xc73b0e1edea81780@mail.gmail.com> References: <146582890607141547w196def37lfbc22da3f9ed33bf@mail.gmail.com> <146582890607170725h384d02f5xc73b0e1edea81780@mail.gmail.com> Message-ID: <65a8dcab283ec7333c7bc54eda13eda6@ruby-forum.com> Andy Caspar wrote: >(3) possibly some form of distributed file system like hadoop Actually hadoop is build for distributed filesystem that only need sequential reading of files. Its not useful for random access. You might want to try something like MogileFS instead. -- Posted via http://www.ruby-forum.com/. From jennyw at dangerousideas.com Mon Jul 17 23:57:42 2006 From: jennyw at dangerousideas.com (jennyw) Date: Mon, 17 Jul 2006 20:57:42 -0700 Subject: [Ferret-talk] Some basic questions Message-ID: <44BC5C36.8050202@dangerousideas.com> Hi, David and everyone, I've had Ferret running fine in a production Rails application for a while now. I haven't updated Ferret or really looked at the Ferret-related code since probably January, but I recently started thinking about trying out the latest version (we were using 0.3.2, I think). I got the latest (0.9.4) and have noticed things break. In particular, I used to refer to the constant Ferret::Analysis::StandardAnalyzer::ENGLISH_STOP_WORDS, but now when I try to reference it I get an uninitialized constant error for StopAnalyzer. Here's an example IRB session: 1 irb(main):001:0> require 'rubygems' 2 => true 3 irb(main):002:0> require_gem 'ferret' 4 => true 5 irb(main):003:0> Ferret::Analysis::StopAnalyzer 6 NameError: uninitialized constant Ferret 7 from (irb):3 8 irb(main):004:0> require 'ferret' 9 => true 10 irb(main):005:0> Ferret::Analysis::StopAnalyzer 11 NameError: uninitialized constant Ferret::Analysis::StopAnalyzer 12 from (irb):5 13 irb(main):006:0> Ferret::Analysis::StandardAnalyzer 14 => Ferret::Analysis::StandardAnalyzer 15 irb(main):007:0> Ferret::Analysis::StandardAnalyzer.superclass 16 => Ferret::Analysis::Analyzer A few questions: On line 8 -- why is it necessary to require 'ferret' after doing a require_gem 'ferret'? On lines 10 and 11 -- why isn't it finding StopAnalyzer? It's defined in the same file as StandardAnalyzer (lines 13 and 14). On lines 15 and 16 -- StandardAnalyzer is defined in ferret-0.9.4/lib/ferret/analysis/analyzers.rb, and it's a subclass of StopAnalyzer. Yet when I try the superclass method, it returns Analyzer. Any ideas why this is? Maybe I'm missing something really obvious, but I'm finding it quite perplexing right now. Thanks! Jen P.S. David -- thanks for Ferret! It's been running great in our production application. People love the app's search capabilities, including fuzzy search, which are of course all made possible by Ferret. From dbalmain.ml at gmail.com Tue Jul 18 01:17:00 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Tue, 18 Jul 2006 14:17:00 +0900 Subject: [Ferret-talk] Some basic questions In-Reply-To: <44BC5C36.8050202@dangerousideas.com> References: <44BC5C36.8050202@dangerousideas.com> Message-ID: On 7/18/06, jennyw wrote: > Hi, David and everyone, > > I've had Ferret running fine in a production Rails application for a > while now. I haven't updated Ferret or really looked at the > Ferret-related code since probably January, but I recently started > thinking about trying out the latest version (we were using 0.3.2, I > think). I got the latest (0.9.4) and have noticed things break. In > particular, I used to refer to the constant > Ferret::Analysis::StandardAnalyzer::ENGLISH_STOP_WORDS, but now when I > try to reference it I get an uninitialized constant error for > StopAnalyzer. Here's an example IRB session: > > 1 irb(main):001:0> require 'rubygems' > 2 => true > 3 irb(main):002:0> require_gem 'ferret' > 4 => true > 5 irb(main):003:0> Ferret::Analysis::StopAnalyzer > 6 NameError: uninitialized constant Ferret > 7 from (irb):3 > 8 irb(main):004:0> require 'ferret' > 9 => true > 10 irb(main):005:0> Ferret::Analysis::StopAnalyzer > 11 NameError: uninitialized constant Ferret::Analysis::StopAnalyzer > 12 from (irb):5 > 13 irb(main):006:0> Ferret::Analysis::StandardAnalyzer > 14 => Ferret::Analysis::StandardAnalyzer > 15 irb(main):007:0> Ferret::Analysis::StandardAnalyzer.superclass > 16 => Ferret::Analysis::Analyzer > > A few questions: Hi Jenny, > On line 8 -- why is it necessary to require 'ferret' after doing a > require_gem 'ferret'? Because I haven't set the autorequire parameter in the gemspec. Until 30 seconds ago I wasn't even aware of it so I'll make sure it's set in futer versions. > On lines 10 and 11 -- why isn't it finding StopAnalyzer? It's defined in > the same file as StandardAnalyzer (lines 13 and 14). The code you are looking at is not the actual code that you are running. All the analyzers are now defined in the extension (as long as you have the extension loaded). Unfortunately I neglected to add the StopAnalyzer. I'll try and remember for the next version. > On lines 15 and 16 -- StandardAnalyzer is defined in > ferret-0.9.4/lib/ferret/analysis/analyzers.rb, and it's a subclass of > StopAnalyzer. Yet when I try the superclass method, it returns Analyzer. > Any ideas why this is? Same reason as above. > Maybe I'm missing something really obvious, but I'm finding it quite > perplexing right now. I'd suggest waiting for the next version. There are going to be a lot of major changes. There will no longer be a pure Ruby version of Ferret, (windows users no need to fret, there'll be a windows binary). I had hoped to get it out over the weekend but I still have a few problems. I'll announce it on the list as soon as it's ready. > P.S. David -- thanks for Ferret! It's been running great in our > production application. People love the app's search capabilities, > including fuzzy search, which are of course all made possible by Ferret. I'm very happy to hear that. Cheers, Dave From dbalmain.ml at gmail.com Tue Jul 18 01:56:36 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Tue, 18 Jul 2006 14:56:36 +0900 Subject: [Ferret-talk] Some basic questions In-Reply-To: References: <44BC5C36.8050202@dangerousideas.com> Message-ID: On 7/18/06, David Balmain wrote: > On 7/18/06, jennyw wrote: > > On lines 10 and 11 -- why isn't it finding StopAnalyzer? It's defined in > > the same file as StandardAnalyzer (lines 13 and 14). > > The code you are looking at is not the actual code that you are > running. All the analyzers are now defined in the extension (as long > as you have the extension loaded). Unfortunately I neglected to add > the StopAnalyzer. I'll try and remember for the next version. Ok, that makes it sound like my process for creating the next release is a bit dodgy. When I say "I'll try and remember", it is because I'm rewriting most of the bindings and the API is changing a lot so there will be a whole new set of unit tests. It is quite likely that a lot of stuff will be left out, simply because I'm still working on it. So please be patient. The next release, 0.10.0 will be very alpha quality so From jennyw at dangerousideas.com Tue Jul 18 02:19:30 2006 From: jennyw at dangerousideas.com (jennyw) Date: Mon, 17 Jul 2006 23:19:30 -0700 Subject: [Ferret-talk] Some basic questions In-Reply-To: References: <44BC5C36.8050202@dangerousideas.com> Message-ID: <44BC7D72.5080506@dangerousideas.com> Thanks for the answers, David! David Balmain wrote: > Ok, that makes it sound like my process for creating the next release > is a bit dodgy. When I say "I'll try and remember", it is because I'm > rewriting most of the bindings and the API is changing a lot so there > will be a whole new set of unit tests. It is quite likely that a lot > of stuff will be left out, simply because I'm still working on it. So > please be patient. The next release, 0.10.0 will be very alpha quality > so Looks like you might have gotten cut off, but it sounds like you're saying that we should expect some issues with 0.10.0. Out of curiosity, is there a recommended stable version of Ferret? We've been running on 0.3.2 and it's working well, so there's no real reason to change, but I thought I'd ask. The enhanced performance with the later Ferret versions would be great but not necessary for our needs on this project. We're starting a new project this month, though, that will be for a customer with a highly visited Web site, so I'll probably be putting 0.10.0 through its paces. Thanks again! Jen From dbalmain.ml at gmail.com Tue Jul 18 02:44:45 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Tue, 18 Jul 2006 15:44:45 +0900 Subject: [Ferret-talk] Some basic questions In-Reply-To: <44BC7D72.5080506@dangerousideas.com> References: <44BC5C36.8050202@dangerousideas.com> <44BC7D72.5080506@dangerousideas.com> Message-ID: On 7/18/06, jennyw wrote: > Thanks for the answers, David! > > David Balmain wrote: > > Ok, that makes it sound like my process for creating the next release > > is a bit dodgy. When I say "I'll try and remember", it is because I'm > > rewriting most of the bindings and the API is changing a lot so there > > will be a whole new set of unit tests. It is quite likely that a lot > > of stuff will be left out, simply because I'm still working on it. So > > please be patient. The next release, 0.10.0 will be very alpha quality > > so > > Looks like you might have gotten cut off, but it sounds like you're > saying that we should expect some issues with 0.10.0. Yes, I was going to say don't start porting your production apps to 0.10.0. I would however suggest that people starting new applications should use 0.10.0 as soon as it comes out. It should quickly become more stable than previous versions. > Out of curiosity, is there a recommended stable version of Ferret? > We've been running on 0.3.2 and it's working well, so there's no real > reason to change, but I thought I'd ask. If you are currently using 0.3.2 I'd stick with that until the 0.10 series becomes stable which will hopefully be very soon. > The enhanced performance with > the later Ferret versions would be great but not necessary for our needs > on this project. We're starting a new project this month, though, that > will be for a customer with a highly visited Web site, so I'll probably > be putting 0.10.0 through its paces. Good idea. I hope you like the new API. Cheers, Dave From julioody at gmail.com Tue Jul 18 02:52:48 2006 From: julioody at gmail.com (Julio Cesar Ody) Date: Tue, 18 Jul 2006 16:52:48 +1000 Subject: [Ferret-talk] searching with chinese chars Message-ID: Hi all, maybe not a Ferret question, but I assume here might have came across that already. I wrote a simple CGI app that adds docs into a Ferret index. The idea is testing asian languages input and searching. The script that does the input seems to be OK. As David mentioned in a question I made a little while ago, Ferret's index is agnostic, in the sense that you can store anything in it. I then wrote another one to search the index created. This is what it looks like: #################################### #!/usr/bin/ruby $KCODE = 'u' require 'cgi' require 'ferret' include Ferret index = Index::Index.new(:path => '/var/index', :default_field => "*") cgi = CGI.new("html4") result = "" if cgi['query'] and not cgi['query'].empty? index.search_each(cgi['query']) do |doc, score| result << "
#{index[doc]['tileid']}#{index[doc]['title']}#{index[doc]['description']}
" end end #################################### It's A-OK for searching english. But when trying to input chinese characters in the "query" field, I'm getting the following error in my lighttpd log file: #################################### /var/www/localhost/htdocs/cgi-bin/search_chinese.ruby:15:in `search_each': : Error occured at :701 (Exception) Error: exception 2 not handled: Error decoding input string. Check that you have the locale set correctly from /var/www/localhost/htdocs/cgi-bin/search_chinese.ruby:15 #################################### Is the error message above suggesting I should specify a chinese locale and not UTF-8? I thought UTF-8 would actually handle chinese and anything else one could throw at it as long as it's a human language. Any help is appreciated. -- Julio C. Ody http://rootshell.be/~julioody From dbalmain.ml at gmail.com Tue Jul 18 03:22:16 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Tue, 18 Jul 2006 16:22:16 +0900 Subject: [Ferret-talk] searching with chinese chars In-Reply-To: References: Message-ID: On 7/18/06, Julio Cesar Ody wrote: > Hi all, > > maybe not a Ferret question, but I assume here might have came across > that already. > > I wrote a simple CGI app that adds docs into a Ferret index. The idea > is testing asian languages input and searching. > > The script that does the input seems to be OK. As David mentioned in a > question I made a little while ago, Ferret's index is agnostic, in the > sense that you can store anything in it. I then wrote another one to > search the index created. This is what it looks like: > > #################################### > > #!/usr/bin/ruby > > $KCODE = 'u' > require 'cgi' > require 'ferret' > include Ferret > > index = Index::Index.new(:path => '/var/index', :default_field => "*") > > cgi = CGI.new("html4") > > result = "" > if cgi['query'] and not cgi['query'].empty? > index.search_each(cgi['query']) do |doc, score| > result << " > >
#{index[doc]['tileid']}#{index[doc]['title']}#{index[doc]['description']}
> " > end > end > #################################### > > It's A-OK for searching english. But when trying to input chinese > characters in the "query" field, I'm getting the following error in my > lighttpd log file: > > #################################### > /var/www/localhost/htdocs/cgi-bin/search_chinese.ruby:15:in > `search_each': : Error occured at :701 (Exception) > Error: exception 2 not handled: Error decoding input string. Check > that you have the locale set correctly > from /var/www/localhost/htdocs/cgi-bin/search_chinese.ruby:15 > #################################### > > Is the error message above suggesting I should specify a chinese > locale and not UTF-8? I thought UTF-8 would actually handle chinese > and anything else one could throw at it as long as it's a human > language. > > Any help is appreciated. The error is being raised when the analyzer tries to tokenize the query string My guess would be that the query string either starts in the wrong encoding (when you type it in) or it gets converted somewhere between being typed in the browser and going into your script. UTF-8 can certainly handle Chinese characters if they are UTF-8 encoded but there are other encodings for Chinese as well. If I were trying to debug this, the first thing I'd do is log the query string in a file and check its encoding. Something like; File.open("query.log", "w") {|f| f.write(cgi['query'])} If you want, send me the file and I'll try and see what encoding it is. Cheers, Dave From julioody at gmail.com Tue Jul 18 03:49:03 2006 From: julioody at gmail.com (Julio Cesar Ody) Date: Tue, 18 Jul 2006 17:49:03 +1000 Subject: [Ferret-talk] searching with chinese chars In-Reply-To: References: Message-ID: > The error is being raised when the analyzer tries to tokenize the > query string My guess would be that the query string either starts in > the wrong encoding (when you type it in) Didn't get that bit. > or it gets converted > somewhere between being typed in the browser and going into your > script. Umm... maybe yes. > UTF-8 can certainly handle Chinese characters if they are > UTF-8 encoded but there are other encodings for Chinese as well. If I > were trying to debug this, the first thing I'd do is log the query > string in a file and check its encoding. Something like; > > File.open("query.log", "w") {|f| f.write(cgi['query'])} > > If you want, send me the file and I'll try and see what encoding it is. I wrote another script that does just that (writes cgi['query'] to /tmp/query.log). After inputting this in a text field name "query" and submitting this chinese string: ?? This is what appears in the /tmp/query.log 新闻 Note that the only thing I did hoping to have evething magically working in UTF-8 is putting this in my script: $KCODE = 'u' Anything I'm missing? > > Cheers, > Dave -- Julio C. Ody http://rootshell.be/~julioody From dbalmain.ml at gmail.com Tue Jul 18 04:01:28 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Tue, 18 Jul 2006 17:01:28 +0900 Subject: [Ferret-talk] searching with chinese chars In-Reply-To: References: Message-ID: On 7/18/06, Julio Cesar Ody wrote: > I wrote another script that does just that (writes cgi['query'] to > /tmp/query.log). After inputting this in a text field name "query" and > submitting this chinese string: > > ?? > > This is what appears in the /tmp/query.log > > 新闻 > > Note that the only thing I did hoping to have evething magically > working in UTF-8 is putting this in my script: > > $KCODE = 'u' > > Anything I'm missing? dbalmain at ubuntu:~/ $ irb -Ku irb(main):001:0> require 'cgi' => true irb(main):002:0> CGI.unescapeHTML("新闻") => "??" That should fix your problem. Dave From jordan.w.frank at gmail.com Tue Jul 18 14:51:02 2006 From: jordan.w.frank at gmail.com (Jordan Frank) Date: Tue, 18 Jul 2006 14:51:02 -0400 Subject: [Ferret-talk] adding a custom filter to the query In-Reply-To: References: Message-ID: On 7/17/06, Gary Elliott wrote: > This is a "me too" post. I would love to replace the query filter we > use on tourb.us with this. > > gary > Maybe Gary, or someone else can help me, but I've put the query filter problem aside, and I'm trying to do this by finding locations within a bounding box using Range queries on the longitude and latitude. Unfortunately I'm running into some problems, since I'm comparing numeric values that can be positive or negative, and as far as I can tell, Ferret (actually I've only been able to find information about Lucene, but I'm assuming it's the same) does the comparisons lexicographically, and not numerically. So I've tried to replicate the encoding as they do in http://wiki.apache.org/jakarta-lucene/SearchNumericalFields, but I'm encountering some strange behaviour that is throwing me off. So I index a bunch of documents, and see the following line of output: Adding field latitude_string with value '004915010' to index So that is the encoded version of 49.1501. Now if I do the following query, I should get this record back: >> Person.ferret_index.search_each("latitude_string:['000000000' '099999999']") => 0 But I don't, and I can verify that lexicographically, ruby sees '004915010' as lying between '000000000' and '099999999': >> '000000000' <= '004915010' and '004915010' <= '099999999' => true But the query returns no results. I've tried a few more, as follows: >> Person.ferret_index.search_each("latitude_string:(> '000000000')") do end => 7 >> Person.ferret_index.search_each("latitude_string:(< '099900000')") do end => 0 And so clearly it is not seeing that '004915010' < '099999999'. If I remove the quotes, it works properly, but the problem is then with the negative values. >> Person.ferret_index.search_each("latitude_string:(> -00000000)") do end => 0 >> Person.ferret_index.search_each("latitude_string:(> '-00000000')") do end => 7 So the quotes affect things, but then what if I need to search between a negative value and a positive value. >> Person.ferret_index.search_each("latitude_string:(< 099999999)") do end => 7 >> Person.ferret_index.search_each("longitude_string:(> '-00000000')") do end => 7 >> Person.ferret_index.search_each("latitude_string:['-00000000' 099999999]") do end => 0 For now should I just not be using range queries at all, and just quote negative values? I'd have to do more testing to see if it's accurate, but it seems to be the only way that works...maybe I could make all values positive by adding a constant to them all? Any ideas why this is occuring? Am I doing this completely backwards, is there an easier way to do the numeric comparisons? I'm very sorry if this is an issue that has been discussed before, but I did look through the archives and didn't find anything... -- Cheers, Jordan Frank jordan.w.frank at gmail.com From etienne.durand at mail.com Tue Jul 18 15:09:16 2006 From: etienne.durand at mail.com (Jean-Etienne Durand) Date: Tue, 18 Jul 2006 21:09:16 +0200 Subject: [Ferret-talk] adding a custom filter to the query In-Reply-To: References: Message-ID: <44BD31DC.1060904@mail.com> Jordan, Why not using NumberTools::long_to_s to convert your numeric values (indexing & search) ? Jean-Etienne Jordan Frank wrote: > On 7/17/06, Gary Elliott wrote: >> This is a "me too" post. I would love to replace the query filter we >> use on tourb.us with this. >> >> gary >> > > Maybe Gary, or someone else can help me, but I've put the query filter > problem aside, and I'm trying to do this by finding locations within a > bounding box using Range queries on the longitude and latitude. > Unfortunately I'm running into some problems, since I'm comparing > numeric values that can be positive or negative, and as far as I can > tell, Ferret (actually I've only been able to find information about > Lucene, but I'm assuming it's the same) does the comparisons > lexicographically, and not numerically. > > So I've tried to replicate the encoding as they do in > http://wiki.apache.org/jakarta-lucene/SearchNumericalFields, but I'm > encountering some strange behaviour that is throwing me off. > > So I index a bunch of documents, and see the following line of output: > Adding field latitude_string with value '004915010' to index > So that is the encoded version of 49.1501. > Now if I do the following query, I should get this record back: >>> Person.ferret_index.search_each("latitude_string:['000000000' '099999999']") > => 0 > > But I don't, and I can verify that lexicographically, ruby sees > '004915010' as lying between '000000000' and '099999999': >>> '000000000' <= '004915010' and '004915010' <= '099999999' > => true > > But the query returns no results. I've tried a few more, as follows: >>> Person.ferret_index.search_each("latitude_string:(> '000000000')") do end > => 7 >>> Person.ferret_index.search_each("latitude_string:(< '099900000')") do end > => 0 > > And so clearly it is not seeing that '004915010' < '099999999'. If I > remove the quotes, it works properly, but the problem is then with the > negative values. >>> Person.ferret_index.search_each("latitude_string:(> -00000000)") do end > => 0 >>> Person.ferret_index.search_each("latitude_string:(> '-00000000')") do end > => 7 > > So the quotes affect things, but then what if I need to search between > a negative value and a positive value. >>> Person.ferret_index.search_each("latitude_string:(< 099999999)") do end > => 7 >>> Person.ferret_index.search_each("longitude_string:(> '-00000000')") do end > => 7 >>> Person.ferret_index.search_each("latitude_string:['-00000000' > 099999999]") do end > => 0 > > For now should I just not be using range queries at all, and just > quote negative values? I'd have to do more testing to see if it's > accurate, but it seems to be the only way that works...maybe I could > make all values positive by adding a constant to them all? > > Any ideas why this is occuring? Am I doing this completely backwards, > is there an easier way to do the numeric comparisons? I'm very sorry > if this is an issue that has been discussed before, but I did look > through the archives and didn't find anything... > From jordan.w.frank at gmail.com Tue Jul 18 15:58:27 2006 From: jordan.w.frank at gmail.com (Jordan Frank) Date: Tue, 18 Jul 2006 15:58:27 -0400 Subject: [Ferret-talk] adding a custom filter to the query In-Reply-To: <44BD31DC.1060904@mail.com> References: <44BD31DC.1060904@mail.com> Message-ID: On 7/18/06, Jean-Etienne Durand wrote: > Jordan, > > Why not using NumberTools::long_to_s to convert your numeric values > (indexing & search) ? > > Jean-Etienne > Well, because I am a fool, and did not notice this class that seems to be exactly what I need. -- Cheers, Jordan Frank jordan.w.frank at gmail.com From jordan.w.frank at gmail.com Tue Jul 18 16:20:51 2006 From: jordan.w.frank at gmail.com (Jordan Frank) Date: Tue, 18 Jul 2006 16:20:51 -0400 Subject: [Ferret-talk] adding a custom filter to the query In-Reply-To: References: <44BD31DC.1060904@mail.com> Message-ID: On 7/18/06, Jordan Frank wrote: > > Well, because I am a fool, and did not notice this class that seems to > be exactly what I need. > > -- > Cheers, > Jordan Frank > jordan.w.frank at gmail.com > Actually, I spoke too soon. It appears that this class has the same problem with negative numbers. For example: >> Person.search("latitude_string:[00000000000000 0000000000nesr]").length => 7 >> Person.search("latitude_string:[-1y2p0ij321x6p 0000000000nesr]").length => 0 I've expanded my range, so shouldn't the number of results be at least what it was with all 0's? I've tried with quotes too, and it doesn't help. Again though, if I do the following (note the quotes): >> Person.search("latitude_string:(> '-1y2p0ij321x6p' AND < 0000000000nesr").length => 7 It works... So what i've done, is because I'm only working with longitudes and latitudes, which are guaranteed to lie between -500 and 500, I'm just adding 500 to them, to make them all positive, and then I can use the range queries...and I wrote my own little number to string thing, since I'm working with small values. But nevertheless I thank you for your help. -- Cheers, Jordan Frank jordan.w.frank at gmail.com From julioody at gmail.com Tue Jul 18 20:50:41 2006 From: julioody at gmail.com (Julio Cesar Ody) Date: Wed, 19 Jul 2006 10:50:41 +1000 Subject: [Ferret-talk] searching with chinese chars In-Reply-To: References: Message-ID: Yep, it did. Thanks tons! But I'm not getting any results now. I take this is because of the default analyzer being used, right? How can I use a whitespace analyzer in my query? (or something that could work effectively with asian languages). For my needs, I suppose the whitespace one could do... On 7/18/06, David Balmain wrote: > On 7/18/06, Julio Cesar Ody wrote: > > I wrote another script that does just that (writes cgi['query'] to > > /tmp/query.log). After inputting this in a text field name "query" and > > submitting this chinese string: > > > > ?? > > > > This is what appears in the /tmp/query.log > > > > 新闻 > > > > Note that the only thing I did hoping to have evething magically > > working in UTF-8 is putting this in my script: > > > > $KCODE = 'u' > > > > Anything I'm missing? > > dbalmain at ubuntu:~/ $ irb -Ku > irb(main):001:0> require 'cgi' > => true > irb(main):002:0> CGI.unescapeHTML("新闻") > => "??" > > That should fix your problem. > > Dave > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk -- Julio C. Ody http://rootshell.be/~julioody From dbalmain.ml at gmail.com Tue Jul 18 21:24:19 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Wed, 19 Jul 2006 10:24:19 +0900 Subject: [Ferret-talk] searching with chinese chars In-Reply-To: References: Message-ID: On 7/19/06, Julio Cesar Ody wrote: > Yep, it did. Thanks tons! > > But I'm not getting any results now. I take this is because of the > default analyzer being used, right? > > How can I use a whitespace analyzer in my query? (or something that > could work effectively with asian languages). > > For my needs, I suppose the whitespace one could do... index = Index::Index.new(:path => '/var/index', :default_field => "*", :analyzer => Ferret::Analysis::WhiteSpaceAnalzyer.new) Although you should probably use the same analyzer I gave you for indexing; http://www.ruby-forum.com/topic/72086#101764 Cheers, Dave From dbalmain.ml at gmail.com Tue Jul 18 21:38:26 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Wed, 19 Jul 2006 10:38:26 +0900 Subject: [Ferret-talk] adding a custom filter to the query In-Reply-To: References: Message-ID: On 7/19/06, Jordan Frank wrote: > On 7/17/06, Gary Elliott wrote: > > So I index a bunch of documents, and see the following line of output: > Adding field latitude_string with value '004915010' to index > So that is the encoded version of 49.1501. > Now if I do the following query, I should get this record back: > >> Person.ferret_index.search_each("latitude_string:['000000000' '099999999']") > => 0 irb(main):008:0> index.search("latitude:[000000000 099999999]").size => 1 irb(main):009:0> index.search("latitude:['000000000' '099999999']").size => 0 The quotes are getting tokenized with the terms so the problem is that "'0099999999'" <= '004915010' Perhaps you already worked that out. Dave From julioody at gmail.com Tue Jul 18 21:45:11 2006 From: julioody at gmail.com (Julio Cesar Ody) Date: Wed, 19 Jul 2006 11:45:11 +1000 Subject: [Ferret-talk] searching with chinese chars In-Reply-To: References: Message-ID: Thanks, and sorry. I checked the documentation for Index::Index and found it right after I asked the question. My bad. I'm getting segfauls when trying to initialize an index using a different analyzer other than the default one (but it works otherwise). But as I can see in this thread http://www.ruby-forum.com/topic/71620 It ain't stable yet for 64 bit. So I'll wait. Thanks again. On 7/19/06, David Balmain wrote: > On 7/19/06, Julio Cesar Ody wrote: > > Yep, it did. Thanks tons! > > > > But I'm not getting any results now. I take this is because of the > > default analyzer being used, right? > > > > How can I use a whitespace analyzer in my query? (or something that > > could work effectively with asian languages). > > > > For my needs, I suppose the whitespace one could do... > > index = Index::Index.new(:path => '/var/index', :default_field => "*", > :analyzer => Ferret::Analysis::WhiteSpaceAnalzyer.new) > > Although you should probably use the same analyzer I gave you for indexing; > > http://www.ruby-forum.com/topic/72086#101764 > > Cheers, > Dave > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > -- Julio C. Ody http://rootshell.be/~julioody From dbalmain.ml at gmail.com Tue Jul 18 21:45:45 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Wed, 19 Jul 2006 10:45:45 +0900 Subject: [Ferret-talk] adding a custom filter to the query In-Reply-To: References: <44BD31DC.1060904@mail.com> Message-ID: On 7/19/06, Jordan Frank wrote: > On 7/18/06, Jordan Frank wrote: > > > > Well, because I am a fool, and did not notice this class that seems to > > be exactly what I need. > > > > -- > > Cheers, > > Jordan Frank > > jordan.w.frank at gmail.com > > > > Actually, I spoke too soon. It appears that this class has the same > problem with negative numbers. For example: > > >> Person.search("latitude_string:[00000000000000 0000000000nesr]").length > => 7 > >> Person.search("latitude_string:[-1y2p0ij321x6p 0000000000nesr]").length > => 0 > > I've expanded my range, so shouldn't the number of results be at least > what it was with all 0's? I've tried with quotes too, and it doesn't > help. Again though, if I do the following (note the quotes): > > >> Person.search("latitude_string:(> '-1y2p0ij321x6p' AND < > 0000000000nesr").length > => 7 > > It works... > > So what i've done, is because I'm only working with longitudes and > latitudes, which are guaranteed to lie between -500 and 500, I'm just > adding 500 to them, to make them all positive, and then I can use the > range queries...and I wrote my own little number to string thing, > since I'm working with small values. But nevertheless I thank you for > your help. This seems like the best solution at the moment. I'd forgotten about NumTools. It's probably one of the first modules I ever wrote in Ruby. Anyway, it looks like it might need an upgrade. I'll try and fix it so that it can handle negative numbers. In C this would be a no-brainer but Ruby's BigNums make it a little difficult. I might put the challenge to the Ruby mailing list. Cheers, Dave From julioody at gmail.com Tue Jul 18 23:09:58 2006 From: julioody at gmail.com (Julio Cesar Ody) Date: Wed, 19 Jul 2006 13:09:58 +1000 Subject: [Ferret-talk] searching with chinese chars In-Reply-To: References: Message-ID: Just sharing my experience and asking another question. I tried the analyzer suggested here: http://www.ruby-forum.com/topic/72086#101764. It works fine if you specify the search field you want to use (anyway, it seems that's how it's suppose to work). # CODE analyzer = Ferret::Analysis::PerFieldAnalyzer.new(Ferret::Analysis::StandardAnalyzer.new) analyzer["chinese"] = Ferret::Analysis::RegExpAnalyzer.new(/./, false) index = Index::Index.new(:path => '/var/index', :analyzer => analyzer, :default_field => "*") ... index.search_each("chinese: #{val}") do |doc, score| #val is a chinese char puts "#{doc} - #{score}" end # END CODE This works OK. However, if you try searching like this: # CODE index.search_each(val) do |doc, score| #val is a chinese char puts "#{doc} - #{score}" end # END CODE I get in my lighttpd error log: /var/www/localhost/htdocs/cgi-bin/search_chinese.ruby:19:in `search_each': : Error occured at :701 (StandardError) Error: exception 2 not handled: Error decoding input string. Check that you have the locale set correctly from /var/www/localhost/htdocs/cgi-bin/search_chinese.ruby:19 Which MAKES SENSE, since the docs I created before are created like this: doc = { "author" => "englishchars", "title" => "more regular chars", "chinese" => "??"} index << doc and I think search_each is going through all the fields (since I explicitly said it should when I issued :default_field => "*" up there), finding english chars, and trying to match them against the chinese ones I supplied as a search query. So alright, I can use the suggested analyzer. But my question is: is there a way to use an analyzer that would work with both character types (english, and asian) simply by not returning matches them as opposed to giving me an error? Thanks a ton for any help. On 7/19/06, Julio Cesar Ody wrote: > Thanks, and sorry. I checked the documentation for Index::Index and > found it right after I asked the question. My bad. > > I'm getting segfauls when trying to initialize an index using a > different analyzer other than the default one (but it works > otherwise). But as I can see in this thread > > http://www.ruby-forum.com/topic/71620 > > It ain't stable yet for 64 bit. So I'll wait. > > Thanks again. > > > On 7/19/06, David Balmain wrote: > > On 7/19/06, Julio Cesar Ody wrote: > > > Yep, it did. Thanks tons! > > > > > > But I'm not getting any results now. I take this is because of the > > > default analyzer being used, right? > > > > > > How can I use a whitespace analyzer in my query? (or something that > > > could work effectively with asian languages). > > > > > > For my needs, I suppose the whitespace one could do... > > > > index = Index::Index.new(:path => '/var/index', :default_field => "*", > > :analyzer => Ferret::Analysis::WhiteSpaceAnalzyer.new) > > > > Although you should probably use the same analyzer I gave you for indexing; > > > > http://www.ruby-forum.com/topic/72086#101764 > > > > Cheers, > > Dave > > _______________________________________________ > > Ferret-talk mailing list > > Ferret-talk at rubyforge.org > > http://rubyforge.org/mailman/listinfo/ferret-talk > > > > > -- > Julio C. Ody > http://rootshell.be/~julioody > -- Julio C. Ody http://rootshell.be/~julioody From dbalmain.ml at gmail.com Tue Jul 18 23:20:12 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Wed, 19 Jul 2006 12:20:12 +0900 Subject: [Ferret-talk] searching with chinese chars In-Reply-To: References: Message-ID: On 7/19/06, Julio Cesar Ody wrote: > Just sharing my experience and asking another question. > > I tried the analyzer suggested here: > http://www.ruby-forum.com/topic/72086#101764. It works fine if you > specify the search field you want to use (anyway, it seems that's how > it's suppose to work). > > # CODE > analyzer = Ferret::Analysis::PerFieldAnalyzer.new(Ferret::Analysis::StandardAnalyzer.new) > analyzer["chinese"] = Ferret::Analysis::RegExpAnalyzer.new(/./, false) > > index = Index::Index.new(:path => '/var/index', :analyzer => analyzer, > :default_field => "*") > > ... > > index.search_each("chinese: #{val}") do |doc, score| #val is a chinese char > puts "#{doc} - #{score}" > end > # END CODE > > This works OK. However, if you try searching like this: > > # CODE > index.search_each(val) do |doc, score| #val is a chinese char > puts "#{doc} - #{score}" > end > # END CODE > > I get in my lighttpd error log: > > /var/www/localhost/htdocs/cgi-bin/search_chinese.ruby:19:in > `search_each': : Error occured at :701 (StandardError) > Error: exception 2 not handled: Error decoding input string. Check > that you have the locale set correctly > from /var/www/localhost/htdocs/cgi-bin/search_chinese.ruby:19 > > Which MAKES SENSE, since the docs I created before are created like this: > > doc = { "author" => "englishchars", "title" => "more regular chars", > "chinese" => "??"} > index << doc > > and I think search_each is going through all the fields (since I > explicitly said it should when I issued :default_field => "*" up > there), finding english chars, and trying to match them against the > chinese ones I supplied as a search query. Actually, it's not because of there is a comparison between Chinese and English characters. That shouldn't cause an error. The error is being thrown because val can't be decoded using the StandardAnalyzer. Again, you need to check that val is correctly encoded and you have your locale set correctly.The only times tokenizing happens are when you add documents to the index and when you run a query through the query parser. Apart from that, all operations on strings are done at the byte level. I hope that makes sense. > So alright, I can use the suggested analyzer. But my question is: is > there a way to use an analyzer that would work with both character > types (english, and asian) simply by not returning matches them as > opposed to giving me an error? > > Thanks a ton for any help. The answer to this question is that it already should work correctly. Just make sure the locale is set correctly when the search method is called and that whatever you pass as a query to the search method is correctly encoded according to the locale. Cheers, Dave From julioody at gmail.com Tue Jul 18 23:27:02 2006 From: julioody at gmail.com (Julio Cesar Ody) Date: Wed, 19 Jul 2006 13:27:02 +1000 Subject: [Ferret-talk] searching with chinese chars In-Reply-To: References: Message-ID: Does it take anything other than simply: $KCODE = 'u' right in the beginning of the script? I have that in place already. (it's CGI we're talking about) On 7/19/06, David Balmain wrote: > On 7/19/06, Julio Cesar Ody wrote: > > Just sharing my experience and asking another question. > > > > I tried the analyzer suggested here: > > http://www.ruby-forum.com/topic/72086#101764. It works fine if you > > specify the search field you want to use (anyway, it seems that's how > > it's suppose to work). > > > > # CODE > > analyzer = Ferret::Analysis::PerFieldAnalyzer.new(Ferret::Analysis::StandardAnalyzer.new) > > analyzer["chinese"] = Ferret::Analysis::RegExpAnalyzer.new(/./, false) > > > > index = Index::Index.new(:path => '/var/index', :analyzer => analyzer, > > :default_field => "*") > > > > ... > > > > index.search_each("chinese: #{val}") do |doc, score| #val is a chinese char > > puts "#{doc} - #{score}" > > end > > # END CODE > > > > This works OK. However, if you try searching like this: > > > > # CODE > > index.search_each(val) do |doc, score| #val is a chinese char > > puts "#{doc} - #{score}" > > end > > # END CODE > > > > I get in my lighttpd error log: > > > > /var/www/localhost/htdocs/cgi-bin/search_chinese.ruby:19:in > > `search_each': : Error occured at :701 (StandardError) > > Error: exception 2 not handled: Error decoding input string. Check > > that you have the locale set correctly > > from /var/www/localhost/htdocs/cgi-bin/search_chinese.ruby:19 > > > > Which MAKES SENSE, since the docs I created before are created like this: > > > > doc = { "author" => "englishchars", "title" => "more regular chars", > > "chinese" => "??"} > > index << doc > > > > and I think search_each is going through all the fields (since I > > explicitly said it should when I issued :default_field => "*" up > > there), finding english chars, and trying to match them against the > > chinese ones I supplied as a search query. > > Actually, it's not because of there is a comparison between Chinese > and English characters. That shouldn't cause an error. The error is > being thrown because val can't be decoded using the StandardAnalyzer. > Again, you need to check that val is correctly encoded and you have > your locale set correctly.The only times tokenizing happens are when > you add documents to the index and when you run a query through the > query parser. Apart from that, all operations on strings are done at > the byte level. I hope that makes sense. > > > So alright, I can use the suggested analyzer. But my question is: is > > there a way to use an analyzer that would work with both character > > types (english, and asian) simply by not returning matches them as > > opposed to giving me an error? > > > > Thanks a ton for any help. > > The answer to this question is that it already should work correctly. > Just make sure the locale is set correctly when the search method is > called and that whatever you pass as a query to the search method is > correctly encoded according to the locale. > > Cheers, > Dave > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk -- Julio C. Ody http://rootshell.be/~julioody From julioody at gmail.com Tue Jul 18 23:35:29 2006 From: julioody at gmail.com (Julio Cesar Ody) Date: Wed, 19 Jul 2006 13:35:29 +1000 Subject: [Ferret-talk] searching with chinese chars In-Reply-To: References: Message-ID: Reply to myself: yes: ENV['LANG'] = 'en_US.utf8' Did the job. Thanks! On 7/19/06, Julio Cesar Ody wrote: > Does it take anything other than simply: > > $KCODE = 'u' > > right in the beginning of the script? > > I have that in place already. > > (it's CGI we're talking about) > > On 7/19/06, David Balmain wrote: > > On 7/19/06, Julio Cesar Ody wrote: > > > Just sharing my experience and asking another question. > > > > > > I tried the analyzer suggested here: > > > http://www.ruby-forum.com/topic/72086#101764. It works fine if you > > > specify the search field you want to use (anyway, it seems that's how > > > it's suppose to work). > > > > > > # CODE > > > analyzer = Ferret::Analysis::PerFieldAnalyzer.new(Ferret::Analysis::StandardAnalyzer.new) > > > analyzer["chinese"] = Ferret::Analysis::RegExpAnalyzer.new(/./, false) > > > > > > index = Index::Index.new(:path => '/var/index', :analyzer => analyzer, > > > :default_field => "*") > > > > > > ... > > > > > > index.search_each("chinese: #{val}") do |doc, score| #val is a chinese char > > > puts "#{doc} - #{score}" > > > end > > > # END CODE > > > > > > This works OK. However, if you try searching like this: > > > > > > # CODE > > > index.search_each(val) do |doc, score| #val is a chinese char > > > puts "#{doc} - #{score}" > > > end > > > # END CODE > > > > > > I get in my lighttpd error log: > > > > > > /var/www/localhost/htdocs/cgi-bin/search_chinese.ruby:19:in > > > `search_each': : Error occured at :701 (StandardError) > > > Error: exception 2 not handled: Error decoding input string. Check > > > that you have the locale set correctly > > > from /var/www/localhost/htdocs/cgi-bin/search_chinese.ruby:19 > > > > > > Which MAKES SENSE, since the docs I created before are created like this: > > > > > > doc = { "author" => "englishchars", "title" => "more regular chars", > > > "chinese" => "??"} > > > index << doc > > > > > > and I think search_each is going through all the fields (since I > > > explicitly said it should when I issued :default_field => "*" up > > > there), finding english chars, and trying to match them against the > > > chinese ones I supplied as a search query. > > > > Actually, it's not because of there is a comparison between Chinese > > and English characters. That shouldn't cause an error. The error is > > being thrown because val can't be decoded using the StandardAnalyzer. > > Again, you need to check that val is correctly encoded and you have > > your locale set correctly.The only times tokenizing happens are when > > you add documents to the index and when you run a query through the > > query parser. Apart from that, all operations on strings are done at > > the byte level. I hope that makes sense. > > > > > So alright, I can use the suggested analyzer. But my question is: is > > > there a way to use an analyzer that would work with both character > > > types (english, and asian) simply by not returning matches them as > > > opposed to giving me an error? > > > > > > Thanks a ton for any help. > > > > The answer to this question is that it already should work correctly. > > Just make sure the locale is set correctly when the search method is > > called and that whatever you pass as a query to the search method is > > correctly encoded according to the locale. > > > > Cheers, > > Dave > > _______________________________________________ > > Ferret-talk mailing list > > Ferret-talk at rubyforge.org > > http://rubyforge.org/mailman/listinfo/ferret-talk > > > -- > Julio C. Ody > http://rootshell.be/~julioody > -- Julio C. Ody http://rootshell.be/~julioody From guest at guest.com Wed Jul 19 12:34:06 2006 From: guest at guest.com (Guest) Date: Wed, 19 Jul 2006 18:34:06 +0200 Subject: [Ferret-talk] Ferret Indexing Message-ID: <3dca6191cad89438af24330890f0d002@ruby-forum.com> Does ferret only index, when you create, or udpate a record? Is there a way to make it index prexisting records? Thanks. -- Posted via http://www.ruby-forum.com/. From JanPrill at blauton.de Wed Jul 19 12:56:05 2006 From: JanPrill at blauton.de (Jan Prill) Date: Wed, 19 Jul 2006 18:56:05 +0200 Subject: [Ferret-talk] Ferret Indexing In-Reply-To: <3dca6191cad89438af24330890f0d002@ruby-forum.com> References: <3dca6191cad89438af24330890f0d002@ruby-forum.com> Message-ID: <562a35c10607190956i2db3f54excf249641a2220447@mail.gmail.com> Hi guest, you are mixing up too different projects: ferret the searchengine library and acts_as_ferret which builds upon this library and provides a convienient way of integrating ferret into a rails project. To answer your question: Of course you are able to index existing records. >From a ferret perspective there is no difference between a new record (document in ferret terms) and existing ones. acts_as_ferret is indeed using the callback methods of activerecord but nonetheless it is able to build up an index from existing records. It's default behaviour is to index the existing records if no index on a model that acts_as_ferret exists yet. Have a look at: http://projects.jkraemer.net/acts_as_ferret/rdoc/classes/FerretMixin/Acts/ARFerret/ClassMethods.html#M000007 Cheers, Jan On 7/19/06, Guest wrote: > > Does ferret only index, when you create, or udpate a record? > > Is there a way to make it index prexisting records? > > Thanks. > > -- > Posted via http://www.ruby-forum.com/. > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20060719/3d25bf51/attachment.html From guest at guest.com Wed Jul 19 13:50:31 2006 From: guest at guest.com (Guest) Date: Wed, 19 Jul 2006 19:50:31 +0200 Subject: [Ferret-talk] Ferret Indexing In-Reply-To: <562a35c10607190956i2db3f54excf249641a2220447@mail.gmail.com> References: <3dca6191cad89438af24330890f0d002@ruby-forum.com> <562a35c10607190956i2db3f54excf249641a2220447@mail.gmail.com> Message-ID: <9fcc58d1487928d5ef8c6dd34f15e86b@ruby-forum.com> Right, acts_as_ferret. Here's the deal, if I delete the existing index (the whole folder for the model), I get an error, and it doesn't recreate the index. Isn't it supposed to reindex with find(:all) if the index doesn't exist? Thanks Jan. Jan Prill wrote: > Hi guest, > > you are mixing up too different projects: ferret the searchengine > library > and acts_as_ferret which builds upon this library and provides a > convienient > way of integrating ferret into a rails project. > > To answer your question: Of course you are able to index existing > records. >>From a ferret perspective there is no difference between a new record > (document in ferret terms) and existing ones. acts_as_ferret is indeed > using > the callback methods of activerecord but nonetheless it is able to build > up > an index from existing records. It's default behaviour is to index the > existing records if no index on a model that acts_as_ferret exists yet. > Have > a look at: > http://projects.jkraemer.net/acts_as_ferret/rdoc/classes/FerretMixin/Acts/ARFerret/ClassMethods.html#M000007 > > Cheers, > Jan -- Posted via http://www.ruby-forum.com/. From jan.prill at gmail.com Wed Jul 19 13:55:32 2006 From: jan.prill at gmail.com (Jan Prill) Date: Wed, 19 Jul 2006 19:55:32 +0200 Subject: [Ferret-talk] Ferret Indexing In-Reply-To: <9fcc58d1487928d5ef8c6dd34f15e86b@ruby-forum.com> References: <3dca6191cad89438af24330890f0d002@ruby-forum.com> <562a35c10607190956i2db3f54excf249641a2220447@mail.gmail.com> <9fcc58d1487928d5ef8c6dd34f15e86b@ruby-forum.com> Message-ID: <562a35c10607191055j65b97553o996ce90f500953f@mail.gmail.com> inline On 7/19/06, Guest wrote: > > > Right, acts_as_ferret. > > Here's the deal, if I delete the existing index (the whole folder for > the model), I get an error, and it doesn't recreate the index. What kind of error is it you are getting? The maintainers of acts_as_ferret are reading on this mailing list and they might be interested in the error and may be able to help as well.. Isn't it > supposed to reindex with find(:all) if the index doesn't exist? Yes, this is at least my understanding of the API docs. Cheers, Jan Thanks Jan. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20060719/6fabb111/attachment.html From guest at guest.com Wed Jul 19 14:07:51 2006 From: guest at guest.com (Guest) Date: Wed, 19 Jul 2006 20:07:51 +0200 Subject: [Ferret-talk] Ferret Indexing In-Reply-To: <562a35c10607191055j65b97553o996ce90f500953f@mail.gmail.com> References: <3dca6191cad89438af24330890f0d002@ruby-forum.com> <562a35c10607190956i2db3f54excf249641a2220447@mail.gmail.com> <9fcc58d1487928d5ef8c6dd34f15e86b@ruby-forum.com> <562a35c10607191055j65b97553o996ce90f500953f@mail.gmail.com> Message-ID: Jan, My bad. I actually didn't delete the whole directory for the model. Just the contents. It's working great once I delete the directory. Thanks a lot. Jan Prill wrote: > inline > > On 7/19/06, Guest wrote: >> >> >> Right, acts_as_ferret. >> >> Here's the deal, if I delete the existing index (the whole folder for >> the model), I get an error, and it doesn't recreate the index. > > > What kind of error is it you are getting? The maintainers of > acts_as_ferret > are reading on this mailing list and they might be interested in the > error > and may be able to help as well.. > > Isn't it >> supposed to reindex with find(:all) if the index doesn't exist? > > > Yes, this is at least my understanding of the API docs. > > Cheers, > Jan > > Thanks Jan. -- Posted via http://www.ruby-forum.com/. From waspfactory at gggmmail.com Wed Jul 19 15:34:20 2006 From: waspfactory at gggmmail.com (Caspar) Date: Wed, 19 Jul 2006 21:34:20 +0200 Subject: [Ferret-talk] sorting and pagination Message-ID: <0f6d51384c97ab18f33f0667d61e4502@ruby-forum.com> Hello All, Okay i think I'm finally getting all of what i want out of ferret working, thanks mostly to reading this forum and also getting ALOT of questions answered, thanks alot everyone. Anyway my last ferret task is too get the results sorted by a field called date_registered and have this working with pagination. here is what i'm doing at the moment: ################################ acts_as_ferret :fields=> ['short_description',...,'date_registered'] def VoObject.find_results(query,page) sort_fields = [] sort_fields << Ferret::Search::SortField.new("date_registered", :reverse => :false) results2 = VoObject.find_by_contents(query,:num_docs=> 2000000,:sort =>sort_fields ) num = results2.size if page == 1 page = 0 else page = (page-1)*20 end results = VoObject.find_by_contents(query,:first_doc=>page, :num_docs=> 20,:sort => sort_fields) [num,results,results2] end ############################## I added date_registered to the ferret fields and rebuilt the index (not sure if this is neccessary but anyway) but my results page still shows the results in a random order. Seems like the sorting has no effect. date_registered is a date_time field. I call ifnd by contents twice in the above code because i need to populate some dropdowns with values for the entire returned results so that they can be used to refine the search. Also just from the look of this code am i right in thinking that each page of results will be sorted but not the entire returned results? Any ideas? thanks regards Caspar -- Posted via http://www.ruby-forum.com/. From dbalmain.ml at gmail.com Wed Jul 19 20:13:30 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Thu, 20 Jul 2006 09:13:30 +0900 Subject: [Ferret-talk] sorting and pagination In-Reply-To: <0f6d51384c97ab18f33f0667d61e4502@ruby-forum.com> References: <0f6d51384c97ab18f33f0667d61e4502@ruby-forum.com> Message-ID: On 7/20/06, Caspar wrote: > Hello All, > Okay i think I'm finally getting all of what i want out of ferret > working, thanks mostly to reading this forum and also getting ALOT of > questions answered, thanks alot everyone. Anyway my last ferret task is > too get the results sorted by a field called date_registered and have > this working with pagination. > > here is what i'm doing at the moment: > > > ################################ > acts_as_ferret :fields=> ['short_description',...,'date_registered'] > > def VoObject.find_results(query,page) > > sort_fields = [] > sort_fields << Ferret::Search::SortField.new("date_registered", > :reverse => :false) > > results2 = VoObject.find_by_contents(query,:num_docs=> 2000000,:sort > =>sort_fields ) > num = results2.size > > if page == 1 > page = 0 > else > page = (page-1)*20 > end > > results = VoObject.find_by_contents(query,:first_doc=>page, > :num_docs=> 20,:sort => sort_fields) > [num,results,results2] > end > ############################## > > I added date_registered to the ferret fields and rebuilt the index (not > sure if this is neccessary but anyway) but my results page still shows > the results in a random order. Seems like the sorting has no effect. > date_registered is a date_time field. I call ifnd by contents twice in > the above code because i need to populate some dropdowns with values for > the entire returned results so that they can be used to refine the > search. Also just from the look of this code am i right in thinking that > each page of results will be sorted but not the entire returned results? > > Any ideas? > thanks > regards > Caspar One of the acts_as_ferret guys will be able to confirm this but I don't think acts_as_ferret automatically converts dates to a format that is searchable. What you need to do is convert the DateTime to a format that is lexicographically sortable; acts_as_ferret :fields=> ['short_description',...,'ferret_date_registered'] def ferret_date_registered date_registered.strftime("%Y%m%d") end This should fix your problem. Note that you if you need a little more precision you can add hours, minutes, etc. > results2 = VoObject.find_by_contents(query,:num_docs=> 2000000,:sort =>sort_fields ) Just a word of warning. Whatever you specify num_docs to be, an array is created to hold that many hits, so even if your query only returns 2 hits, the above line will use 2 Mb (bit it will be freed immediately once the search is finished, not garbage collected). I might try and change that behaviour this morning. On this topic, :first_doc is going to be changing to :offset and :num_docs to :limit. I'm toying with the idea of making :all a valid option for :limit to get all search results although :limit => :all sounds a bit funny to me. Perhaps simply :limit => nil would be better. Feedback anyone? Cheers, Dave From kraemer at webit.de Thu Jul 20 04:30:13 2006 From: kraemer at webit.de (Jens Kraemer) Date: Thu, 20 Jul 2006 10:30:13 +0200 Subject: [Ferret-talk] sorting and pagination In-Reply-To: References: <0f6d51384c97ab18f33f0667d61e4502@ruby-forum.com> Message-ID: <20060720083013.GC18581@cordoba.webit.de> On Thu, Jul 20, 2006 at 09:13:30AM +0900, David Balmain wrote: > On 7/20/06, Caspar wrote: [..] > > I added date_registered to the ferret fields and rebuilt the index (not > > sure if this is neccessary but anyway) but my results page still shows > > the results in a random order. Seems like the sorting has no effect. > > date_registered is a date_time field. I call ifnd by contents twice in > > the above code because i need to populate some dropdowns with values for > > the entire returned results so that they can be used to refine the > > search. Also just from the look of this code am i right in thinking that > > each page of results will be sorted but not the entire returned results? > > > > Any ideas? > > One of the acts_as_ferret guys will be able to confirm this but I > don't think acts_as_ferret automatically converts dates to a format > that is searchable. What you need to do is convert the DateTime to a > format that is lexicographically sortable; > > acts_as_ferret :fields=> ['short_description',...,'ferret_date_registered'] > > def ferret_date_registered > date_registered.strftime("%Y%m%d") > end that's right, you'll have to do the datetime to string conversion yourself. I think we could add conversions like that to acts_as_ferret in a future version. > This should fix your problem. Note that you if you need a little more precision > you can add hours, minutes, etc. > > > results2 = VoObject.find_by_contents(query,:num_docs=> 2000000,:sort > =>sort_fields ) > > Just a word of warning. Whatever you specify num_docs to be, an array > is created to hold that many hits, so even if your query only returns > 2 hits, the above line will use 2 Mb (bit it will be freed immediately > once the search is finished, not garbage collected). I might try and > change that behaviour this morning. > > On this topic, :first_doc is going to be changing to :offset and > :num_docs to :limit. I'm toying with the idea of making :all a valid > option for :limit to get all search results although :limit => :all > sounds a bit funny to me. Perhaps simply :limit => nil would be > better. Feedback anyone? actually acts_as_ferret already supports :num_docs => :all, where the :all gets transformed to a user-definable (via the :max_results option in the call to acts_as_ferret) large number of records. I agree that :limit => nil makes more sense. Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From waspfactory at gggmmail.com Thu Jul 20 09:52:58 2006 From: waspfactory at gggmmail.com (Caspar) Date: Thu, 20 Jul 2006 15:52:58 +0200 Subject: [Ferret-talk] sorting and pagination Message-ID: tried the solution that was posted but no luck. seems to make exactly no difference. def ferret_date_registered date_registered.strftime("%Y%m%d") end ##### log/development.log################# Adding field date_registered with value 'Wed Jul 19 12:07:52 BST 2006' to index what am I doing wrong? regards Caspar -- Posted via http://www.ruby-forum.com/. From dbalmain.ml at gmail.com Thu Jul 20 10:13:23 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Thu, 20 Jul 2006 23:13:23 +0900 Subject: [Ferret-talk] sorting and pagination In-Reply-To: References: Message-ID: On 7/20/06, Caspar wrote: > tried the solution that was posted but no luck. > seems to make exactly no difference. > > def ferret_date_registered > date_registered.strftime("%Y%m%d") > end > > ##### log/development.log################# > Adding field date_registered with value 'Wed Jul 19 12:07:52 BST 2006' > to index > > what am I doing wrong? > regards > Caspar Have another look at the solution posted. You must have forgotten this line; acts_as_ferret :fields=> ['short_description',...,'ferret_date_registered'] From vanesam at ece.ubc.ca Thu Jul 20 16:46:22 2006 From: vanesam at ece.ubc.ca (Vanesa Mirzaee) Date: Thu, 20 Jul 2006 22:46:22 +0200 Subject: [Ferret-talk] search on fields Message-ID: Hi, I wonder if it is possible to perform the "find_by_contents" on a subset of fields indexed in acts_as_ferret.If so, how? In my code I have: acts_as_ferret (:fields => ['title', 'focus', 'purpose']) However, I like to have two search options one on all fields and one only on the title. Any help is most appreciated. Thanks, -- Posted via http://www.ruby-forum.com/. From dbalmain.ml at gmail.com Thu Jul 20 19:29:51 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Fri, 21 Jul 2006 08:29:51 +0900 Subject: [Ferret-talk] search on fields In-Reply-To: References: Message-ID: On 7/21/06, Vanesa Mirzaee wrote: > Hi, > > I wonder if it is possible to perform the "find_by_contents" on a subset > of fields indexed in acts_as_ferret.If so, how? > > In my code I have: > > acts_as_ferret (:fields => ['title', 'focus', 'purpose']) > > However, I like to have two search options one on all fields and one > only on the title. > > Any help is most appreciated. > > Thanks, Hi Venesa, You can search a specific field by prepending the query with the field name. For example %{title:"Lord of the Rings" purpose:"enjoyment"}. Or, if you don't want the users typing in field names, you could modify the query_string to search only the title field like this; query_string = "title:(#{query_string})" Hope that helps, Dave PS : (Jens, please correct me if this won't work with acts_as_ferret) From vanesam at ece.ubc.ca Fri Jul 21 01:00:22 2006 From: vanesam at ece.ubc.ca (Vanesa Mirzaee) Date: Fri, 21 Jul 2006 07:00:22 +0200 Subject: [Ferret-talk] search on fields In-Reply-To: References: Message-ID: <7ed501411def590d8a38dc3a69f3f642@ruby-forum.com> Thanks David, This works. -Vanesa David Balmain wrote: > On 7/21/06, Vanesa Mirzaee wrote: >> only on the title. >> >> Any help is most appreciated. >> >> Thanks, > > Hi Venesa, > > You can search a specific field by prepending the query with the field > name. For example %{title:"Lord of the Rings" purpose:"enjoyment"}. > Or, if you don't want the users typing in field names, you could > modify the query_string to search only the title field like this; > > query_string = "title:(#{query_string})" > > Hope that helps, > Dave > > PS : (Jens, please correct me if this won't work with acts_as_ferret) -- Posted via http://www.ruby-forum.com/. From rbitar at gmail.com Fri Jul 21 02:51:48 2006 From: rbitar at gmail.com (Rami Bitar) Date: Fri, 21 Jul 2006 08:51:48 +0200 Subject: [Ferret-talk] acts_as_ferret with has_many relationship Message-ID: <5b61362afba0752f08ff15a75f8b0fdd@ruby-forum.com> Using Ferret and the acts_as_ferret plugin, I would like to retrieve results based on models descended from a parent model: class Post < ActiveRecord::Base acts_as_ferret :fields => [ 'title', 'body', :post_author, :post_comments ] ... has_one :author has_many :comments end I would like to search all comments of a post and retrieve Post models as search results. Currently, searching across :author fields is achieved using: def post_author self.author.name end However, a similiar technique does not seem to work for the has_many relationship: def post_comments for comment in self.comments @comments << comment.body @comments << comment.name end @comments.join(" ") end Searching across author fields finds the correct Post. However, none of the comments seems to be indexed and/or associated with the correct Post so no search results are found. I would appreciate any help on this. Thanks! Rami -------- Note: The post model is saved after comment changes are made in order for acts_as_ferret to index the Post. Also note that multi_search would not easily apply here since I know for certain I would like to retrieve the Post model, and not the Comment or Author model as search results. -- Posted via http://www.ruby-forum.com/. From rrmdf at stimble.net Fri Jul 21 03:24:49 2006 From: rrmdf at stimble.net (Michael) Date: Fri, 21 Jul 2006 09:24:49 +0200 Subject: [Ferret-talk] segfaulting at rebiuild_index Message-ID: <7e60944762d13cc4e1cf48e2fb1f45e5@ruby-forum.com> Hello, I can't figure out how to get acts_as_ferret to stop segfaulting. Every time i run a query on my server (rails 1.1.4 and ferret .9.4 and aaf from svn) I get a segfault ./script/../config/../vendor/plugins/acts_as_ferret/lib/acts_as_ferret.rb:243: [BUG] Segmentation fault this line is where the indx is rebuilt. " def rebuild_index(*additional_models) index = Ferret::Index::Index.new(ferret_configuration.merge(:create => true)) " I can get index = Ferret::Index::Index.new to run from the console. But, If i try the above from a model, Employee.rebuild_index, it segfaults. a fresh check out from the same svn repo on my laptop works fine. it still segaults after chmod 777 index/ -R Any ideas? I sure would love to be able to use this. Thanks, Michael Fairchild -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Fri Jul 21 05:22:06 2006 From: kraemer at webit.de (Jens Kraemer) Date: Fri, 21 Jul 2006 11:22:06 +0200 Subject: [Ferret-talk] acts_as_ferret with has_many relationship In-Reply-To: <5b61362afba0752f08ff15a75f8b0fdd@ruby-forum.com> References: <5b61362afba0752f08ff15a75f8b0fdd@ruby-forum.com> Message-ID: <20060721092206.GD28283@cordoba.webit.de> Hi Rami, could you have a look at your development.log and check what values actuelly gets indexed for the post_comments field when you save the post ? Jens On Fri, Jul 21, 2006 at 08:51:48AM +0200, Rami Bitar wrote: > Using Ferret and the acts_as_ferret plugin, I would like to retrieve > results based on models descended from a parent model: > > class Post < ActiveRecord::Base > > acts_as_ferret :fields => [ 'title', > 'body', > :post_author, > :post_comments > ] > ... > > has_one :author > has_many :comments > end > > I would like to search all comments of a post and retrieve Post models > as search results. Currently, searching across :author fields is > achieved using: > > def post_author > self.author.name > end > > However, a similiar technique does not seem to work for the has_many > relationship: > > def post_comments > for comment in self.comments > @comments << comment.body > @comments << comment.name > end > @comments.join(" ") > end > > Searching across author fields finds the correct Post. However, none of > the comments seems to be indexed and/or associated with the correct Post > so no search results are found. I would appreciate any help on this. > > Thanks! > > Rami > > -------- > Note: The post model is saved after comment changes are made in order > for acts_as_ferret to index the Post. Also note that multi_search would > not easily apply here since I know for certain I would like to retrieve > the Post model, and not the Comment or Author model as search results. > > -- > Posted via http://www.ruby-forum.com/. > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From kraemer at webit.de Fri Jul 21 05:26:01 2006 From: kraemer at webit.de (Jens Kraemer) Date: Fri, 21 Jul 2006 11:26:01 +0200 Subject: [Ferret-talk] segfaulting at rebiuild_index In-Reply-To: <7e60944762d13cc4e1cf48e2fb1f45e5@ruby-forum.com> References: <7e60944762d13cc4e1cf48e2fb1f45e5@ruby-forum.com> Message-ID: <20060721092601.GE28283@cordoba.webit.de> Hi Michael, On Fri, Jul 21, 2006 at 09:24:49AM +0200, Michael wrote: > Hello, > I can't figure out how to get acts_as_ferret to stop segfaulting. > Every time i run a query on my server (rails 1.1.4 and ferret .9.4 and > aaf from svn) I get a segfault > > ./script/../config/../vendor/plugins/acts_as_ferret/lib/acts_as_ferret.rb:243: > [BUG] Segmentation fault > > this line is where the indx is rebuilt. > " > def rebuild_index(*additional_models) > index = > Ferret::Index::Index.new(ferret_configuration.merge(:create => true)) > " > > I can get index = Ferret::Index::Index.new to run from the console. > But, If i try the above from a model, Employee.rebuild_index, it > segfaults. > > a fresh check out from the same svn repo on my laptop works fine. it > still segaults after chmod 777 index/ -R Do you use the compiled version of Ferret on both laptop and server ? What does your call to acts_as_ferret look like, do you specify a custom analyzer or something like this ? Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From lmarlow at yahoo.com Fri Jul 21 13:07:49 2006 From: lmarlow at yahoo.com (Lee Marlow) Date: Fri, 21 Jul 2006 11:07:49 -0600 Subject: [Ferret-talk] acts_as_ferret with has_many relationship In-Reply-To: <5b61362afba0752f08ff15a75f8b0fdd@ruby-forum.com> References: <5b61362afba0752f08ff15a75f8b0fdd@ruby-forum.com> Message-ID: <7968d7490607211007n4fde2da7jad9ff61576f38106@mail.gmail.com> You might want to try removing the @ symbol from comments and name it something else to avoid conflict with your relationship. def post_comments self.comments.inject([]) { |ary, c| ary << c.body << c.name }.join(' ') end -Lee On 7/21/06, Rami Bitar wrote: > Using Ferret and the acts_as_ferret plugin, I would like to retrieve > results based on models descended from a parent model: > > class Post < ActiveRecord::Base > > acts_as_ferret :fields => [ 'title', > 'body', > :post_author, > :post_comments > ] > ... > > has_one :author > has_many :comments > end > > I would like to search all comments of a post and retrieve Post models > as search results. Currently, searching across :author fields is > achieved using: > > def post_author > self.author.name > end > > However, a similiar technique does not seem to work for the has_many > relationship: > > def post_comments > for comment in self.comments > @comments << comment.body > @comments << comment.name > end > @comments.join(" ") > end > > Searching across author fields finds the correct Post. However, none of > the comments seems to be indexed and/or associated with the correct Post > so no search results are found. I would appreciate any help on this. > > Thanks! > > Rami > > -------- > Note: The post model is saved after comment changes are made in order > for acts_as_ferret to index the Post. Also note that multi_search would > not easily apply here since I know for certain I would like to retrieve > the Post model, and not the Comment or Author model as search results. > > -- > Posted via http://www.ruby-forum.com/. > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > From rbitar at gmail.com Sat Jul 22 14:22:18 2006 From: rbitar at gmail.com (Rami Bitar) Date: Sat, 22 Jul 2006 20:22:18 +0200 Subject: [Ferret-talk] acts_as_ferret with has_many relationship In-Reply-To: <20060721092206.GD28283@cordoba.webit.de> References: <5b61362afba0752f08ff15a75f8b0fdd@ruby-forum.com> <20060721092206.GD28283@cordoba.webit.de> Message-ID: Hi Jens, After looking at the development log, I've solved the problem. It turns out that acts_as_ferret was not indexing 'comments' since another table I was also trying to index was 'null.' As a result, acts_as_ferret was terminating prematurely (due to the null error) before it could even index the 'comment' values. The code I originally posted, therefore, should work fine for others. Thanks for your help and all your great work! Rami @Lee: Yes, thanks for the suggestion. I will remove the @ symbol to eliminate any possible conflicts. Jens Kraemer wrote: > Hi Rami, > > could you have a look at your development.log and check what values > actuelly gets indexed for the post_comments field when you save the > post ? > > Jens > > On Fri, Jul 21, 2006 at 08:51:48AM +0200, Rami Bitar wrote: >> ... >> self.author.name >> @comments.join(" ") >> -------- >> http://rubyforge.org/mailman/listinfo/ferret-talk > -- > webit! Gesellschaft f?r neue Medien mbH www.webit.de > Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de > Schnorrstra?e 76 Tel +49 351 46766 0 > D-01069 Dresden Fax +49 351 46766 66 -- Posted via http://www.ruby-forum.com/. From guest at guest.com Mon Jul 24 13:32:25 2006 From: guest at guest.com (Guest) Date: Mon, 24 Jul 2006 19:32:25 +0200 Subject: [Ferret-talk] Ferret Search Terms Message-ID: I'm having a bit a trouble getting some search terms to work with acts_as_ferret. Particulary where multiple words are in the query. For instance if I search for "Time to Kill" it doesn't find the any records. However, if I search just for time, Time to Kill comes up. I'm not sure what I'm missing here. Any help would be greatly appreciated. -- Posted via http://www.ruby-forum.com/. From guest at guest.com Mon Jul 24 13:33:28 2006 From: guest at guest.com (Guest) Date: Mon, 24 Jul 2006 19:33:28 +0200 Subject: [Ferret-talk] Ferret Search Terms In-Reply-To: References: Message-ID: <063d58683d03885b8078852f0aec85c0@ruby-forum.com> I should also add that some multiple word queries do work, and I haven't been able to spot any pattern there. Thanks again. Guest wrote: > I'm having a bit a trouble getting some search terms to work with > acts_as_ferret. Particulary where multiple words are in the query. For > instance if I search for "Time to Kill" it doesn't find the any records. > However, if I search just for time, Time to Kill comes up. > > I'm not sure what I'm missing here. > > Any help would be greatly appreciated. -- Posted via http://www.ruby-forum.com/. From bk at benjaminkrause.com Mon Jul 24 18:18:52 2006 From: bk at benjaminkrause.com (Ben) Date: Tue, 25 Jul 2006 00:18:52 +0200 Subject: [Ferret-talk] error searching for a boolean query Message-ID: Hey .. i'm not sure if the trac is currently maintained, so i'll post this here as well, just to make sure :) http://ferret.davebalmain.com/trac/ticket/94 i get a segfault on certain queries.. i guess thats a problem with the query parser.. >> Indexer.index.search( "american~0.6 AND NOT type:Language" ) *** glibc detected *** double free or corruption (!prev): 0x0872ac50 *** Aborted maybe only in combination with a fuzzyquery.. as this is working >> Indexer.index.search( "american AND NOT type:Language" ) => # Ben -- Posted via http://www.ruby-forum.com/. From shingler at gmail.com Tue Jul 25 05:28:13 2006 From: shingler at gmail.com (Steven Shingler) Date: Tue, 25 Jul 2006 11:28:13 +0200 Subject: [Ferret-talk] RDig document processing error Message-ID: Hi all, Am having problems using RDig: With this rdig config... cfg.crawler.start_urls = ['http://www.defensetech.org'] cfg.crawler.include_hosts = ['www.defensetech.org'] cfg.index.path = '/my/path/to/index' cfg.verbose = true ...I get this output: $ rdig -c config/rdig_config.rb /usr/local/lib/site_ruby/1.8/ferret/index/term.rb:45: warning: method redefined; discarding old text= /usr/local/lib/site_ruby/1.8/ferret/search/sort_field.rb:69: warning: instance variable @name not initialized /usr/local/lib/site_ruby/1.8/ferret/search/sort_field.rb:69: warning: instance variable @name not initialized lib/ferret/query_parser/query_parser.y:128: warning: method redefined; discarding old initialize lib/ferret/query_parser/query_parser.y:157: warning: method redefined; discarding old parse lib/ferret/query_parser/query_parser.y:216: warning: method redefined; discarding old clean_string /usr/lib/ruby/gems/1.8/gems/rubyful_soup-1.0.4/lib/rubyful_soup.rb:230: warning: method redefined; discarding old attrs discovered content extractor class: RDig::ContentExtractors::PdfContentExtractor discovered content extractor class: RDig::ContentExtractors::WordContentExtractor discovered content extractor class: RDig::ContentExtractors::HtmlContentExtractor using Ferret 0.9.0 /usr/local/lib/site_ruby/1.8/rdig/url_filters.rb:116: warning: instance variable @patterns not initialized /usr/local/lib/site_ruby/1.8/rdig/url_filters.rb:105: warning: instance variable @patterns not initialized added url http://www.defensetech.org fetching http://www.defensetech.org waiting for threads to finish... /usr/local/lib/site_ruby/1.8/rdig/url_filters.rb:116: warning: instance variable @patterns not initialized /usr/local/lib/site_ruby/1.8/rdig/url_filters.rb:105: warning: instance variable @patterns not initialized added url http://www.defensetech.org error processing document http://www.defensetech.org/: undefined local variable or method `url' for # Trace: /usr/local/lib/site_ruby/1.8/rdig/documents.rb:35:in `initialize' /usr/local/lib/site_ruby/1.8/rdig/documents.rb:107:in `initialize' /usr/local/lib/site_ruby/1.8/rdig/documents.rb:15:in `create' /usr/local/lib/site_ruby/1.8/rdig/crawler.rb:68:in `add_url' /usr/local/lib/site_ruby/1.8/rdig/crawler.rb:51:in `process_document' /usr/local/lib/site_ruby/1.8/rdig/crawler.rb:50:in `process_document' /usr/local/lib/site_ruby/1.8/rdig/crawler.rb:28:in `run' /usr/local/lib/site_ruby/1.8/rdig/crawler.rb:25:in `run' /usr/local/lib/site_ruby/1.8/rdig/crawler.rb:24:in `run' /usr/local/lib/site_ruby/1.8/rdig.rb:258:in `run' /usr/bin/rdig:14 If anyone could tell me why @patterns and url aren't being set, I'd really appreciate it. Am on Ubuntu 6.06, ruby 1.8.4, gems: rdig 0.3.0, rubyful_soup 1.0.4, ferret 0.9.4 Many Thanks, Steven -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Wed Jul 26 08:03:55 2006 From: kraemer at webit.de (Jens Kraemer) Date: Wed, 26 Jul 2006 14:03:55 +0200 Subject: [Ferret-talk] Ferret Search Terms In-Reply-To: <063d58683d03885b8078852f0aec85c0@ruby-forum.com> References: <063d58683d03885b8078852f0aec85c0@ruby-forum.com> Message-ID: <20060726120355.GG21135@cordoba.webit.de> Hi! Do you use a special analyzer for indexing/searching ? A possible explanation might be that at indexing time the 'to' is stripped away as being not a relevant term, but at searching time it isn't. As aaf ANDs query terms together by default, you then won't get a hit. What does the parsed query look like if you search via aaf (queries should get logged in development.log) ? Jens On Mon, Jul 24, 2006 at 07:33:28PM +0200, Guest wrote: > I should also add that some multiple word queries do work, and I haven't > been able to spot any pattern there. > > Thanks again. > > Guest wrote: > > I'm having a bit a trouble getting some search terms to work with > > acts_as_ferret. Particulary where multiple words are in the query. For > > instance if I search for "Time to Kill" it doesn't find the any records. > > However, if I search just for time, Time to Kill comes up. > > > > I'm not sure what I'm missing here. > > > > Any help would be greatly appreciated. > > > -- > Posted via http://www.ruby-forum.com/. > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From kraemer at webit.de Wed Jul 26 08:50:30 2006 From: kraemer at webit.de (Jens Kraemer) Date: Wed, 26 Jul 2006 14:50:30 +0200 Subject: [Ferret-talk] RDig document processing error In-Reply-To: References: Message-ID: <20060726125030.GJ21135@cordoba.webit.de> Hi Steven, sorry for replying that late - I'm quite busy atm. The error you received was because of an invalid mailto: link which rdig failed to handle correctly. I just uploaded RDig 0.3.1, fixing this bug. In testing with your site I noticed that it takes quite long to parse the index page, so you might have to set cfg.crawler.wait_before_leave to a higher value (20 worked for me) to prevent rdig from exiting before the parser has finished parsing the index page. The parsing speed of RDig is really bad for big pages (your index page weighs around 62kB). I'd happily accept a patch adding a faster html content extraction mechanism for RDig users to choose from ;-) Maybe even a special Ferret analyzer just stripping out any html tags would do. Regards, Jens On Tue, Jul 25, 2006 at 11:28:13AM +0200, Steven Shingler wrote: > Hi all, > > Am having problems using RDig: > > With this rdig config... > > cfg.crawler.start_urls = ['http://www.defensetech.org'] > cfg.crawler.include_hosts = ['www.defensetech.org'] > cfg.index.path = '/my/path/to/index' > cfg.verbose = true > > ...I get this output: > > $ rdig -c config/rdig_config.rb > /usr/local/lib/site_ruby/1.8/ferret/index/term.rb:45: warning: method > redefined; discarding old text= > /usr/local/lib/site_ruby/1.8/ferret/search/sort_field.rb:69: warning: > instance variable @name not initialized > /usr/local/lib/site_ruby/1.8/ferret/search/sort_field.rb:69: warning: > instance variable @name not initialized > lib/ferret/query_parser/query_parser.y:128: warning: method redefined; > discarding old initialize > lib/ferret/query_parser/query_parser.y:157: warning: method redefined; > discarding old parse > lib/ferret/query_parser/query_parser.y:216: warning: method redefined; > discarding old clean_string > /usr/lib/ruby/gems/1.8/gems/rubyful_soup-1.0.4/lib/rubyful_soup.rb:230: > warning: method redefined; discarding old attrs > discovered content extractor class: > RDig::ContentExtractors::PdfContentExtractor > discovered content extractor class: > RDig::ContentExtractors::WordContentExtractor > discovered content extractor class: > RDig::ContentExtractors::HtmlContentExtractor > using Ferret 0.9.0 > /usr/local/lib/site_ruby/1.8/rdig/url_filters.rb:116: warning: instance > variable @patterns not initialized > /usr/local/lib/site_ruby/1.8/rdig/url_filters.rb:105: warning: instance > variable @patterns not initialized > added url http://www.defensetech.org > fetching http://www.defensetech.org > waiting for threads to finish... > /usr/local/lib/site_ruby/1.8/rdig/url_filters.rb:116: warning: instance > variable @patterns not initialized > /usr/local/lib/site_ruby/1.8/rdig/url_filters.rb:105: warning: instance > variable @patterns not initialized > added url http://www.defensetech.org > error processing document http://www.defensetech.org/: undefined local > variable or method `url' for # > Trace: /usr/local/lib/site_ruby/1.8/rdig/documents.rb:35:in `initialize' > /usr/local/lib/site_ruby/1.8/rdig/documents.rb:107:in `initialize' > /usr/local/lib/site_ruby/1.8/rdig/documents.rb:15:in `create' > /usr/local/lib/site_ruby/1.8/rdig/crawler.rb:68:in `add_url' > /usr/local/lib/site_ruby/1.8/rdig/crawler.rb:51:in `process_document' > /usr/local/lib/site_ruby/1.8/rdig/crawler.rb:50:in `process_document' > /usr/local/lib/site_ruby/1.8/rdig/crawler.rb:28:in `run' > /usr/local/lib/site_ruby/1.8/rdig/crawler.rb:25:in `run' > /usr/local/lib/site_ruby/1.8/rdig/crawler.rb:24:in `run' > /usr/local/lib/site_ruby/1.8/rdig.rb:258:in `run' > /usr/bin/rdig:14 > > If anyone could tell me why @patterns and url aren't being set, I'd > really appreciate it. > > Am on Ubuntu 6.06, ruby 1.8.4, gems: rdig 0.3.0, rubyful_soup 1.0.4, > ferret 0.9.4 > > Many Thanks, > Steven > > -- > Posted via http://www.ruby-forum.com/. > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From sera at fhwang.net Wed Jul 26 11:48:41 2006 From: sera at fhwang.net (Francis Hwang) Date: Wed, 26 Jul 2006 11:48:41 -0400 Subject: [Ferret-talk] tweaking minimum word length? Message-ID: Hi, Can Ferret be configured to change the minimum word length of what it indexes? Right now it seems to drop words 3 characters or less, but I'd like to include words going down to 2 characters. How would I do that? Francis From sera at fhwang.net Wed Jul 26 12:04:35 2006 From: sera at fhwang.net (Francis Hwang) Date: Wed, 26 Jul 2006 12:04:35 -0400 Subject: [Ferret-talk] tweaking minimum word length? In-Reply-To: References: Message-ID: <64E8273D-D7CC-4DE9-8BA4-CB50DAF4D123@fhwang.net> Sorry, false alarm, I was not indexing some of my records. On Jul 26, 2006, at 11:48 AM, Francis Hwang wrote: > Hi, > > Can Ferret be configured to change the minimum word length of what it > indexes? Right now it seems to drop words 3 characters or less, but > I'd like to include words going down to 2 characters. How would I do > that? > > Francis > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk From bk at benjaminkrause.com Thu Jul 27 13:10:21 2006 From: bk at benjaminkrause.com (Ben) Date: Thu, 27 Jul 2006 19:10:21 +0200 Subject: [Ferret-talk] How to use the ruby only version of ferret? Message-ID: <14f8af891199b66000be6ff767f4a693@ruby-forum.com> Hey .. I tried to use ferret on a development system for some days, but it keeps crashing with some glibc error message.. i understand that you're currently rewriting most of the c code .. meanwhile i would like to use the ruby only version, hoping that this will not crash my fastcgi-processes.. so some questions.. 1st, how can i use the ruby only version of ferret? (i've installed ferret via gem on my linux box) 2nd, is there any non-binding roadmap or any date that a new, more stable version of ferret is going to be released? i really like ferret, but it is currently not stable enough to use it on a production environment. i can send you any log or install any debug libraries, if this would help finding the bug.. my ferret is crashing at least twice a day. Ben -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Thu Jul 27 17:31:33 2006 From: kraemer at webit.de (Jens Kraemer) Date: Thu, 27 Jul 2006 23:31:33 +0200 Subject: [Ferret-talk] How to use the ruby only version of ferret? In-Reply-To: <14f8af891199b66000be6ff767f4a693@ruby-forum.com> References: <14f8af891199b66000be6ff767f4a693@ruby-forum.com> Message-ID: <20060727213133.GA10792@cordoba.webit.de> Hi! On Thu, Jul 27, 2006 at 07:10:21PM +0200, Ben wrote: [..] > 1st, how can i use the ruby only version of ferret? (i've installed > ferret via gem on my linux box) require 'rferret' > 2nd, is there any non-binding roadmap or any date that a new, more > stable version of ferret is going to be released? > > i really like ferret, but it is currently not stable enough to use it on > a production environment. i can send you any log or install any debug > libraries, if this would help finding the bug.. my ferret is crashing at > least twice a day. you could try Ferret 0.3.2, it's been the last revision before the 0.9.x versions and generally has been very stable. On the other hand there are people (at least me ;-)) who are using Ferret 0.9.x in production environments without problems. It might help if you could provide more information on when the error occurs, and what's the exact error. Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From bk at benjaminkrause.com Thu Jul 27 17:41:22 2006 From: bk at benjaminkrause.com (Ben) Date: Thu, 27 Jul 2006 23:41:22 +0200 Subject: [Ferret-talk] How to use the ruby only version of ferret? In-Reply-To: <20060727213133.GA10792@cordoba.webit.de> References: <14f8af891199b66000be6ff767f4a693@ruby-forum.com> <20060727213133.GA10792@cordoba.webit.de> Message-ID: <1769527af02ae50456639f331886979a@ruby-forum.com> hey jens, > require 'rferret' thank you > On the other hand there are people (at least me ;-)) who are using > Ferret 0.9.x in production environments without problems. It might help > if you could provide more information on when the error occurs, and > what's the exact error. well, errors like that: http://www.ruby-forum.com/topic/74389#new most of the time i get some non-descriptive glibc errors or something with a [BUG] prefix :) How can i produce more detailed error messages? What i am currently able to see won't help you .. Ben -- Posted via http://www.ruby-forum.com/. From f at andreas-s.net Fri Jul 28 05:28:47 2006 From: f at andreas-s.net (Andreas S.) Date: Fri, 28 Jul 2006 11:28:47 +0200 Subject: [Ferret-talk] How to use the ruby only version of ferret? In-Reply-To: <20060727213133.GA10792@cordoba.webit.de> References: <14f8af891199b66000be6ff767f4a693@ruby-forum.com> <20060727213133.GA10792@cordoba.webit.de> Message-ID: Jens Kraemer wrote: > Hi! > > On Thu, Jul 27, 2006 at 07:10:21PM +0200, Ben wrote: > [..] >> 1st, how can i use the ruby only version of ferret? (i've installed >> ferret via gem on my linux box) > > require 'rferret' > >> 2nd, is there any non-binding roadmap or any date that a new, more >> stable version of ferret is going to be released? >> >> i really like ferret, but it is currently not stable enough to use it on >> a production environment. i can send you any log or install any debug >> libraries, if this would help finding the bug.. my ferret is crashing at >> least twice a day. > > you could try Ferret 0.3.2, it's been the last revision before the > 0.9.x versions and generally has been very stable. > > On the other hand there are people (at least me ;-)) who are using > Ferret 0.9.x in production environments without problems. You've probably just been lucky. I have seen 0.9.x (and 0.3.x) crashing on several different platforms. Everything works in the beginning, sometimes for a few weeks, but then I add more and more documents, the index gets larger, and at some point querys just start segfaulting. At least I haven't seen any more indexing segfaults since 0.9.x, that's definetely an improvement... -- Posted via http://www.ruby-forum.com/. From bk at benjaminkrause.com Sat Jul 29 13:03:02 2006 From: bk at benjaminkrause.com (Ben) Date: Sat, 29 Jul 2006 19:03:02 +0200 Subject: [Ferret-talk] How to use the ruby only version of ferret? In-Reply-To: <1769527af02ae50456639f331886979a@ruby-forum.com> References: <14f8af891199b66000be6ff767f4a693@ruby-forum.com> <20060727213133.GA10792@cordoba.webit.de> <1769527af02ae50456639f331886979a@ruby-forum.com> Message-ID: Hey .. i can reproduce these crashes without any problem, so if there is any way on how i can contribute to the ferret project, let me know. I'm running Linux 2.6.15 on a Opteron (x86_64, dual core). I can even give you acces to the server if that might help analysing the problems. David, I would like to go live with my project by the end of october and i need a stable version of a search engine to do that. so please let me know if i should wait for a couple of weeks or if i should start to look for alternatives as well. Unfortunately, the ruby only version of ferret is not an option as it is far to slow. Ben -- Posted via http://www.ruby-forum.com/. From contact at ezabel.com Sat Jul 29 16:29:30 2006 From: contact at ezabel.com (Ian Zabel) Date: Sat, 29 Jul 2006 22:29:30 +0200 Subject: [Ferret-talk] Searches limited to 10 results? Message-ID: <7a70c9473bd52cf8eb87e2a3ce22d0a2@ruby-forum.com> Hello all, I've just recently gotten my ActiveRecord models indexed properly in my Rails App. Woohoo! So I've started testing some searches. Is there some limit to acts_as_ferret that only allows it to return 10 results? I'm searching for common terms in my data that I _know_ return many results, but I only ever get at most 10 results. This is in a db with 350k rows: >> Comment.find_by_contents("ian").size => 10 The following SQL returns 1359 rows: SELECT count(*) FROM comments where comment like '% ian %'; My comment model has this: acts_as_ferret :fields => [ 'comment' ] Anything I'm doing wrong here? Thanks! Ian. -- Posted via http://www.ruby-forum.com/. From dbalmain.ml at gmail.com Sat Jul 29 19:48:13 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Sun, 30 Jul 2006 08:48:13 +0900 Subject: [Ferret-talk] Searches limited to 10 results? In-Reply-To: <7a70c9473bd52cf8eb87e2a3ce22d0a2@ruby-forum.com> References: <7a70c9473bd52cf8eb87e2a3ce22d0a2@ruby-forum.com> Message-ID: On 7/30/06, Ian Zabel wrote: > Hello all, > > I've just recently gotten my ActiveRecord models indexed properly in my > Rails App. Woohoo! So I've started testing some searches. > > Is there some limit to acts_as_ferret that only allows it to return 10 > results? I'm searching for common terms in my data that I _know_ return > many results, but I only ever get at most 10 results. > > This is in a db with 350k rows: > > >> Comment.find_by_contents("ian").size > => 10 > > The following SQL returns 1359 rows: > SELECT count(*) FROM comments where comment like '% ian %'; > > My comment model has this: > > acts_as_ferret :fields => [ 'comment' ] > > Anything I'm doing wrong here? > > Thanks! > Ian. Hi Ian, Try; >> Comment.find_by_contents("ian", :num_docs => 100).size Cheers, Dave From dbalmain.ml at gmail.com Sat Jul 29 19:53:53 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Sun, 30 Jul 2006 08:53:53 +0900 Subject: [Ferret-talk] How to use the ruby only version of ferret? In-Reply-To: References: <14f8af891199b66000be6ff767f4a693@ruby-forum.com> <20060727213133.GA10792@cordoba.webit.de> <1769527af02ae50456639f331886979a@ruby-forum.com> Message-ID: On 7/30/06, Ben wrote: > Hey .. > > i can reproduce these crashes without any problem, so if there is any > way on how i can contribute to the ferret project, let me know. I'm > running Linux 2.6.15 on a Opteron (x86_64, dual core). I can even give > you acces to the server if that might help analysing the problems. If you are still having problems when I get the next version out I'll definitely take you up on the offer. I'm working on Ferret full time from now until I finish the next version so hopefully that will be very soon. > David, I would like to go live with my project by the end of october and > i need a stable version of a search engine to do that. so please let me > know if i should wait for a couple of weeks or if i should start to look > for alternatives as well. I would be very disappointed if Ferret wasn't stable be October. And with access to your server I should definitely be able to make it stable for your system at least. Cheers, Dave From contact at ezabel.com Sat Jul 29 22:21:02 2006 From: contact at ezabel.com (Ian Zabel) Date: Sun, 30 Jul 2006 04:21:02 +0200 Subject: [Ferret-talk] Searches limited to 10 results? In-Reply-To: References: <7a70c9473bd52cf8eb87e2a3ce22d0a2@ruby-forum.com> Message-ID: <0bf42c2bd2c5b469ac48e2d6a44d5320@ruby-forum.com> David Balmain wrote: > On 7/30/06, Ian Zabel wrote: >> >> Anything I'm doing wrong here? >> >> Thanks! >> Ian. > > Hi Ian, > > Try; > >>> Comment.find_by_contents("ian", :num_docs => 100).size > > Cheers, > > Dave Ah, ok. Got it. Thanks! Ian. -- Posted via http://www.ruby-forum.com/. From bk at benjaminkrause.com Sun Jul 30 05:12:16 2006 From: bk at benjaminkrause.com (Benjamin Krause) Date: Sun, 30 Jul 2006 11:12:16 +0200 Subject: [Ferret-talk] How to use the ruby only version of ferret? In-Reply-To: References: <14f8af891199b66000be6ff767f4a693@ruby-forum.com> <20060727213133.GA10792@cordoba.webit.de> <1769527af02ae50456639f331886979a@ruby-forum.com> Message-ID: <44CC77F0.60900@benjaminkrause.com> David Balmain schrieb: > I would be very disappointed if Ferret wasn't stable be October. And > with access to your server I should definitely be able to make it > stable for your system at least. > David, great to hear.. i'm currently starting my testing phase.. we got several different queries and a lot of people testing.. searches are a very imporant as we included many live-searches, all based on ferret. one thing i can definately say is that fuzzy searches are broken.. as far as i see it, all crashes are related to a #{query}~0.6 call. But it seems to depend on the word you've been searching for. Some words like 'star' are working, others like 'memories' aren't. Ben From rrmdf at stimble.net Mon Jul 31 02:16:39 2006 From: rrmdf at stimble.net (Michael) Date: Mon, 31 Jul 2006 08:16:39 +0200 Subject: [Ferret-talk] segfaulting at rebiuild_index In-Reply-To: <20060721092601.GE28283@cordoba.webit.de> References: <7e60944762d13cc4e1cf48e2fb1f45e5@ruby-forum.com> <20060721092601.GE28283@cordoba.webit.de> Message-ID: <519eddba6e8cf68a66504b4c32fd4451@ruby-forum.com> > Do you use the compiled version of Ferret on both laptop and server ? I beleive they are both compiled. My server is gentoo linux. To be sure I just did "gem install ferret" to see and it does indeed appear to be compilinmg on both my macbook and the server. Nothing Changed. I also tried changing my locale in enviornemnt.rb: ENV['LANG'] = 'en_US.utf8' ENV['LC_TIME'] = 'C' I also dumped by database and reimported after converting from latin one to utf8 as described here http://climbtothestars.org/archives/2004/07/18/converting-mysql-database-contents-to-utf-8/ and here http://textsnippets.com/posts/show/84. still didn't change anything. > > What does your call to acts_as_ferret look like, do you specify a > custom analyzer or something like this ? I think they are pretty basic most are simple, except one line: acts_as_ferret(:fields => ['id','first_name','last_name', 'email', 'notes'], :occur_default => Ferret::Search::BooleanClause::Occur::SHOULD) I decided to comment out all the acts_as_ferret lines and add them back 1 at a time. At first i thought i had found the problem this way, but i was mistaken. Whenever the feret attempts to interact with the index, it segfaults /config/../vendor/plugins/acts_as_ferret/lib/acts_as_ferret.rb:243: [BUG] Segmentation fault I have noticed that all of my index dirs are empty. I mean the directory is there, i.e. index/development/boat, but they are all empty. Is there a way that i could try to manually build the index? Maybe that would help? Thanks for any help. > > Jens -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Mon Jul 31 05:42:39 2006 From: kraemer at webit.de (Jens Kraemer) Date: Mon, 31 Jul 2006 11:42:39 +0200 Subject: [Ferret-talk] segfaulting at rebiuild_index In-Reply-To: <519eddba6e8cf68a66504b4c32fd4451@ruby-forum.com> References: <7e60944762d13cc4e1cf48e2fb1f45e5@ruby-forum.com> <20060721092601.GE28283@cordoba.webit.de> <519eddba6e8cf68a66504b4c32fd4451@ruby-forum.com> Message-ID: <20060731094239.GB8592@cordoba.webit.de> On Mon, Jul 31, 2006 at 08:16:39AM +0200, Michael wrote: > > > Do you use the compiled version of Ferret on both laptop and server ? > I beleive they are both compiled. My server is gentoo linux. > To be sure I just did "gem install ferret" to see and it does indeed > appear to be compilinmg on both my macbook and the server. Nothing > Changed. > I also tried changing my locale in enviornemnt.rb: > ENV['LANG'] = 'en_US.utf8' > ENV['LC_TIME'] = 'C' > I also dumped by database and reimported after converting from latin one > to utf8 as described here > http://climbtothestars.org/archives/2004/07/18/converting-mysql-database-contents-to-utf-8/ > and here http://textsnippets.com/posts/show/84. > still didn't change anything. > > > > > What does your call to acts_as_ferret look like, do you specify a > > custom analyzer or something like this ? > I think they are pretty basic > most are simple, except one line: > acts_as_ferret(:fields => ['id','first_name','last_name', 'email', > 'notes'], :occur_default => > Ferret::Search::BooleanClause::Occur::SHOULD) maybe this helps: you should not specify the id field in a call to acts_as_ferret. The id is indexed by default. Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From Pedro.CorteReal at iantt.pt Mon Jul 31 06:05:31 2006 From: Pedro.CorteReal at iantt.pt (Pedro =?ISO-8859-1?Q?C=F4rte-Real?=) Date: Mon, 31 Jul 2006 11:05:31 +0100 Subject: [Ferret-talk] Sorting performance Message-ID: <1154340332.5397.2.camel@localhost.localdomain> I'm using acts_as_ferret to index one of my rails models. Right after I start the app the first request that orders by some ferret field will take very long. Subsequent ones seem to be fast. I guess some caching is going on. Any tips on solving this? Pedro. From dbalmain.ml at gmail.com Mon Jul 31 06:17:37 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Mon, 31 Jul 2006 19:17:37 +0900 Subject: [Ferret-talk] Sorting performance In-Reply-To: <1154340332.5397.2.camel@localhost.localdomain> References: <1154340332.5397.2.camel@localhost.localdomain> Message-ID: On 7/31/06, Pedro C?rte-Real wrote: > I'm using acts_as_ferret to index one of my rails models. Right after I > start the app the first request that orders by some ferret field will > take very long. Subsequent ones seem to be fast. I guess some caching is > going on. Any tips on solving this? > > Pedro. You guessed correctly. The sort fields are cached. You can easily preload the cache by running a search when you start up your app. You should also be careful what fields you sort on. You should only sort on untokenized fields. You can also speed up sorting by dates by lowering the precision that you use. For example, if you are storing the date with time to the nearest second, eg 2006-08-01 10:13:24 you may get a much faster sort by only storing up to the nearest day, ie 2006-08-01. By the way, what kind of times are we talking about here? Cheers, Dave From Pedro.CorteReal at iantt.pt Mon Jul 31 06:26:44 2006 From: Pedro.CorteReal at iantt.pt (Pedro =?ISO-8859-1?Q?C=F4rte-Real?=) Date: Mon, 31 Jul 2006 11:26:44 +0100 Subject: [Ferret-talk] Sorting performance In-Reply-To: References: <1154340332.5397.2.camel@localhost.localdomain> Message-ID: <1154341604.5397.6.camel@localhost.localdomain> On Mon, 2006-07-31 at 19:17 +0900, David Balmain wrote: > On 7/31/06, Pedro C?rte-Real wrote: > > I'm using acts_as_ferret to index one of my rails models. Right after I > > start the app the first request that orders by some ferret field will > > take very long. Subsequent ones seem to be fast. I guess some caching is > > going on. Any tips on solving this? > > > > Pedro. > > You guessed correctly. The sort fields are cached. You can easily > preload the cache by running a search when you start up your app. You > should also be careful what fields you sort on. You should only sort > on untokenized fields. Is it ok if the field isn't stored in the index? Anyone know how to set a field to be untokenized in acts_as_ferret? > You can also speed up sorting by dates by > lowering the precision that you use. For example, if you are storing > the date with time to the nearest second, eg 2006-08-01 10:13:24 you > may get a much faster sort by only storing up to the nearest day, ie > 2006-08-01. I'm only using dates so it should be alright. > By the way, what kind of times are we talking about here? 300 seconds for a 100MB index. Pedro. From guest at guest.com Mon Jul 31 09:34:21 2006 From: guest at guest.com (Guest) Date: Mon, 31 Jul 2006 15:34:21 +0200 Subject: [Ferret-talk] Indexing a lot of records Message-ID: <9ccbeaace7e4ef376ab17715e7ec5f48@ruby-forum.com> When trying to index a rather large database of records (50,000), acts_as_ferret takes a long time, then invariably times out and all the index information isn't created. Does anyone know how to rectify this? -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Mon Jul 31 09:58:41 2006 From: kraemer at webit.de (Jens Kraemer) Date: Mon, 31 Jul 2006 15:58:41 +0200 Subject: [Ferret-talk] Indexing a lot of records In-Reply-To: <9ccbeaace7e4ef376ab17715e7ec5f48@ruby-forum.com> References: <9ccbeaace7e4ef376ab17715e7ec5f48@ruby-forum.com> Message-ID: <20060731135841.GB26391@cordoba.webit.de> Hi! On Mon, Jul 31, 2006 at 03:34:21PM +0200, Guest wrote: > When trying to index a rather large database of records (50,000), > acts_as_ferret takes a long time, then invariably times out and all the > index information isn't created. > > Does anyone know how to rectify this? what does the logfile look like ? Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From Pedro.CorteReal at iantt.pt Mon Jul 31 11:10:03 2006 From: Pedro.CorteReal at iantt.pt (Pedro =?ISO-8859-1?Q?C=F4rte-Real?=) Date: Mon, 31 Jul 2006 16:10:03 +0100 Subject: [Ferret-talk] Sorting performance In-Reply-To: <1154341604.5397.6.camel@localhost.localdomain> References: <1154340332.5397.2.camel@localhost.localdomain> <1154341604.5397.6.camel@localhost.localdomain> Message-ID: <1154358603.5397.11.camel@localhost.localdomain> On Mon, 2006-07-31 at 11:26 +0100, Pedro C?rte-Real wrote: > Anyone know how to set a field to be untokenized in acts_as_ferret? I forgot that I was actually supplying my own #to_doc so it was a matter of changing it to not tokenize the fields I want. When using acts_as_ferret the regular way I don't know if this is possible. Pedro. From Pedro.CorteReal at iantt.pt Mon Jul 31 11:11:36 2006 From: Pedro.CorteReal at iantt.pt (Pedro =?ISO-8859-1?Q?C=F4rte-Real?=) Date: Mon, 31 Jul 2006 16:11:36 +0100 Subject: [Ferret-talk] Sorting performance In-Reply-To: References: <1154340332.5397.2.camel@localhost.localdomain> Message-ID: <1154358696.5397.14.camel@localhost.localdomain> On Mon, 2006-07-31 at 19:17 +0900, David Balmain wrote: > By the way, what kind of times are we talking about here? I added a preloading of this at the start of my app and it takes 14 minutes for a 100MB index with 4 fields I order by. Any way to speed this up? Shouldn't this be cached in the on-disk structure? Don't think I'm being critical, ferret is great software, many thanks for it. Pedro. From guest at guest.com Mon Jul 31 12:44:11 2006 From: guest at guest.com (Guest) Date: Mon, 31 Jul 2006 18:44:11 +0200 Subject: [Ferret-talk] Indexing a lot of records In-Reply-To: <20060731135841.GB26391@cordoba.webit.de> References: <9ccbeaace7e4ef376ab17715e7ec5f48@ruby-forum.com> <20060731135841.GB26391@cordoba.webit.de> Message-ID: <657b2afa543cfcca2c7171430e5ab4d3@ruby-forum.com> Well, I just tried it again, and after a long while, I got this error; StandardError in ListingsController#search No such file or directory - path to project here/index/development/listing/segments Jens Kraemer wrote: > Hi! > > On Mon, Jul 31, 2006 at 03:34:21PM +0200, Guest wrote: >> When trying to index a rather large database of records (50,000), >> acts_as_ferret takes a long time, then invariably times out and all the >> index information isn't created. >> >> Does anyone know how to rectify this? > > what does the logfile look like ? > > Jens > > > -- > webit! Gesellschaft f?r neue Medien mbH www.webit.de > Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de > Schnorrstra?e 76 Tel +49 351 46766 0 > D-01069 Dresden Fax +49 351 46766 66 -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Mon Jul 31 13:23:25 2006 From: kraemer at webit.de (Jens Kraemer) Date: Mon, 31 Jul 2006 19:23:25 +0200 Subject: [Ferret-talk] Ferret Wiki Spam - Solutions Anyone? In-Reply-To: References: Message-ID: <20060731172325.GA19848@cordoba.webit.de> On Tue, Jul 11, 2006 at 06:50:40PM -0400, Tom Davies wrote: > My vote is to stick with Trac. I haven't used the SpamFilter plugin, > but it looks promising. Actually it seems to work pretty well. I installed it into the acts_as_ferret Trac 2 weeks ago, and we've had no spam since then :-) Only thing I had to do (besides upgrading trac and installing the plugin) was to register for an akismet API key. Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From kraemer at webit.de Mon Jul 31 13:36:17 2006 From: kraemer at webit.de (Jens Kraemer) Date: Mon, 31 Jul 2006 19:36:17 +0200 Subject: [Ferret-talk] Sorting performance In-Reply-To: <1154358603.5397.11.camel@localhost.localdomain> References: <1154340332.5397.2.camel@localhost.localdomain> <1154341604.5397.6.camel@localhost.localdomain> <1154358603.5397.11.camel@localhost.localdomain> Message-ID: <20060731173617.GB19848@cordoba.webit.de> On Mon, Jul 31, 2006 at 04:10:03PM +0100, Pedro C?rte-Real wrote: > On Mon, 2006-07-31 at 11:26 +0100, Pedro C?rte-Real wrote: > > Anyone know how to set a field to be untokenized in acts_as_ferret? > > I forgot that I was actually supplying my own #to_doc so it was a matter > of changing it to not tokenize the fields I want. When using > acts_as_ferret the regular way I don't know if this is possible. it is, just provide a hash with the desired options to each field name: acts_as_ferret( :fields => { 'title' => { :boost => 2 }, 'description' => { :boost => 1, :index => Ferret::Document::Field::Index::UNTOKENIZED } }) options that can be set this way are (with their defaults given): :store => Ferret::Document::Field::Store::NO :index => Ferret::Document::Field::Index::TOKENIZED :term_vector => Ferret::Document::Field::TermVector::NO :binary => false :boost => 1.0 Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From dbalmain.ml at gmail.com Mon Jul 31 20:24:16 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Tue, 1 Aug 2006 09:24:16 +0900 Subject: [Ferret-talk] Sorting performance In-Reply-To: <1154358696.5397.14.camel@localhost.localdomain> References: <1154340332.5397.2.camel@localhost.localdomain> <1154358696.5397.14.camel@localhost.localdomain> Message-ID: On 8/1/06, Pedro C?rte-Real wrote: > On Mon, 2006-07-31 at 19:17 +0900, David Balmain wrote: > > By the way, what kind of times are we talking about here? > > I added a preloading of this at the start of my app and it takes 14 > minutes for a 100MB index with 4 fields I order by. Any way to speed > this up? Shouldn't this be cached in the on-disk structure? How many documents and what is the date range (eg 2001-01-01 -> 2006-08-01). These are the critical variables for sort performance. Once I know these numbers I'll be able to replicate the task here and I'll see what I can do. > Don't think I'm being critical, ferret is great software, many thanks > for it. No offence taken. I'd definitely like to be able to help. I'm guessing I'll probably have to optimize the C code to rectify this. Cheers, Dave From samuelgiffney at gmail.com Mon Jul 31 20:26:59 2006 From: samuelgiffney at gmail.com (Sam Giffney) Date: Tue, 1 Aug 2006 02:26:59 +0200 Subject: [Ferret-talk] Per field boost values - possible? working? Message-ID: I'm making a simple business directory search and I want to boost the relevance of the 'name' field over the 'address' field - both stored in the same document in the same index. Here is some console code to demonstrate what I am actually doing >> include Ferret::Document => Object >> doc = Document.new => Document { } >> doc << Field.new(:name, "Business Search", Field::Store::YES, Field::Index::TOKENIZED, Field::TermVector::NO, false, 2.0) => nil >> doc << Field.new("physical_address", "New Zealand", Field::Store::YES, Field::Index::TOKENIZED, Field::TermVector::NO, false, 1.0) => nil >> doc => Document { stored/uncompressed,indexed,tokenized, stored/uncompressed,indexed,tokenized, } I realise the docs say: "Note: this value is not stored directly with the document in the index." so I guess that's why the boost field isn't shown here. However, browsing the index in Luke shows that the boost value on each field is still set to the default 1.0. Also empirical testing suggests the boost value I'm entering isn't taken into account at all. Am I doing something wrong or is the boost functionality not working? I'm running ferret 0.9.4 with ruby 1.82 on debian sarge. -- Posted via http://www.ruby-forum.com/. From dbalmain.ml at gmail.com Mon Jul 31 21:35:15 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Tue, 1 Aug 2006 10:35:15 +0900 Subject: [Ferret-talk] Per field boost values - possible? working? In-Reply-To: References: Message-ID: On 8/1/06, Sam Giffney wrote: > I'm making a simple business directory search and I want to boost the > relevance of the 'name' field over the 'address' field - both stored in > the same document in the same index. > > Here is some console code to demonstrate what I am actually doing > > >> include Ferret::Document > => Object > >> doc = Document.new > => Document { > } > >> doc << Field.new(:name, "Business Search", Field::Store::YES, Field::Index::TOKENIZED, Field::TermVector::NO, false, 2.0) > => nil > >> doc << Field.new("physical_address", "New Zealand", Field::Store::YES, Field::Index::TOKENIZED, Field::TermVector::NO, false, 1.0) > => nil > >> doc > => Document { > stored/uncompressed,indexed,tokenized, > stored/uncompressed,indexed,tokenized, > } > > I realise the docs say: "Note: this value is not stored directly with > the document in the index." so I guess that's why the boost field isn't > shown here. The boost isn't shown here simple because I forgot to add it. It is stored with the document when you create it. However, it isn't stored with the document in the index. It is stored in a "norms" file. There is a norms file for every indexed field in the index (unless you chose Field::Index::OMIT_NORMS) and the norms file contains a single byte for every document in the index. > However, browsing the index in Luke shows that the boost value on each > field is still set to the default 1.0. Also empirical testing suggests > the boost value I'm entering isn't taken into account at all. I'm not sure why it doesn't show up in Luke. The boost is definitely working. I'm not sure what kinds of empirical tests you did. Try this; require 'rubygems' require 'ferret' include Ferret::Index include Ferret::Document index = Index.new doc = Document.new doc << Field.new(:name, "Business Search", Field::Store::YES, Field::Index::TOKENIZED, Field::TermVector::NO) index << doc doc.field(:name).boost = 2.0 index << doc puts "Explanation for Doc 0" puts index.explain("business", 0) puts "" puts "Explanation for Doc 1" puts index.explain("business", 1) The explain method explains the score for a query and a particular document. You'll notice the score is doubled for the second document. Hope that helps, Dave PS: anyone interested in porting Luke to ruby? Luke won't work on future versions of the Ferret index. I'd be happy to help out but I don't have time to do it by myself.