From henke at mac.se Tue Jul 1 08:43:28 2008 From: henke at mac.se (Henrik) Date: Tue, 1 Jul 2008 14:43:28 +0200 Subject: [Ferret-talk] filter_proc problem with limit Message-ID: Hi list, I have a problem grouping with ferret. I'm using the filter_proc from Dave's book as seen below results = {} group_by_proc = lambda do |doc_id, score, searcher| doc = searcher[doc_id] (results[doc[:pk_file_id]]||=[]) << doc[:filename] << doc[:path] next true end The problem is that if I use this it ignores my limit clause. I set limit on 10 and I still get 5995 results and it takes several seconds. How come the limit clause is ignored when using a filter_proc? How can I change this behaviour? term = "wi" bool1 = Ferret::Search::BooleanQuery.new() bool1.add_query(Ferret::Search::PrefixQuery.new(:filename, "#{term}")) bool1.add_query(Ferret::Search::PrefixQuery.new(:path, "#{term}")) index.search(bool1, :limit => 10, :filter_proc => group_by_proc) puts result.size 5995 Cheers, Henke From kraemer at webit.de Wed Jul 2 05:43:45 2008 From: kraemer at webit.de (Jens Kraemer) Date: Wed, 2 Jul 2008 11:43:45 +0200 Subject: [Ferret-talk] Find fields beginning with? In-Reply-To: <42EAD03C-2B3D-4F9B-B36A-1E5D0C13BEA6@oncotype.dk> References: <9DC46304-D34A-4CC7-AE41-F4A1C8FDF484@oncotype.dk> <20080627134723.GF6614@cordoba.webit.de> <0BFA7B0E-63B8-4EA9-905F-B0798BDA6768@oncotype.dk> <9C154B9C-C4BE-4E86-A9E1-788BCF713142@mac.se> <0B69F0EE-9DA2-4DE2-930F-D41C5C63BA4F@mac.se> <30DC61DC-03AB-46E3-AA46-DE343A6E36BF@mac.se> <833D4BC2-EABB-4D09-B8B1-019AB4C698D4@mac.se> <42EAD03C-2B3D-4F9B-B36A-1E5D0C13BEA6@oncotype.dk> Message-ID: <20080702094344.GA25820@cordoba.webit.de> Hi! On Mon, Jun 30, 2008 at 02:32:27PM +0200, Mattias Bodlund wrote: > I think so. I have tried almost everything and the common missbehavior I > get is that I keep getting hits where the query isn't in the start of the > field but somewhere in the middle or end. I dont think any of Ferret's default queries will solve your problem, but given that something like SpanFirstQuery exists it should also be possible to implement a SpanFirstTermQuery... cheers, Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold From henke at mac.se Fri Jul 4 05:58:00 2008 From: henke at mac.se (Henrik) Date: Fri, 4 Jul 2008 11:58:00 +0200 Subject: [Ferret-talk] filter_proc problem with limit In-Reply-To: References: Message-ID: Has anyone tried this? I'm I the only one trying to group results from ferret? Cheers, Henke 1 jul 2008 kl. 14.43 skrev Henrik: > Hi list, > > I have a problem grouping with ferret. > > I'm using the filter_proc from Dave's book as seen below > > results = {} > group_by_proc = lambda do |doc_id, score, searcher| > doc = searcher[doc_id] > (results[doc[:pk_file_id]]||=[]) << doc[:filename] << doc[:path] > next true > end > > > The problem is that if I use this it ignores my limit clause. > > I set limit on 10 and I still get 5995 results and it takes several > seconds. > > How come the limit clause is ignored when using a filter_proc? How > can I change this behaviour? > > > term = "wi" > > bool1 = Ferret::Search::BooleanQuery.new() > bool1.add_query(Ferret::Search::PrefixQuery.new(:filename, "#{term}")) > bool1.add_query(Ferret::Search::PrefixQuery.new(:path, "#{term}")) > > index.search(bool1, :limit => 10, :filter_proc => group_by_proc) > > puts result.size > 5995 > > > Cheers, > Henke > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk From jeff at boowebb.com Wed Jul 9 00:01:36 2008 From: jeff at boowebb.com (Jeff Webb) Date: Tue, 8 Jul 2008 21:01:36 -0700 Subject: [Ferret-talk] Pagination, sorting and conditions: the combination is b In-Reply-To: <66d3a747bc767cb7b72be166d2773179@ruby-forum.com> References: <20080425115943.GA308@cordoba.webit.de> <66d3a747bc767cb7b72be166d2773179@ruby-forum.com> Message-ID: <67f7f6410807082101q274513fdmbb415271506a1429@mail.gmail.com> Sheldon/Jens, Any chance that this has been figured out and patched? Great analysis Sheldon! Jeff On Thu, May 1, 2008 at 7:24 PM, Sheldon Maloff wrote: > Hello Jens, > > I think I know what's going on here, because our descending sort > searches are broken too and I have started to investigate what's causing > the problem and trying to fix it. I have a January '08 version of the > trunk. I believe it's changed quite a lot since that time. > > Jens, I don't think it's anything you "broke" but rather an artifiact of > how MySQL works. At least, I'm using MySQL and this is the behaviour I > see. > > I created 6 records, whose ids are 1 to 6 in my database. I am > paginating on every 5 records. In my reverse sort I would expect to see > records 6, 5, 4, 3, 2 on page 1 of the results. And id 1 on page 2 of > the results. > > What I see is a method called ar_find_by_contents. It calls > find_id_by_content that returns an array that in turn calls ferret. The > array that comes back from ferret is actually correctly sorted: > > 6 0.928179502487183 > 5 0.928179502487183 > 4 0.928179502487183 > 3 0.928179502487183 > 2 0.928179502487183 > 1 0.928179502487183 > > The first number is the id, the second is the rank. > > Now what happens is ar_find_by_contents calls retrieve_records. And > retrieve_records produces a SELECT statement like so: > > SELECT * FROM model WHERE id IN (6, 1, 2, 3, 4, 5) LIMIT 0, 5 > > It took me a while to figure out that things are being passed around as > a hash, and hence the wacky order of the ids in the IN clause. Now the > problem with this statement is that MySQL doesn't return records in the > order that the ids appear in the IN clause. MySQL returns records in the > order of the Primary Key on the table, which happens to be the id > column. So MySQL is returning records 1, 2, 3, 4, 5, 6, in that order. > Then the LIMIT clause kicks in and truncates the results to 1 through 5. > > Now the rest of ar_find_by_contents valiantly tries to order the AR > results with the rank returned by ferret (my first table above). The > problem is, record 6, the youngest, is no longer in the results because > LIMIT took it out. So AAF sorts records 1 through 5 descending. > > Following along we can see how page two returns only record 6. On page > two, the limit changes to > > SELECT * FROM model WHERE id IN (6, 1, 2, 3, 4, 5) LIMIT 5, 5 > > Once again, My SQL returns records 1, 2, 3, 4, 5, 6, but this time the > limit returns only the last record, id 6. And then AAF sorts that > descending. > > I working on a patch for the version I have by making MySQL return only > the correct set of records in the first place. In other words, ensuring > that the only ids present in the IN clause are the ones that should > appear on page 1 of the results, or page 2, or pane N. > > So my AR query for page 1 looks like > > SELECT * FROM model WHERE id IN (6, 5, 4, 3, 2) LIMIT 0, 5 > > and the AR query for page 2 looks like > > SELECT * FROM model WHERE id IN (1) LIMIT 0, 5 > > I got it working, but in the process have made every other search, not > work. Funny. I'm sure I'll figure it out. > > Anyway, Jens, that's the gist of the problem at least how it relates to > MySQL. Other databases may vary. > > Regards > Sheldon Maloff > veer.com > > > Jens Kraemer wrote: > > Hi Max, > > > > thanks for your detailed report. Might well be that I broke one or more > > of the various combinations of pagination / sorting / active record > > conditions (where you might specify :order, too, btw) in trunk. > > > > I'll look into it asap. > > > > Cheers, > > Jens > -- > Posted via http://www.ruby-forum.com/. > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > -- Jeff Webb jeff at boowebb.com http://boowebb.com/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From palasrinivasarao14 at gmail.com Wed Jul 9 02:36:18 2008 From: palasrinivasarao14 at gmail.com (Srinu Pala) Date: Wed, 9 Jul 2008 08:36:18 +0200 Subject: [Ferret-talk] acts_as_ferret problem In-Reply-To: <6e4724782cd65cadc7eb77ff546d5111@ruby-forum.com> References: <6e4724782cd65cadc7eb77ff546d5111@ruby-forum.com> Message-ID: Srinu Pala wrote: > Hi, > > I want to use the ferret plugin. > I installed the acts_as_ferret. And installed the acts_as_feret plugin > also. > when I am going to generate the controller, > it is showing one error like > b/ruby/site_ruby/1.8/rubygems/custom_require.rb:27:in > `gem_original_require': no such file to load -- ferret > (MissingSourceFile) > from D:/Program > Files/ruby/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:27:in > `require' > from D:/Program > Files/ruby/lib/ruby/gems/1.8/gems/activesupport-2.0.2/lib/active_support/dependencies.rb:496:in > `require' > from D:/Program > Files/ruby/lib/ruby/gems/1.8/gems/activesupport-2.0.2/lib/active_support/dependencies.rb:342:in > `new_constants_in' > from D:/Program > Files/ruby/lib/ruby/gems/1.8/gems/activesupport-2.0.2/lib/active_support/dependencies.rb:496:in > `require' > from D:/Workspac > > please help me for the issue. > > srinivas rao.pala Hi, I tried with latest version. For that it gave install problem. Now the problem is rectified. Till now any one got the same problem, Install the previous version. It will install. Thank you, srinivas -- Posted via http://www.ruby-forum.com/. From jk at jkraemer.net Wed Jul 9 05:24:14 2008 From: jk at jkraemer.net (Jens Kraemer) Date: Wed, 9 Jul 2008 11:24:14 +0200 Subject: [Ferret-talk] Pagination, sorting and conditions: the combination is b In-Reply-To: <67f7f6410807082101q274513fdmbb415271506a1429@mail.gmail.com> References: <20080425115943.GA308@cordoba.webit.de> <66d3a747bc767cb7b72be166d2773179@ruby-forum.com> <67f7f6410807082101q274513fdmbb415271506a1429@mail.gmail.com> Message-ID: <3F873548-5896-42F1-9A4D-983D85AA9D2C@jkraemer.net> Hi folks, On 09.07.2008, at 06:01, Jeff Webb wrote: > Sheldon/Jens, > > Any chance that this has been figured out and patched? I just committed a fix to this problem to trunk, so pagination with sorting and AR conditions should work now. cheers, Jens -- Jens Kr?mer http://www.jkraemer.net/ - Blog http://www.omdb.org/ - The new free film database From arvindsg at gmail.com Sun Jul 13 01:43:53 2008 From: arvindsg at gmail.com (arvind gautam) Date: Sun, 13 Jul 2008 01:43:53 -0400 Subject: [Ferret-talk] AAF and Location Filter Message-ID: <89a9739f0807122243s2530a2ck2a2aa9cffc649186@mail.gmail.com> I am using Acts as Ferret and trying to filter my search results by location (doing a location-based search). I came across a locationfilter on http://blog.tourb.us/archives/ferret-and-location-based-searches <- this blog post. I don't really know how to use this from within my model (where I'm actually searching). Help Please..? -Arvind -------------- next part -------------- An HTML attachment was scrubbed... URL: From julioody at gmail.com Sun Jul 13 02:04:27 2008 From: julioody at gmail.com (Julio Cesar Ody) Date: Sun, 13 Jul 2008 16:04:27 +1000 Subject: [Ferret-talk] AAF and Location Filter In-Reply-To: <89a9739f0807122243s2530a2ck2a2aa9cffc649186@mail.gmail.com> References: <89a9739f0807122243s2530a2ck2a2aa9cffc649186@mail.gmail.com> Message-ID: For location based search (ActiveRecord Based), you might want to take a look at http://geokit.rubyforge.org/. It's so cool it supports more than one algorithm to calculate proximity. I haven't tried to mix ActsAsFerret with Geokit. Would be interesting to know what happens. On Sun, Jul 13, 2008 at 3:43 PM, arvind gautam wrote: > I am using Acts as Ferret and trying to filter my search results by location > (doing a location-based search). > > I came across a locationfilter on > http://blog.tourb.us/archives/ferret-and-location-based-searches <- this > blog post. > > I don't really know how to use this from within my model (where I'm actually > searching). > > Help Please..? > > -Arvind > > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > From arvindsg at gmail.com Sun Jul 13 02:23:17 2008 From: arvindsg at gmail.com (arvind gautam) Date: Sun, 13 Jul 2008 02:23:17 -0400 Subject: [Ferret-talk] AAF and Location Filter In-Reply-To: References: <89a9739f0807122243s2530a2ck2a2aa9cffc649186@mail.gmail.com> Message-ID: <89a9739f0807122323h128d1221tdb65772bf7442eb6@mail.gmail.com> Julio, I've actually checked that plugin and it seems to be great. But what I'm trying to achieve is pretty much explained on that blog post -> http://blog.tourb.us/archives/ferret-and-location-based-searches . A-la Google Maps. I want to be able to search for a particular named location (name/description - text search) within an X mile radius of a given lat/long. That's why the filter for ferret :) -Arvind On Sun, Jul 13, 2008 at 2:04 AM, Julio Cesar Ody wrote: > For location based search (ActiveRecord Based), you might want to take > a look at http://geokit.rubyforge.org/. It's so cool it supports more > than one algorithm to calculate proximity. > > I haven't tried to mix ActsAsFerret with Geokit. Would be interesting > to know what happens. > > > On Sun, Jul 13, 2008 at 3:43 PM, arvind gautam wrote: > > I am using Acts as Ferret and trying to filter my search results by > location > > (doing a location-based search). > > > > I came across a locationfilter on > > http://blog.tourb.us/archives/ferret-and-location-based-searches <- this > > blog post. > > > > I don't really know how to use this from within my model (where I'm > actually > > searching). > > > > Help Please..? > > > > -Arvind > > > > _______________________________________________ > > Ferret-talk mailing list > > Ferret-talk at rubyforge.org > > http://rubyforge.org/mailman/listinfo/ferret-talk > > > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > -------------- next part -------------- An HTML attachment was scrubbed... URL: From toastkid.williams at gmail.com Fri Jul 18 07:11:54 2008 From: toastkid.williams at gmail.com (Max Williams) Date: Fri, 18 Jul 2008 12:11:54 +0100 Subject: [Ferret-talk] Pagination, sorting and conditions: the combination is b In-Reply-To: <3F873548-5896-42F1-9A4D-983D85AA9D2C@jkraemer.net> References: <20080425115943.GA308@cordoba.webit.de> <66d3a747bc767cb7b72be166d2773179@ruby-forum.com> <67f7f6410807082101q274513fdmbb415271506a1429@mail.gmail.com> <3F873548-5896-42F1-9A4D-983D85AA9D2C@jkraemer.net> Message-ID: Hi Jens/all I've been playing with latest version, and it seems that the combination of AR conditions, paginating and AR order (eg ":order => "name") seems to be fine now, even on our server, which was were it was broken before (it always worked locally, weirdly). However, i've just tried it with results that are sorted by ferret on the basis of their boost score, and it seems broken. I'm sorry i don't have any tests set up but if i walk through an example can anyone see something that maybe i'm doing wrong? Or shed any light? I use AR conditions to filter the results through a set of ids that an individual user is allowed to view. I just have a method User#allowed_ids for this that returns an array of integers. For the purpose of illustration, though, let's say that the allowed ids are everything between 1000 & 2000. allowed_ids = (1000..2000).to_a => [1000, 1001, 1002, etc, 2000] For a particular search (on 'rhythm') i get these results back - the order is the order calculated by ferret/aaf on the basis of boost values: i've collected them by id for clarity. In this example i've set a big per_page to get all the results. >> ActsAsFerret::find("rhythm", [TeachingObject],{ :page => 1, :per_page => 1000 }, {} ).collect(&:id) => [4038, 698, 4039, 1830, 1831, 1826, 1832, 1825, 1833, 411, 1834, 702, 1827, 1689, 1680, 1688, 1679, 1686, 1684, 1676, 2129, 2130, 2131, 1858, 1859, 1860, 1861, 1865, 2132, 2141, 2345, 2350, 2352, 2353, 2356, 2360, 2362, 2366, 2368, 2371, 2372, 2373, 2376, 2377, 2378, 2384, 2385, 2389, 2407, 2414, 2417, 2419, 2427, 2428, 2438, 2439, 2458, 2459, 2473, 2474, 2475, 2477, 2478, 2133, 2349, 2361, 2363, 2365, 2370, 2375, 2383, 2386, 2392, 2415, 2430, 2431, 2440, 2441, 2442, 2472, 2142, 3751, 2161, 1236] Now, when i apply the condition of only being in the allowed ids, i'd expect the remaining ids to be in the same order as above, and that is in fact the case: >> ActsAsFerret::find("rhythm", [TeachingObject],{ :page => 1, :per_page => 1000 }, {:conditions => ["id in (?)", allowed_ids] } ).collect(&:id) => [1830, 1831, 1826, 1832, 1825, 1833, 1834, 1827, 1689, 1680, 1688, 1679, 1686, 1684, 1676, 1858, 1859, 1860, 1861, 1865, 1236] Just to check, let's do the original again (with no AR conditions) and just do set intersection with the allowed_ids >> ActsAsFerret::find("rhythm", [TeachingObject],{ :page => 1, :per_page => 1000 }, {} ).collect(&:id) & allowed_ids => [1830, 1831, 1826, 1832, 1825, 1833, 1834, 1827, 1689, 1680, 1688, 1679, 1686, 1684, 1676, 1858, 1859, 1860, 1861, 1865, 1236] OK - looks good. But, when i try to paginate into some actual pages, the order breaks: first, without AR conditions: >> ActsAsFerret::find("rhythm", [TeachingObject],{ :page => 1, :per_page => 10 }, {} ).collect(&:id) => [4038, 698, 4039, 1830, 1831, 1826, 1832, 1825, 1833, 411] That all seems to be in order - i get the first ten results from the big list, above. Now, if we were to apply the allowed_ids condition here, we'd expect the results to start [1830, 1831, 1826, 1832, 1825, 1833, ...] - right? Because we should have the same ordering applied to the remaining resources, and then we get the first ten for page 1. But, the ordering is different - >> ActsAsFerret::find("rhythm", [TeachingObject],{ :page => 1, :per_page => 10 }, {:conditions => ["id in (?)", allowed_ids] } ).collect(&:id) => [1826, 1825, 1689, 1680, 1688, 1679, 1686, 1684, 1676, 1236] So, it seems that pagination + ferret score ordering + AR conditions is a bad combination Again, sorry to not supply test cases but we don't use them (i know, i know!!!). Can anyone shed any light? thanks - max 2008/7/9 Jens Kraemer : > Hi folks, > > On 09.07.2008, at 06:01, Jeff Webb wrote: > >> Sheldon/Jens, >> >> Any chance that this has been figured out and patched? >> > > I just committed a fix to this problem to trunk, so pagination with sorting > and AR conditions should work now. > > > cheers, > Jens > > -- > Jens Kr?mer > http://www.jkraemer.net/ - Blog > http://www.omdb.org/ - The new free film database > > > > > > > > > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jeff at boowebb.com Fri Jul 18 18:28:54 2008 From: jeff at boowebb.com (Jeff Webb) Date: Fri, 18 Jul 2008 15:28:54 -0700 Subject: [Ferret-talk] Pagination, sorting and conditions: the combination is b In-Reply-To: References: <20080425115943.GA308@cordoba.webit.de> <66d3a747bc767cb7b72be166d2773179@ruby-forum.com> <67f7f6410807082101q274513fdmbb415271506a1429@mail.gmail.com> <3F873548-5896-42F1-9A4D-983D85AA9D2C@jkraemer.net> Message-ID: <67f7f6410807181528s41c6393el752412ebd0ad9a49@mail.gmail.com> Max, Here is what I changed in the current stable version of AAF in my local plugin install to get conditions and pagination working. I have not fully tested it except for my particular use cases but it seems to work out. Sorting is not factored into these changes. Basically if there is no pagination OR no AR conditions you can let ferret do all the work for you. If you have both of these then you need to return all results, merge with ferret IDs, then do your offset and limit. I replicated what Jens has done on the AAF trunk. Hope this works for you. I have not heard if Jens is going to add this to the current branch as a hotfix. replace the existing ar_by_contents method in class_methods.rb with: #################################### #changes to ar_find_by_contents AAF 0.4.3# #################################### def ar_find_by_contents(q, options = {}, find_options = {}) result_ids = {} has_conditions = !find_options[:conditions].blank? || caller.find{ |call| call =~ %r{active_record/associations} } # odd case - cannot do pagination combo with AR & Ferret # must retrieve all then paginate after if options[:per_page] && has_conditions late_paginate = true offset = find_options.delete(:offset) limit = find_options.delete(:limit) options.delete(:page) options.delete(:per_page) find_options.delete(:offset) find_options.delete(:limit) options[:limit] = :all end total_hits = find_id_by_contents(q, options) do |model, id, score, data| # stores ids, index and score of each hit for later ordering of # results result_ids[id] = [ result_ids.size + 1, score ] end result = retrieve_records( { self.name => result_ids }, find_options ) if has_conditions # what we got from the database is our full result set, so take it's size total_hits = result.length if late_paginate result = result[offset..offset+limit-1] end end [ total_hits, result ] end Jeff On Fri, Jul 18, 2008 at 4:11 AM, Max Williams wrote: > Hi Jens/all > > I've been playing with latest version, and it seems that the combination of > AR conditions, paginating and AR order (eg ":order => "name") seems to be > fine now, even on our server, which was were it was broken before (it always > worked locally, weirdly). > > However, i've just tried it with results that are sorted by ferret on the > basis of their boost score, and it seems broken. > > I'm sorry i don't have any tests set up but if i walk through an example > can anyone see something that maybe i'm doing wrong? Or shed any light? > > I use AR conditions to filter the results through a set of ids that an > individual user is allowed to view. I just have a method User#allowed_ids > for this that returns an array of integers. For the purpose of > illustration, though, let's say that the allowed ids are everything between > 1000 & 2000. > allowed_ids = (1000..2000).to_a > => [1000, 1001, 1002, etc, 2000] > > For a particular search (on 'rhythm') i get these results back - the order > is the order calculated by ferret/aaf on the basis of boost values: i've > collected them by id for clarity. In this example i've set a big per_page > to get all the results. > > >> ActsAsFerret::find("rhythm", [TeachingObject],{ :page => 1, :per_page => > 1000 }, {} ).collect(&:id) > => [4038, 698, 4039, 1830, 1831, 1826, 1832, 1825, 1833, 411, 1834, 702, > 1827, 1689, 1680, 1688, 1679, 1686, 1684, 1676, 2129, 2130, 2131, 1858, > 1859, 1860, 1861, 1865, 2132, 2141, 2345, 2350, 2352, 2353, 2356, 2360, > 2362, 2366, 2368, 2371, 2372, 2373, 2376, 2377, 2378, 2384, 2385, 2389, > 2407, 2414, 2417, 2419, 2427, 2428, 2438, 2439, 2458, 2459, 2473, 2474, > 2475, 2477, 2478, 2133, 2349, 2361, 2363, 2365, 2370, 2375, 2383, 2386, > 2392, 2415, 2430, 2431, 2440, 2441, 2442, 2472, 2142, 3751, 2161, 1236] > > Now, when i apply the condition of only being in the allowed ids, i'd > expect the remaining ids to be in the same order as above, and that is in > fact the case: > > >> ActsAsFerret::find("rhythm", [TeachingObject],{ :page => 1, :per_page => > 1000 }, {:conditions => ["id in (?)", allowed_ids] } ).collect(&:id) > => [1830, 1831, 1826, 1832, 1825, 1833, 1834, 1827, 1689, 1680, 1688, 1679, > 1686, 1684, 1676, 1858, 1859, 1860, 1861, 1865, 1236] > > Just to check, let's do the original again (with no AR conditions) and just > do set intersection with the allowed_ids > >> ActsAsFerret::find("rhythm", [TeachingObject],{ :page => 1, :per_page => > 1000 }, {} ).collect(&:id) & allowed_ids > => [1830, 1831, 1826, 1832, 1825, 1833, 1834, 1827, 1689, 1680, 1688, 1679, > 1686, 1684, 1676, 1858, 1859, 1860, 1861, 1865, 1236] > > OK - looks good. But, when i try to paginate into some actual pages, the > order breaks: first, without AR conditions: > > >> ActsAsFerret::find("rhythm", [TeachingObject],{ :page => 1, :per_page => > 10 }, {} ).collect(&:id) > => [4038, 698, 4039, 1830, 1831, 1826, 1832, 1825, 1833, 411] > > That all seems to be in order - i get the first ten results from the big > list, above. > > Now, if we were to apply the allowed_ids condition here, we'd expect the > results to start > [1830, 1831, 1826, 1832, 1825, 1833, ...] - right? Because we should have > the same ordering applied to the remaining resources, and then we get the > first ten for page 1. But, the ordering is different - > > >> ActsAsFerret::find("rhythm", [TeachingObject],{ :page => 1, :per_page => > 10 }, {:conditions => ["id in (?)", allowed_ids] } ).collect(&:id) > => [1826, 1825, 1689, 1680, 1688, 1679, 1686, 1684, 1676, 1236] > > So, it seems that pagination + ferret score ordering + AR conditions is a > bad combination > > Again, sorry to not supply test cases but we don't use them (i know, i > know!!!). Can anyone shed any light? > > thanks - max > > 2008/7/9 Jens Kraemer : > > Hi folks, >> >> On 09.07.2008, at 06:01, Jeff Webb wrote: >> >>> Sheldon/Jens, >>> >>> Any chance that this has been figured out and patched? >>> >> >> I just committed a fix to this problem to trunk, so pagination with >> sorting and AR conditions should work now. >> >> >> cheers, >> Jens >> >> -- >> Jens Kr?mer >> http://www.jkraemer.net/ - Blog >> http://www.omdb.org/ - The new free film database >> >> >> >> >> >> >> >> >> _______________________________________________ >> Ferret-talk mailing list >> Ferret-talk at rubyforge.org >> http://rubyforge.org/mailman/listinfo/ferret-talk >> > > > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > -- Jeff Webb jeff at boowebb.com http://boowebb.com/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From toastkid.williams at gmail.com Sat Jul 19 06:51:37 2008 From: toastkid.williams at gmail.com (Max Williams) Date: Sat, 19 Jul 2008 11:51:37 +0100 Subject: [Ferret-talk] Pagination, sorting and conditions: the combination is b In-Reply-To: <67f7f6410807181528s41c6393el752412ebd0ad9a49@mail.gmail.com> References: <20080425115943.GA308@cordoba.webit.de> <66d3a747bc767cb7b72be166d2773179@ruby-forum.com> <67f7f6410807082101q274513fdmbb415271506a1429@mail.gmail.com> <3F873548-5896-42F1-9A4D-983D85AA9D2C@jkraemer.net> <67f7f6410807181528s41c6393el752412ebd0ad9a49@mail.gmail.com> Message-ID: Hi Jeff - thanks a lot for looking at this. I discovered that since last august, will paginate lets you paginate any arbitrary array - so you can paginate an existing set of results. so what i'm doing now is this - get the results using ferret find, passing through the AR conditions and a large per page which means there's no pagination, effectively. The sorting (by ferret score) is correct. - cache the results with memcache, using params (minus the page k-v pair) as a cache key - this means that the retrieve will be slow the first time but very quick if the user just wants to look at another page - then paginate the results, simply doing "@all_results.paginate(:page => params[:page], :per_page => 20)" This is pretty similar to your patch i think - get results with ferret and then paginate - and it's working fine for me. The size of my result set returned by ferret is rarely more than 150 big, so getting all results isn't a problem. Obviously it's not very scaleable though. thanks! max 2008/7/18 Jeff Webb : > Max, > > Here is what I changed in the current stable version of AAF in my local > plugin install to get conditions and pagination working. I have not fully > tested it except for my particular use cases but it seems to work out. > Sorting is not factored into these changes. > > Basically if there is no pagination OR no AR conditions you can let ferret > do all the work for you. If you have both of these then you need to return > all results, merge with ferret IDs, then do your offset and limit. I > replicated what Jens has done on the AAF trunk. > > Hope this works for you. I have not heard if Jens is going to add this to > the current branch as a hotfix. > > replace the existing ar_by_contents method in class_methods.rb with: > > #################################### > #changes to ar_find_by_contents AAF 0.4.3# > #################################### > > def ar_find_by_contents(q, options = {}, find_options = {}) > result_ids = {} > has_conditions = !find_options[:conditions].blank? || caller.find{ > |call| call =~ %r{active_record/associations} } > > # odd case - cannot do pagination combo with AR & Ferret > # must retrieve all then paginate after > if options[:per_page] && has_conditions > late_paginate = true > offset = find_options.delete(:offset) > limit = find_options.delete(:limit) > options.delete(:page) > options.delete(:per_page) > find_options.delete(:offset) > find_options.delete(:limit) > options[:limit] = :all > end > > total_hits = find_id_by_contents(q, options) do |model, id, score, > data| > # stores ids, index and score of each hit for later ordering of > # results > result_ids[id] = [ result_ids.size + 1, score ] > end > > result = retrieve_records( { self.name => result_ids }, find_options > ) > > if has_conditions > # what we got from the database is our full result set, so take > it's size > total_hits = result.length > > if late_paginate > result = result[offset..offset+limit-1] > end > end > > [ total_hits, result ] > end > > > Jeff > > > > > On Fri, Jul 18, 2008 at 4:11 AM, Max Williams > wrote: > >> Hi Jens/all >> >> I've been playing with latest version, and it seems that the combination >> of AR conditions, paginating and AR order (eg ":order => "name") seems to be >> fine now, even on our server, which was were it was broken before (it always >> worked locally, weirdly). >> >> However, i've just tried it with results that are sorted by ferret on the >> basis of their boost score, and it seems broken. >> >> I'm sorry i don't have any tests set up but if i walk through an example >> can anyone see something that maybe i'm doing wrong? Or shed any light? >> >> I use AR conditions to filter the results through a set of ids that an >> individual user is allowed to view. I just have a method User#allowed_ids >> for this that returns an array of integers. For the purpose of >> illustration, though, let's say that the allowed ids are everything between >> 1000 & 2000. >> allowed_ids = (1000..2000).to_a >> => [1000, 1001, 1002, etc, 2000] >> >> For a particular search (on 'rhythm') i get these results back - the order >> is the order calculated by ferret/aaf on the basis of boost values: i've >> collected them by id for clarity. In this example i've set a big per_page >> to get all the results. >> >> >> ActsAsFerret::find("rhythm", [TeachingObject],{ :page => 1, :per_page >> => 1000 }, {} ).collect(&:id) >> => [4038, 698, 4039, 1830, 1831, 1826, 1832, 1825, 1833, 411, 1834, 702, >> 1827, 1689, 1680, 1688, 1679, 1686, 1684, 1676, 2129, 2130, 2131, 1858, >> 1859, 1860, 1861, 1865, 2132, 2141, 2345, 2350, 2352, 2353, 2356, 2360, >> 2362, 2366, 2368, 2371, 2372, 2373, 2376, 2377, 2378, 2384, 2385, 2389, >> 2407, 2414, 2417, 2419, 2427, 2428, 2438, 2439, 2458, 2459, 2473, 2474, >> 2475, 2477, 2478, 2133, 2349, 2361, 2363, 2365, 2370, 2375, 2383, 2386, >> 2392, 2415, 2430, 2431, 2440, 2441, 2442, 2472, 2142, 3751, 2161, 1236] >> >> Now, when i apply the condition of only being in the allowed ids, i'd >> expect the remaining ids to be in the same order as above, and that is in >> fact the case: >> >> >> ActsAsFerret::find("rhythm", [TeachingObject],{ :page => 1, :per_page >> => 1000 }, {:conditions => ["id in (?)", allowed_ids] } ).collect(&:id) >> => [1830, 1831, 1826, 1832, 1825, 1833, 1834, 1827, 1689, 1680, 1688, >> 1679, 1686, 1684, 1676, 1858, 1859, 1860, 1861, 1865, 1236] >> >> Just to check, let's do the original again (with no AR conditions) and >> just do set intersection with the allowed_ids >> >> ActsAsFerret::find("rhythm", [TeachingObject],{ :page => 1, :per_page >> => 1000 }, {} ).collect(&:id) & allowed_ids >> => [1830, 1831, 1826, 1832, 1825, 1833, 1834, 1827, 1689, 1680, 1688, >> 1679, 1686, 1684, 1676, 1858, 1859, 1860, 1861, 1865, 1236] >> >> OK - looks good. But, when i try to paginate into some actual pages, the >> order breaks: first, without AR conditions: >> >> >> ActsAsFerret::find("rhythm", [TeachingObject],{ :page => 1, :per_page >> => 10 }, {} ).collect(&:id) >> => [4038, 698, 4039, 1830, 1831, 1826, 1832, 1825, 1833, 411] >> >> That all seems to be in order - i get the first ten results from the big >> list, above. >> >> Now, if we were to apply the allowed_ids condition here, we'd expect the >> results to start >> [1830, 1831, 1826, 1832, 1825, 1833, ...] - right? Because we should have >> the same ordering applied to the remaining resources, and then we get the >> first ten for page 1. But, the ordering is different - >> >> >> ActsAsFerret::find("rhythm", [TeachingObject],{ :page => 1, :per_page >> => 10 }, {:conditions => ["id in (?)", allowed_ids] } ).collect(&:id) >> => [1826, 1825, 1689, 1680, 1688, 1679, 1686, 1684, 1676, 1236] >> >> So, it seems that pagination + ferret score ordering + AR conditions is a >> bad combination >> >> Again, sorry to not supply test cases but we don't use them (i know, i >> know!!!). Can anyone shed any light? >> >> thanks - max >> >> 2008/7/9 Jens Kraemer : >> >> Hi folks, >>> >>> On 09.07.2008, at 06:01, Jeff Webb wrote: >>> >>>> Sheldon/Jens, >>>> >>>> Any chance that this has been figured out and patched? >>>> >>> >>> I just committed a fix to this problem to trunk, so pagination with >>> sorting and AR conditions should work now. >>> >>> >>> cheers, >>> Jens >>> >>> -- >>> Jens Kr?mer >>> http://www.jkraemer.net/ - Blog >>> http://www.omdb.org/ - The new free film database >>> >>> >>> >>> >>> >>> >>> >>> >>> _______________________________________________ >>> Ferret-talk mailing list >>> Ferret-talk at rubyforge.org >>> http://rubyforge.org/mailman/listinfo/ferret-talk >>> >> >> >> _______________________________________________ >> Ferret-talk mailing list >> Ferret-talk at rubyforge.org >> http://rubyforge.org/mailman/listinfo/ferret-talk >> > > > > -- > > Jeff Webb > jeff at boowebb.com > http://boowebb.com/ > > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > -------------- next part -------------- An HTML attachment was scrubbed... URL: From john at johnleach.co.uk Mon Jul 28 06:55:11 2008 From: john at johnleach.co.uk (John Leach) Date: Mon, 28 Jul 2008 11:55:11 +0100 Subject: [Ferret-talk] acts_as_ferret server_manager fails under Capistrano Message-ID: <1217242511.27100.13.camel@dogen.thepride.> Hi, in lib/server_manager.rb we have lines 38 and 39: #require(File.join(File.dirname(__FILE__), '../../../../config/environment')) require(File.join(File.dirname(ENV['_']), '../config/environment')) This works fine if I run it in a normal shell but borks when the server is run using Capistrano: ** [out :: server-001.vm.brightbox.net] no such file to load -- /bin/../config/environment Confirmed here by someone else too: http://www.zorched.net/2008/06/19/capistrano-and-ferret-drb/ Any idea why ?ENV['_'] is being used here rather than ?__FILE__ ? Looks a bit like it's supposed to work better when aaf is installed as a gem, rather than as a plugin in the vendor dir (as I'm using it here). Actually, it looks like this is good behaviour by aaf and bad behaviour by Capistrano, but was wondering if anyone can shed any light on it. Thanks, John. -- http://johnleach.co.uk From henke at mac.se Mon Jul 28 07:29:57 2008 From: henke at mac.se (Henrik) Date: Mon, 28 Jul 2008 13:29:57 +0200 Subject: [Ferret-talk] Grouping with ferret Message-ID: Hi list, I have a problem grouping with ferret. I'm using the filter_proc from Dave's book as seen below results = {} group_by_proc = lambda do |doc_id, score, searcher| doc = searcher[doc_id] (results[doc[:pk_file_id]]||=[]) << doc[:filename] << doc[:path] next true end The problem is that if I use this it ignores my limit clause. I set limit on 10 and I still get 5995 results and it takes several seconds. How come the limit clause is ignored when using a filter_proc? How can I change this behaviour? term = "wi" bool1 = Ferret::Search::BooleanQuery.new() bool1.add_query(Ferret::Search::PrefixQuery.new(:filename, "#{term}")) bool1.add_query(Ferret::Search::PrefixQuery.new(:path, "#{term}")) index.search(bool1, :limit => 10, :filter_proc => group_by_proc) puts result.size 5995 Cheers, Henke From john at johnleach.co.uk Mon Jul 28 08:19:31 2008 From: john at johnleach.co.uk (John Leach) Date: Mon, 28 Jul 2008 13:19:31 +0100 Subject: [Ferret-talk] acts_as_ferret server_manager fails under Capistrano In-Reply-To: <1217242511.27100.13.camel@dogen.thepride.> References: <1217242511.27100.13.camel@dogen.thepride.> Message-ID: <1217247571.27100.27.camel@dogen.thepride.> Fixed! >From the bash man page: "When bash invokes an external command, the variable _ is set to the full file name of the command and passed to that command in its environment." cap> on server-009.vm.brightbox.net ruby -e "puts ENV['_']" ** [out :: server-009.vm.brightbox.net] /bin/sh ?Found a solution though. Setting the following option in my Capistrano recipe: default_run_options[:shell] = false gets me: ?cap> on server-009.vm.brightbox.net ruby -e "puts ENV['_']" ** [out :: server-009.vm.brightbox.net] /usr/bin/ruby Now the ferret_server runs properly. This change hasn't affected anything else for me - according to the Capistrano 2.1 announcement, if your default shell is POSIX compliant it should be fine (most cases I'd guess this is true, unless you have some weird setup). http://groups.google.com/group/capistrano/browse_thread/thread/531ad32aff5fe5a8 For reference, the code to use ENV['_'] was added here: http://projects.jkraemer.net/acts_as_ferret/ticket/185 Hope that helps someone. John. On Mon, 2008-07-28 at 11:55 +0100, John Leach wrote: > Hi, > > in lib/server_manager.rb we have lines 38 and 39: > > #require(File.join(File.dirname(__FILE__), '../../../../config/environment')) > require(File.join(File.dirname(ENV['_']), '../config/environment')) > > This works fine if I run it in a normal shell but borks when the server > is run using Capistrano: > > ** [out :: server-001.vm.brightbox.net] no such file to load -- /bin/../config/environment > > Confirmed here by someone else too: > > http://www.zorched.net/2008/06/19/capistrano-and-ferret-drb/ > > Any idea why ?ENV['_'] is being used here rather than ?__FILE__ ? Looks > a bit like it's supposed to work better when aaf is installed as a gem, > rather than as a plugin in the vendor dir (as I'm using it here). > > Actually, it looks like this is good behaviour by aaf and bad behaviour > by Capistrano, but was wondering if anyone can shed any light on it. > > Thanks, > > John. > -- -- http://johnleach.co.uk From julioody at gmail.com Mon Jul 28 23:16:23 2008 From: julioody at gmail.com (Julio Cesar Ody) Date: Tue, 29 Jul 2008 13:16:23 +1000 Subject: [Ferret-talk] acts_as_ferret server_manager fails under Capistrano In-Reply-To: <1217247571.27100.27.camel@dogen.thepride.> References: <1217242511.27100.13.camel@dogen.thepride.> <1217247571.27100.27.camel@dogen.thepride.> Message-ID: I've came across that same issue a few times. I solved it by uncommenting line #38 and commenting #39. On Mon, Jul 28, 2008 at 10:19 PM, John Leach wrote: > Fixed! > > >From the bash man page: > > "When bash invokes an external command, the variable _ is set to the > full file name of the command and passed to that command in its > environment." > > cap> on server-009.vm.brightbox.net ruby -e "puts ENV['_']" > ** [out :: server-009.vm.brightbox.net] /bin/sh > > ?Found a solution though. Setting the following option in my Capistrano > recipe: > > default_run_options[:shell] = false > > gets me: > > ?cap> on server-009.vm.brightbox.net ruby -e "puts ENV['_']" > ** [out :: server-009.vm.brightbox.net] /usr/bin/ruby > > Now the ferret_server runs properly. This change hasn't affected > anything else for me - according to the Capistrano 2.1 announcement, if > your default shell is POSIX compliant it should be fine (most cases I'd > guess this is true, unless you have some weird setup). > > http://groups.google.com/group/capistrano/browse_thread/thread/531ad32aff5fe5a8 > > For reference, the code to use ENV['_'] was added here: > > http://projects.jkraemer.net/acts_as_ferret/ticket/185 > > Hope that helps someone. > > John. > > On Mon, 2008-07-28 at 11:55 +0100, John Leach wrote: >> Hi, >> >> in lib/server_manager.rb we have lines 38 and 39: >> >> #require(File.join(File.dirname(__FILE__), '../../../../config/environment')) >> require(File.join(File.dirname(ENV['_']), '../config/environment')) >> >> This works fine if I run it in a normal shell but borks when the server >> is run using Capistrano: >> >> ** [out :: server-001.vm.brightbox.net] no such file to load -- /bin/../config/environment >> >> Confirmed here by someone else too: >> >> http://www.zorched.net/2008/06/19/capistrano-and-ferret-drb/ >> >> Any idea why ?ENV['_'] is being used here rather than ?__FILE__ ? Looks >> a bit like it's supposed to work better when aaf is installed as a gem, >> rather than as a plugin in the vendor dir (as I'm using it here). >> >> Actually, it looks like this is good behaviour by aaf and bad behaviour >> by Capistrano, but was wondering if anyone can shed any light on it. >> >> Thanks, >> >> John. >> -- > > -- > http://johnleach.co.uk > > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk From jeremy at hinegardner.org Tue Jul 29 00:08:47 2008 From: jeremy at hinegardner.org (Jeremy Hinegardner) Date: Mon, 28 Jul 2008 22:08:47 -0600 Subject: [Ferret-talk] trac and svn are not responding Message-ID: <20080729040847.GG12417@hinegardner.org> Looks like trac and SVN are down. - Trac reports 502 Bad Gateway - http://ferret.davebalmain.com/trac - SVN reports 'Connection refused' API Docs at http://ferret.davebalmain.com/api/files/README.html are still available though. enjoy, -jeremy -- ======================================================================== Jeremy Hinegardner jeremy at hinegardner.org From jk at jkraemer.net Tue Jul 29 05:04:41 2008 From: jk at jkraemer.net (Jens Kraemer) Date: Tue, 29 Jul 2008 11:04:41 +0200 Subject: [Ferret-talk] Grouping with ferret In-Reply-To: References: Message-ID: Hi! On 28.07.2008, at 13:29, Henrik wrote: > Hi list, > > I have a problem grouping with ferret. > > I'm using the filter_proc from Dave's book as seen below > > results = {} > group_by_proc = lambda do |doc_id, score, searcher| > doc = searcher[doc_id] > (results[doc[:pk_file_id]]||=[]) << doc[:filename] << doc[:path] > next true > end > > > The problem is that if I use this it ignores my limit clause. > > I set limit on 10 and I still get 5995 results and it takes several > seconds. > > How come the limit clause is ignored when using a filter_proc? How > can I change this behaviour? Filters are applied by Ferret before the result is limited, that's why your filter gets to see all possible results regardless of the limit you specify. If it was implemented the other way around, first limiting and then filtering, you would possibly end up with less than limit results in case your filter would actually filter out any results. Of course in your case this wouldnt happen as your filter does no filtering but always returns true. If you really only want the first 10 results, why dont you just use the results you get back and do your result collecting there like this? results = {} hit_count = index.search_each(query, :limit => 10, :filter_proc => group_by_proc) do |doc, score| (results[doc[:pk_file_id]]||=[]) << doc[:filename] << doc[:path] end You could of course also return false in your filter_proc for every possible hit once your results collection has reached the desired size to save the time collecting all results. cheers, Jens -- Jens Kr?mer Finkenlust 14, 06449 Aschersleben, Germany VAT Id DE251962952 http://www.jkraemer.net/ - Blog http://www.omdb.org/ - The new free film database From durante.dev at mac.com Thu Jul 31 18:47:32 2008 From: durante.dev at mac.com (E Durante) Date: Thu, 31 Jul 2008 15:47:32 -0700 Subject: [Ferret-talk] Monitor ferret Drb with launchd on OS X Message-ID: <2B7976A7-DE9D-4475-B519-70E17B9D4655@mac.com> Has anyone tried monitoring the ferret Drb server with launchd (OS X)? Does anyone know if there is any limitation that would prevent ferret from being monitored by launchd? I have tried many different plists and none work. One starts too many ferret servers and one cannot start any server. If anyone has experience with this, any information would be most appreciated. Thanks.