From john at digitalpulp.com Wed Jan 2 14:30:23 2008 From: john at digitalpulp.com (John Bachir) Date: Wed, 2 Jan 2008 14:30:23 -0500 Subject: [Ferret-talk] utility of default_field Message-ID: The documentation* states that when using a single index for multiple models, the default_field list should be set to the same thing for all models. However, in my application, all my models have very different fields and this is not possible. I still want the results returned sorted by term frequency across all indexed content in each model. What is the purpose of default_field? Under what multi-model circumstance, if any, is it not necessary to use it? Thanks, John *http://projects.jkraemer.net/rdoc/acts_as_ferret/classes/ ActsAsFerret/ActMethods.html#M000009 From jk at jkraemer.net Thu Jan 3 10:38:59 2008 From: jk at jkraemer.net (Jens Kraemer) Date: Thu, 3 Jan 2008 16:38:59 +0100 Subject: [Ferret-talk] utility of default_field In-Reply-To: References: Message-ID: <20080103153859.GT26886@thunder.jkraemer.net> Hi! On Wed, Jan 02, 2008 at 02:30:23PM -0500, John Bachir wrote: > The documentation* states that when using a single index for multiple > models, the default_field list should be set to the same thing for > all models. > > However, in my application, all my models have very different fields > and this is not possible. I still want the results returned sorted by > term frequency across all indexed content in each model. Short answer: It's safe for you to specify the same large :default_field list containing fields from all models in all your acts_as_ferret calls. aaf doesn't use this list but only hands it through to Ferret's query parser which uses it to expand queries that have no fields specified. > What is the purpose of default_field? Under what multi-model > circumstance, if any, is it not necessary to use it? Long answer: The default_field option determines which fields Ferret will search for when there is no explicit field specified in a query. Suppose your index has the fields :id and :text (with id being untokenized). With an empty default_field value (or '*', which means the same), and a :or_default value of false (as aaf sets it) you get parsed queries like this: 'tree' --> 'id:tree text:tree' 'some tree' (meaning some AND tree because or_default == false) --> '+(id:some) +(id:tree text:tree)' With 'some' being a stop word, one would expect the second query to yield the same result as the first one, but since the query is run against all fields, including :id, which is untokenized and therefore has no analyzer, we end up querying our id field with a required term query and get no result at all. I remember there has been some debate about this topic a year ago or so, and in theory it would be possible for Ferret to parse queries the other way around to work around this issue, but afair Dave brought up some good reasons to leave it as it is. The solution is to tell Ferret which fields to search when no fields are specified for a query (or part of a query) with the :default_field option. Usually aaf does this automatically by collecting all tokenized fields from the model. Now with a shared index there are n models but one index, so here we need to have a joint list of all tokenized fields across all these models for the :default_field parameter. Since aaf is called in every single model, I didn't find an easy way to build this list automatically and decided to leave it up to the user to specify this list in the acts_as_ferret calls of every model. Not really DRY indeed. Patches welcome ;-) Here's a small script reproducing the issue: http://pastie.caboo.se/134443 So to summarize: You need to specify :default_field if you're using :single_index => true in combination with :or_default => false (aaf default) and you have queries that may contain stop words and that are not constrained to a list of fields specified in the query string. Cheers, Jens -- Jens Kr?mer http://www.jkraemer.net/ - Blog http://www.omdb.org/ - The new free film database From john at digitalpulp.com Thu Jan 3 11:06:24 2008 From: john at digitalpulp.com (John Bachir) Date: Thu, 3 Jan 2008 11:06:24 -0500 Subject: [Ferret-talk] utility of default_field In-Reply-To: <20080103153859.GT26886@thunder.jkraemer.net> References: <20080103153859.GT26886@thunder.jkraemer.net> Message-ID: <0471BFAB-F709-4817-B4A2-17AE2EF5ED6C@digitalpulp.com> On Jan 3, 2008, at 10:38 AM, Jens Kraemer wrote: > You need to specify :default_field if you're using :single_index => > true > in combination with :or_default => false (aaf default) and you have > queries that may contain stop words and that are not constrained to a > list of fields specified in the query string. Thank you Jens for your elaborate response. Our code removes stop words from all queries before sending them to AAF. In this case, would the lack of setting default_field ever be a problem? Perhaps this is why we have not seen problems even though we have never set default_field. Cheers, John From jk at jkraemer.net Thu Jan 3 13:27:51 2008 From: jk at jkraemer.net (Jens Kraemer) Date: Thu, 3 Jan 2008 19:27:51 +0100 Subject: [Ferret-talk] utility of default_field In-Reply-To: <0471BFAB-F709-4817-B4A2-17AE2EF5ED6C@digitalpulp.com> References: <20080103153859.GT26886@thunder.jkraemer.net> <0471BFAB-F709-4817-B4A2-17AE2EF5ED6C@digitalpulp.com> Message-ID: <20080103182751.GW26886@thunder.jkraemer.net> On Thu, Jan 03, 2008 at 11:06:24AM -0500, John Bachir wrote: > > On Jan 3, 2008, at 10:38 AM, Jens Kraemer wrote: > > > You need to specify :default_field if you're using :single_index => > > true > > in combination with :or_default => false (aaf default) and you have > > queries that may contain stop words and that are not constrained to a > > list of fields specified in the query string. > > Thank you Jens for your elaborate response. > > Our code removes stop words from all queries before sending them to > AAF. In this case, would the lack of setting default_field ever be a > problem? Perhaps this is why we have not seen problems even though we > have never set default_field. exactly, in this case you shouldn't have any problems. Cheers, Jens -- Jens Kr?mer http://www.jkraemer.net/ - Blog http://www.omdb.org/ - The new free film database From imotic at gmail.com Thu Jan 3 14:53:48 2008 From: imotic at gmail.com (Liam Morley) Date: Thu, 3 Jan 2008 20:53:48 +0100 Subject: [Ferret-talk] Problem (with fix?) for passing :multi => [ ] In-Reply-To: References: Message-ID: <12529fbe5d5b8377406e8cef15e68c85@ruby-forum.com> Max, seems to me that your code would help to make things more robust, if it's a public API then it should stand up to a little abuse in ways not intended. maybe more of an enhancement than a bug fix, but a fix just the same. Max Williams wrote: > I'm using acts_as_ferret, passing through multiple classes using :multi > and also some conditions. The way i'm doing this, which seems a bit > dirty (but it's the only way i can work out) is to call find_by_contents > on the first element in an array of classes (called 'classes' here), and > then point :multi at the rest of the array of classes: > > @search_results = classes.first.find_by_contents(@search.term, > #(ferret) options > {:page => (params[:page] || 1), > :per_page => 10, > :multi => classes.slice(1, classes.length) }, > #find_options > {:conditions => ["id = ?", 99]} > ) > > However, if classes has only one class in it, then :multi points to an > empty array, which seems to confuse the find_by_contents method - for > example, when doing a search on a single class, with a condition that > should return 1 result, 'total_hits' equals 1 but @search_results is > empty. > > Following the code through method calls in the class_methods file in > acts_as_ferret, it seems that having a value for :multi but not having > any classes in it causes some confusion. > > To get around this, i added the following line at the start of the > find_with_ferret method (which is an alias for find_by_contents) - > > options.delete :multi if options[:multi] == [] > > This removes the confusion and everything's fine as far as i can tell. > My question is this - should what i've done be considered a bug fix for > find_with_ferret, or is the fault mine for sometimes pointing :multi at > an empty array in my call? -- Posted via http://www.ruby-forum.com/. From imotic at gmail.com Thu Jan 3 15:35:53 2008 From: imotic at gmail.com (Liam Morley) Date: Thu, 3 Jan 2008 15:35:53 -0500 Subject: [Ferret-talk] properly escaping special characters in AAF? Message-ID: <3f4e3f140801031235n281a2f91g4c4052f42dff6a9d@mail.gmail.com> For most cases, I've got search working in Rails as follows: ## controller: term = params[:search][:term] @results = MyModel.find_by_contents "#{term}*" The '*' character is appended to the search term so that searches match anything that begins with 'term'. For the most part, this is great, but let's say term is equal to "Title: Some subtitle". This will match anything that has a 'title' attribute equal to "some subtitle", instead of any attribute equal to "Title: Some subtitle", which is what I'm hoping for. If I run my search from within a double-quotes expression, like MyModel.find_by_contents "'\"#{term}*\"'", then it looks like I can get matches for "Title: Some subtitle", but I can't get matches if I search for "Titl" without the 'e', presumably because the '*' is escaped as well? I'm not quite sure. I want something that works in all cases, where I can include a search term that has a special character, but still get matches when my search term isn't equal to an entire word. I'm hoping that my situation is a typical one, and that someone out there has already dealt with this? Thanks very much for any advice. Liam Morley -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20080103/d3c3112e/attachment.html From julioody at gmail.com Thu Jan 3 18:11:57 2008 From: julioody at gmail.com (Julio Cesar Ody) Date: Fri, 4 Jan 2008 10:11:57 +1100 Subject: [Ferret-talk] properly escaping special characters in AAF? In-Reply-To: <3f4e3f140801031235n281a2f91g4c4052f42dff6a9d@mail.gmail.com> References: <3f4e3f140801031235n281a2f91g4c4052f42dff6a9d@mail.gmail.com> Message-ID: Hey, these are two separate problems in fact. Let me try to explain. Throwing a query like "title: foo bar" straight into Ferret, when that gets turned into FQL, it becomes basically "look for 'foo bar' in the 'title' field", as you figured. Quoting the whole lot will throw the query towards the index, and the default_field value will be used to decide which fields should be queried for "title: foo bar". Since it defaults to "*", it'll query all fields in every document. Now the reason why a query like "Title: foo bar" won't match any results with "title" in it is, simply put, the analyzer you're using. If you're using the StandardAnalyzer (if you didn't specify otherwise, then that's what you're using), the behavior you can expect is it will catch whole words, separated by spaces, minus stop words (or, and, by, etc...). So "titl" will never match "title". If you're looking for something that gives you half-string matches, I'd go for a RegexpAnalyzer and use a regex like "*", which would turn every character into a token. This is a bit nightmarish because you'll get an insane number of matches for everything, but right now I can't think of a better way (maybe declare a mininum number of chars for a query and filter out results with very low score?). Or if you're looking for stemming (query for "titles", "titling" returning results with "title"), have a look at http://rubyforge.org/pipermail/ferret-talk/2007-March/002782.html Hope that helps. On Jan 4, 2008 7:35 AM, Liam Morley wrote: > For most cases, I've got search working in Rails as follows: > ## controller: > term = params[:search][:term] > @results = MyModel.find_by_contents "#{term}*" > > The '*' character is appended to the search term so that searches match > anything that begins with 'term'. For the most part, this is great, but > let's say term is equal to "Title: Some subtitle". This will match anything > that has a 'title' attribute equal to "some subtitle", instead of any > attribute equal to "Title: Some subtitle", which is what I'm hoping for. > > If I run my search from within a double-quotes expression, like > MyModel.find_by_contents "'\"#{term}*\"'", then it looks like I can get > matches for "Title: Some subtitle", but I can't get matches if I search for > "Titl" without the 'e', presumably because the '*' is escaped as well? I'm > not quite sure. > > I want something that works in all cases, where I can include a search term > that has a special character, but still get matches when my search term > isn't equal to an entire word. I'm hoping that my situation is a typical > one, and that someone out there has already dealt with this? Thanks very > much for any advice. > > Liam Morley > > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > From ndaniels at mac.com Thu Jan 3 20:36:31 2008 From: ndaniels at mac.com (Noah M. Daniels) Date: Thu, 3 Jan 2008 20:36:31 -0500 Subject: [Ferret-talk] Strange wildcard problem In-Reply-To: <355290B8-BD7C-45F3-A1FC-CE6D15ABDBD1@mac.com> References: <42B8371C-25DE-4707-926A-FC7431F40B2C@mac.com> <20071106090616.GD30619@cordoba.webit.de> <7435d282c6ea369eacfd1c1775bc341d@ruby-forum.com> <20071106163554.GD2040@cordoba.webit.de> <355290B8-BD7C-45F3-A1FC-CE6D15ABDBD1@mac.com> Message-ID: <9CD4D929-EEBC-4059-8ACB-0A7694F82435@mac.com> Just a 'ping' since I still haven't been able to solve this without doing what I don't want to do (putting this into my local copy of ferret itself) Setting this in init.rb in the acts_as_ferret plugin does nothing. Does anyone have a suggestion for where it would work? thanks! On Nov 6, 2007, at 11:41 AM, Noah M. Daniels wrote: > Unfortunately, it doesn't seem to. For a local index, I can just put > this anywhere in code (even in a controller, or in the console) and > I start getting correct results from my query: > > Ferret::Search::MultiTermQuery.default_max_terms = 5000 > > but on my staging server, where a drb ferret server is used, putting > that line in the init.rb doesn't seem to do anything -- in fact, > even putting it into the initialize method of the LocalIndex class > doesn't help! Any ideas? > > thanks! From lebreeze at gmail.com Fri Jan 4 11:50:00 2008 From: lebreeze at gmail.com (Levent Ali) Date: Fri, 4 Jan 2008 17:50:00 +0100 Subject: [Ferret-talk] Records not in search results until I recreate record index Message-ID: our object A has three properties title - a straight db field, open_for_search - a method which returns yes or no site_search - which is a list of codes based on a has_many relationship so our api accepts a title and a list of sites creates object A with object = Object.new object.title = thetitle sites = Site.find_all_by_whatever([whatever]) object.save object.sites << sites Then because sites were added after the object was created we do object.ferret_create If we do Object.find_by_contents('title:thetitle AND site_search:sitecode') we get nothing If we do Object.find_by_contents('title:thetitle') we get the record and if we do Object.find_by_contents('title:thetitle').first.to_doc the site_search field is populated correctly in ferret We have to now manually do ferret_create again from script/console before the query returns the results... Any ideas? -- Posted via http://www.ruby-forum.com/. From john at digitalpulp.com Fri Jan 4 16:59:09 2008 From: john at digitalpulp.com (John Bachir) Date: Fri, 4 Jan 2008 16:59:09 -0500 Subject: [Ferret-talk] utility of default_field In-Reply-To: <20080103153859.GT26886@thunder.jkraemer.net> References: <20080103153859.GT26886@thunder.jkraemer.net> Message-ID: <7AC39732-7C19-4B4E-8C10-293A6A704044@digitalpulp.com> i added your comments to the wiki: http://projects.jkraemer.net/acts_as_ferret/wiki/AdvancedUsage? action=diff&version=11 On Jan 3, 2008, at 10:38 AM, Jens Kraemer wrote: > Hi! > > On Wed, Jan 02, 2008 at 02:30:23PM -0500, John Bachir wrote: >> The documentation* states that when using a single index for multiple >> models, the default_field list should be set to the same thing for >> all models. >> >> However, in my application, all my models have very different fields >> and this is not possible. I still want the results returned sorted by >> term frequency across all indexed content in each model. > > Short answer: > > It's safe for you to specify the same large :default_field list > containing > fields from all models in all your acts_as_ferret calls. aaf > doesn't use > this list but only hands it through to Ferret's query parser which > uses > it to expand queries that have no fields specified. > >> What is the purpose of default_field? Under what multi-model >> circumstance, if any, is it not necessary to use it? > > Long answer: > > The default_field option determines which fields Ferret will search > for > when there is no explicit field specified in a query. > > Suppose your index has the fields :id and :text (with id being > untokenized). With an empty default_field value (or '*', which > means the > same), and a :or_default value of false (as aaf sets it) you get > parsed > queries like this: > > 'tree' > --> 'id:tree text:tree' > > 'some tree' (meaning some AND tree because or_default == false) > --> '+(id:some) +(id:tree text:tree)' > > With 'some' being a stop word, one would expect the second query to > yield the same result as the first one, but since the query is run > against all fields, including :id, which is untokenized and therefore > has no analyzer, we end up querying our id field with a required term > query and get no result at all. > > I remember there has been some debate about this topic a year ago > or so, > and in theory it would be possible for Ferret to parse queries the > other way > around to work around this issue, but afair Dave brought up some good > reasons to leave it as it is. > > The solution is to tell Ferret which fields to search when no > fields are > specified for a query (or part of a query) with the :default_field > option. Usually aaf does this automatically by collecting all > tokenized > fields from the model. Now with a shared index there are n models but > one index, so here we need to have a joint list of all tokenized > fields > across all these models for the :default_field parameter. > > Since aaf is called in every single model, I didn't find an easy > way to > build this list automatically and decided to leave it up to the > user to > specify this list in the acts_as_ferret calls of every model. Not > really > DRY indeed. Patches welcome ;-) > > Here's a small script reproducing the issue: > http://pastie.caboo.se/134443 > > So to summarize: > > You need to specify :default_field if you're using :single_index => > true > in combination with :or_default => false (aaf default) and you have > queries that may contain stop words and that are not constrained to a > list of fields specified in the query string. > > > Cheers, > Jens > > > > > -- > Jens Kr?mer > http://www.jkraemer.net/ - Blog > http://www.omdb.org/ - The new free film database > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk From lars.heese at googlemail.com Sun Jan 6 13:16:48 2008 From: lars.heese at googlemail.com (Lars Heese) Date: Sun, 6 Jan 2008 19:16:48 +0100 Subject: [Ferret-talk] Did you mean ...? with act_as_ferret Message-ID: Hello, does anybody know how to implement a "Did you mean ...?" like Google with act_as_ferret? I think this is a possible way: 1. Generate a keyword-list (this is my difficulty. I don't know how to build such a list from the index) with no stop-words from the first index. e. g. (car, ship, plant, house) 2. Build a second index from this word-list where we store the word in the index. 3. Make a Fuzzy-Search over the new list, e. g. "pland" 4. Fetch the stored keyword => plant, now you can write "Did you mean 'plant'?" 5. Make a sharp search with "plant" on the first index. How can I generate a word-list from the first (standard) index? Best greetings Lars -- Posted via http://www.ruby-forum.com/. From jk at jkraemer.net Sun Jan 6 15:53:18 2008 From: jk at jkraemer.net (Jens Kraemer) Date: Sun, 6 Jan 2008 21:53:18 +0100 Subject: [Ferret-talk] Did you mean ...? with act_as_ferret In-Reply-To: References: Message-ID: <20080106205317.GC16797@thunder.jkraemer.net> Hi! On Sun, Jan 06, 2008 at 07:16:48PM +0100, Lars Heese wrote: > Hello, > > does anybody know how to implement a "Did you mean ...?" like Google > with act_as_ferret? > > I think this is a possible way: > > 1. Generate a keyword-list (this is my difficulty. I don't know how to > build such a list from the index) with no stop-words from the first > index. > e. g. (car, ship, plant, house) > > 2. Build a second index from this word-list where we store the word in > the index. > > 3. Make a Fuzzy-Search over the new list, e. g. "pland" > > 4. Fetch the stored keyword => plant, now you can write "Did you mean > 'plant'?" > > 5. Make a sharp search with "plant" on the first index. > > How can I generate a word-list from the first (standard) index? TermEnum (http://ferret.davebalmain.com/api/classes/Ferret/Index/TermEnum.html) might help here. You can't do this through acts_as_ferret, instead I'd suggest you create a little script outside your application which rebuilds the word-list index from the real index by using Ferret directly to access the index. cheers, Jens -- Jens Kr?mer http://www.jkraemer.net/ - Blog http://www.omdb.org/ - The new free film database From john at johnleach.co.uk Sun Jan 6 15:56:10 2008 From: john at johnleach.co.uk (John Leach) Date: Sun, 06 Jan 2008 20:56:10 +0000 Subject: [Ferret-talk] Did you mean ...? with act_as_ferret In-Reply-To: References: Message-ID: <1199652970.7756.70.camel@dogen.thepride.> On Sun, 2008-01-06 at 19:16 +0100, Lars Heese wrote: > Hello, > > does anybody know how to implement a "Did you mean ...?" like Google > with act_as_ferret? > > I think this is a possible way: Hi Lars, I did a similar thing in a project except the only things I wanted to suggest were pre-defined tag names in a table. So I just indexed that and did a fuzzy search on it. But anyway, you can enumerate all the terms in an index for a given field using the terms method of the IndexReader instance for the index: http://ferret.davebalmain.com/api/classes/Ferret/Index/TermEnum.html Then generate your other index from that. You can store the words directly in the Ferret index to avoid the unnecessary overhead of an SQL lookup. With a small list of terms, I'm not sure what the overhead of Ferret would be here though. Might be worth experimenting with some alternatives, like maybe generating an index yourself as an array directly in Ruby. See the Text library for Metaphone and Soundex algorithms: http://text.rubyforge.org/ Ferret will probably be best though tbh. John. -- http://www.brightbox.co.uk - UK Ruby on Rails hosting From julioody at gmail.com Sun Jan 6 17:03:20 2008 From: julioody at gmail.com (Julio Cesar Ody) Date: Mon, 7 Jan 2008 09:03:20 +1100 Subject: [Ferret-talk] Did you mean ...? with act_as_ferret In-Reply-To: References: Message-ID: As a suggestion: why not build that with a spelling checker instead of Ferret? I believe there's free services around for that, not to mention aspell on the console if you're running this on a *nix. You could then rebuild the sentence with the output of the spelling checker, and build a link with it that you can present to the user. Strikes me as easier than building the words index yourself. On Jan 7, 2008 5:16 AM, Lars Heese wrote: > Hello, > > does anybody know how to implement a "Did you mean ...?" like Google > with act_as_ferret? > > I think this is a possible way: > > 1. Generate a keyword-list (this is my difficulty. I don't know how to > build such a list from the index) with no stop-words from the first > index. > e. g. (car, ship, plant, house) > > 2. Build a second index from this word-list where we store the word in > the index. > > 3. Make a Fuzzy-Search over the new list, e. g. "pland" > > 4. Fetch the stored keyword => plant, now you can write "Did you mean > 'plant'?" > > 5. Make a sharp search with "plant" on the first index. > > How can I generate a word-list from the first (standard) index? > > Best greetings > > Lars > -- > Posted via http://www.ruby-forum.com/. > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > From me at benjaminarai.com Mon Jan 7 18:36:11 2008 From: me at benjaminarai.com (Benjamin Arai) Date: Mon, 7 Jan 2008 15:36:11 -0800 Subject: [Ferret-talk] Selecting maximum value from a field Message-ID: <83A0E2F2-05D0-46F6-B64B-D24061D4AC99@benjaminarai.com> Hello, I have a field which contains integer values. How do I obtain the maximum value of the column? Benjamin From julioody at gmail.com Mon Jan 7 18:57:49 2008 From: julioody at gmail.com (Julio Cesar Ody) Date: Tue, 8 Jan 2008 10:57:49 +1100 Subject: [Ferret-talk] Selecting maximum value from a field In-Reply-To: <83A0E2F2-05D0-46F6-B64B-D24061D4AC99@benjaminarai.com> References: <83A0E2F2-05D0-46F6-B64B-D24061D4AC99@benjaminarai.com> Message-ID: You sort by DESC, and fetch the first value of a query for "*"? =) Ferret doesn't have these SGDB-like features built into it, as far as I can remember from the docs. On Jan 8, 2008 10:36 AM, Benjamin Arai wrote: > Hello, > > I have a field which contains integer values. How do I obtain the > maximum value of the column? > > Benjamin > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > From ndaniels at mac.com Wed Jan 9 16:02:17 2008 From: ndaniels at mac.com (Noah M. Daniels) Date: Wed, 9 Jan 2008 16:02:17 -0500 Subject: [Ferret-talk] Parallel indexing doesn't work? Message-ID: <3AABF151-C414-4ECA-A011-00FE3D93F657@mac.com> Hi, I'm trying to get parallelized ferret indexing working for my AAF indices, based on the example in the O'Reilly Ferret shortcut. However, the resulting indices after merging seem to have no actual documents. I went and made minimal changes to the example in the Ferret shortcut pdf, and indeed can't get that to work either. I'd appreciate any help anyone can give! Thanks! The example is below: #!/usr/bin/env ruby require 'rubygems' require 'ferret' include Ferret::Index 5.times do |i| name = "index#{i}" puts name i = Ferret::I.new(:path => "/tmp/#{i}", :create => true) i << {:name => name} i.close end readers = [] readers << IndexReader.new("/tmp/0") readers << IndexReader.new("/tmp/1") readers << IndexReader.new("/tmp/2") readers << IndexReader.new("/tmp/3") readers << IndexReader.new("/tmp/4") index_writer = IndexWriter.new(:path => "/tmp/test") index_writer.add_readers(readers) index_writer.close() readers.each {|reader| reader.close()} i = Ferret::I.new(:path => '/tmp/test') res = i.search('name*') puts res.inspect # gives me: #> puts res.hits.size # gives me: 0 From jk at jkraemer.net Wed Jan 9 16:24:52 2008 From: jk at jkraemer.net (Jens Kraemer) Date: Wed, 9 Jan 2008 22:24:52 +0100 Subject: [Ferret-talk] Parallel indexing doesn't work? In-Reply-To: <3AABF151-C414-4ECA-A011-00FE3D93F657@mac.com> References: <3AABF151-C414-4ECA-A011-00FE3D93F657@mac.com> Message-ID: <20080109212452.GB14794@thunder.jkraemer.net> Hi! seems to me you're indexing strings starting with 'index' but you're searching for 'name'? Or maybe correcting this already was one of your minimal changes? If not, try changing that line: > res = i.search('name*') to > res = i.search('index*') cheers, Jens On Wed, Jan 09, 2008 at 04:02:17PM -0500, Noah M. Daniels wrote: > Hi, > > I'm trying to get parallelized ferret indexing working for my AAF > indices, based on the example in the O'Reilly Ferret shortcut. > However, the resulting indices after merging seem to have no actual > documents. > > I went and made minimal changes to the example in the Ferret shortcut > pdf, and indeed can't get that to work either. I'd appreciate any help > anyone can give! Thanks! > > The example is below: > > #!/usr/bin/env ruby > > require 'rubygems' > require 'ferret' > include Ferret::Index > > 5.times do |i| > name = "index#{i}" > puts name > i = Ferret::I.new(:path => "/tmp/#{i}", :create => true) > i << {:name => name} > i.close > end > readers = [] > readers << IndexReader.new("/tmp/0") > readers << IndexReader.new("/tmp/1") > readers << IndexReader.new("/tmp/2") > readers << IndexReader.new("/tmp/3") > readers << IndexReader.new("/tmp/4") > index_writer = IndexWriter.new(:path => "/tmp/test") > index_writer.add_readers(readers) > index_writer.close() > readers.each {|reader| reader.close()} > i = Ferret::I.new(:path => '/tmp/test') > res = i.search('name*') > puts res.inspect # gives me: # total_hits=0, hits=[], max_score=0.0, > searcher=#> > > puts res.hits.size # gives me: 0 > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > -- Jens Kr?mer http://www.jkraemer.net/ - Blog http://www.omdb.org/ - The new free film database From ndaniels at mac.com Wed Jan 9 16:37:45 2008 From: ndaniels at mac.com (Noah M. Daniels) Date: Wed, 9 Jan 2008 16:37:45 -0500 Subject: [Ferret-talk] Parallel indexing doesn't work? In-Reply-To: <20080109212452.GB14794@thunder.jkraemer.net> References: <3AABF151-C414-4ECA-A011-00FE3D93F657@mac.com> <20080109212452.GB14794@thunder.jkraemer.net> Message-ID: Thanks, Jens. Good catch; this little example works correctly after fixing that change. However, my ActsAsFerret index merging does _not_ work, and I'm wondering if it's something to do with AAF's handling of documents in an index? Let's call my indexed class Company... Company.find_by_contents('*') => # yet on each partial index prior to merging, that query would return a bunch of results as one would expect. now, here's how I've built that index... any idea why the merged index is broken? module FerretHelpers def merge_ferret_index_partitions(model) model_dir = File.basename(model.aaf_configuration[:ferret][:path]) final_index_path = "/tmp/merged_parallel_ferret_index/#{model_dir}" partial_index_path = "/tmp/partial_indices/#{model_dir}" paths = Dir.glob("#{partial_index_path}/*") paths.each do |path| i = Ferret::I.new(:path => path, :create => true) name = path.split('/').last i << {:name => name} i.close end readers = [] paths.each {|path| readers << IndexReader.new(path) } index_writer = IndexWriter.new(:path => final_index_path) index_writer.add_readers(readers) index_writer.close() readers.each {|reader| reader.close()} index = Ferret::Index::Index.new(:path => final_index_path) index.optimize index.close end end On Jan 9, 2008, at 4:24 PM, Jens Kraemer wrote: > Hi! > > seems to me you're indexing strings starting with 'index' but you're > searching for 'name'? Or maybe correcting this already was one of your > minimal changes? > > If not, try changing that line: >> res = i.search('name*') > to >> res = i.search('index*') > > cheers, > Jens > > On Wed, Jan 09, 2008 at 04:02:17PM -0500, Noah M. Daniels wrote: >> Hi, >> >> I'm trying to get parallelized ferret indexing working for my AAF >> indices, based on the example in the O'Reilly Ferret shortcut. >> However, the resulting indices after merging seem to have no actual >> documents. >> >> I went and made minimal changes to the example in the Ferret shortcut >> pdf, and indeed can't get that to work either. I'd appreciate any >> help >> anyone can give! Thanks! >> >> The example is below: >> >> #!/usr/bin/env ruby >> >> require 'rubygems' >> require 'ferret' >> include Ferret::Index >> >> 5.times do |i| >> name = "index#{i}" >> puts name >> i = Ferret::I.new(:path => "/tmp/#{i}", :create => true) >> i << {:name => name} >> i.close >> end >> readers = [] >> readers << IndexReader.new("/tmp/0") >> readers << IndexReader.new("/tmp/1") >> readers << IndexReader.new("/tmp/2") >> readers << IndexReader.new("/tmp/3") >> readers << IndexReader.new("/tmp/4") >> index_writer = IndexWriter.new(:path => "/tmp/test") >> index_writer.add_readers(readers) >> index_writer.close() >> readers.each {|reader| reader.close()} >> i = Ferret::I.new(:path => '/tmp/test') >> res = i.search('name*') >> puts res.inspect # gives me: #> total_hits=0, hits=[], max_score=0.0, >> searcher=#> >> >> puts res.hits.size # gives me: 0 >> _______________________________________________ >> Ferret-talk mailing list >> Ferret-talk at rubyforge.org >> http://rubyforge.org/mailman/listinfo/ferret-talk >> > > -- > Jens Kr?mer > http://www.jkraemer.net/ - Blog > http://www.omdb.org/ - The new free film database > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk From ndaniels at mac.com Wed Jan 9 16:58:20 2008 From: ndaniels at mac.com (Noah M. Daniels) Date: Wed, 9 Jan 2008 16:58:20 -0500 Subject: [Ferret-talk] Parallel indexing doesn't work? In-Reply-To: References: <3AABF151-C414-4ECA-A011-00FE3D93F657@mac.com> <20080109212452.GB14794@thunder.jkraemer.net> Message-ID: <2D8D2CBB-86C9-4C2A-88E1-9FBFB8352223@mac.com> Ok, further update -- there was an obvious and stupid bug in my code that was overwriting the partial indices. So now when that's fixed, I get the proper number of results for a search: Company.find_by_contents('*') => # however, why is @results empty? Similarly, find_id_by_contents also returns empty documents, it seems: >> Company.find_id_by_contents('*') => [247, [{:model=>"Company", :data=>{}, :score=>1.0, :id=>nil}, {:model=>"Company", :data=>{}, :score=>1.0, :id=>nil}, {:model=>"Company", :data=>{}, :score=>1.0, :id=>nil}, {:model=>"Company", :data=>{}, :score=>1.0, :id=>nil}, {:model=>"Company", :data=>{}, :score=>1.0, :id=>nil}, {:model=>"Company", :data=>{}, :score=>1.0, :id=>nil}, {:model=>"Company", :data=>{}, :score=>1.0, :id=>nil}, {:model=>"Company", :data=>{}, :score=>1.0, :id=>nil}, {:model=>"Company", :data=>{}, :score=>1.0, :id=>nil}, {:model=>"Company", :data=>{}, :score=>1.0, :id=>nil}]] when I would have expected: >> Company.find_id_by_contents('*') => [247, [{:model=>"Company", :score=>1.0, :id=>"189", :data=>{}}, {:model=>"Company", :score=>1.0, :id=>"2", :data=>{}}, {:model=>"Company", :score=>1.0, :id=>"192", :data=>{}}, {:model=>"Company", :score=>1.0, :id=>"4", :data=>{}}, {:model=>"Company", :score=>1.0, :id=>"6", :data=>{}}, {:model=>"Company", :score=>1.0, :id=>"7", :data=>{}}, {:model=>"Company", :score=>1.0, :id=>"8", :data=>{}}, {:model=>"Company", :score=>1.0, :id=>"37", :data=>{}}, {:model=>"Company", :score=>1.0, :id=>"13", :data=>{}}, {:model=>"Company", :score=>1.0, :id=>"21", :data=>{}}]] thanks for the help, and sorry for the silly previous bugs :) On Jan 9, 2008, at 4:37 PM, Noah M. Daniels wrote: > Thanks, Jens. Good catch; this little example works correctly after > fixing that change. > > However, my ActsAsFerret index merging does _not_ work, and I'm > wondering if it's something to do with AAF's handling of documents in > an index? > > Let's call my indexed class Company... > > > Company.find_by_contents('*') > => # @total_hits=3, @results=[], @total_pages=1, @per_page=3> > > > yet on each partial index prior to merging, that query would return a > bunch of results as one would expect. > > now, here's how I've built that index... any idea why the merged index > is broken? > > module FerretHelpers > def merge_ferret_index_partitions(model) > > model_dir = File.basename(model.aaf_configuration[:ferret][:path]) > > final_index_path = "/tmp/merged_parallel_ferret_index/ > #{model_dir}" > > partial_index_path = "/tmp/partial_indices/#{model_dir}" > > paths = Dir.glob("#{partial_index_path}/*") > > paths.each do |path| > i = Ferret::I.new(:path => path, :create => true) > name = path.split('/').last > i << {:name => name} > i.close > end > > readers = [] > paths.each {|path| readers << IndexReader.new(path) } > index_writer = IndexWriter.new(:path => final_index_path) > index_writer.add_readers(readers) > index_writer.close() > readers.each {|reader| reader.close()} > index = Ferret::Index::Index.new(:path => final_index_path) > index.optimize > index.close > > > end > end > > > On Jan 9, 2008, at 4:24 PM, Jens Kraemer wrote: > >> Hi! >> >> seems to me you're indexing strings starting with 'index' but you're >> searching for 'name'? Or maybe correcting this already was one of >> your >> minimal changes? >> >> If not, try changing that line: >>> res = i.search('name*') >> to >>> res = i.search('index*') >> >> cheers, >> Jens >> >> On Wed, Jan 09, 2008 at 04:02:17PM -0500, Noah M. Daniels wrote: >>> Hi, >>> >>> I'm trying to get parallelized ferret indexing working for my AAF >>> indices, based on the example in the O'Reilly Ferret shortcut. >>> However, the resulting indices after merging seem to have no actual >>> documents. >>> >>> I went and made minimal changes to the example in the Ferret >>> shortcut >>> pdf, and indeed can't get that to work either. I'd appreciate any >>> help >>> anyone can give! Thanks! >>> >>> The example is below: >>> >>> #!/usr/bin/env ruby >>> >>> require 'rubygems' >>> require 'ferret' >>> include Ferret::Index >>> >>> 5.times do |i| >>> name = "index#{i}" >>> puts name >>> i = Ferret::I.new(:path => "/tmp/#{i}", :create => true) >>> i << {:name => name} >>> i.close >>> end >>> readers = [] >>> readers << IndexReader.new("/tmp/0") >>> readers << IndexReader.new("/tmp/1") >>> readers << IndexReader.new("/tmp/2") >>> readers << IndexReader.new("/tmp/3") >>> readers << IndexReader.new("/tmp/4") >>> index_writer = IndexWriter.new(:path => "/tmp/test") >>> index_writer.add_readers(readers) >>> index_writer.close() >>> readers.each {|reader| reader.close()} >>> i = Ferret::I.new(:path => '/tmp/test') >>> res = i.search('name*') >>> puts res.inspect # gives me: #>> total_hits=0, hits=[], max_score=0.0, >>> searcher=#> >>> >>> puts res.hits.size # gives me: 0 >>> _______________________________________________ >>> Ferret-talk mailing list >>> Ferret-talk at rubyforge.org >>> http://rubyforge.org/mailman/listinfo/ferret-talk >>> >> >> -- >> Jens Kr?mer >> http://www.jkraemer.net/ - Blog >> http://www.omdb.org/ - The new free film database >> _______________________________________________ >> Ferret-talk mailing list >> Ferret-talk at rubyforge.org >> http://rubyforge.org/mailman/listinfo/ferret-talk > > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk From jeroen at laika.nl Thu Jan 10 10:19:02 2008 From: jeroen at laika.nl (jeroen janssen) Date: Thu, 10 Jan 2008 16:19:02 +0100 Subject: [Ferret-talk] Error on manual indexing Message-ID: <4E59B574-20DF-4336-8F11-2213AFBF54AB@laika.nl> I'm having some problems with getting the drb server to work correctly on my production server. As a workaround I tried disabling automatic indexing and have a cron job manually update the index every hour or so. I disabled the automatic indexing with: def ferret_enabled? false end But whenever I try Page.rebuild_index I get a 'wrong number of arguments' error. I seem to remember that this used to work, am I doing something wrong or has something changed? The full error: ArgumentError: wrong number of arguments (1 for 0) from ./script/../config/../config/../vendor/plugins/acts_as_ferret/ lib/bulk_indexer.rb:19:in `ferret_enabled?' from ./script/../config/../config/../vendor/plugins/acts_as_ferret/ lib/bulk_indexer.rb:19:in `index_records' from ./script/../config/../config/../vendor/plugins/acts_as_ferret/ lib/bulk_indexer.rb:19:in `each' from ./script/../config/../config/../vendor/plugins/acts_as_ferret/ lib/bulk_indexer.rb:19:in `index_records' from ./script/../config/../config/../vendor/plugins/acts_as_ferret/ lib/bulk_indexer.rb:29:in `measure_time' from ./script/../config/../config/../vendor/plugins/acts_as_ferret/ lib/bulk_indexer.rb:18:in `index_records' from ./script/../config/../config/../vendor/plugins/acts_as_ferret/ lib/ferret_extensions.rb:52:in `index_model' from ./script/../config/../config/../vendor/plugins/acts_as_ferret/ lib/class_methods.rb:66:in `records_for_rebuild' from ./script/../config/../config/../vendor/rails/activerecord/lib/ active_record/connection_adapters/abstract/database_statements.rb: 59:in `transaction' from ./script/../config/../config/../vendor/rails/activerecord/lib/ active_record/transactions.rb:95:in `transaction' from ./script/../config/../config/../vendor/plugins/acts_as_ferret/ lib/class_methods.rb:61:in `records_for_rebuild' from ./script/../config/../config/../vendor/plugins/acts_as_ferret/ lib/ferret_extensions.rb:51:in `index_model' from ./script/../config/../config/../vendor/plugins/acts_as_ferret/ lib/ferret_extensions.rb:39:in `index_models' from ./script/../config/../config/../vendor/plugins/acts_as_ferret/ lib/ferret_extensions.rb:39:in `each' from ./script/../config/../config/../vendor/plugins/acts_as_ferret/ lib/ferret_extensions.rb:39:in `index_models' from ./script/../config/../config/../vendor/plugins/acts_as_ferret/ lib/local_index.rb:60:in `rebuild_index' from ./script/../config/../config/../vendor/plugins/acts_as_ferret/ lib/class_methods.rb:33:in `rebuild_index' From jk at jkraemer.net Thu Jan 10 10:43:31 2008 From: jk at jkraemer.net (Jens Kraemer) Date: Thu, 10 Jan 2008 16:43:31 +0100 Subject: [Ferret-talk] Error on manual indexing In-Reply-To: <4E59B574-20DF-4336-8F11-2213AFBF54AB@laika.nl> References: <4E59B574-20DF-4336-8F11-2213AFBF54AB@laika.nl> Message-ID: <20080110154331.GE14794@thunder.jkraemer.net> Hi! On Thu, Jan 10, 2008 at 04:19:02PM +0100, jeroen janssen wrote: > I'm having some problems with getting the drb server to work correctly > on my production server. As a workaround I tried disabling automatic > indexing and have a cron job manually update the index every hour or so. > > I disabled the automatic indexing with: > > def ferret_enabled? > false > end > > But whenever I try Page.rebuild_index > > I get a 'wrong number of arguments' error. I seem to remember that > this used to work, am I doing something wrong or has something changed? ferret_enabled? takes a boolean argument telling it if there's a bulk indexing going on, or if it's a normal update of a single record. So in your case, overriding it like that: def ferret_enabled?(is_bulk_index) is_bulk_index end should do what you want - disable normal index updates, but allow them during rebuild and bulk_index calls.. Cheers, Jens -- Jens Kr?mer http://www.jkraemer.net/ - Blog http://www.omdb.org/ - The new free film database From jk at jkraemer.net Fri Jan 11 13:53:41 2008 From: jk at jkraemer.net (Jens Kraemer) Date: Fri, 11 Jan 2008 19:53:41 +0100 Subject: [Ferret-talk] Parallel indexing doesn't work? In-Reply-To: <2D8D2CBB-86C9-4C2A-88E1-9FBFB8352223@mac.com> References: <3AABF151-C414-4ECA-A011-00FE3D93F657@mac.com> <20080109212452.GB14794@thunder.jkraemer.net> <2D8D2CBB-86C9-4C2A-88E1-9FBFB8352223@mac.com> Message-ID: <20080111185341.GI14794@thunder.jkraemer.net> Hi! On Wed, Jan 09, 2008 at 04:58:20PM -0500, Noah M. Daniels wrote: > Ok, further update -- there was an obvious and stupid bug in my code > that was overwriting the partial indices. So now when that's fixed, I > get the proper number of results for a search: > > Company.find_by_contents('*') > => # @total_hits=247, @results=[], @total_pages=1, @per_page=247> strange. Did you try to access the merged index with plain Ferret to see if this works? Additionally, are your partial indexes index ok and deliver results with contents when you search only one of them? Cheers, Jens > > however, why is @results empty? > > Similarly, find_id_by_contents also returns empty documents, it seems: > > >> Company.find_id_by_contents('*') > => [247, [{:model=>"Company", :data=>{}, :score=>1.0, :id=>nil}, > {:model=>"Company", :data=>{}, :score=>1.0, :id=>nil}, > {:model=>"Company", :data=>{}, :score=>1.0, :id=>nil}, > {:model=>"Company", :data=>{}, :score=>1.0, :id=>nil}, > {:model=>"Company", :data=>{}, :score=>1.0, :id=>nil}, > {:model=>"Company", :data=>{}, :score=>1.0, :id=>nil}, > {:model=>"Company", :data=>{}, :score=>1.0, :id=>nil}, > {:model=>"Company", :data=>{}, :score=>1.0, :id=>nil}, > {:model=>"Company", :data=>{}, :score=>1.0, :id=>nil}, > {:model=>"Company", :data=>{}, :score=>1.0, :id=>nil}]] > > when I would have expected: > > >> Company.find_id_by_contents('*') > => [247, [{:model=>"Company", :score=>1.0, :id=>"189", :data=>{}}, > {:model=>"Company", :score=>1.0, :id=>"2", :data=>{}}, > {:model=>"Company", :score=>1.0, :id=>"192", :data=>{}}, > {:model=>"Company", :score=>1.0, :id=>"4", :data=>{}}, > {:model=>"Company", :score=>1.0, :id=>"6", :data=>{}}, > {:model=>"Company", :score=>1.0, :id=>"7", :data=>{}}, > {:model=>"Company", :score=>1.0, :id=>"8", :data=>{}}, > {:model=>"Company", :score=>1.0, :id=>"37", :data=>{}}, > {:model=>"Company", :score=>1.0, :id=>"13", :data=>{}}, > {:model=>"Company", :score=>1.0, :id=>"21", :data=>{}}]] > > thanks for the help, and sorry for the silly previous bugs :) > > On Jan 9, 2008, at 4:37 PM, Noah M. Daniels wrote: > > > Thanks, Jens. Good catch; this little example works correctly after > > fixing that change. > > > > However, my ActsAsFerret index merging does _not_ work, and I'm > > wondering if it's something to do with AAF's handling of documents in > > an index? > > > > Let's call my indexed class Company... > > > > > > Company.find_by_contents('*') > > => # > @total_hits=3, @results=[], @total_pages=1, @per_page=3> > > > > > > yet on each partial index prior to merging, that query would return a > > bunch of results as one would expect. > > > > now, here's how I've built that index... any idea why the merged index > > is broken? > > > > module FerretHelpers > > def merge_ferret_index_partitions(model) > > > > model_dir = File.basename(model.aaf_configuration[:ferret][:path]) > > > > final_index_path = "/tmp/merged_parallel_ferret_index/ > > #{model_dir}" > > > > partial_index_path = "/tmp/partial_indices/#{model_dir}" > > > > paths = Dir.glob("#{partial_index_path}/*") > > > > paths.each do |path| > > i = Ferret::I.new(:path => path, :create => true) > > name = path.split('/').last > > i << {:name => name} > > i.close > > end > > > > readers = [] > > paths.each {|path| readers << IndexReader.new(path) } > > index_writer = IndexWriter.new(:path => final_index_path) > > index_writer.add_readers(readers) > > index_writer.close() > > readers.each {|reader| reader.close()} > > index = Ferret::Index::Index.new(:path => final_index_path) > > index.optimize > > index.close > > > > > > end > > end > > > > > > On Jan 9, 2008, at 4:24 PM, Jens Kraemer wrote: > > > >> Hi! > >> > >> seems to me you're indexing strings starting with 'index' but you're > >> searching for 'name'? Or maybe correcting this already was one of > >> your > >> minimal changes? > >> > >> If not, try changing that line: > >>> res = i.search('name*') > >> to > >>> res = i.search('index*') > >> > >> cheers, > >> Jens > >> > >> On Wed, Jan 09, 2008 at 04:02:17PM -0500, Noah M. Daniels wrote: > >>> Hi, > >>> > >>> I'm trying to get parallelized ferret indexing working for my AAF > >>> indices, based on the example in the O'Reilly Ferret shortcut. > >>> However, the resulting indices after merging seem to have no actual > >>> documents. > >>> > >>> I went and made minimal changes to the example in the Ferret > >>> shortcut > >>> pdf, and indeed can't get that to work either. I'd appreciate any > >>> help > >>> anyone can give! Thanks! > >>> > >>> The example is below: > >>> > >>> #!/usr/bin/env ruby > >>> > >>> require 'rubygems' > >>> require 'ferret' > >>> include Ferret::Index > >>> > >>> 5.times do |i| > >>> name = "index#{i}" > >>> puts name > >>> i = Ferret::I.new(:path => "/tmp/#{i}", :create => true) > >>> i << {:name => name} > >>> i.close > >>> end > >>> readers = [] > >>> readers << IndexReader.new("/tmp/0") > >>> readers << IndexReader.new("/tmp/1") > >>> readers << IndexReader.new("/tmp/2") > >>> readers << IndexReader.new("/tmp/3") > >>> readers << IndexReader.new("/tmp/4") > >>> index_writer = IndexWriter.new(:path => "/tmp/test") > >>> index_writer.add_readers(readers) > >>> index_writer.close() > >>> readers.each {|reader| reader.close()} > >>> i = Ferret::I.new(:path => '/tmp/test') > >>> res = i.search('name*') > >>> puts res.inspect # gives me: # >>> total_hits=0, hits=[], max_score=0.0, > >>> searcher=#> > >>> > >>> puts res.hits.size # gives me: 0 > >>> _______________________________________________ > >>> Ferret-talk mailing list > >>> Ferret-talk at rubyforge.org > >>> http://rubyforge.org/mailman/listinfo/ferret-talk > >>> > >> > >> -- > >> Jens Kr?mer > >> http://www.jkraemer.net/ - Blog > >> http://www.omdb.org/ - The new free film database > >> _______________________________________________ > >> Ferret-talk mailing list > >> Ferret-talk at rubyforge.org > >> http://rubyforge.org/mailman/listinfo/ferret-talk > > > > _______________________________________________ > > Ferret-talk mailing list > > Ferret-talk at rubyforge.org > > http://rubyforge.org/mailman/listinfo/ferret-talk > > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > -- Jens Kr?mer http://www.jkraemer.net/ - Blog http://www.omdb.org/ - The new free film database From ndaniels at mac.com Fri Jan 11 14:09:21 2008 From: ndaniels at mac.com (Noah M. Daniels) Date: Fri, 11 Jan 2008 14:09:21 -0500 Subject: [Ferret-talk] Parallel indexing doesn't work? In-Reply-To: <20080111185341.GI14794@thunder.jkraemer.net> References: <3AABF151-C414-4ECA-A011-00FE3D93F657@mac.com> <20080109212452.GB14794@thunder.jkraemer.net> <2D8D2CBB-86C9-4C2A-88E1-9FBFB8352223@mac.com> <20080111185341.GI14794@thunder.jkraemer.net> Message-ID: <70259383-AA10-4710-92D4-1AE8BBFADC52@mac.com> Hi, Jens, I'll try what you suggested with plain ferret. What about ferret_browser; what should I be looking for? The partial indices are each fine prior to merging; they deliver results with contents when searching only one of them. thanks again On Jan 11, 2008, at 1:53 PM, Jens Kraemer wrote: > Hi! > > On Wed, Jan 09, 2008 at 04:58:20PM -0500, Noah M. Daniels wrote: >> Ok, further update -- there was an obvious and stupid bug in my code >> that was overwriting the partial indices. So now when that's fixed, I >> get the proper number of results for a search: >> >> Company.find_by_contents('*') >> => #> @total_hits=247, @results=[], @total_pages=1, @per_page=247> > > strange. Did you try to access the merged index with plain Ferret to > see > if this works? Additionally, are your partial indexes index ok and > deliver results with contents when you search only one of them? > > > Cheers, > Jens > >> >> however, why is @results empty? >> >> Similarly, find_id_by_contents also returns empty documents, it >> seems: >> >>>> Company.find_id_by_contents('*') >> => [247, [{:model=>"Company", :data=>{}, :score=>1.0, :id=>nil}, >> {:model=>"Company", :data=>{}, :score=>1.0, :id=>nil}, >> {:model=>"Company", :data=>{}, :score=>1.0, :id=>nil}, >> {:model=>"Company", :data=>{}, :score=>1.0, :id=>nil}, >> {:model=>"Company", :data=>{}, :score=>1.0, :id=>nil}, >> {:model=>"Company", :data=>{}, :score=>1.0, :id=>nil}, >> {:model=>"Company", :data=>{}, :score=>1.0, :id=>nil}, >> {:model=>"Company", :data=>{}, :score=>1.0, :id=>nil}, >> {:model=>"Company", :data=>{}, :score=>1.0, :id=>nil}, >> {:model=>"Company", :data=>{}, :score=>1.0, :id=>nil}]] >> >> when I would have expected: >> >>>> Company.find_id_by_contents('*') >> => [247, [{:model=>"Company", :score=>1.0, :id=>"189", :data=>{}}, >> {:model=>"Company", :score=>1.0, :id=>"2", :data=>{}}, >> {:model=>"Company", :score=>1.0, :id=>"192", :data=>{}}, >> {:model=>"Company", :score=>1.0, :id=>"4", :data=>{}}, >> {:model=>"Company", :score=>1.0, :id=>"6", :data=>{}}, >> {:model=>"Company", :score=>1.0, :id=>"7", :data=>{}}, >> {:model=>"Company", :score=>1.0, :id=>"8", :data=>{}}, >> {:model=>"Company", :score=>1.0, :id=>"37", :data=>{}}, >> {:model=>"Company", :score=>1.0, :id=>"13", :data=>{}}, >> {:model=>"Company", :score=>1.0, :id=>"21", :data=>{}}]] >> >> thanks for the help, and sorry for the silly previous bugs :) >> >> On Jan 9, 2008, at 4:37 PM, Noah M. Daniels wrote: >> >>> Thanks, Jens. Good catch; this little example works correctly after >>> fixing that change. >>> >>> However, my ActsAsFerret index merging does _not_ work, and I'm >>> wondering if it's something to do with AAF's handling of documents >>> in >>> an index? >>> >>> Let's call my indexed class Company... >>> >>> >>> Company.find_by_contents('*') >>> => #>> @total_hits=3, @results=[], @total_pages=1, @per_page=3> >>> >>> >>> yet on each partial index prior to merging, that query would >>> return a >>> bunch of results as one would expect. >>> >>> now, here's how I've built that index... any idea why the merged >>> index >>> is broken? >>> >>> module FerretHelpers >>> def merge_ferret_index_partitions(model) >>> >>> model_dir = File.basename(model.aaf_configuration[:ferret] >>> [:path]) >>> >>> final_index_path = "/tmp/merged_parallel_ferret_index/ >>> #{model_dir}" >>> >>> partial_index_path = "/tmp/partial_indices/#{model_dir}" >>> >>> paths = Dir.glob("#{partial_index_path}/*") >>> >>> paths.each do |path| >>> i = Ferret::I.new(:path => path, :create => true) >>> name = path.split('/').last >>> i << {:name => name} >>> i.close >>> end >>> >>> readers = [] >>> paths.each {|path| readers << IndexReader.new(path) } >>> index_writer = IndexWriter.new(:path => final_index_path) >>> index_writer.add_readers(readers) >>> index_writer.close() >>> readers.each {|reader| reader.close()} >>> index = Ferret::Index::Index.new(:path => final_index_path) >>> index.optimize >>> index.close >>> >>> >>> end >>> end >>> >>> >>> On Jan 9, 2008, at 4:24 PM, Jens Kraemer wrote: >>> >>>> Hi! >>>> >>>> seems to me you're indexing strings starting with 'index' but >>>> you're >>>> searching for 'name'? Or maybe correcting this already was one of >>>> your >>>> minimal changes? >>>> >>>> If not, try changing that line: >>>>> res = i.search('name*') >>>> to >>>>> res = i.search('index*') >>>> >>>> cheers, >>>> Jens >>>> >>>> On Wed, Jan 09, 2008 at 04:02:17PM -0500, Noah M. Daniels wrote: >>>>> Hi, >>>>> >>>>> I'm trying to get parallelized ferret indexing working for my AAF >>>>> indices, based on the example in the O'Reilly Ferret shortcut. >>>>> However, the resulting indices after merging seem to have no >>>>> actual >>>>> documents. >>>>> >>>>> I went and made minimal changes to the example in the Ferret >>>>> shortcut >>>>> pdf, and indeed can't get that to work either. I'd appreciate any >>>>> help >>>>> anyone can give! Thanks! >>>>> >>>>> The example is below: >>>>> >>>>> #!/usr/bin/env ruby >>>>> >>>>> require 'rubygems' >>>>> require 'ferret' >>>>> include Ferret::Index >>>>> >>>>> 5.times do |i| >>>>> name = "index#{i}" >>>>> puts name >>>>> i = Ferret::I.new(:path => "/tmp/#{i}", :create => true) >>>>> i << {:name => name} >>>>> i.close >>>>> end >>>>> readers = [] >>>>> readers << IndexReader.new("/tmp/0") >>>>> readers << IndexReader.new("/tmp/1") >>>>> readers << IndexReader.new("/tmp/2") >>>>> readers << IndexReader.new("/tmp/3") >>>>> readers << IndexReader.new("/tmp/4") >>>>> index_writer = IndexWriter.new(:path => "/tmp/test") >>>>> index_writer.add_readers(readers) >>>>> index_writer.close() >>>>> readers.each {|reader| reader.close()} >>>>> i = Ferret::I.new(:path => '/tmp/test') >>>>> res = i.search('name*') >>>>> puts res.inspect # gives me: #>>>> total_hits=0, hits=[], max_score=0.0, >>>>> searcher=#> >>>>> >>>>> puts res.hits.size # gives me: 0 >>>>> _______________________________________________ >>>>> Ferret-talk mailing list >>>>> Ferret-talk at rubyforge.org >>>>> http://rubyforge.org/mailman/listinfo/ferret-talk >>>>> >>>> >>>> -- >>>> Jens Kr?mer >>>> http://www.jkraemer.net/ - Blog >>>> http://www.omdb.org/ - The new free film database >>>> _______________________________________________ >>>> Ferret-talk mailing list >>>> Ferret-talk at rubyforge.org >>>> http://rubyforge.org/mailman/listinfo/ferret-talk >>> >>> _______________________________________________ >>> Ferret-talk mailing list >>> Ferret-talk at rubyforge.org >>> http://rubyforge.org/mailman/listinfo/ferret-talk >> >> _______________________________________________ >> Ferret-talk mailing list >> Ferret-talk at rubyforge.org >> http://rubyforge.org/mailman/listinfo/ferret-talk >> > > -- > Jens Kr?mer > http://www.jkraemer.net/ - Blog > http://www.omdb.org/ - The new free film database > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk From me at benjaminarai.com Fri Jan 11 14:14:54 2008 From: me at benjaminarai.com (Benjamin Arai) Date: Fri, 11 Jan 2008 11:14:54 -0800 Subject: [Ferret-talk] Date range queries return zero results Message-ID: <4787C02E.8050909@benjaminarai.com> Hello, I am having trouble getting data ranges to work correctly. I am using the following command to load the db: index << {:title => row[7].to_i, :date => Date.strptime(row[3], '%Y-%m-%d'), :page_id => row[5].to_i, :page => row[6].to_i, :content_type => row[1].to_i, :article_id => row[4].to_i, :label => row[2], :label_sort => row[8], :content => row[0] } Notice "Date.strptime(row[3], '%Y-%m-%d')"... When I query (ex. +label:barbara) I get results in the form: {:label=>"NEW TOOL FOR BARBERS.", :page_id=>"36", :label_sort=>"NEW TOOL FOR BARBERS.", :page=>"4", :date=>"1900-03-02", :content_type=>"19", :title=>"1", :article_id=>"7855", :content=>" NEW TOOL FOR BARBERS."} ... Which looks correct to me but if I modify the query to include a date range like "+label:barbara +data:{19000101 19010101}" or even "+label:barbara +data:(>=19000101 AND <= 19010101}" I get 0 results. Does anybody know what I am doing incorrectly? I am using Windows Vista, Ferret version 0.11.5 mswin32. Benjamin From jk at jkraemer.net Fri Jan 11 16:21:09 2008 From: jk at jkraemer.net (Jens Kraemer) Date: Fri, 11 Jan 2008 22:21:09 +0100 Subject: [Ferret-talk] Date range queries return zero results In-Reply-To: <4787C02E.8050909@benjaminarai.com> References: <4787C02E.8050909@benjaminarai.com> Message-ID: <20080111212108.GA13155@thunder.jkraemer.net> Hi! I'd say your problem is that you index your dates with '-' separators between year, month and day, but your range queries don't have these. You should get this working (and better performance because of faster integer based sorting) by indexing your dates as '%Y%m%d'. Cheers, Jens On Fri, Jan 11, 2008 at 11:14:54AM -0800, Benjamin Arai wrote: > Hello, > > I am having trouble getting data ranges to work correctly. I am using > the following command to load the db: > > index << {:title => row[7].to_i, > :date => Date.strptime(row[3], '%Y-%m-%d'), > :page_id => row[5].to_i, > :page => row[6].to_i, > :content_type => row[1].to_i, > :article_id => row[4].to_i, > :label => row[2], > :label_sort => row[8], > :content => row[0] > } > > Notice "Date.strptime(row[3], '%Y-%m-%d')"... > > When I query (ex. +label:barbara) I get results in the form: > > {:label=>"NEW TOOL FOR BARBERS.", :page_id=>"36", :label_sort=>"NEW TOOL > FOR BARBERS.", :page=>"4", :date=>"1900-03-02", :content_type=>"19", > :title=>"1", :article_id=>"7855", :content=>" NEW TOOL FOR BARBERS."} > ... > > Which looks correct to me but if I modify the query to include a date > range like "+label:barbara +data:{19000101 19010101}" or even > "+label:barbara +data:(>=19000101 AND <= 19010101}" I get 0 results. > Does anybody know what I am doing incorrectly? > > I am using Windows Vista, Ferret version 0.11.5 mswin32. > > Benjamin > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > -- Jens Kr?mer http://www.jkraemer.net/ - Blog http://www.omdb.org/ - The new free film database From john at johnleach.co.uk Fri Jan 11 16:26:36 2008 From: john at johnleach.co.uk (John Leach) Date: Fri, 11 Jan 2008 21:26:36 +0000 Subject: [Ferret-talk] Date range queries return zero results In-Reply-To: <4787C02E.8050909@benjaminarai.com> References: <4787C02E.8050909@benjaminarai.com> Message-ID: <1200086796.17342.96.camel@dogen.thepride.> On Fri, 2008-01-11 at 11:14 -0800, Benjamin Arai wrote: > Hello, > > I am having trouble getting data ranges to work correctly. I am using > the following command to load the db: > Hi Benjamin, the .to_s method for the Date object returns a date formatted like "2007-12-25, so for one you're searching for a string that doesn't exist in the index. Secondly, I'm pretty sure the hypens in that string will be tokenized by the Ferret tokenizer, so will end up in the database as separate parts, so a range query would be slow (or not work, not certain). Try explicitly returning the date in a useful format when putting it in the index: index << {:title => row[7].to_i, :date => Date.strptime(row[3], '%Y-%m-%d').strftime("%Y%m%d"), :page_id => row[5].to_i, :page => row[6].to_i, :content_type => row[1].to_i, :article_id => row[4].to_i, :label => row[2], :label_sort => row[8], :content => row[0] } John -- http://www.brightbox.co.uk - UK Ruby on Rails hosting > index << {:title => row[7].to_i, > :date => Date.strptime(row[3], '%Y-%m-%d'), > :page_id => row[5].to_i, > :page => row[6].to_i, > :content_type => row[1].to_i, > :article_id => row[4].to_i, > :label => row[2], > :label_sort => row[8], > :content => row[0] > } > > Notice "Date.strptime(row[3], '%Y-%m-%d')"... > > When I query (ex. +label:barbara) I get results in the form: > > {:label=>"NEW TOOL FOR BARBERS.", :page_id=>"36", :label_sort=>"NEW TOOL > FOR BARBERS.", :page=>"4", :date=>"1900-03-02", :content_type=>"19", > :title=>"1", :article_id=>"7855", :content=>" NEW TOOL FOR BARBERS."} > ... > > Which looks correct to me but if I modify the query to include a date > range like "+label:barbara +data:{19000101 19010101}" or even > "+label:barbara +data:(>=19000101 AND <= 19010101}" I get 0 results. > Does anybody know what I am doing incorrectly? > > I am using Windows Vista, Ferret version 0.11.5 mswin32. > > Benjamin From me at benjaminarai.com Sat Jan 12 09:17:37 2008 From: me at benjaminarai.com (Benjamin Arai) Date: Sat, 12 Jan 2008 06:17:37 -0800 Subject: [Ferret-talk] Date range queries return zero results In-Reply-To: <1200086796.17342.96.camel@dogen.thepride.> References: <4787C02E.8050909@benjaminarai.com> <1200086796.17342.96.camel@dogen.thepride.> Message-ID: <9F34F2F1-BF19-4891-95A8-3EF627939064@benjaminarai.com> Hello, It worked changing the format too %Y%m%d. So, Ruby does not really handle dates directly? I would have thought Ferret would have parsed the Date object internally. Anyways, thanks for the help! Benjamin On Jan 11, 2008, at 1:26 PM, John Leach wrote: > On Fri, 2008-01-11 at 11:14 -0800, Benjamin Arai wrote: >> Hello, >> >> I am having trouble getting data ranges to work correctly. I am >> using >> the following command to load the db: >> > > Hi Benjamin, > > the .to_s method for the Date object returns a date formatted like > "2007-12-25, so for one you're searching for a string that doesn't > exist > in the index. > > Secondly, I'm pretty sure the hypens in that string will be > tokenized by > the Ferret tokenizer, so will end up in the database as separate > parts, > so a range query would be slow (or not work, not certain). > > Try explicitly returning the date in a useful format when putting it > in > the index: > > index << {:title => row[7].to_i, > :date => Date.strptime(row[3], '%Y-%m- > %d').strftime("%Y%m%d"), > :page_id => row[5].to_i, > :page => row[6].to_i, > :content_type => row[1].to_i, > :article_id => row[4].to_i, > :label => row[2], > :label_sort => row[8], > :content => row[0] > } > John > -- > http://www.brightbox.co.uk - UK Ruby on Rails hosting > > >> index << {:title => row[7].to_i, >> :date => Date.strptime(row[3], '%Y-%m-%d'), >> :page_id => row[5].to_i, >> :page => row[6].to_i, >> :content_type => row[1].to_i, >> :article_id => row[4].to_i, >> :label => row[2], >> :label_sort => row[8], >> :content => row[0] >> } >> >> Notice "Date.strptime(row[3], '%Y-%m-%d')"... >> >> When I query (ex. +label:barbara) I get results in the form: >> >> {:label=>"NEW TOOL FOR BARBERS.", :page_id=>"36", :label_sort=>"NEW >> TOOL >> FOR BARBERS.", :page=>"4", :date=>"1900-03-02", :content_type=>"19", >> :title=>"1", :article_id=>"7855", :content=>" NEW TOOL FOR BARBERS."} >> ... >> >> Which looks correct to me but if I modify the query to include a date >> range like "+label:barbara +data:{19000101 19010101}" or even >> "+label:barbara +data:(>=19000101 AND <= 19010101}" I get 0 results. >> Does anybody know what I am doing incorrectly? >> >> I am using Windows Vista, Ferret version 0.11.5 mswin32. >> >> Benjamin > > > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > From me at benjaminarai.com Sat Jan 12 09:27:46 2008 From: me at benjaminarai.com (Benjamin Arai) Date: Sat, 12 Jan 2008 06:27:46 -0800 Subject: [Ferret-talk] Native release for windows 0.11.6? Message-ID: <91CD5E3E-FF7D-45EE-AEE1-A9381859718F@benjaminarai.com> Hello, Is there going to be a native windows release for 0.11.6? Alternatively, if a new version is going to be released soon, will there be a release for windows of that? Thanks! Benjamin From u.alberton at gmail.com Sat Jan 12 12:20:52 2008 From: u.alberton at gmail.com (Bira) Date: Sat, 12 Jan 2008 15:20:52 -0200 Subject: [Ferret-talk] Native release for windows 0.11.6? In-Reply-To: <91CD5E3E-FF7D-45EE-AEE1-A9381859718F@benjaminarai.com> References: <91CD5E3E-FF7D-45EE-AEE1-A9381859718F@benjaminarai.com> Message-ID: On Jan 12, 2008 12:27 PM, Benjamin Arai wrote: > Hello, > > Is there going to be a native windows release for 0.11.6? > Alternatively, if a new version is going to be released soon, will > there be a release for windows of that? Thanks! Hello, 0.11.6 was released to fix a Linux-only bug, so that's why there isn't a Windows version. As for future versions, you'll have to wait for the author's response :). -- Bira http://compexplicita.blogspot.com http://sinfoniaferida.blogspot.com From jeroen at laika.nl Tue Jan 15 06:29:24 2008 From: jeroen at laika.nl (jeroen janssen) Date: Tue, 15 Jan 2008 12:29:24 +0100 Subject: [Ferret-talk] Error on manual indexing In-Reply-To: <20080110154331.GE14794@thunder.jkraemer.net> References: <4E59B574-20DF-4336-8F11-2213AFBF54AB@laika.nl> <20080110154331.GE14794@thunder.jkraemer.net> Message-ID: <0A6DE829-A20A-4455-9293-A64D9FDEB3A7@laika.nl> >> I get a 'wrong number of arguments' error. I seem to remember that >> this used to work, am I doing something wrong or has something >> changed? > > ferret_enabled? takes a boolean argument telling it if there's a bulk > indexing going on, or if it's a normal update of a single record. So > in your case, overriding it like that: > > def ferret_enabled?(is_bulk_index) > is_bulk_index > end > > should do what you want - disable normal index updates, but allow them > during rebuild and bulk_index calls.. Thanks... I can update from the console now but I get a 'wrong numbers of arguments error' when I add or update a page. I've added a ferret_enabled? method to the Page model (which acts_as_ferret), like this: def ferret_enabled?(is_bulk_index) is_bulk_index end But now I get the following error on create or update: ArgumentError in PagesController#create wrong number of arguments (0 for 1) RAILS_ROOT: /Users/jeroen/Documents/projecten/hdz01/trunk/config/.. Application Trace | Framework Trace | Full Trace #{RAILS_ROOT}/vendor/plugins/acts_as_ferret/lib/instance_methods.rb: 88:in `ferret_enabled?' #{RAILS_ROOT}/vendor/plugins/acts_as_ferret/lib/instance_methods.rb: 88:in `ferret_create' #{RAILS_ROOT}/vendor/rails/activerecord/lib/active_record/callbacks.rb: 333:in `send' #{RAILS_ROOT}/vendor/rails/activerecord/lib/active_record/callbacks.rb: 333:in `callback' #{RAILS_ROOT}/vendor/rails/activerecord/lib/active_record/callbacks.rb: 330:in `each' #{RAILS_ROOT}/vendor/rails/activerecord/lib/active_record/callbacks.rb: 330:in `callback' #{RAILS_ROOT}/vendor/rails/activerecord/lib/active_record/callbacks.rb: 255:in `create_without_timestamps' #{RAILS_ROOT}/vendor/rails/activerecord/lib/active_record/timestamp.rb: 39:in `create' #{RAILS_ROOT}/vendor/rails/activerecord/lib/active_record/base.rb: 1789:in `create_or_update_without_callbacks' #{RAILS_ROOT}/vendor/rails/activerecord/lib/active_record/callbacks.rb: 242:in `create_or_update' #{RAILS_ROOT}/vendor/rails/activerecord/lib/active_record/base.rb: 1545:in `save_without_validation' #{RAILS_ROOT}/vendor/rails/activerecord/lib/active_record/ validations.rb:752:in `save_without_transactions' #{RAILS_ROOT}/vendor/rails/activerecord/lib/active_record/ transactions.rb:129:in `save' #{RAILS_ROOT}/vendor/rails/activerecord/lib/active_record/ connection_adapters/abstract/database_statements.rb:59:in `transaction' #{RAILS_ROOT}/vendor/rails/activerecord/lib/active_record/ transactions.rb:95:in `transaction' #{RAILS_ROOT}/vendor/rails/activerecord/lib/active_record/ transactions.rb:121:in `transaction' #{RAILS_ROOT}/vendor/rails/activerecord/lib/active_record/ transactions.rb:129:in `save' #{RAILS_ROOT}/app/controllers/pages_controller.rb:78:in `create' #{RAILS_ROOT}/app/controllers/pages_controller.rb:77:in `create' /Library/Ruby/Gems/1.8/gems/mongrel-1.1.1/lib/mongrel/rails.rb:76:in `process' /Library/Ruby/Gems/1.8/gems/mongrel-1.1.1/lib/mongrel/rails.rb:74:in `synchronize' /Library/Ruby/Gems/1.8/gems/mongrel-1.1.1/lib/mongrel/rails.rb:74:in `process' /Library/Ruby/Gems/1.8/gems/mongrel-1.1.1/lib/mongrel.rb:155:in `process_client' /Library/Ruby/Gems/1.8/gems/mongrel-1.1.1/lib/mongrel.rb:154:in `each' /Library/Ruby/Gems/1.8/gems/mongrel-1.1.1/lib/mongrel.rb:154:in `process_client' /Library/Ruby/Gems/1.8/gems/mongrel-1.1.1/lib/mongrel.rb:281:in `run' /Library/Ruby/Gems/1.8/gems/mongrel-1.1.1/lib/mongrel.rb:281:in `initialize' /Library/Ruby/Gems/1.8/gems/mongrel-1.1.1/lib/mongrel.rb:281:in `new' /Library/Ruby/Gems/1.8/gems/mongrel-1.1.1/lib/mongrel.rb:281:in `run' /Library/Ruby/Gems/1.8/gems/mongrel-1.1.1/lib/mongrel.rb:264:in `initialize' /Library/Ruby/Gems/1.8/gems/mongrel-1.1.1/lib/mongrel.rb:264:in `new' /Library/Ruby/Gems/1.8/gems/mongrel-1.1.1/lib/mongrel.rb:264:in `run' /Library/Ruby/Gems/1.8/gems/mongrel-1.1.1/lib/mongrel/configurator.rb: 282:in `run' /Library/Ruby/Gems/1.8/gems/mongrel-1.1.1/lib/mongrel/configurator.rb: 281:in `each' /Library/Ruby/Gems/1.8/gems/mongrel-1.1.1/lib/mongrel/configurator.rb: 281:in `run' /Library/Ruby/Gems/1.8/gems/mongrel-1.1.1/bin/mongrel_rails:128:in `run' /Library/Ruby/Gems/1.8/gems/mongrel-1.1.1/lib/mongrel/command.rb: 212:in `run' /Library/Ruby/Gems/1.8/gems/mongrel-1.1.1/bin/mongrel_rails:281 /usr/bin/mongrel_rails:16:in `load' /usr/bin/mongrel_rails:16 From me at benjaminarai.com Tue Jan 15 06:55:39 2008 From: me at benjaminarai.com (Benjamin Arai) Date: Tue, 15 Jan 2008 03:55:39 -0800 Subject: [Ferret-talk] Best way to build a query? Message-ID: <478C9F3B.6020901@benjaminarai.com> Hello, I am currently taking various users parameters and build an FQL query. For example, I I was given: query = "indian reservation" date: 20050101 to 20070303 location = USA Human version: Get all of the documents containing "indian reservation" which were published between 20050101 to 20070303 in the USA FQL: +content:(indian reservation) +date:{20050101 20070303} +location:(usa) Is there a better way to do this building queries using the query functions? Benjamin From me at benjaminarai.com Wed Jan 16 16:03:55 2008 From: me at benjaminarai.com (Benjamin Arai) Date: Wed, 16 Jan 2008 13:03:55 -0800 Subject: [Ferret-talk] Escaping special characters :, (, ), [, ], {, }, !, +, ", ~, ^, -, |, <, >, =, *, ?, \ Message-ID: Hello, I am trying to escape words for searching i.e., "hello". The key here is that the two L's on "hello" are actually vertical bars. Is there a special function in Ferret or anywhere for that matter that will do the escaping of the Ferret special characters? Thanks in advance, Benjamin From me at benjaminarai.com Wed Jan 16 16:22:24 2008 From: me at benjaminarai.com (Benjamin Arai) Date: Wed, 16 Jan 2008 13:22:24 -0800 Subject: [Ferret-talk] QueryParser clean_string ? Message-ID: Hello, What happened to "clean_string" in QueryParser? Benjamin From me at benjaminarai.com Wed Jan 16 18:51:48 2008 From: me at benjaminarai.com (Benjamin Arai) Date: Wed, 16 Jan 2008 15:51:48 -0800 Subject: [Ferret-talk] Escaping special characters :, (, ), [, ], {, }, !, +, ", ~, ^, -, |, <, >, =, *, ?, \ In-Reply-To: References: Message-ID: <67B2B037-CE7F-4EBF-9A35-60CBF604E764@benjaminarai.com> Hello, I am going to answer my own questions... You can do something like: def escape_string(str) str = Regexp.escape(item[:word]).gsub(/([:~!<>="])/,'\\\\\1') end At one time there was a function called clean_string in QueryParser but it appears to be gone. Benjamin On Jan 16, 2008, at 1:03 PM, Benjamin Arai wrote: > Hello, > > I am trying to escape words for searching i.e., "hello". The key here > is that the two L's on "hello" are actually vertical bars. Is there > a special function in Ferret or anywhere for that matter that will do > the escaping of the Ferret special characters? > > Thanks in advance, > > Benjamin > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > From me at benjaminarai.com Wed Jan 16 20:19:12 2008 From: me at benjaminarai.com (Benjamin Arai) Date: Wed, 16 Jan 2008 17:19:12 -0800 Subject: [Ferret-talk] Escaping special characters :, (, ), [, ], {, }, !, +, ", ~, ^, -, |, <, >, =, *, ?, \ In-Reply-To: <67B2B037-CE7F-4EBF-9A35-60CBF604E764@benjaminarai.com> References: <67B2B037-CE7F-4EBF-9A35-60CBF604E764@benjaminarai.com> Message-ID: Woops... corrected code below: > def escape_string(str) > return Regexp.escape(str).gsub(/([:~!<>="])/,'\\\\\1') > end On Jan 16, 2008, at 3:51 PM, Benjamin Arai wrote: > Hello, > > I am going to answer my own questions... > > You can do something like: > > def escape_string(str) > str = Regexp.escape(item[:word]).gsub(/([:~!<>="])/,'\\\\\1') > end > > At one time there was a function called clean_string in QueryParser > but it appears to be gone. > > Benjamin > > On Jan 16, 2008, at 1:03 PM, Benjamin Arai wrote: > >> Hello, >> >> I am trying to escape words for searching i.e., "hello". The key here >> is that the two L's on "hello" are actually vertical bars. Is there >> a special function in Ferret or anywhere for that matter that will do >> the escaping of the Ferret special characters? >> >> Thanks in advance, >> >> Benjamin >> _______________________________________________ >> Ferret-talk mailing list >> Ferret-talk at rubyforge.org >> http://rubyforge.org/mailman/listinfo/ferret-talk >> > > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > From ndaniels at mac.com Sun Jan 20 20:21:55 2008 From: ndaniels at mac.com (Noah M. Daniels) Date: Sun, 20 Jan 2008 20:21:55 -0500 Subject: [Ferret-talk] fuzzy search question Message-ID: Hi, I've got a question about FuzzyQueries. Say I'm doing a search for people by name, and I want to allow fuzzy results if there aren't enough hits with a regular query. This is easy enough; just redo the search with the fuzzy query if original_results.total_hits is less than some threshold. However, I would like the exact-match results to have a higher score (show up as the first results) than the fuzzy- match results. One really clumsy way of doing this is to do the two searches and concatenate the results in order into a new set of results (I'm using ActsAsFerret, so I'd be doing some surgery on ActsAsFerret::SearchResults that gets pretty fragile with pagination). Is there an easier way to have a FuzzyQuery return the exact hits first? It looks like the score is the same regardless of the fuzziness of a specific term match. thanks! From jk at jkraemer.net Mon Jan 21 03:47:21 2008 From: jk at jkraemer.net (Jens Kraemer) Date: Mon, 21 Jan 2008 09:47:21 +0100 Subject: [Ferret-talk] fuzzy search question In-Reply-To: References: Message-ID: <20080121084721.GX6349@thunder.jkraemer.net> Hi, didn't try this out but what about doing an OR query and have the non-fuzzy part of the query boosted? name:smith^10 OR name:smith~0.4 Jens On Sun, Jan 20, 2008 at 08:21:55PM -0500, Noah M. Daniels wrote: > Hi, > > I've got a question about FuzzyQueries. Say I'm doing a search for > people by name, and I want to allow fuzzy results if there aren't > enough hits with a regular query. This is easy enough; just redo the > search with the fuzzy query if original_results.total_hits is less > than some threshold. However, I would like the exact-match results to > have a higher score (show up as the first results) than the fuzzy- > match results. One really clumsy way of doing this is to do the two > searches and concatenate the results in order into a new set of > results (I'm using ActsAsFerret, so I'd be doing some surgery on > ActsAsFerret::SearchResults that gets pretty fragile with pagination). > > Is there an easier way to have a FuzzyQuery return the exact hits > first? It looks like the score is the same regardless of the fuzziness > of a specific term match. > > thanks! > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > -- Jens Kr?mer http://www.jkraemer.net/ - Blog http://www.omdb.org/ - The new free film database From me at benjaminarai.com Tue Jan 22 02:06:53 2008 From: me at benjaminarai.com (Benjamin Arai) Date: Mon, 21 Jan 2008 23:06:53 -0800 Subject: [Ferret-talk] GROUP BY functionality for Ferret Message-ID: Hello, Is there any way to perform GROUP BY operations on specific fields, namely, a date field? Benjamin From pat at trailfire.com Fri Jan 25 00:25:21 2008 From: pat at trailfire.com (Pat Ferrel) Date: Thu, 24 Jan 2008 21:25:21 -0800 Subject: [Ferret-talk] Ferret+Lucene Index Message-ID: We use Nutch and Lucene for our heavy duty text analysis jobs but I?m trying to use ferrret to do some experiments. I understood that Ferret used the same index format as lucene but I cannot look into a lucene index with ferret and cannot read a ferret index with luke (the lucene index browser). Am I doing somehting wrong or have the formats diverged? -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20080124/db73387d/attachment.html From ryan at theryanking.com Fri Jan 25 01:41:15 2008 From: ryan at theryanking.com (Ryan King) Date: Thu, 24 Jan 2008 22:41:15 -0800 Subject: [Ferret-talk] Ferret+Lucene Index In-Reply-To: References: Message-ID: <4B364861-EFB6-4532-8AE5-A4257ABC8402@theryanking.com> On Jan 24, 2008, at 9:25 PM, Pat Ferrel wrote: > We use Nutch and Lucene for our heavy duty text analysis jobs but > I?m trying to use ferrret to do some experiments. I understood that > Ferret used the same index format as lucene but I cannot look into a > lucene index with ferret and cannot read a ferret index with luke > (the lucene index browser). Am I doing somehting wrong or have the > formats diverged? The formats have diverged. -ryan From john at johnleach.co.uk Fri Jan 25 10:02:50 2008 From: john at johnleach.co.uk (John Leach) Date: Fri, 25 Jan 2008 15:02:50 +0000 Subject: [Ferret-talk] ferret success stories? Message-ID: <1201273370.18749.8.camel@dogen.thepride.> Hi all, there was a recent thread[1] on rails-deploy about Ferret in which a lot of people complained of problems using it in production. I've been using Ferret (with DRb) for many months now with no serious issues. I'm assuming the posters know what they're doing so I'm guessing they're just using Ferret in higher-scale environments than me. I spoke to someone in person yesterday who claimed that Ferret over DRb couldn't keep up with their use rate and had been investigating replicating the ferret database between two machines. With all these bad experiences, I'd like to hear about some good experiences. Anyone care to comment? Anyone using it under huge load? Care to provide some numbers and some notes about how you've made it work? John. [1] http://groups.google.com/group/rubyonrails-deployment/browse_thread/thread/980fe7cb20cb97dd/bc798b52f439020c -- http://www.brightbox.co.uk - UK Ruby on Rails hosting From pat at trailfire.com Fri Jan 25 11:12:24 2008 From: pat at trailfire.com (Pat Ferrel) Date: Fri, 25 Jan 2008 08:12:24 -0800 Subject: [Ferret-talk] Ferret+Lucene Index In-Reply-To: <4B364861-EFB6-4532-8AE5-A4257ABC8402@theryanking.com> Message-ID: Can I use an earlier version of ferret maybe? Does anyone know when the formats diverged? On 1/24/08 10:41 PM, "Ryan King" wrote: > On Jan 24, 2008, at 9:25 PM, Pat Ferrel wrote: > >> > We use Nutch and Lucene for our heavy duty text analysis jobs but >> > I?m trying to use ferrret to do some experiments. I understood that >> > Ferret used the same index format as lucene but I cannot look into a >> > lucene index with ferret and cannot read a ferret index with luke >> > (the lucene index browser). Am I doing somehting wrong or have the >> > formats diverged? > > The formats have diverged. > > -ryan > > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20080125/05fdf039/attachment.html From kraemer at webit.de Fri Jan 25 11:33:57 2008 From: kraemer at webit.de (Jens Kraemer) Date: Fri, 25 Jan 2008 17:33:57 +0100 Subject: [Ferret-talk] Ferret+Lucene Index In-Reply-To: References: <4B364861-EFB6-4532-8AE5-A4257ABC8402@theryanking.com> Message-ID: <20080125163357.GF507@cordoba.webit.de> On Fri, Jan 25, 2008 at 08:12:24AM -0800, Pat Ferrel wrote: > Can I use an earlier version of ferret maybe? Does anyone know when the > formats diverged? Yeah, Versions 0.3.x should definitely work, and possibly also 0.9.x. Afair there always were some substantial problems in terms of UTF8 character handling, so you might run into problems even with the older versions. Cheers, Jens > > > On 1/24/08 10:41 PM, "Ryan King" wrote: > > > On Jan 24, 2008, at 9:25 PM, Pat Ferrel wrote: > > > >> > We use Nutch and Lucene for our heavy duty text analysis jobs but > >> > I?m trying to use ferrret to do some experiments. I understood that > >> > Ferret used the same index format as lucene but I cannot look into a > >> > lucene index with ferret and cannot read a ferret index with luke > >> > (the lucene index browser). Am I doing somehting wrong or have the > >> > formats diverged? > > > > The formats have diverged. > > > > -ryan > > > > _______________________________________________ > > Ferret-talk mailing list > > Ferret-talk at rubyforge.org > > http://rubyforge.org/mailman/listinfo/ferret-talk > > > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold From ryan at theryanking.com Fri Jan 25 14:33:18 2008 From: ryan at theryanking.com (Ryan King) Date: Fri, 25 Jan 2008 11:33:18 -0800 Subject: [Ferret-talk] ferret success stories? In-Reply-To: <1201273370.18749.8.camel@dogen.thepride.> References: <1201273370.18749.8.camel@dogen.thepride.> Message-ID: <586FD54C-913D-4EC9-83E2-E076BD0509D6@theryanking.com> On Jan 25, 2008, at 7:02 AM, John Leach wrote: > Hi all, > > there was a recent thread[1] on rails-deploy about Ferret in which a > lot > of people complained of problems using it in production. > > I've been using Ferret (with DRb) for many months now with no serious > issues. I'm assuming the posters know what they're doing so I'm > guessing they're just using Ferret in higher-scale environments than > me. > > I spoke to someone in person yesterday who claimed that Ferret over > DRb > couldn't keep up with their use rate and had been investigating > replicating the ferret database between two machines. > > With all these bad experiences, I'd like to hear about some good > experiences. Anyone care to comment? Anyone using it under huge > load? > Care to provide some numbers and some notes about how you've made it > work? I had used ferret w/ DrB at Technorati on a project that had several indexes of 5-10M documents. To make it work well I had to limit the update rates to the index (I think I took it down to about 1-2/s). To go to higher updates rates I would have had to change how we were writing and serving indexes. -ryan From john at digitalpulp.com Fri Jan 25 16:35:27 2008 From: john at digitalpulp.com (John Bachir) Date: Fri, 25 Jan 2008 16:35:27 -0500 Subject: [Ferret-talk] strange capistrano problem Message-ID: When trying to start ferret with capistrano, I keep getting this: $ cap services:ferret:stop domain [redken.digitalpulp.com] : user [john] : * executing `services:ferret:stop' * executing "cd /srv/rails/redken/current; script/ferret_server -e production stop" servers: ["redken.digitalpulp.com"] Password: [redken.digitalpulp.com] executing command *** [err :: redken.digitalpulp.com] no such file to load -- /bin/../ config/environment *** [err :: redken.digitalpulp.com] command finished command "cd /srv/rails/redken/current; script/ferret_server -e production stop" failed on redken.digitalpulp.com Running the same command manually on the server, it is successful. I have tried both as a regular user and as root. Any ideas? Thanks, John -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20080125/7efe9a4e/attachment.html From john at digitalpulp.com Fri Jan 25 16:53:35 2008 From: john at digitalpulp.com (John Bachir) Date: Fri, 25 Jan 2008 16:53:35 -0500 Subject: [Ferret-talk] strange capistrano problem In-Reply-To: References: Message-ID: <06AE120E-6FB8-4BD8-BB77-F7BC897AF143@digitalpulp.com> On Jan 25, 2008, at 4:35 PM, John Bachir wrote: > When trying to start ferret with capistrano, I keep getting this: ....... > *** [err :: redken.digitalpulp.com] no such file to load -- /bin/../ > config/environment ........ > Running the same command manually on the server, it is successful. > I have tried both as a regular user and as root. More info-- here is the relevant code, in lib/server_manager.rb ENV['FERRET_USE_LOCAL_INDEX'] = 'true' ENV['RAILS_ENV'] = $ferret_server_options['environment'] #require(File.join(File.dirname(__FILE__), '../../../../config/ environment')) require(File.join(File.dirname(ENV['_']), '../config/environment')) require 'acts_as_ferret' ActsAsFerret::Remote::Server.new.send($ferret_server_action) so, it seems that in some contexts, ENV['_'] is coming up with /bin/. Thanks, John -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20080125/4b11b1f5/attachment-0001.html From casey at nerdle.com Fri Jan 25 18:06:54 2008 From: casey at nerdle.com (Casey) Date: Fri, 25 Jan 2008 18:06:54 -0500 (EST) Subject: [Ferret-talk] ferret success stories? In-Reply-To: <586FD54C-913D-4EC9-83E2-E076BD0509D6@theryanking.com> References: <1201273370.18749.8.camel@dogen.thepride.> <586FD54C-913D-4EC9-83E2-E076BD0509D6@theryanking.com> Message-ID: I'm using Ferret/aaf with the DRb server under a medium load at Ravelry.com. I think that we peak at 10-12 queries per second and a little less than 1 update per second. My biggest problem has been indexing speed. I've been gradually switching over to Sphinx (http://www.sphinxsearch.com/) for indexes that don't have to be updated in realtime (places where I can afford several minutes of lag). Emulating near realtime index updates in Sphinx is a little hacky but I find that it is worth it. Casey On Fri, 25 Jan 2008, Ryan King wrote: > On Jan 25, 2008, at 7:02 AM, John Leach wrote: >> Hi all, >> >> there was a recent thread[1] on rails-deploy about Ferret in which a >> lot >> of people complained of problems using it in production. >> >> I've been using Ferret (with DRb) for many months now with no serious >> issues. I'm assuming the posters know what they're doing so I'm >> guessing they're just using Ferret in higher-scale environments than >> me. >> >> I spoke to someone in person yesterday who claimed that Ferret over >> DRb >> couldn't keep up with their use rate and had been investigating >> replicating the ferret database between two machines. >> >> With all these bad experiences, I'd like to hear about some good >> experiences. Anyone care to comment? Anyone using it under huge >> load? >> Care to provide some numbers and some notes about how you've made it >> work? > > I had used ferret w/ DrB at Technorati on a project that had several > indexes of 5-10M documents. To make it work well I had to limit the > update rates to the index (I think I took it down to about 1-2/s). > > To go to higher updates rates I would have had to change how we were > writing and serving indexes. > > -ryan > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > From alex at liivid.com Fri Jan 25 19:00:33 2008 From: alex at liivid.com (Alex Neth) Date: Fri, 25 Jan 2008 16:00:33 -0800 Subject: [Ferret-talk] ferret success stories? In-Reply-To: References: Message-ID: I am successfully using ferret with a few caveats, none of which to my knowledge would be solved by using another solution. I believe many corruption problems are associated with a bug in aaf that causes multiple indexes to be built at the same time in the same place. I'm not sure if that's been addressed as I didn't see any response after reporting it months ago. My primary issues is that the ferret index needs to be optimized before it performs well. Since optimization locks the index from reading, and can take 30 seconds on my index, this severely limits how often I can update my index. I am working on a solution by running two ferret servers, but this is requiring extensively modifying the aaf plug-in. I have investigated Sphynx, but I don't think that it solves my problem of large amounts of constant updates. -- Alex Neth Liivid Inc / cribQ www.liivid.com / www.cribq.com alex at liivid.com +1 206 499 4995 +86 13761577188 On Jan 25, 2008, at 1:53 PM, ferret-talk-request at rubyforge.org wrote: > Message: 5 > Date: Fri, 25 Jan 2008 15:02:50 +0000 > From: John Leach > Subject: [Ferret-talk] ferret success stories? > To: ferret-talk at rubyforge.org > Message-ID: <1201273370.18749.8.camel at dogen.thepride.> > Content-Type: text/plain > > Hi all, > > there was a recent thread[1] on rails-deploy about Ferret in which > a lot > of people complained of problems using it in production. > > I've been using Ferret (with DRb) for many months now with no serious > issues. I'm assuming the posters know what they're doing so I'm > guessing they're just using Ferret in higher-scale environments > than me. > > I spoke to someone in person yesterday who claimed that Ferret over > DRb > couldn't keep up with their use rate and had been investigating > replicating the ferret database between two machines. > > With all these bad experiences, I'd like to hear about some good > experiences. Anyone care to comment? Anyone using it under huge > load? > Care to provide some numbers and some notes about how you've made it > work? > > John. > > [1] http://groups.google.com/group/rubyonrails-deployment/ > browse_thread/thread/980fe7cb20cb97dd/bc798b52f439020c > > -- > http://www.brightbox.co.uk - UK Ruby on Rails hosting > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20080125/732e332c/attachment.html From john at digitalpulp.com Fri Jan 25 19:23:06 2008 From: john at digitalpulp.com (John Bachir) Date: Fri, 25 Jan 2008 19:23:06 -0500 Subject: [Ferret-talk] rebuilding the index completely and consistently In-Reply-To: <20070608115452.GB23116@cordoba.webit.de> References: <88EF8DAF-8AA3-4DD8-961D-BAD6D091852E@digitalpulp.com> <4AC55D32-90E5-4D0E-BE53-DAB9629AFA77@digitalpulp.com> <20070608115452.GB23116@cordoba.webit.de> Message-ID: Hey folks. Here's an update to my Super Duper Ferret Single Index Rebuild that we were discussing back in June. On Jun 8, 2007, at 7:54 AM, Jens Kraemer wrote: > On May 31, 2007, at 2:30 PM, John Bachir wrote: > >> I am using AAF trunk, and I want a way to rebuild an index on a >> production site with little or no interruption to service. The Drb >> Server documentation* states that when an index is rebuilt, it is >> done in a separate location and then swapped into place when >> finished, and so to do a complete rebuild on a live site, one must >> take into consideration objects which have been created or modified >> in the meantime. To achieve this, I have come up with the following >> solution: >> >> http://pastie.textmate.org/66602 >> >> [1] Does this look like a complete solution? I suppose it relies on >> timestamp consistency between system components... it is possible >> that between setting "start = ..." and performing the rebuild, >> another thread in the system will have create an earlier timestamp >> for an object that did not get committed until after the rebuild >> began. Is it possible to do a perfect rebuild, or would that require >> building a layer of concurrency logic into AAF? > > The scenario you describe might happen and cause a record not to be > indexed, but I'd implement it just like you did. > > To be safe you can subtract a minute or so from your recorded start > time ;-) I've come up with this rake task: http://pastie.textmate.org/private/4xyk2o0obibzi2tmpbog Jens, what do you think? Anyone have any improvements to offer? Cheers, John From jk at jkraemer.net Sat Jan 26 04:23:23 2008 From: jk at jkraemer.net (Jens Kraemer) Date: Sat, 26 Jan 2008 10:23:23 +0100 Subject: [Ferret-talk] rebuilding the index completely and consistently In-Reply-To: References: <88EF8DAF-8AA3-4DD8-961D-BAD6D091852E@digitalpulp.com> <4AC55D32-90E5-4D0E-BE53-DAB9629AFA77@digitalpulp.com> <20070608115452.GB23116@cordoba.webit.de> Message-ID: <20080126092323.GB31262@thunder.jkraemer.net> Hi! On Fri, Jan 25, 2008 at 07:23:06PM -0500, John Bachir wrote: > Hey folks. > > Here's an update to my Super Duper Ferret Single Index Rebuild that > we were discussing back in June. > [..] > > > I've come up with this rake task: > > http://pastie.textmate.org/private/4xyk2o0obibzi2tmpbog > > Jens, what do you think? Anyone have any improvements to offer? looks great. Mind if I add this as an example to acts_as_ferret? Cheers, Jens -- Jens Kr?mer http://www.jkraemer.net/ - Blog http://www.omdb.org/ - The new free film database From jk at jkraemer.net Sat Jan 26 04:53:22 2008 From: jk at jkraemer.net (Jens Kraemer) Date: Sat, 26 Jan 2008 10:53:22 +0100 Subject: [Ferret-talk] ferret success stories? In-Reply-To: References: Message-ID: <20080126095322.GC31262@thunder.jkraemer.net> Hi Alex, sorry about not responding to that mail, been pretty busy with other stuff at that time. I just dugg it out of my mail archive now and yes, I'd like to have a look at your locking code to prevent parallel index rebuilds :-) Regarding your optimization troubles - I'm afraid I don't know what's going wrong there with your index being slow when you don't optimize it (unless you're sorting your search results by something else than relevancy - that is indeed known to be slow when the index isn't optimized). Did you try to tweak some of Ferret's more obscure indexing parameters like :merge_factor (lowering it (the Ferret shortcut suggests a value of 2 or 3 for better search performance) will let Ferret merge segments more frequently so on average there are less files in the index which should improve search performance)? Cheers, Jens On Fri, Jan 25, 2008 at 04:00:33PM -0800, Alex Neth wrote: > I am successfully using ferret with a few caveats, none of which to > my knowledge would be solved by using another solution. > > I believe many corruption problems are associated with a bug in aaf > that causes multiple indexes to be built at the same time in the same > place. I'm not sure if that's been addressed as I didn't see any > response after reporting it months ago. > > My primary issues is that the ferret index needs to be optimized > before it performs well. Since optimization locks the index from > reading, and can take 30 seconds on my index, this severely limits > how often I can update my index. I am working on a solution by > running two ferret servers, but this is requiring extensively > modifying the aaf plug-in. > > I have investigated Sphynx, but I don't think that it solves my > problem of large amounts of constant updates. > > -- > Alex Neth > Liivid Inc / cribQ > www.liivid.com / www.cribq.com > alex at liivid.com > +1 206 499 4995 > +86 13761577188 > > > > > On Jan 25, 2008, at 1:53 PM, ferret-talk-request at rubyforge.org wrote: > > >Message: 5 > >Date: Fri, 25 Jan 2008 15:02:50 +0000 > >From: John Leach > >Subject: [Ferret-talk] ferret success stories? > >To: ferret-talk at rubyforge.org > >Message-ID: <1201273370.18749.8.camel at dogen.thepride.> > >Content-Type: text/plain > > > >Hi all, > > > >there was a recent thread[1] on rails-deploy about Ferret in which > >a lot > >of people complained of problems using it in production. > > > >I've been using Ferret (with DRb) for many months now with no serious > >issues. I'm assuming the posters know what they're doing so I'm > >guessing they're just using Ferret in higher-scale environments > >than me. > > > >I spoke to someone in person yesterday who claimed that Ferret over > >DRb > >couldn't keep up with their use rate and had been investigating > >replicating the ferret database between two machines. > > > >With all these bad experiences, I'd like to hear about some good > >experiences. Anyone care to comment? Anyone using it under huge > >load? > >Care to provide some numbers and some notes about how you've made it > >work? > > > >John. > > > >[1] http://groups.google.com/group/rubyonrails-deployment/ > >browse_thread/thread/980fe7cb20cb97dd/bc798b52f439020c > > > >-- > >http://www.brightbox.co.uk - UK Ruby on Rails hosting > > > > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk -- Jens Kr?mer http://www.jkraemer.net/ - Blog http://www.omdb.org/ - The new free film database From john at digitalpulp.com Mon Jan 28 17:54:10 2008 From: john at digitalpulp.com (John Bachir) Date: Mon, 28 Jan 2008 17:54:10 -0500 Subject: [Ferret-talk] lock index when not using drb server? Message-ID: <813B05B9-3234-4769-B1B0-0C297549B8A6@digitalpulp.com> We all know that using ferret/aaf without the drb server is not thread-safe-- but why not? Would it be so hard to sacrifice performance by using a simple locking system? Very often I run into a situation where I want to quickly stage a project, and I want to use a few mongrels but don't want to configure every last piece of the system, including the drb server. It would be nice if I didn't have to worry about index corruption. Just a thought. John From julioody at gmail.com Mon Jan 28 18:45:43 2008 From: julioody at gmail.com (Julio Cesar Ody) Date: Tue, 29 Jan 2008 10:45:43 +1100 Subject: [Ferret-talk] lock index when not using drb server? In-Reply-To: <813B05B9-3234-4769-B1B0-0C297549B8A6@digitalpulp.com> References: <813B05B9-3234-4769-B1B0-0C297549B8A6@digitalpulp.com> Message-ID: Please anyone correct me if I'm wrong. I think it's because the thread safety is implemented at the class level. Which means it's ok to share a single instance of IndexWriter across many threads. But when you have a multi-process model going (Rails), you effectively have many different programs accessing the same index files, thus running into file locking issues. Implementing a shared locking mechanism that's fast enough not to get in the way of performance (in a really bad way) is a subject for a lot of research. The first idea that springs to my mind is using a tempfile based one. Solves the problem, but you can kiss goodbye to fast indexing. (e.g.: if File.exists? 'foo' # index is locked, wait a bit and try again) One other solution is to have a common daemon that all processes share and handle file locking in memory, and that's precisely what the DRb server is. Perhaps a quickie would be to tie the DRb server to a thread that runs along with Rails. This would make it transparent. I think I'll risk it and reply later with a hack. On Jan 29, 2008 9:54 AM, John Bachir wrote: > We all know that using ferret/aaf without the drb server is not > thread-safe-- but why not? Would it be so hard to sacrifice > performance by using a simple locking system? > > Very often I run into a situation where I want to quickly stage a > project, and I want to use a few mongrels but don't want to configure > every last piece of the system, including the drb server. It would be > nice if I didn't have to worry about index corruption. > > Just a thought. > > John > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > From alex at liivid.com Mon Jan 28 21:01:59 2008 From: alex at liivid.com (Alex Neth) Date: Mon, 28 Jan 2008 18:01:59 -0800 Subject: [Ferret-talk] Ferret-talk Digest, Vol 27, Issue 7 In-Reply-To: References: Message-ID: Thanks for the response Jens. Indeed I am sorting by something other than relevancy, so that would explain it. Optimized, it's extremely fast and handles a good load, but new records kill it until I optimize. I haven't tried :merge_factor as I wasn't aware of it. I'm not sure it will help given the above. Regarding the re-index locking code, it's not 100% threadsafe or necessarily the desired behavior (calling rebuild_index when the last rebuild isn't finished won't result in a new index,) but it works for me. I just added this to the beginning of rebuild_index: if @rebuilding @logger.warn "already rebuilding index - will lock until rebuilding is complete" while @rebuilding sleep 1 end return end @rebuilding = true The big problem was in new_index_for, which always uses the same path, and causes the multiple rebuilds to be stomping on each other. It's not necessary to fix this after adding the above though. Thanks again. -- Alex Neth Liivid Inc / cribQ www.liivid.com / www.cribq.com On Jan 28, 2008, at 3:45 PM, ferret-talk-request at rubyforge.org wrote: > Date: Sat, 26 Jan 2008 10:53:22 +0100 > From: Jens Kraemer > Subject: Re: [Ferret-talk] ferret success stories? > To: ferret-talk at rubyforge.org > Message-ID: <20080126095322.GC31262 at thunder.jkraemer.net> > Content-Type: text/plain; charset=iso-8859-1 > > Hi Alex, > > sorry about not responding to that mail, been pretty busy with other > stuff at that time. > > I just dugg it out of my mail archive now and yes, I'd like to have a > look at your locking code to prevent parallel index rebuilds :-) > > Regarding your optimization troubles - I'm afraid I don't know what's > going wrong there with your index being slow when you don't > optimize it > (unless you're sorting your search results by something else than > relevancy - that is indeed known to be slow when the index isn't > optimized). > > Did you try to tweak some of Ferret's more obscure indexing parameters > like :merge_factor (lowering it (the Ferret shortcut suggests a > value of > 2 or 3 for better search performance) will let Ferret merge segments > more frequently so on average there are less files in the index which > should improve search performance)? > > > Cheers, > Jens -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20080128/63a02a87/attachment.html From syrius.ml at no-log.org Tue Jan 29 11:04:37 2008 From: syrius.ml at no-log.org (syrius.ml at no-log.org) Date: Tue, 29 Jan 2008 17:04:37 +0100 Subject: [Ferret-talk] acts_as_ferret : cannot use a customized Analyzer (as indicated in the AdvancedUsageNotes) In-Reply-To: <20071115090711.GX10556@cordoba.webit.de> (Jens Kraemer's message of "Thu\, 15 Nov 2007 10\:07\:11 +0100") References: <20071114092537.GB3558@thunder.jkraemer.net> <473B83A9.4050509@plan99.net> <20071115090711.GX10556@cordoba.webit.de> Message-ID: <87fxwgvcx3.87ejc0vcx3@87d4rkvcx3.message.id> Jens Kraemer writes: > On Thu, Nov 15, 2007 at 12:24:25AM +0100, Hongli Lai wrote: >> Alain Ravet wrote: >> > class Country < ActiveRecord::Base >> > acts_as_ferret( >> > :fields => [:name] , >> > :remote => true, >> > :ferret => {:analyzer => Test2Analyzer.new([]) } >> > ) >> > end >> >> Try this: >> >> acts_as_ferret({ :fields => [:name], :remote => true }, >> { :analyzer => Test2Analyzer.new([]) }) > > this won't help, these are both valid ways to call acts_as_ferret. The > :ferret syntax is the preferred one, however. Just for information, I was using an old or bad syntax for aaf. I was using acts_as_ferret :fields [], :analyzer => MyAnalyzer.new and it wasn't working. (A raise in initialize of MyAnalyzer was raising but not in token_stream) I'm now using :ferret => {:analyzer => MyAnalyzer} and it works as expected. -- From john at digitalpulp.com Thu Jan 31 13:11:17 2008 From: john at digitalpulp.com (John Bachir) Date: Thu, 31 Jan 2008 13:11:17 -0500 Subject: [Ferret-talk] strange capistrano problem In-Reply-To: <06AE120E-6FB8-4BD8-BB77-F7BC897AF143@digitalpulp.com> References: <06AE120E-6FB8-4BD8-BB77-F7BC897AF143@digitalpulp.com> Message-ID: <384C4706-5D38-4A4E-BB91-33B3D54E720D@digitalpulp.com> On Jan 25, 2008, at 4:53 PM, John Bachir wrote: > More info-- here is the relevant code, in lib/server_manager.rb > > ENV['FERRET_USE_LOCAL_INDEX'] = 'true' > ENV['RAILS_ENV'] = $ferret_server_options['environment'] > #require(File.join(File.dirname(__FILE__), '../../../../config/ > environment')) > require(File.join(File.dirname(ENV['_']), '../config/environment')) > require 'acts_as_ferret' > ActsAsFerret::Remote::Server.new.send($ferret_server_action) > > so, it seems that in some contexts, ENV['_'] is coming up with /bin/. I just noticed that trunk has more sophisticated logic here: #require(File.join(File.dirname(__FILE__), '../../../../config/ environment')) if $ferret_server_options['root'] require File.join($ferret_server_options['root'], 'config', 'environment') else require(File.join(File.dirname(ENV['_']), '../config/environment')) end Jens-- how stable is trunk? Do you anticipate a point release any time soon? Thanks! John -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20080131/2f8a292a/attachment.html From john at digitalpulp.com Thu Jan 31 20:32:09 2008 From: john at digitalpulp.com (John Bachir) Date: Thu, 31 Jan 2008 20:32:09 -0500 Subject: [Ferret-talk] Reducing dependency on remote ferret process In-Reply-To: References: Message-ID: I've added suggestions from Julio Cesar Ody and Peter Jones to the relevant ticket: http://projects.jkraemer.net/acts_as_ferret/ticket/149 If anyone has another solution, consider adding it to the ticket. Quote code in trac like this: {{{ my code }}}