From andy.kuo at gmail.com Fri Sep 1 01:05:31 2006 From: andy.kuo at gmail.com (Andy Kuo) Date: Fri, 1 Sep 2006 07:05:31 +0200 Subject: [Ferret-talk] Ferret/acts_as_ferret don't seem to be doing anything Message-ID: I'm having an odd problem with Ferret 0.9.5 and acts_as_ferret 0.2.3, on RedHat Linux Enterprise 4. All of my find_by_contents calls return 0 results. When I try Foo.rebuild_index via the console, it returns nil. This is happening on my production machine, during deployment of my app. Everything works perfectly on my development machine (Windows). I have no idea where to start debugging this. Am I missing something obvious? Any help would be greatly appreciated. Thanks, Andy -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Fri Sep 1 04:26:54 2006 From: kraemer at webit.de (Jens Kraemer) Date: Fri, 1 Sep 2006 10:26:54 +0200 Subject: [Ferret-talk] Ferret/acts_as_ferret don't seem to be doing anything In-Reply-To: References: Message-ID: <20060901082654.GO9513@cordoba.webit.de> On Fri, Sep 01, 2006 at 07:05:31AM +0200, Andy Kuo wrote: > I'm having an odd problem with Ferret 0.9.5 and acts_as_ferret 0.2.3, on > RedHat Linux Enterprise 4. > > All of my find_by_contents calls return 0 results. When I try > Foo.rebuild_index via the console, it returns nil. Does anything show up in the log file when you do this ? A possible problem might be missing write permissions to your RAILS_ROOT, though I think this should result in an error. Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From dbalmain.ml at gmail.com Fri Sep 1 06:18:14 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Fri, 1 Sep 2006 19:18:14 +0900 Subject: [Ferret-talk] Help with Multiple Readers, 1 Writer scenario In-Reply-To: <126EC586577FD611A28E00A0C9A03758B5BBFE@maui.bmsoft.com.au> References: <126EC586577FD611A28E00A0C9A03758B5BBFE@maui.bmsoft.com.au> Message-ID: On 8/28/06, Neville Burnell wrote: > Hi, > > I'm building a web server application using Ferret [thanks so much > Dave], Mongrel and Camping which works fine servicing one request at a > time, but serialises searches if more than one request arrives, so I'd > like some advice please about the best way to use multiple readers and > one writer. > > Some background ... query requests which in my case are always read > only, arrive via Mongrel, which allocates a thread for each request. > Should I create a new IndexReader for each request also, or can I use > one IndexReader concurrently? Creating a new reader per request is not a good idea since creating a new IndexReader is an expensive operation (although it has been significantly improved in version 0.10). A lot of data needs to be read into memory for fast access. In most situations the ideal solution is to have a single IndexReader per thread. You can have as many IndexReaders open on an index as your operating system will allow. The one situation where you might be better off using a single IndexReader is when you are relying on caching. Filters and Sorts are cached per IndexReader and Sorts in particular can take up a fair chunk of memory so if you have a large index (large as in number of documents, not size of data) then you may be better off with a single IndexReader. IndexReader is thread-safe so using it concurrently should be fine. > Index updates on the other hand are coordinated by a special Update > Thread which runs every 10 minutes or so. I'm guessing that the best > approach is to create an IndexWriter for each update run, which can be > closed and discarded at the end of the update run. Or can I close and > reuse a single IndexWriter? You can't reuse an IndexWriter after it has been closed. But you can commit the changes to disk; writer.commit() IndexWriter#optimize will also commit all changes to disk as an optimal index but depending on the size of your index you may only want to call optimize once a day if at all. For a small index however, calling it every ten minutes is definitely possible. > I searched http://ferret.davebalmain.com/api for details on the > MultiReader, but I couldn't find any details. If someone could post a > link to point me in the right direction that would be great. You can actually pass an array of readers as the first (only) parameter to IndexReader.new. reader = IndexReader.new([reader1, reader2, reader3]) In the current working version of Ferret you can also pass Directory objects or paths; iw = IndexReader.new([dir, dir2, dir3]) iw = IndexReader.new(["/path/to/index1", "/path/to/index2"]) wait for 10.2 for this functionality (and an update to include this info in the API docs). Cheers, Dave From kraemer at webit.de Fri Sep 1 07:26:53 2006 From: kraemer at webit.de (Jens Kraemer) Date: Fri, 1 Sep 2006 13:26:53 +0200 Subject: [Ferret-talk] disabling automatic indexing in acts_as_ferret In-Reply-To: <1156504226.6728.4.camel@localhost.localdomain> References: <1156504226.6728.4.camel@localhost.localdomain> Message-ID: <20060901112653.GQ9513@cordoba.webit.de> On Fri, Aug 25, 2006 at 12:10:26PM +0100, Pedro C?rte-Real wrote: > I'd like to be able to enable/disable the automatic indexing of > documents acts_as_ferret does. Something like MyModel.disable_indexing > MyModel.enable_indexing would be perfect. I need this because I do some > indexing that requires visiting the parents of the model objects and my > import method imports the children first, so the information isn't there > yet. I'd like to disable the indexing, do all the importing and then > manually index the documents. Having a MyModel#index would be great for > this too. > > Is there anything of the sort? there's an instance variable @ferret_reindex that's checked before the indexing takes place. Something like def save_noindex @ferret_reindex = false save end in your model should save the record without reindexing it. The boolean is set to true in the after_save handler again, so the next call to save should reindex again. You may call ferret_update directly to reindex without saving, too. Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From andy.kuo at gmail.com Fri Sep 1 08:16:17 2006 From: andy.kuo at gmail.com (Andy Kuo) Date: Fri, 1 Sep 2006 14:16:17 +0200 Subject: [Ferret-talk] Ferret/acts_as_ferret don't seem to be doing anything In-Reply-To: <20060901082654.GO9513@cordoba.webit.de> References: <20060901082654.GO9513@cordoba.webit.de> Message-ID: <2e8057e6aa0d7a865b3de4b85c4d1305@ruby-forum.com> > > Does anything show up in the log file when you do this ? A possible > problem might be missing write permissions to your RAILS_ROOT, though > I think this should result in an error. > Thanks for the tip, Jens. I think I figured this out. It turns out it was two separate problems, for those who are interested. 1. My find_by_contents calls are wrapped in a with_scope(:include). find_by_contents has a line conditions = [ "id in (?)", id_array ] During development, this was causing a MySQL "ambiguous column id" error, because the tables I was joining each contained an id column. I had changed the line to conditions = [ "#{self.name.pluralize}.id in (?)", id_array ] which worked fine on my Windows box. On the Linux box, MySQL was complaining about an "unknown column Foos.id". I hadn't realized that MySQL on Linux was case-sensitive, whereas Windows is not, so I changed the line to conditions = [ "#{self.name.downcase.pluralize}.id in (?)", id_array ] 2. I had defined a WhiteSpaceAnalyzer (different from the one defined in Ferret) in my environment.rb file, and was calling it in acts_as_ferret. For some reason, though, the analyzer being used was the WhiteSpaceAnalyzer defined in Ferret, instead of mine. My guess as to why this is happening is because of the order in which environment.rb and ferret are loaded? I'm not even sure redefining it in environment.rb is the right way to do it, so I'd be happy if someone knew what the right way of doing this is. My workaround was just to rename my analyzer. -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Fri Sep 1 08:26:22 2006 From: kraemer at webit.de (Jens Kraemer) Date: Fri, 1 Sep 2006 14:26:22 +0200 Subject: [Ferret-talk] Ferret/acts_as_ferret don't seem to be doing anything In-Reply-To: <2e8057e6aa0d7a865b3de4b85c4d1305@ruby-forum.com> References: <20060901082654.GO9513@cordoba.webit.de> <2e8057e6aa0d7a865b3de4b85c4d1305@ruby-forum.com> Message-ID: <20060901122622.GR9513@cordoba.webit.de> Hi! On Fri, Sep 01, 2006 at 02:16:17PM +0200, Andy Kuo wrote: [..] > > On the Linux box, MySQL was complaining about an "unknown column > Foos.id". I hadn't realized that MySQL on Linux was case-sensitive, > whereas Windows is not, so I changed the line to > > conditions = [ "#{self.name.downcase.pluralize}.id in (?)", id_array ] Even better would be "#{self.table_name}.id in (?)", because that works in case you don't use the standard naming scheme for your tables, too. Recent versions of aaf contain this fix. > 2. I had defined a WhiteSpaceAnalyzer (different from the one defined in > Ferret) in my environment.rb file, and was calling it in acts_as_ferret. > For some reason, though, the analyzer being used was the > WhiteSpaceAnalyzer defined in Ferret, instead of mine. My guess as to > why this is happening is because of the order in which environment.rb > and ferret are loaded? I'm not even sure redefining it in > environment.rb is the right way to do it, so I'd be happy if someone > knew what the right way of doing this is. My workaround was just to > rename my analyzer. where you declare you custom Analyzer class shouldn't be that important, I'd personally place it in lib/ and require that file in environment.rb. To avoid problems you should indeed choose a unique class name, though I can't imagine where the problem was in that special case. Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From dbalmain.ml at gmail.com Fri Sep 1 08:27:25 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Fri, 1 Sep 2006 21:27:25 +0900 Subject: [Ferret-talk] stop words and /'s In-Reply-To: References: Message-ID: On 8/28/06, Caspar wrote: > Hi new version of ferret and acts as ferret have sorted out the scary > glibc *** linked list pointer errors, thank god! New version are good > but some searches are still not working. It is mostly the stop words > ones. For example the "For Sale/Free/Swap" fails but works when "for" is > stripped out. I have read all the recent posts regarding this issue and > failed to get it to work in a clean way. I tried: > > acts_as_ferret :fields=> ['short_description','section'], :analyzer => > Ferret::Analysis::StandardAnalyzer.new([]) > > and rebuilt my index but same behavior as before. There seems to have > been alot of effort put into tracking down a solution to this problem so > has one been found that i'm missing? I'm using ferret 0.10.1 and > bleeding edge acts_as_ferret (is this the only version that works with > ferret 0.10.1?). > Also a search for "Film/Theatre" fails while lots of other serahces with > /'s in them succeed, is there a reason why this might be happening ? Hi Caspar, I'm not sure what you are trying to do with the "Film/Theatre" search. If you are trying to Film or Theatre you should search for "Film|Theatre". Also, could you please give an example of a search that doesn't work including the data that is entered. For example; irb(main):001:0> require 'rubygems' => true irb(main):002:0> require 'ferret' => false irb(main):003:0> include Ferret => Object irb(main):004:0> i = I.new => #:*, :dir=>#, :analyzer=>#}, @mon_owner=nil, @id_field=:id, @reader=nil, @mon_waiting_queue=[], @analyzer=#, @default_input_field=:id, @dir=#, @mon_entering_queue=[], @qp=nil, @writer=nil, @mon_count=0> irb(main):005:0> i << "For Free" => nil irb(main):006:0> i << "For Sale" => nil irb(main):007:0> i << "For Swap" => nil irb(main):008:0> i.search('"For Free|Sale|Swap"') => #, #, #], max_score=4.21639537811279> Cheers, Dave From dbalmain.ml at gmail.com Fri Sep 1 09:46:42 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Fri, 1 Sep 2006 22:46:42 +0900 Subject: [Ferret-talk] [0.10.0] Index#search is not thread safe ? In-Reply-To: <20060827165409.GC23274@cordoba.webit.de> References: <20060827165409.GC23274@cordoba.webit.de> Message-ID: On 8/28/06, Jens Kraemer wrote: > On Sat, Aug 26, 2006 at 04:12:57PM +0200, Florent Solt wrote: > > This script (http://pastie.caboo.se/10371) give this result : > > > > "1" > > "0" > > "0" > > "0" > > > > Why the other thread does not have the same result ? > > Maybe, it's not the correct way to use the index in a multi threaded > > environement but I don't know how to do. > > This should work, imho. I even tried to synchronize access to the index > like that: > > guard = Mutex.new > [..] > and in the thread's block: > guard.synchronize do > result = index.search('id:42') > p result.total_hits > end > > but the output stayed the same. This was a bug that has been fixed in the current working copy. The fix will be released in version 0.10.2. Cheers, Dave From thepyrat at gmail.com Fri Sep 1 10:34:42 2006 From: thepyrat at gmail.com (Alastair Brunton) Date: Fri, 1 Sep 2006 16:34:42 +0200 Subject: [Ferret-talk] acts_as_ferret for Ferret 0.10 In-Reply-To: <20060824151137.GF25651@cordoba.webit.de> References: <20060824150647.GE25651@cordoba.webit.de> <20060824151137.GF25651@cordoba.webit.de> Message-ID: <454e0989a3466e2a48f3359748e3b417@ruby-forum.com> I get the following error with the 0.10.1 and the new acts as ferret.. undefined local variable or method `hits' Cheers, Alastair. Jens Kraemer wrote: > For installation of the plugin via script/plugin install, append > '/acts_as_ferret' to the svn urls below. > > Jens > > On Thu, Aug 24, 2006 at 05:06:47PM +0200, Jens Kraemer wrote: >> - more_like_this is broken >> -- >> webit! Gesellschaft f?r neue Medien mbH www.webit.de >> Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de >> Schnorrstra?e 76 Tel +49 351 46766 0 >> D-01069 Dresden Fax +49 351 46766 66 >> _______________________________________________ >> Ferret-talk mailing list >> Ferret-talk at rubyforge.org >> http://rubyforge.org/mailman/listinfo/ferret-talk > > -- > webit! Gesellschaft f?r neue Medien mbH www.webit.de > Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de > Schnorrstra?e 76 Tel +49 351 46766 0 > D-01069 Dresden Fax +49 351 46766 66 -- Posted via http://www.ruby-forum.com/. From dbalmain.ml at gmail.com Fri Sep 1 10:49:33 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Fri, 1 Sep 2006 23:49:33 +0900 Subject: [Ferret-talk] acts_as_ferret, locking, auto_flush In-Reply-To: <8D24C472-72FB-4F6E-9A60-7D084870CB16@sialia.com> References: <44f26b25fc4413f6447d2750f5c3fd32@ruby-forum.com> <20060826154625.GC23639@cordoba.webit.de> <20060827115804.GB23274@cordoba.webit.de> <6017D5F8-D58B-4DBB-985E-E6D26C44B3FE@sialia.com> <20060828102028.GF10178@cordoba.webit.de> <8D24C472-72FB-4F6E-9A60-7D084870CB16@sialia.com> Message-ID: On 8/29/06, David Ranney wrote: > Jens, > > Thanks again, that makes sense. I'll implement some workarounds in my > code until we hear from Dave on the matter. Hey Dave, I've added a small sleep to the lock-obtain loop which should fix this problem. There is also a single 2 second sleep and retry in the Index class (you can change this with the :lock_retry_time parameter). Please let me know whether or not this works for you. It's already in the subversion repository and it'll be out in version 0.10.2. Cheers, Dave From dbalmain.ml at gmail.com Fri Sep 1 11:09:44 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Sat, 2 Sep 2006 00:09:44 +0900 Subject: [Ferret-talk] how to get the words of a query In-Reply-To: References: <0adeabd1dc7f244671e899c6d735004c@symetrie.com> <20060828102229.GG10178@cordoba.webit.de> Message-ID: On 8/28/06, Jean-Christophe Michel wrote: > Hi, > > Le 28 ao?t 06, ? 12:22, Jens Kraemer a ?crit : > > in Ferret 0.10 there's a highlight method in the Searcher class. Maybe > > that does what you want ? > > > > http://ferret.davebalmain.com/api/classes/Ferret/Search/ > > Searcher.html#M000223 > > Seems good, will be perfect if your truncate respects multi-byte chars. > My ruby helper does it, see how it works on > http://symetrie.com/fr/search > (it highlights only the first occurence of each word currently). Hi Jean-Christophe, Are you saying the highlight doesn't respect multi-byte characters? If so, could you give an example? The highlighter uses the byte boundaries returned by the analyzer during indexing so I can't see any reason multi-byte characters wouldn't be respected. Also, it's quite a bit more advanced then your version (and the version in Lucene contrib for that matter). It highlights only the terms that match the query. So if you search for the phrase "red truck" the terms "red" and "truck" will only be highlighted if they appear together. If you search for "red truck"~1 then the phrase "red fire truck" will be highlighted. It also uses a pretty clever algorithm to find the excerpts with the most matching information. It's still quite experimental though so I need people to try it out and send in their suggestions. Cheers, Dave From dbalmain.ml at gmail.com Fri Sep 1 11:14:21 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Sat, 2 Sep 2006 00:14:21 +0900 Subject: [Ferret-talk] Excluding values from search term In-Reply-To: <20060829065751.GD14476@cordoba.webit.de> References: <3b366b25f82a78b88b9014122ffa9de0@ruby-forum.com> <20060829065751.GD14476@cordoba.webit.de> Message-ID: On 8/29/06, Jens Kraemer wrote: > On Tue, Aug 29, 2006 at 02:27:18AM +0200, Thiago wrote: > > I'm trying to do a search that would exclude a value from it and return > > whatever results aren't matched from the term, but it's not working very > > well. > > > > None of these are working, they all return the results including the > > ones with tag_id = 701 > > > > tag_id:(NOT 701) > > tag_id:(! 701) > > tag_id:(- 701) > > a Query that only has one negative (must not) term is not possible, I > believe. That's because of how the index works internally, I think. > Probabla such a query would require a full scan of the index, which is > something you usually don't want to happen. > > Jens That is correct. Single negatives won't work. You could try this; tag_id:(* AND NOT 701) From raul at murciano.net Fri Sep 1 12:20:20 2006 From: raul at murciano.net (Raul Murciano) Date: Fri, 1 Sep 2006 18:20:20 +0200 Subject: [Ferret-talk] Excluding a term on a search Message-ID: <832e87d964ff831c0b25744d4c5ad77c@ruby-forum.com> Hi there, I have a model with indexed fields (F1, F2, F3...). I have a field-based search (+F1:..., +F2:...), and a global search (which queries all fields with the given terms). Both of them works ok, but I would like to exclude a term from the global search (though it should remain indexed to support the field-based search), is it possible? Thanks in advance. -- Posted via http://www.ruby-forum.com/. From raul at murciano.net Fri Sep 1 12:31:02 2006 From: raul at murciano.net (Raul Murciano) Date: Fri, 1 Sep 2006 18:31:02 +0200 Subject: [Ferret-talk] Typo In-Reply-To: <832e87d964ff831c0b25744d4c5ad77c@ruby-forum.com> References: <832e87d964ff831c0b25744d4c5ad77c@ruby-forum.com> Message-ID: <9504ec294cf26227dba273f62e046643@ruby-forum.com> Of course, the thread topic should be 'Excluding a FIELD on a search'. I'm sorry... Raul Murciano wrote: > > Hi there, > > I have a model with indexed fields (F1, F2, F3...). > > I have a field-based search (+F1:..., +F2:...), and a global search > (which queries all fields with the given terms). Both of them works ok, > but I would like to exclude a term from the global search (though it > should remain indexed to support the field-based search), is it > possible? > > Thanks in advance. -- Posted via http://www.ruby-forum.com/. From mkhumri at allegromedical.com Fri Sep 1 15:01:23 2006 From: mkhumri at allegromedical.com (Mufaddal Khumri) Date: Fri, 1 Sep 2006 21:01:23 +0200 Subject: [Ferret-talk] installing ferret In-Reply-To: References: <4b34d023a849da1d058336537c024749@ruby-forum.com> <50ec9b6262700512c8b20972d2132d53@ruby-forum.com> Message-ID: This is wierd: I had the above app working fine for a day and just got back to it today and get an error. Here is the code from f.rb: --------------------------------- require 'rubygems' require 'ferret' include Ferret index = Index::Index.new(:path => '/opt/search-index') --------------------------------- Exception thrown: --------------------------------- ./ferret.rb:3: uninitialized constant Ferret (NameError) from /usr/local/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:21:in `require' from f.rb:2 --------------------------------- Any ideas? Mufaddal Khumri wrote: > Thank you for the quick response. that was precisely the mistake. > > Florent Solt wrote: >> Mufaddal Khumri wrote: >>> I am trying to test drive ferret on a ubuntu dapper installation. >>> >>> I have a ferret-test.rb file like: >>> ----begin--------- >>> require 'ferret' >>> include Ferret >>> index = Index::Index.new(:path => '/opt/search-index') >>> ----end----------- >>> >> >> Add this : require 'rubygems' -- Posted via http://www.ruby-forum.com/. From dbalmain.ml at gmail.com Fri Sep 1 21:43:37 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Sat, 2 Sep 2006 10:43:37 +0900 Subject: [Ferret-talk] [0.10.0] Index#add_document bug with strange value ? In-Reply-To: References: <23e21c37d4c5d6119e8617da53371ea4@ruby-forum.com> Message-ID: On 8/29/06, David Balmain wrote: > On 8/26/06, Florent Solt wrote: > > Perhaps, I found where is my problem (during a big import). > > Why this silly (really silly :)) example crash ? > > > > http://pastie.caboo.se/10357 > > > > /usr/lib/ruby/site_ruby/1.8/ferret/index.rb:211:in `add_document': IO > > Error occured at :79 in xraise (IOError) > > Error occured in fs_store.c:225 - fso_flush_i > > flushing src of length -2 > > > > from /usr/lib/ruby/site_ruby/1.8/ferret/index.rb:211:in `<<' > > from /usr/lib/ruby/1.8/monitor.rb:229:in `synchronize' > > from /usr/lib/ruby/site_ruby/1.8/ferret/index.rb:186:in `<<' > > from test.rb:13 > > from test.rb:8 > > Hi Florent, > > This is something that I still need to work on. The Locale sensitive > analyzers aren't as robust as they could be. Try using the > AsciiStandardAnalyzer instead. Or better yet, don't index binary data. > You can store binary data but indexing it doesn't usually make a lot > of sense. At least not without a custom analyzer. Having said that, I > will try and fix this. > > Cheers, > Dave > Just an update on this issue. I've now made the StandardAnalyzer more robust so it won't crash as easily (hopefully not at all) with bad data. In the process of fixing this I also added a fix so that the StandardTokenizer will now tokenize negative numbers. ie it will parse "-23" as "-23" instead of just "23". Cheers, Dave From dbalmain.ml at gmail.com Fri Sep 1 21:54:45 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Sat, 2 Sep 2006 10:54:45 +0900 Subject: [Ferret-talk] adding new items to index breaks searches with * In-Reply-To: <20060829065051.GB14476@cordoba.webit.de> References: <20060829065051.GB14476@cordoba.webit.de> Message-ID: On 8/29/06, Jens Kraemer wrote: > On Tue, Aug 29, 2006 at 02:06:16AM +0200, Clare wrote: > > Hi after upgrading to ferret 0.10.1 and bleeding edge aaf i'm getting > > some strange behavior. Generally much better stability with new version > > of ferret but when i add new items for some reason i can no longer > > search with a *. Or rather i can but it returns no results and no > > errors. I can search and get results normally on other searches and when > > i rebuild the index i can search with * until i add a new item. Has > > anyone else experienced this? I use * in my browse items page. > > do you mean a query only consisting of '*' or wild card queries like > 'test*' ? The former isn't an allowed query, afaik. Don't know why it > works before modifying the index. Here's the snippet how I reproduced > this behavior: > > require 'rubygems' > require 'ferret' > include Ferret > i = I.new > i << 'just some testing' > i.search('*').total_hits # => 1 > i << 'another testing session' > i.search('*').total_hits # => 0 > > > why don't you just use find(:all) on your browse page ? Thanks for the snippet Jens. This was a bug (quite a serious one) which I have now fixed. As Jens said, "*" queries were not a good idea and would fail on most indexes because of the number of terms (the got expanded as MultiTermQueries with every single term in the index). However, I've now modified the QueryParser to translate "*" to a MatchAllQuery so there should be no problem, performance or otherwise with using "*" in your queries. I should note here that "title:*" will match all documents include documents that don't have a :title field. If you only want documents with a :title field you should use "title:?*". Having said that, if you are using these types of queries there is probably a better way to do what you are doing. Cheers, Dave From dbalmain.ml at gmail.com Fri Sep 1 22:12:50 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Sat, 2 Sep 2006 11:12:50 +0900 Subject: [Ferret-talk] [0.10.x] Index#search with wildcard bug In-Reply-To: <38b5b61633a178649f6d31e21899be87@ruby-forum.com> References: <9c6298fa2d7c8054780ec5a2024cdf53@ruby-forum.com> <20060830100019.GE9513@cordoba.webit.de> <38b5b61633a178649f6d31e21899be87@ruby-forum.com> Message-ID: On 8/30/06, Florent Solt wrote: > Jens Kraemer wrote: > > > > there was a mail recently where this problem already came up ( subject > > was "adding new items to index breaks searches with *" ). > > > > I always thought that searching with only a wildcard as in 'id:*' would > > not > > be possible at all. So I'd consider it a bug that Ferret delivers > > results in > > the first place... > > > > Jens > > It's not the answer I would like to read :) :) > > But do you know how to do this query : "Any document that have an id > field" ? Hi Florent, I answered this on the other thread that Jens mentioned but I better say it again here. This was a bug which has been fixed. "id:*" will match all documents with or without an id field. To get all documents with an id field you should use "id:?*". Wait for version 0.10.2 though before this will work. Cheers, Dave From dbalmain.ml at gmail.com Fri Sep 1 23:17:44 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Sat, 2 Sep 2006 12:17:44 +0900 Subject: [Ferret-talk] AAF Sorting by date - what am I doing wrong? In-Reply-To: <5a0e13729aa9cad0894c87b7c241b579@ruby-forum.com> References: <306160cee51217d231f33a03adc3ef0d@ruby-forum.com> <20060830215715.GC5292@cordoba.webit.de> <846f30c70608311321l402ad8f8o9caea2f525266b28@mail.gmail.com> <5a0e13729aa9cad0894c87b7c241b579@ruby-forum.com> Message-ID: On 9/1/06, Ian Zabel wrote: > I'm not sure why sorting by :id is so slow. It takes like 60 seconds or > more to return a query sorted by id, and only like 0.5 seconds when not > sorted. Weird. Hi Ian, Try optimizing the index. Sorting results by a field will naturally take a little longer then sorting the results by relevancy because an index needs to be built for that field. Once the sort-index is built it is cached for the IndexReader so future sorts should be almost as fast getting unsorted results. To build the index Ferret needs to iterate through all the terms in the index. This takes significantly longer for unoptimized indexes. Here is a quick benchmark you can try running; require 'ferret' include Ferret words = %w{one two three four five six seven eight nine ten} i = I.new start_time = Time.now 100000.times { i << {:id => rand(1000000), :content => words[rand(10)]}} puts "Building index took #{Time.new - start_time} seconds" start_time = Time.now i.search("one", :sort => :id) puts "Sort by integer took #{Time.new - start_time} seconds the first time" start_time = Time.now i.search("one", :sort => :id) puts "Sort by integer took #{Time.new - start_time} seconds the second time" i.__send__(:ensure_writer_open) # get rid of sort cache start_time = Time.now i.search("one", :sort => [Ferret::Search::SortField.new(:id, :type => :byte)]) puts "Sort by bytes took #{Time.new - start_time} seconds the first time" start_time = Time.now i.search("one", :sort => [Ferret::Search::SortField.new(:id, :type => :byte)]) puts "Sort by bytes took #{Time.new - start_time} seconds the second time" puts "\nOPTIMIZING THE INDEX\n" start_time = Time.now i.optimize puts "Optimizing the index took #{Time.new - start_time} seconds" start_time = Time.now i.search("one", :sort => :id) puts "Sort by integer took #{Time.new - start_time} seconds the first time" start_time = Time.now i.search("one", :sort => :id) puts "Sort by integer took #{Time.new - start_time} seconds the second time" i.__send__(:ensure_writer_open) # get rid of sort cache start_time = Time.now i.search("one", :sort => [Ferret::Search::SortField.new(:id, :type => :byte)]) puts "Sort by bytes took #{Time.new - start_time} seconds the first time" start_time = Time.now i.search("one", :sort => [Ferret::Search::SortField.new(:id, :type => :byte)]) puts "Sort by bytes took #{Time.new - start_time} seconds the second time" And here are the results on my system; Building index took 36.131648 seconds Sort by integer took 15.39588 seconds the first time Sort by integer took 0.002627 seconds the second time Sort by bytes took 15.889957 seconds the first time Sort by bytes took 0.001914 seconds the second time OPTIMIZING THE INDEX Optimizing the index took 0.639831 seconds Sort by integer took 0.170887 seconds the first time Sort by integer took 0.001423 seconds the second time Sort by bytes took 0.029054 seconds the first time Sort by bytes took 0.001424 seconds the second time So optimizing the index before sorting should help a lot. Cheers, Dave From dbalmain.ml at gmail.com Sat Sep 2 01:50:21 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Sat, 2 Sep 2006 14:50:21 +0900 Subject: [Ferret-talk] Hyphens In-Reply-To: <96e5c1aab2b8a87365304ee289f5e4b2@ruby-forum.com> References: <313cb4dea81d5d8679572d6b8221b9c0@ruby-forum.com> <20060830160044.GJ9513@cordoba.webit.de> <20060830175941.GA5292@cordoba.webit.de> <20060830211645.GB5292@cordoba.webit.de> <96e5c1aab2b8a87365304ee289f5e4b2@ruby-forum.com> Message-ID: On 9/1/06, Michael Leung wrote: > Heya Jens, > > Actually, I'm having the problem where no records get returned, when I > query for the full number: 123-45-55555 for example. > > M. Hi Michael, It works here in version 0.10.1; irb(main):001:0> require 'rubygems' => true irb(main):002:0> require 'ferret' => false irb(main):003:0> include Ferret => Object irb(main):004:0> i = I.new => # irb(main):005:0> i << {:content => "the phone number is 123-45-55555"} => nil irb(main):006:0> i.search("content:123-45-55555") => #], max_score=0.1534264087677> irb(main):007:0> I put a bug-fix for this in version 0.10.1. I think it is fixed in 0.9.6 too but I can't remember for certain. You're better off upgrading to 0.10.1, especially if you are using acts_as_ferret (since most of the work has already been done for you). Cheers, Dave From dbalmain.ml at gmail.com Sat Sep 2 01:59:54 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Sat, 2 Sep 2006 14:59:54 +0900 Subject: [Ferret-talk] Excluding a term on a search In-Reply-To: <832e87d964ff831c0b25744d4c5ad77c@ruby-forum.com> References: <832e87d964ff831c0b25744d4c5ad77c@ruby-forum.com> Message-ID: On 9/2/06, Raul Murciano wrote: > > Hi there, > > I have a model with indexed fields (F1, F2, F3...). > > I have a field-based search (+F1:..., +F2:...), and a global search > (which queries all fields with the given terms). Both of them works ok, > but I would like to exclude a term from the global search (though it > should remain indexed to support the field-based search), is it > possible? Hi Raul, You probably want to do something like this; require 'rubygems' require 'ferret' i = Ferret::I.new(:default_field => [:f1, :f2, :f3]) i << {:f1 => "hello"} i << {:f2 => "hello"} i << {:f3 => "hello"} i << {:f4 => "hello"} puts "global" i.search_each("hello") {|d, s| puts "#{d}"} puts "field" i.search_each("f4:hello") {|d, s| puts "#{d}"} This should print this out; global 0 1 2 field 3 This should be pretty simple to do through acts_as_ferret too. Cheers, Dave From dbalmain.ml at gmail.com Sat Sep 2 02:05:26 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Sat, 2 Sep 2006 15:05:26 +0900 Subject: [Ferret-talk] installing ferret In-Reply-To: References: <4b34d023a849da1d058336537c024749@ruby-forum.com> <50ec9b6262700512c8b20972d2132d53@ruby-forum.com> Message-ID: On 9/2/06, Mufaddal Khumri wrote: > This is wierd: > > I had the above app working fine for a day and just got back to it today > and get an error. > > Here is the code from f.rb: > --------------------------------- > require 'rubygems' > require 'ferret' > include Ferret > index = Index::Index.new(:path => '/opt/search-index') > --------------------------------- > > Exception thrown: > --------------------------------- > ./ferret.rb:3: uninitialized constant Ferret (NameError) > from > /usr/local/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:21:in > `require' > from f.rb:2 > --------------------------------- > > Any ideas? Yes, you have a file called ferret.rb in your current directory which is being loaded instead of the Ferret gem. Try renaming the file and everything should work again. Cheers, Dave From me at mos.cn Sat Sep 2 07:57:39 2006 From: me at mos.cn (hui) Date: Sat, 2 Sep 2006 19:57:39 +0800 Subject: [Ferret-talk] installing ferret In-Reply-To: References: <4b34d023a849da1d058336537c024749@ruby-forum.com> <50ec9b6262700512c8b20972d2132d53@ruby-forum.com> Message-ID: <35ae50b10609020457l55aade19i1824511a0d004a45@mail.gmail.com> Hi Dave, I can't compile ferret on Windows XP, with VC6.0, come compile error. D:\Program Files\Microsoft Visual Studio\VC98\include\wingdi.h(93) : warning C40 05: 'ERROR' : macro redefinition except.h(71) : see previous definition of 'ERROR' r_search.c(1142) : error C2275: 'VALUE' : illegal use of this type as an express ion F:/InstantRails/ruby/lib/ruby/1.8/i386-mswin32\ruby.h(79) : see declarat ion of 'VALUE' r_search.c(1142) : error C2146: syntax error : missing ';' before identifier 'v' r_search.c(1142) : error C2065: 'v' : undeclared identifier r_search.c(2041) : error C2039: 'rb_w32_close' : is not a member of 'Searcher' search.h(703) : see declaration of 'Searcher' r_search.c(2059) : error C2039: 'rb_w32_close' : is not a member of 'Searcher' search.h(703) : see declaration of 'Searcher' r_search.c(2461) : error C2039: 'rb_w32_close' : is not a member of 'Searcher' search.h(703) : see declaration of 'Searcher' From fsolt at rift.fr Sat Sep 2 08:31:34 2006 From: fsolt at rift.fr (Florent Solt) Date: Sat, 2 Sep 2006 14:31:34 +0200 Subject: [Ferret-talk] [0.10.x] Index#search with wildcard bug In-Reply-To: References: <9c6298fa2d7c8054780ec5a2024cdf53@ruby-forum.com> <20060830100019.GE9513@cordoba.webit.de> <38b5b61633a178649f6d31e21899be87@ruby-forum.com> Message-ID: <06ca4567f6cfd24eb6a43261a8345254@ruby-forum.com> Thanks a lot Dave, great job ! -- Posted via http://www.ruby-forum.com/. From fsolt at rift.fr Sat Sep 2 08:33:59 2006 From: fsolt at rift.fr (Florent Solt) Date: Sat, 2 Sep 2006 14:33:59 +0200 Subject: [Ferret-talk] [0.10.0] Index#search is not thread safe ? In-Reply-To: References: <20060827165409.GC23274@cordoba.webit.de> Message-ID: <7d770278bdddc945be70df4a9d3da455@ruby-forum.com> Thanks Dave, I'll check it. -- Posted via http://www.ruby-forum.com/. From fsolt at rift.fr Sat Sep 2 08:35:19 2006 From: fsolt at rift.fr (Florent Solt) Date: Sat, 2 Sep 2006 14:35:19 +0200 Subject: [Ferret-talk] [0.10.0] Index#add_document bug with strange value ? In-Reply-To: References: <23e21c37d4c5d6119e8617da53371ea4@ruby-forum.com> Message-ID: Cool ! and as usual, great job Dave ! -- Posted via http://www.ruby-forum.com/. From jc.michel at symetrie.com Sat Sep 2 13:08:58 2006 From: jc.michel at symetrie.com (Jean-Christophe Michel) Date: Sat, 2 Sep 2006 19:08:58 +0200 Subject: [Ferret-talk] how to get the words of a query In-Reply-To: References: <0adeabd1dc7f244671e899c6d735004c@symetrie.com> <20060828102229.GG10178@cordoba.webit.de> Message-ID: Hi, Le 1 sept. 06, ? 17:09, David Balmain a ?crit : >> Seems good, will be perfect if your truncate respects multi-byte >> chars. >> My ruby helper does it, see how it works on >> http://symetrie.com/fr/search >> (it highlights only the first occurence of each word currently). > > > Are you saying the highlight doesn't respect multi-byte characters? If > so, could you give an example? The highlighter uses the byte > boundaries returned by the analyzer during indexing so I can't see any > reason multi-byte characters wouldn't be respected. No, it was a question, I was wondering wether it respected the multibyte. It's a good news it can handle unicode. > Also, it's quite a bit more advanced then your version (and the > version in Lucene contrib for that matter). It highlights only the > terms that match the query. So if you search for the phrase "red > truck" the terms "red" and "truck" will only be highlighted if they > appear together. If you search for "red truck"~1 then the phrase "red > fire truck" will be highlighted. It also uses a pretty clever > algorithm to find the excerpts with the most matching information. > It's still quite experimental though so I need people to try it out > and send in their suggestions. Ok, I'll try. Till now I was using my own ruby hilighter. Jean-Christophe Michel -- Sym?trie, ?dition de musique et services multim?dia 30 rue Jean-Baptiste Say 69001 LYON (FRANCE) t?l +33 (0)478 29 52 14 fax +33 (0)478 30 01 11 web www.symetrie.com From dbalmain.ml at gmail.com Sat Sep 2 19:35:34 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Sun, 3 Sep 2006 08:35:34 +0900 Subject: [Ferret-talk] installing ferret In-Reply-To: <35ae50b10609020457l55aade19i1824511a0d004a45@mail.gmail.com> References: <4b34d023a849da1d058336537c024749@ruby-forum.com> <50ec9b6262700512c8b20972d2132d53@ruby-forum.com> <35ae50b10609020457l55aade19i1824511a0d004a45@mail.gmail.com> Message-ID: On 9/2/06, hui wrote: > Hi Dave, I can't compile ferret on Windows XP, with VC6.0, come compile error. > > D:\Program Files\Microsoft Visual Studio\VC98\include\wingdi.h(93) : warning C40 > 05: 'ERROR' : macro redefinition > except.h(71) : see previous definition of 'ERROR' > r_search.c(1142) : error C2275: 'VALUE' : illegal use of this type as an express > ion > F:/InstantRails/ruby/lib/ruby/1.8/i386-mswin32\ruby.h(79) : see declarat > ion of 'VALUE' > r_search.c(1142) : error C2146: syntax error : missing ';' before identifier 'v' > > r_search.c(1142) : error C2065: 'v' : undeclared identifier > r_search.c(2041) : error C2039: 'rb_w32_close' : is not a member of 'Searcher' > search.h(703) : see declaration of 'Searcher' > r_search.c(2059) : error C2039: 'rb_w32_close' : is not a member of 'Searcher' > search.h(703) : see declaration of 'Searcher' > r_search.c(2461) : error C2039: 'rb_w32_close' : is not a member of 'Searcher' > search.h(703) : see declaration of 'Searcher' Hi Hui, I'm working on this right now. From me at mos.cn Sat Sep 2 19:58:34 2006 From: me at mos.cn (hui) Date: Sun, 3 Sep 2006 07:58:34 +0800 Subject: [Ferret-talk] installing ferret In-Reply-To: References: <4b34d023a849da1d058336537c024749@ruby-forum.com> <50ec9b6262700512c8b20972d2132d53@ruby-forum.com> <35ae50b10609020457l55aade19i1824511a0d004a45@mail.gmail.com> Message-ID: <35ae50b10609021658j56985b16jea8bcc42104bfaa4@mail.gmail.com> Thanks a lot, Dave :) On 9/3/06, David Balmain wrote: > On 9/2/06, hui wrote: > > Hi Dave, I can't compile ferret on Windows XP, with VC6.0, come compile error. > > > > D:\Program Files\Microsoft Visual Studio\VC98\include\wingdi.h(93) : warning C40 > > 05: 'ERROR' : macro redefinition > > except.h(71) : see previous definition of 'ERROR' > > r_search.c(1142) : error C2275: 'VALUE' : illegal use of this type as an express > > ion > > F:/InstantRails/ruby/lib/ruby/1.8/i386-mswin32\ruby.h(79) : see declarat > > ion of 'VALUE' > > r_search.c(1142) : error C2146: syntax error : missing ';' before identifier 'v' > > > > r_search.c(1142) : error C2065: 'v' : undeclared identifier > > r_search.c(2041) : error C2039: 'rb_w32_close' : is not a member of 'Searcher' > > search.h(703) : see declaration of 'Searcher' > > r_search.c(2059) : error C2039: 'rb_w32_close' : is not a member of 'Searcher' > > search.h(703) : see declaration of 'Searcher' > > r_search.c(2461) : error C2039: 'rb_w32_close' : is not a member of 'Searcher' > > search.h(703) : see declaration of 'Searcher' > > Hi Hui, I'm working on this right now. > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > -- hui http://blog.treasured.cn skype: bourne.z From jc.michel at symetrie.com Sun Sep 3 05:48:59 2006 From: jc.michel at symetrie.com (Jean-Christophe Michel) Date: Sun, 3 Sep 2006 11:48:59 +0200 Subject: [Ferret-talk] using highlight from aaf Message-ID: Hi, I'm trying to use highlight ferret method with trunk aaf and 0.10.1 ferret. In my search display I use: Myindexedclass.ferret_index.searcher.highlight(@query, result_line.id, :content) * searcher is a protected method; how can I access to the searcher from aaf ? * is the doc id in aaf the same as my model id ? * is the first param, query, the string query or the query object ? Jean-Christophe Michel -- symetrie.com Better Nested Set for rails: http://opensource.symetrie.com/trac/better_nested_set From dbalmain.ml at gmail.com Sun Sep 3 09:26:35 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Sun, 3 Sep 2006 22:26:35 +0900 Subject: [Ferret-talk] using highlight from aaf In-Reply-To: References: Message-ID: On 9/3/06, Jean-Christophe Michel wrote: > Hi, > > I'm trying to use highlight ferret method with trunk aaf and 0.10.1 > ferret. > In my search display I use: > > Myindexedclass.ferret_index.searcher.highlight(@query, result_line.id, > :content) > > * searcher is a protected method; how can I access to the searcher from > aaf ? I've added a highlight method to Ferret::Index::Index so you'll be able to use it now. > * is the doc id in aaf the same as my model id ? No, but (I'm pretty sure) the model id is automatically stored in the :id field in each document. > * is the first param, query, the string query or the query object ? As with all Index methods, Index#highlight takes a string or a Query object. The Searcher#highlight method however only takes Query objects. Note also that you must specify the field to highlight when using Searcher#highlight however the default_field is used for Index#highlight unless specified otherwise. Cheers, Dave From jc.michel at symetrie.com Sun Sep 3 16:30:24 2006 From: jc.michel at symetrie.com (Jean-Christophe Michel) Date: Sun, 3 Sep 2006 22:30:24 +0200 Subject: [Ferret-talk] using highlight from aaf In-Reply-To: References: Message-ID: <8eea6dfd472d48aad53cf8f945ec3312@symetrie.com> Hi Dave, Le 3 sept. 06, ? 15:26, David Balmain a ?crit : > I've added a highlight method to Ferret::Index::Index so you'll be > able to use it now. Thanks. Trying to use this, I updated to 0.10.2 gem. But I cannot get highlight return something else than nil. I suspect highly the doc id not always being my indexed class id, though aaf code seems to create docs with Model.id :/ I tried to do i.search 'rare_word' Mymodel.find result-id # contains no such word i.highlight('rare_word', result-id, :field => :myfield) But if I try to pass a model id containing this word for sure, I still get nil result :/ i.highlight('rare_word', model-id, :field => :myfield) I'll try to turn this into a test case in aaf. -- Jean-Christophe Michel From dbalmain.ml at gmail.com Sun Sep 3 19:48:30 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Mon, 4 Sep 2006 08:48:30 +0900 Subject: [Ferret-talk] using highlight from aaf In-Reply-To: <8eea6dfd472d48aad53cf8f945ec3312@symetrie.com> References: <8eea6dfd472d48aad53cf8f945ec3312@symetrie.com> Message-ID: On 9/4/06, Jean-Christophe Michel wrote: > Hi Dave, > > Le 3 sept. 06, ? 15:26, David Balmain a ?crit : > > I've added a highlight method to Ferret::Index::Index so you'll be > > able to use it now. > > Thanks. Trying to use this, I updated to 0.10.2 gem. > But I cannot get highlight return something else than nil. > I suspect highly the doc id not always being my indexed class id, > though aaf code seems to create docs with Model.id :/ > > I tried to do > i.search 'rare_word' > Mymodel.find result-id # contains no such word > i.highlight('rare_word', result-id, :field => :myfield) > > But if I try to pass a model id containing this word for sure, I still > get nil result :/ > i.highlight('rare_word', model-id, :field => :myfield) > > I'll try to turn this into a test case in aaf. > > -- > Jean-Christophe Michel Try this; index.search_each(query) do |doc_id, score| puts index.highlight(query, doc_id, :field => :my_field).join(", ") end Also, I think you should be doing something like this to get the resulting object from the database; MyModel.find index[doc_id][:id] Hope that helps, Dave From Neville.Burnell at bmsoft.com.au Sun Sep 3 21:40:30 2006 From: Neville.Burnell at bmsoft.com.au (Neville Burnell) Date: Mon, 4 Sep 2006 11:40:30 +1000 Subject: [Ferret-talk] Help with Multiple Readers, 1 Writer scenario Message-ID: <126EC586577FD611A28E00A0C9A03758B5BC6B@maui.bmsoft.com.au> Thanks for your reply Dave, > The one situation where you might be better off using > a single IndexReader is when you are relying on caching. > Filters and Sorts are cached per IndexReader and Sorts > in particular can take up a fair chunk of memory so if > you have a large index (large as in number of documents, > not size of data) then you may be better off with a single > IndexReader. IndexReader is thread-safe so using it concurrently > should be fine. Just to clarify, I'm using Ferret::Index::Index concurrently at the moment, and I'm not getting concurrent searches via #search_each. IE, if a slow wild-card search arrives first, all subsequent searches wait until the wild-card search completes. So I guess #search_each is "synchronised"? Therefore to have multiple searches on an index concurrently, I really need an IndexReader per thread and I would need to manage a pool of reusable IndexReaders? Any pointers on how other web apps [not using Rails] handle multiple Ferret readers? > You can actually pass an array of readers as the first (only) parameter to > IndexReader.new. > > reader = IndexReader.new([reader1, reader2, reader3]) > Interesting ... I had a look, but I don't really understand what this does? Would you elaborate please :D Thanks for your help, Neville From contact at ezabel.com Sun Sep 3 21:40:47 2006 From: contact at ezabel.com (Ian Zabel) Date: Mon, 4 Sep 2006 03:40:47 +0200 Subject: [Ferret-talk] AAF Sorting by date - what am I doing wrong? In-Reply-To: References: <306160cee51217d231f33a03adc3ef0d@ruby-forum.com> <20060830215715.GC5292@cordoba.webit.de> <846f30c70608311321l402ad8f8o9caea2f525266b28@mail.gmail.com> <5a0e13729aa9cad0894c87b7c241b579@ruby-forum.com> Message-ID: <9e2bd147449170127662b7773b788b81@ruby-forum.com> Thanks for all the help, everyone. I am now using this statement in my model: acts_as_ferret :fields => { 'comment' => {}, :forum_id => {:index => :untokenized}, 'mod_type' => {:index => :untokenized} , 'user_id' => {:index => :untokenized} , 'ferret_created_at' => {:index => :untokenized} } I rebuilt the index, and sorting now seems to work properly with both "ferret_created_at" and "id", like so sort_fields = [] sort_fields << Ferret::Search::SortField.new("ferret_created_at",:reverse => :true) or sort_fields << Ferret::Search::SortField.new("id",:reverse => :true) Comment.find_by_contents("test", :sort => sort_fields, :limit => 5) Sorting by id is now MUCH faster, as well. The only thing I notice now is that the index is MUCH larger. The index is now about 91MB, whereas before I changed the aaf settings for the model, it was about 20MB. I guess untokenized values take up a lot more space? Thanks again! Ian. -- Posted via http://www.ruby-forum.com/. From dbalmain.ml at gmail.com Mon Sep 4 00:05:07 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Mon, 4 Sep 2006 13:05:07 +0900 Subject: [Ferret-talk] Help with Multiple Readers, 1 Writer scenario In-Reply-To: <126EC586577FD611A28E00A0C9A03758B5BC6B@maui.bmsoft.com.au> References: <126EC586577FD611A28E00A0C9A03758B5BC6B@maui.bmsoft.com.au> Message-ID: On 9/4/06, Neville Burnell wrote: > Thanks for your reply Dave, > > > The one situation where you might be better off using > > a single IndexReader is when you are relying on caching. > > Filters and Sorts are cached per IndexReader and Sorts > > in particular can take up a fair chunk of memory so if > > you have a large index (large as in number of documents, > > not size of data) then you may be better off with a single > > IndexReader. IndexReader is thread-safe so using it concurrently > > should be fine. > > Just to clarify, I'm using Ferret::Index::Index concurrently at the > moment, and I'm not getting concurrent searches via #search_each. IE, if > a slow wild-card search arrives first, all subsequent searches wait > until the wild-card search completes. > > So I guess #search_each is "synchronised"? That's correct. Otherwise it would be possible for the document IDs of the documents to change between the time the search is run and the time the document is referenced. For the benefit of those who don't know this, document IDs are not constant. They represent the position of the document in the index. Think of it like an array. Let's add 5 documents to the index. [0,1,2,3,4] Now let's delete documents 1 and 2; [0,3,4] So document 4 now has a doc_id of 2. If this happened in the middle of a search you'd have a problem. So instead we synchronize the the Index#search and Index#search_each methods. Now this isn't the case for Searcher#search and Searcher#search_each since the IndexReader that Searcher uses remains consistent so you should be able to use Searcher concurrently. > Therefore to have multiple searches on an index concurrently, I really > need an IndexReader per thread and I would need to manage a pool of > reusable IndexReaders? Using Ferret::Index::Index this would be true. But if performance is a concern you should definitely use a Ferret::Search::Searcher object instead anyway and you'll be able to use it concurrently. > Any pointers on how other web apps [not using Rails] handle multiple > Ferret readers? Let us know if using the Searcher object isn't adequate. > > You can actually pass an array of readers as the first (only) > parameter to > > IndexReader.new. > > > > reader = IndexReader.new([reader1, reader2, reader3]) > > > > Interesting ... I had a look, but I don't really understand what this > does? Would you elaborate please :D A MultiReader object was initially what was used to read and search multiple indexes at a time. This functionality is now simply handled by the IndexReader object. There are several uses for this. One was to store each model in a separate index and you could then offer search across multiple models using a MultiReader. Another use-case might be to have multiple indexes to speed up indexing. If for example you are scraping websites it is a very good idea to have multiple scraping processes. The best way to do this is to have each process indexing to its own index. You could then search all indexes at once using a MultiReader or you could also merge all indexes into a single index. Hope that makes sense. Cheers, Dave From dbalmain.ml at gmail.com Mon Sep 4 00:25:52 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Mon, 4 Sep 2006 13:25:52 +0900 Subject: [Ferret-talk] AAF Sorting by date - what am I doing wrong? In-Reply-To: <9e2bd147449170127662b7773b788b81@ruby-forum.com> References: <306160cee51217d231f33a03adc3ef0d@ruby-forum.com> <20060830215715.GC5292@cordoba.webit.de> <846f30c70608311321l402ad8f8o9caea2f525266b28@mail.gmail.com> <5a0e13729aa9cad0894c87b7c241b579@ruby-forum.com> <9e2bd147449170127662b7773b788b81@ruby-forum.com> Message-ID: On 9/4/06, Ian Zabel wrote: > Thanks for all the help, everyone. > > I am now using this statement in my model: acts_as_ferret :fields => { > 'comment' => {}, :forum_id => {:index => :untokenized}, 'mod_type' => > {:index => :untokenized} , 'user_id' => {:index => :untokenized} , > 'ferret_created_at' => {:index => :untokenized} } > > I rebuilt the index, and sorting now seems to work properly with both > "ferret_created_at" and "id", like so > > sort_fields = [] > sort_fields << > Ferret::Search::SortField.new("ferret_created_at",:reverse => :true) > or > sort_fields << Ferret::Search::SortField.new("id",:reverse => :true) > Comment.find_by_contents("test", :sort => sort_fields, :limit => 5) > > Sorting by id is now MUCH faster, as well. Great to hear. > The only thing I notice now is that the index is MUCH larger. The index > is now about 91MB, whereas before I changed the aaf settings for the > model, it was about 20MB. I guess untokenized values take up a lot more > space? That can be correct but it is surprising for your schema. For example, imagine the following six documents; "one two three" (13-bytes) "one three two" "two three one" "two one three" "three one two" "three two one" If you tokenized the fields you'd have tree terms "one" (3-bytes), "two" (3-bytes), "three" (5-bytes) and each term would use six bytes to store the doc_ids of the documents they occur in. So you'd have 3 + 3 + 5 + 3*6 = 29 bytes. Storing the fields as untokenized would take 13 bytes per field plus 1 byte to signify the document each field occurs in which would be (13 + 1) * 6 = 84 bytes. Of course this is a simplification of what is really going on. There is a lot of compression happening and a lot of other data is stored as well like term positions, term frequencies, term-vectors as well as actually storing the data. Now, if you want to save space, there are a few other parameters you can set. You can start by discarding :term_vectors. These are used for excerpts and match highlighting but are unnecessary in most cases. Also, there is no need to store all your data. Often, the only fields you'll want to store are the model IDs. If you aren't referencing the field in the document from the Ferret index, don't bother storing it. So for example; :ferret_created_at could be :ferret_created_at => {:index => :untokenized, :store => :no, :term_vectors => :no} Note also I recommend always using Symbols for your field names rather than Strings. Cheers, Dave From dbalmain.ml at gmail.com Mon Sep 4 00:40:32 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Mon, 4 Sep 2006 13:40:32 +0900 Subject: [Ferret-talk] [ANN] 0.10.2 release with win32 gem Message-ID: Hey all, I've just released Ferret version 0.10.2. It is mostly just a bug fix release. The only change is that a highlight method has been added to Ferret::Index::Index. Please try it out and let me know what you think. The big news for this release is that there is also a binary win32 gem included. This is the first time I've build a gem like this so please let me know if there are any issues. Cheers, Dave From me at mos.cn Mon Sep 4 00:56:43 2006 From: me at mos.cn (hui) Date: Mon, 4 Sep 2006 12:56:43 +0800 Subject: [Ferret-talk] [ANN] 0.10.2 release with win32 gem In-Reply-To: References: Message-ID: <35ae50b10609032156o3a5e9a51p7b51192b0e8193d7@mail.gmail.com> Great, Successfully installed on WindowsXP with InstantRails. F:\>gem install ferret-0.10.2-mswin32.gem Attempting local installation of 'ferret-0.10.2-mswin32.gem' Successfully installed ferret, version 0.10.2 Installing RDoc documentation for ferret-0.10.2-mswin32... F:\>irb -rrubygems irb(main):001:0> require 'ferret' => false irb(main):002:0> include Ferret => Object irb(main):003:0> index = Index::Index.new() => ##, :dir=>#, :default_field=>:*, :lock_retry_time=>2}, @ reader=nil, @mon_waiting_queue=[], @writer=nil, @default_input_field=:id, @dir=# , @mon_entering_queue=[], @qp=nil, @searc her=nil, @mon_count=0, @default_field=:*, @auto_flush=false> On 9/4/06, David Balmain wrote: > Hey all, > > I've just released Ferret version 0.10.2. It is mostly just a bug fix > release. The only change is that a highlight method has been added to > Ferret::Index::Index. Please try it out and let me know what you > think. > > The big news for this release is that there is also a binary win32 gem > included. This is the first time I've build a gem like this so please > let me know if there are any issues. > > Cheers, > Dave > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > -- hui http://blog.treasured.cn skype: bourne.z From kraemer at webit.de Mon Sep 4 03:25:48 2006 From: kraemer at webit.de (Jens Kraemer) Date: Mon, 4 Sep 2006 09:25:48 +0200 Subject: [Ferret-talk] AAF Sorting by date - what am I doing wrong? In-Reply-To: References: <306160cee51217d231f33a03adc3ef0d@ruby-forum.com> <20060830215715.GC5292@cordoba.webit.de> <846f30c70608311321l402ad8f8o9caea2f525266b28@mail.gmail.com> <5a0e13729aa9cad0894c87b7c241b579@ruby-forum.com> <9e2bd147449170127662b7773b788b81@ruby-forum.com> Message-ID: <20060904072548.GS9513@cordoba.webit.de> On Mon, Sep 04, 2006 at 01:25:52PM +0900, David Balmain wrote: > On 9/4/06, Ian Zabel wrote: [..] > > :ferret_created_at => {:index => :untokenized, :store => :no, > :term_vectors => :no} :store => :no is already the default used by acts_as_ferret, no need to explicitly specify this. term vectors are stored by default :with_positions_offsets, so turning them off might help a bit. Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From kraemer at webit.de Mon Sep 4 03:39:29 2006 From: kraemer at webit.de (Jens Kraemer) Date: Mon, 4 Sep 2006 09:39:29 +0200 Subject: [Ferret-talk] using highlight from aaf In-Reply-To: <8eea6dfd472d48aad53cf8f945ec3312@symetrie.com> References: <8eea6dfd472d48aad53cf8f945ec3312@symetrie.com> Message-ID: <20060904073929.GT9513@cordoba.webit.de> On Sun, Sep 03, 2006 at 10:30:24PM +0200, Jean-Christophe Michel wrote: > Hi Dave, > > Le 3 sept. 06, ? 15:26, David Balmain a ?crit : > > I've added a highlight method to Ferret::Index::Index so you'll be > > able to use it now. > > Thanks. Trying to use this, I updated to 0.10.2 gem. > But I cannot get highlight return something else than nil. > I suspect highly the doc id not always being my indexed class id, > though aaf code seems to create docs with Model.id :/ the ferret document id is *not* your Model primary key id. retrieving the doc id from a given primary key isn't that easy atm, since aaf's find_by_contents returns model instances and no doc ids. Model.ferret_index.search("id:#{model.id}") should give you access to the ferret search results and therefore the document ids. I'll add better support for this to aaf soon. I've been thinking about something like model_instance.highlight('rare_word', :field => :my_field) Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From angrypirana at googlemail.com Mon Sep 4 07:35:34 2006 From: angrypirana at googlemail.com (Richard) Date: Mon, 4 Sep 2006 13:35:34 +0200 Subject: [Ferret-talk] Installing Message-ID: <4d493d2aed144082a2aae5229c5283dc@ruby-forum.com> I'm sorry to be such a newbie, but can someone please tell me exactly what I need to do to get acts_as_ferret working on Windows XP. I'm just not understanding the install process. Also will I need to change anything before uploading the site to a linux host? -- Posted via http://www.ruby-forum.com/. From angrypirana at googlemail.com Mon Sep 4 09:47:30 2006 From: angrypirana at googlemail.com (Richard) Date: Mon, 4 Sep 2006 15:47:30 +0200 Subject: [Ferret-talk] [ANN] 0.10.2 release with win32 gem In-Reply-To: References: Message-ID: David Balmain wrote: > Hey all, > > I've just released Ferret version 0.10.2. It is mostly just a bug fix > release. The only change is that a highlight method has been added to > Ferret::Index::Index. Please try it out and let me know what you > think. > > The big news for this release is that there is also a binary win32 gem > included. This is the first time I've build a gem like this so please > let me know if there are any issues. > > Cheers, > Dave What happens when I upload my site from Windows XP to a Linux host, do I have to get another gem? This is the only thing I'm really unsure about. Obviously I need the Win32 as my test environment is a Windows machine. -- Posted via http://www.ruby-forum.com/. From bk at benjaminkrause.com Mon Sep 4 09:50:51 2006 From: bk at benjaminkrause.com (Benjamin Krause) Date: Mon, 4 Sep 2006 15:50:51 +0200 (CEST) Subject: [Ferret-talk] [ANN] 0.10.2 release with win32 gem In-Reply-To: References: Message-ID: <45546.62.180.117.243.1157377851.squirrel@orkland.homeunix.org> > What happens when I upload my site from Windows XP to a Linux host, do I > have to get another gem? This is the only thing I'm really unsure about. > Obviously I need the Win32 as my test environment is a Windows machine. Hey .. it's the same gem.. just run "gem install ferret" on your windows and your linux machine.. gem will take care of the rest.. :-) Ben From angrypirana at googlemail.com Mon Sep 4 11:07:54 2006 From: angrypirana at googlemail.com (Richard) Date: Mon, 4 Sep 2006 17:07:54 +0200 Subject: [Ferret-talk] uninitialized constant BooleanClause Message-ID: I've installed the latest Win32 gem and the acts_as_ferret plugin (i checked out the files and placed them in the vendor/plugins directory). When I try to search I get the following error: uninitialized constant BooleanClause RAILS_ROOT: ./script/../config/.. Application Trace | Framework Trace | Full Trace C:/InstantRails/ruby/lib/ruby/gems/1.8/gems/activesupport-1.3.1/lib/active_support/dependencies.rb:123:in `const_missing' #{RAILS_ROOT}/vendor/plugins/acts_as_ferret/lib/acts_as_ferret.rb:176:in `acts_as_ferret' #{RAILS_ROOT}/app/models/restaurant.rb:6 #{RAILS_ROOT}/app/controllers/browse_controller.rb:18:in `searcher' C:/InstantRails/ruby/lib/ruby/gems/1.8/gems/activesupport-1.3.1/lib/active_support/dependencies.rb:123:in `const_missing' #{RAILS_ROOT}/vendor/plugins/acts_as_ferret/lib/acts_as_ferret.rb:176:in `acts_as_ferret' C:/InstantRails/ruby/lib/ruby/gems/1.8/gems/activesupport-1.3.1/lib/active_support/dependencies.rb:140:in `load' C:/InstantRails/ruby/lib/ruby/gems/1.8/gems/activesupport-1.3.1/lib/active_support/dependencies.rb:56:in `require_or_load' C:/InstantRails/ruby/lib/ruby/gems/1.8/gems/activesupport-1.3.1/lib/active_support/dependencies.rb:30:in `depend_on' C:/InstantRails/ruby/lib/ruby/gems/1.8/gems/activesupport-1.3.1/lib/active_support/dependencies.rb:85:in `require_dependency' C:/InstantRails/ruby/lib/ruby/gems/1.8/gems/activesupport-1.3.1/lib/active_support/dependencies.rb:98:in `const_missing' C:/InstantRails/ruby/lib/ruby/gems/1.8/gems/activesupport-1.3.1/lib/active_support/dependencies.rb:131:in `const_missing' C:/InstantRails/ruby/lib/ruby/gems/1.8/gems/activesupport-1.3.1/lib/active_support/dependencies.rb:133:in `const_missing' C:/InstantRails/ruby/lib/ruby/gems/1.8/gems/actionpack-1.12.3/lib/action_controller/base.rb:910:in `perform_action_without_filters' C:/InstantRails/ruby/lib/ruby/gems/1.8/gems/actionpack-1.12.3/lib/action_controller/filters.rb:368:in `perform_action_without_benchmark' C:/InstantRails/ruby/lib/ruby/gems/1.8/gems/actionpack-1.12.3/lib/action_controller/benchmarking.rb:69:in `perform_action_without_rescue' C:/InstantRails/ruby/lib/ruby/1.8/benchmark.rb:293:in `measure' C:/InstantRails/ruby/lib/ruby/gems/1.8/gems/actionpack-1.12.3/lib/action_controller/benchmarking.rb:69:in `perform_action_without_rescue' C:/InstantRails/ruby/lib/ruby/gems/1.8/gems/actionpack-1.12.3/lib/action_controller/rescue.rb:82:in `perform_action' C:/InstantRails/ruby/lib/ruby/gems/1.8/gems/actionpack-1.12.3/lib/action_controller/base.rb:381:in `process_without_filters' C:/InstantRails/ruby/lib/ruby/gems/1.8/gems/actionpack-1.12.3/lib/action_controller/filters.rb:377:in `process_without_session_management_support' C:/InstantRails/ruby/lib/ruby/gems/1.8/gems/actionpack-1.12.3/lib/action_controller/session_management.rb:117:in `process' C:/InstantRails/ruby/lib/ruby/gems/1.8/gems/rails-1.1.4/lib/dispatcher.rb:38:in `dispatch' C:/InstantRails/ruby/lib/ruby/gems/1.8/gems/rails-1.1.4/lib/webrick_server.rb:115:in `handle_dispatch' C:/InstantRails/ruby/lib/ruby/gems/1.8/gems/rails-1.1.4/lib/webrick_server.rb:81:in `service' C:/InstantRails/ruby/lib/ruby/1.8/webrick/httpserver.rb:104:in `service' C:/InstantRails/ruby/lib/ruby/1.8/webrick/httpserver.rb:65:in `run' C:/InstantRails/ruby/lib/ruby/1.8/webrick/server.rb:173:in `start_thread' C:/InstantRails/ruby/lib/ruby/1.8/webrick/server.rb:162:in `start_thread' C:/InstantRails/ruby/lib/ruby/1.8/webrick/server.rb:95:in `start' C:/InstantRails/ruby/lib/ruby/1.8/webrick/server.rb:92:in `start' C:/InstantRails/ruby/lib/ruby/1.8/webrick/server.rb:23:in `start' C:/InstantRails/ruby/lib/ruby/1.8/webrick/server.rb:82:in `start' C:/InstantRails/ruby/lib/ruby/gems/1.8/gems/rails-1.1.4/lib/webrick_server.rb:67:in `dispatch' C:/InstantRails/ruby/lib/ruby/gems/1.8/gems/rails-1.1.4/lib/commands/servers/webrick.rb:59 C:/InstantRails/ruby/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:21:in `require' C:/InstantRails/ruby/lib/ruby/gems/1.8/gems/activesupport-1.3.1/lib/active_support/dependencies.rb:147:in `require' C:/InstantRails/ruby/lib/ruby/gems/1.8/gems/rails-1.1.4/lib/commands/server.rb:30 C:/InstantRails/ruby/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:21:in `require' C:/InstantRails/ruby/lib/ruby/gems/1.8/gems/activesupport-1.3.1/lib/active_support/dependencies.rb:147:in `require' script/server:3 C:/InstantRails/ruby/lib/ruby/gems/1.8/gems/activesupport-1.3.1/lib/active_support/dependencies.rb:123:in `const_missing' #{RAILS_ROOT}/vendor/plugins/acts_as_ferret/lib/acts_as_ferret.rb:176:in `acts_as_ferret' #{RAILS_ROOT}/app/models/restaurant.rb:6 C:/InstantRails/ruby/lib/ruby/gems/1.8/gems/activesupport-1.3.1/lib/active_support/dependencies.rb:140:in `load' C:/InstantRails/ruby/lib/ruby/gems/1.8/gems/activesupport-1.3.1/lib/active_support/dependencies.rb:56:in `require_or_load' C:/InstantRails/ruby/lib/ruby/gems/1.8/gems/activesupport-1.3.1/lib/active_support/dependencies.rb:30:in `depend_on' C:/InstantRails/ruby/lib/ruby/gems/1.8/gems/activesupport-1.3.1/lib/active_support/dependencies.rb:85:in `require_dependency' C:/InstantRails/ruby/lib/ruby/gems/1.8/gems/activesupport-1.3.1/lib/active_support/dependencies.rb:98:in `const_missing' C:/InstantRails/ruby/lib/ruby/gems/1.8/gems/activesupport-1.3.1/lib/active_support/dependencies.rb:131:in `const_missing' C:/InstantRails/ruby/lib/ruby/gems/1.8/gems/activesupport-1.3.1/lib/active_support/dependencies.rb:133:in `const_missing' #{RAILS_ROOT}/app/controllers/browse_controller.rb:18:in `searcher' C:/InstantRails/ruby/lib/ruby/gems/1.8/gems/actionpack-1.12.3/lib/action_controller/base.rb:910:in `perform_action_without_filters' C:/InstantRails/ruby/lib/ruby/gems/1.8/gems/actionpack-1.12.3/lib/action_controller/filters.rb:368:in `perform_action_without_benchmark' C:/InstantRails/ruby/lib/ruby/gems/1.8/gems/actionpack-1.12.3/lib/action_controller/benchmarking.rb:69:in `perform_action_without_rescue' C:/InstantRails/ruby/lib/ruby/1.8/benchmark.rb:293:in `measure' C:/InstantRails/ruby/lib/ruby/gems/1.8/gems/actionpack-1.12.3/lib/action_controller/benchmarking.rb:69:in `perform_action_without_rescue' C:/InstantRails/ruby/lib/ruby/gems/1.8/gems/actionpack-1.12.3/lib/action_controller/rescue.rb:82:in `perform_action' C:/InstantRails/ruby/lib/ruby/gems/1.8/gems/actionpack-1.12.3/lib/action_controller/base.rb:381:in `process_without_filters' C:/InstantRails/ruby/lib/ruby/gems/1.8/gems/actionpack-1.12.3/lib/action_controller/filters.rb:377:in `process_without_session_management_support' C:/InstantRails/ruby/lib/ruby/gems/1.8/gems/actionpack-1.12.3/lib/action_controller/session_management.rb:117:in `process' C:/InstantRails/ruby/lib/ruby/gems/1.8/gems/rails-1.1.4/lib/dispatcher.rb:38:in `dispatch' C:/InstantRails/ruby/lib/ruby/gems/1.8/gems/rails-1.1.4/lib/webrick_server.rb:115:in `handle_dispatch' C:/InstantRails/ruby/lib/ruby/gems/1.8/gems/rails-1.1.4/lib/webrick_server.rb:81:in `service' C:/InstantRails/ruby/lib/ruby/1.8/webrick/httpserver.rb:104:in `service' C:/InstantRails/ruby/lib/ruby/1.8/webrick/httpserver.rb:65:in `run' C:/InstantRails/ruby/lib/ruby/1.8/webrick/server.rb:173:in `start_thread' C:/InstantRails/ruby/lib/ruby/1.8/webrick/server.rb:162:in `start_thread' C:/InstantRails/ruby/lib/ruby/1.8/webrick/server.rb:95:in `start' C:/InstantRails/ruby/lib/ruby/1.8/webrick/server.rb:92:in `start' C:/InstantRails/ruby/lib/ruby/1.8/webrick/server.rb:23:in `start' C:/InstantRails/ruby/lib/ruby/1.8/webrick/server.rb:82:in `start' C:/InstantRails/ruby/lib/ruby/gems/1.8/gems/rails-1.1.4/lib/webrick_server.rb:67:in `dispatch' C:/InstantRails/ruby/lib/ruby/gems/1.8/gems/rails-1.1.4/lib/commands/servers/webrick.rb:59 C:/InstantRails/ruby/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:21:in `require' C:/InstantRails/ruby/lib/ruby/gems/1.8/gems/activesupport-1.3.1/lib/active_support/dependencies.rb:147:in `require' C:/InstantRails/ruby/lib/ruby/gems/1.8/gems/rails-1.1.4/lib/commands/server.rb:30 C:/InstantRails/ruby/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:21:in `require' C:/InstantRails/ruby/lib/ruby/gems/1.8/gems/activesupport-1.3.1/lib/active_support/dependencies.rb:147:in `require' script/server:3 This error occured while loading the following files: restaurant.rb ferret/search/boolean_clause.rb -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Mon Sep 4 11:26:17 2006 From: kraemer at webit.de (Jens Kraemer) Date: Mon, 4 Sep 2006 17:26:17 +0200 Subject: [Ferret-talk] uninitialized constant BooleanClause In-Reply-To: References: Message-ID: <20060904152617.GV9513@cordoba.webit.de> On Mon, Sep 04, 2006 at 05:07:54PM +0200, Richard wrote: > I've installed the latest Win32 gem and the acts_as_ferret plugin (i > checked out the files and placed them in the vendor/plugins directory). > > When I try to search I get the following error: > > uninitialized constant BooleanClause Seems you're using an older version of acts_as_ferret, which isn't compatible with Ferret 0.10.x yet. I hope to officially release a new version this week. For the time being, please use the trunk: script/plugin install svn://projects.jkraemer.net/acts_as_ferret/trunk/plugin/acts_as_ferret Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From angrypirana at googlemail.com Mon Sep 4 12:29:14 2006 From: angrypirana at googlemail.com (Richard) Date: Mon, 4 Sep 2006 18:29:14 +0200 Subject: [Ferret-talk] uninitialized constant BooleanClause In-Reply-To: <20060904152617.GV9513@cordoba.webit.de> References: <20060904152617.GV9513@cordoba.webit.de> Message-ID: Using the trunk I now get the random error: compile error ./script/../config/../app/views/browse/index.rhtml:40: Invalid char `\002' in expression Extracted source (around line #40): 37: <%= l %> 38: <% else %> 39: <%= link_to l, :action => 'index', :letter => l %> 40: <% end %> 41: <% end %>
42: <%= if @restaurant_pages.current.previous 43: link_to("Previous", { :page => @restaurant_pages.current.previous, :params => { :sort => params[:sort] } } ) On the index page of my controller which has nothing to do with searching. Any ideas? -- Posted via http://www.ruby-forum.com/. From angrypirana at googlemail.com Mon Sep 4 12:44:33 2006 From: angrypirana at googlemail.com (Richard) Date: Mon, 4 Sep 2006 18:44:33 +0200 Subject: [Ferret-talk] uninitialized constant BooleanClause In-Reply-To: References: <20060904152617.GV9513@cordoba.webit.de> Message-ID: <2edb14d894156dc8dab8a68ef132dcf4@ruby-forum.com> Richard wrote: > Using the trunk I now get the random error: > > compile error > ./script/../config/../app/views/browse/index.rhtml:40: Invalid char > `\002' in expression > > Extracted source (around line #40): > > 37: <%= l %> > 38: <% else %> > 39: <%= link_to l, :action => 'index', :letter => l %> > 40: <% end %> > 41: <% end %>
> 42: <%= if @restaurant_pages.current.previous > 43: link_to("Previous", { :page => @restaurant_pages.current.previous, > :params => { :sort => params[:sort] } } ) > > On the index page of my controller which has nothing to do with > searching. Any ideas? I've found this was just a "line end" character in the file. Weird. Never had that problem before. Anyway, it's not related to acts_as_ferret so hurrah! -- Posted via http://www.ruby-forum.com/. From angrypirana at googlemail.com Mon Sep 4 16:24:19 2006 From: angrypirana at googlemail.com (Richard) Date: Mon, 4 Sep 2006 22:24:19 +0200 Subject: [Ferret-talk] Fields Message-ID: <84da6fda8187f6e83b0696d7b97c90b9@ruby-forum.com> Sorry for asking so many questions recently. Just wondered. I originally had acts_as_ferret in my model and I performed a few searches. I then decided to change this line to acts_as_ferret :fields => [ ?name? ] However when I perform searches it is still listing results with queries contained in fields other than the 'name' field. How do i fix this? -- Posted via http://www.ruby-forum.com/. From cuong.tran at gmail.com Mon Sep 4 18:16:36 2006 From: cuong.tran at gmail.com (Cuong Tran) Date: Mon, 4 Sep 2006 17:16:36 -0500 Subject: [Ferret-talk] Fields In-Reply-To: <84da6fda8187f6e83b0696d7b97c90b9@ruby-forum.com> References: <84da6fda8187f6e83b0696d7b97c90b9@ruby-forum.com> Message-ID: <8a73c7940609041516g2786ae77o80fda340250d1f49@mail.gmail.com> Run Model.rebuild_index On 9/4/06, Richard wrote: > Sorry for asking so many questions recently. > > Just wondered. I originally had acts_as_ferret in my model and I > performed a few searches. I then decided to change this line to > acts_as_ferret :fields => [ 'name' ] > > However when I perform searches it is still listing results with queries > contained in fields other than the 'name' field. How do i fix this? > > -- > Posted via http://www.ruby-forum.com/. > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk From angrypirana at googlemail.com Mon Sep 4 18:22:20 2006 From: angrypirana at googlemail.com (Richard) Date: Tue, 5 Sep 2006 00:22:20 +0200 Subject: [Ferret-talk] Fields In-Reply-To: <8a73c7940609041516g2786ae77o80fda340250d1f49@mail.gmail.com> References: <84da6fda8187f6e83b0696d7b97c90b9@ruby-forum.com> <8a73c7940609041516g2786ae77o80fda340250d1f49@mail.gmail.com> Message-ID: <5264f2bc4c8bc28ef6b4a6e3a46288a9@ruby-forum.com> Cuong Tran wrote: > Run Model.rebuild_index How do I initiate this command? More info please or just a link to where it tells you this stuff. -- Posted via http://www.ruby-forum.com/. From jc.michel at symetrie.com Mon Sep 4 19:02:57 2006 From: jc.michel at symetrie.com (Jean-Christophe Michel) Date: Tue, 5 Sep 2006 01:02:57 +0200 Subject: [Ferret-talk] using highlight from aaf In-Reply-To: References: <8eea6dfd472d48aad53cf8f945ec3312@symetrie.com> Message-ID: <5b609d0df1bec0ec11b8944e3dd34887@symetrie.com> Hi, Le 4 sept. 06, ? 01:48, David Balmain a ?crit : > index.search_each(query) do |doc_id, score| > puts index.highlight(query, doc_id, :field => > :my_field).join(", ") > end Tried it, I get only nil for index.highlight :/ -- Jean-Christophe Michel From dbalmain.ml at gmail.com Mon Sep 4 19:18:28 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Tue, 5 Sep 2006 08:18:28 +0900 Subject: [Ferret-talk] using highlight from aaf In-Reply-To: <5b609d0df1bec0ec11b8944e3dd34887@symetrie.com> References: <8eea6dfd472d48aad53cf8f945ec3312@symetrie.com> <5b609d0df1bec0ec11b8944e3dd34887@symetrie.com> Message-ID: On 9/5/06, Jean-Christophe Michel wrote: > Hi, > > Le 4 sept. 06, ? 01:48, David Balmain a ?crit : > > index.search_each(query) do |doc_id, score| > > puts index.highlight(query, doc_id, :field => > > :my_field).join(", ") > > end > > Tried it, I get only nil for index.highlight :/ Can you give me an example of what doesn't work? Something like this; require 'rubygems' require 'ferret' i = Ferret::I.new(:default_field => :content) i << {:content => "here is the content I want to highlight."} puts i.highlight("content", 0) Cheers, Dave From dbalmain.ml at gmail.com Mon Sep 4 19:22:11 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Tue, 5 Sep 2006 08:22:11 +0900 Subject: [Ferret-talk] Fields In-Reply-To: <5264f2bc4c8bc28ef6b4a6e3a46288a9@ruby-forum.com> References: <84da6fda8187f6e83b0696d7b97c90b9@ruby-forum.com> <8a73c7940609041516g2786ae77o80fda340250d1f49@mail.gmail.com> <5264f2bc4c8bc28ef6b4a6e3a46288a9@ruby-forum.com> Message-ID: On 9/5/06, Richard wrote: > Cuong Tran wrote: > > Run Model.rebuild_index > > How do I initiate this command? More info please or just a link to where > it tells you this stuff. Hi Richard, You could have an action that calls the command. Or you could call it from the console. I hope that helps. This is more of a Rails question than a Ferret question. Cheers, Dave From angrypirana at googlemail.com Mon Sep 4 19:56:37 2006 From: angrypirana at googlemail.com (Richard) Date: Tue, 5 Sep 2006 01:56:37 +0200 Subject: [Ferret-talk] Fields In-Reply-To: References: <84da6fda8187f6e83b0696d7b97c90b9@ruby-forum.com> <8a73c7940609041516g2786ae77o80fda340250d1f49@mail.gmail.com> <5264f2bc4c8bc28ef6b4a6e3a46288a9@ruby-forum.com> Message-ID: David Balmain wrote: > On 9/5/06, Richard wrote: >> Cuong Tran wrote: >> > Run Model.rebuild_index >> >> How do I initiate this command? More info please or just a link to where >> it tells you this stuff. > > Hi Richard, > > You could have an action that calls the command. Or you could call it > from the console. I hope that helps. This is more of a Rails question > than a Ferret question. > > Cheers, > Dave Thanks Dave and Cuong what you said worked perfectly. -- Posted via http://www.ruby-forum.com/. From jc.michel at symetrie.com Mon Sep 4 19:57:23 2006 From: jc.michel at symetrie.com (Jean-Christophe Michel) Date: Tue, 5 Sep 2006 01:57:23 +0200 Subject: [Ferret-talk] using highlight from aaf In-Reply-To: References: <8eea6dfd472d48aad53cf8f945ec3312@symetrie.com> <5b609d0df1bec0ec11b8944e3dd34887@symetrie.com> Message-ID: <96e6dc6806cffab8d7dc915a90e712b0@symetrie.com> Hi, Le 5 sept. 06, ? 01:18, David Balmain a ?crit : > Can you give me an example of what doesn't work? Something like this; > > require 'rubygems' > require 'ferret' > > i = Ferret::I.new(:default_field => :content) > i << {:content => "here is the content I want to highlight."} > > puts i.highlight("content", 0) In fact this example work. The difference here is that doc is not stored in the index though aaf probably. It's probably the reason why I don't have a result: even Myclass.ferret_index.doc(12) returns {}. -- Jean-Christophe Michel From dbalmain.ml at gmail.com Mon Sep 4 20:24:45 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Tue, 5 Sep 2006 09:24:45 +0900 Subject: [Ferret-talk] using highlight from aaf In-Reply-To: <96e6dc6806cffab8d7dc915a90e712b0@symetrie.com> References: <8eea6dfd472d48aad53cf8f945ec3312@symetrie.com> <5b609d0df1bec0ec11b8944e3dd34887@symetrie.com> <96e6dc6806cffab8d7dc915a90e712b0@symetrie.com> Message-ID: On 9/5/06, Jean-Christophe Michel wrote: > Hi, > > > Le 5 sept. 06, ? 01:18, David Balmain a ?crit : > > Can you give me an example of what doesn't work? Something like this; > > > > require 'rubygems' > > require 'ferret' > > > > i = Ferret::I.new(:default_field => :content) > > i << {:content => "here is the content I want to highlight."} > > > > puts i.highlight("content", 0) > > In fact this example work. The difference here is that doc is not > stored in the index though aaf probably. It's probably the reason why I > don't have a result: even Myclass.ferret_index.doc(12) returns {}. Ahhhh, of course. Sorry. Jens mentioned that yesterday so I should have realized. You need to store the field as well as its term vector :with_positions_offsets if you want to highlight it. The :term_vector setting is :with_positions_offsets by default in aaf so you only need to change the :store setting for the field you want to highlight. By the way, Myclass.ferret_index.doc(12) will always return {}. The documents are lazy loading now so Myclass.ferret_index.doc(12)[:id] will return the model ID. You can load all fields with the load method. Try; puts Myclass.ferret_index.doc(12).load().inspect() That should show you which fields are actually stored which in the case of acts_as_ferret will only be the model ID (I think??). Cheers, Dave From Neville.Burnell at bmsoft.com.au Mon Sep 4 20:42:56 2006 From: Neville.Burnell at bmsoft.com.au (Neville Burnell) Date: Tue, 5 Sep 2006 10:42:56 +1000 Subject: [Ferret-talk] [ANN] 0.10.2 release with win32 gem Message-ID: <126EC586577FD611A28E00A0C9A03758B5BC7E@maui.bmsoft.com.au> Hi Dave, I seem to be having trouble retrieving docs from the index. Am I missing something obvious? BTW, I am on Windows XP, Ruby 1.8.4. require 'rubygems' require 'ferret' p Ferret::VERSION idx = Ferret::Index::Index.new idx << {:id => 1, :name => 'Fred', :occupation => 'Toon'} idx << {:id => 1, :name => 'Barney', :occupation => 'Toon'} p idx.size doc = idx[0] p doc docs = [] query = '*:fred' idx.search_each(query) { |doc, score| docs << idx[doc] } p docs.length p docs.first docs = [] query = '*:toon' idx.search_each(query) { |doc, score| docs << idx[doc] } p docs.length p docs.first ========= ruby test.rb "0.10.2" 2 {} 1 {} 2 {} -----Original Message----- From: ferret-talk-bounces at rubyforge.org [mailto:ferret-talk-bounces at rubyforge.org] On Behalf Of David Balmain Sent: Monday, 4 September 2006 2:41 PM To: ferret-talk at rubyforge.org Subject: [Ferret-talk] [ANN] 0.10.2 release with win32 gem Hey all, I've just released Ferret version 0.10.2. It is mostly just a bug fix release. The only change is that a highlight method has been added to Ferret::Index::Index. Please try it out and let me know what you think. The big news for this release is that there is also a binary win32 gem included. This is the first time I've build a gem like this so please let me know if there are any issues. Cheers, Dave _______________________________________________ Ferret-talk mailing list Ferret-talk at rubyforge.org http://rubyforge.org/mailman/listinfo/ferret-talk From dbalmain.ml at gmail.com Mon Sep 4 20:53:58 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Tue, 5 Sep 2006 09:53:58 +0900 Subject: [Ferret-talk] [ANN] 0.10.2 release with win32 gem In-Reply-To: <126EC586577FD611A28E00A0C9A03758B5BC7E@maui.bmsoft.com.au> References: <126EC586577FD611A28E00A0C9A03758B5BC7E@maui.bmsoft.com.au> Message-ID: On 9/5/06, Neville Burnell wrote: > Hi Dave, > > I seem to be having trouble retrieving docs from the index. Am I missing > something obvious? BTW, I am on Windows XP, Ruby 1.8.4. > > require 'rubygems' > require 'ferret' > > p Ferret::VERSION > > idx = Ferret::Index::Index.new > > idx << {:id => 1, :name => 'Fred', :occupation => 'Toon'} > idx << {:id => 1, :name => 'Barney', :occupation => 'Toon'} > p idx.size > > doc = idx[0] > p doc > > docs = [] > query = '*:fred' > idx.search_each(query) { |doc, score| docs << idx[doc] } > p docs.length > p docs.first > > docs = [] > query = '*:toon' > idx.search_each(query) { |doc, score| docs << idx[doc] } > p docs.length > p docs.first Hi Neville, Documents are now lazy loading so they just look like an empty hash unless you load all the fields. So you could try this; p docs.first.load And you'll see all the stored fields loaded. Ofcourse, you don't need to call the load method to access the fields. Just refrencing a field will load it automatically. p docs.first[:occupation] Hope that helps, Dave From Neville.Burnell at bmsoft.com.au Mon Sep 4 22:16:04 2006 From: Neville.Burnell at bmsoft.com.au (Neville Burnell) Date: Tue, 5 Sep 2006 12:16:04 +1000 Subject: [Ferret-talk] [ANN] 0.10.2 release with win32 gem Message-ID: <126EC586577FD611A28E00A0C9A03758B5BC80@maui.bmsoft.com.au> Hi Dave, > Documents are now lazy loading so they just look > like an empty hash unless you load all the fields. > So you could try this; > > p docs.first.load > > And you'll see all the stored fields loaded. Ofcourse, > you don't need to call the load method to access the fields. > Just refrencing a field will load it automatically. > > p docs.first[:occupation] I see now, thanks! Neville From mkhumri at allegromedical.com Tue Sep 5 00:47:04 2006 From: mkhumri at allegromedical.com (Mufaddal Khumri) Date: Tue, 5 Sep 2006 06:47:04 +0200 Subject: [Ferret-talk] No matches Message-ID: <7a163939f9f08f9063beeb2cb0135739@ruby-forum.com> The following script creates a search index and then searches it. I get no results? Where am I going wrong? Thanks. -----------BEGIN SCRIPT---------------- require 'rubygems' require 'ferret' include Ferret path = '/tmp/myindex' field_infos = Ferret::Index::FieldInfos.new() field_infos.add_field(:name, :store => :yes, :index => :yes) field_infos.create_index(path) index = Index::Index.new(:path => path, :field_infos => field_infos, :analyzer => Analysis::AsciiStandardAnalyzer.new) index << {:name => "Joe"} index << {:name => "Sandy"} index << {:name => "Billy"} index << {:name => "Lona"} index << {:name => "Frank"} index.optimize query = Search::TermQuery.new(:name, "Joe") index.search_each(query, {:limit => :all}) do |doc, score| puts 'i am here to just drink some hot chocolate.' puts index[doc]["name"] end -------------END SCRIPT---------------- -- Posted via http://www.ruby-forum.com/. From dbalmain.ml at gmail.com Tue Sep 5 00:59:03 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Tue, 5 Sep 2006 13:59:03 +0900 Subject: [Ferret-talk] No matches In-Reply-To: <7a163939f9f08f9063beeb2cb0135739@ruby-forum.com> References: <7a163939f9f08f9063beeb2cb0135739@ruby-forum.com> Message-ID: On 9/5/06, Mufaddal Khumri wrote: > The following script creates a search index and then searches it. I get > no results? Where am I going wrong? > > Thanks. > > -----------BEGIN SCRIPT---------------- > require 'rubygems' > require 'ferret' > > include Ferret > > path = '/tmp/myindex' > field_infos = Ferret::Index::FieldInfos.new() > field_infos.add_field(:name, :store => :yes, :index => :yes) > field_infos.create_index(path) > index = Index::Index.new(:path => path, :field_infos => field_infos, > :analyzer => Analysis::AsciiStandardAnalyzer.new) > > index << {:name => "Joe"} > index << {:name => "Sandy"} > index << {:name => "Billy"} > index << {:name => "Lona"} > index << {:name => "Frank"} > > index.optimize > > query = Search::TermQuery.new(:name, "Joe") Your problem lies here. The AsciiStandardAnalyzer downcases all of the data as it is entered into the index, so you should be searching for "joe", not "Joe". Since you are using the Index class you can just do it like this also; index.search_each("name:Joe", {:limit => :all}) do |doc, score| In this case the QueryParser will downcase "Joe" for you. Cheers, Dave > index.search_each(query, {:limit => :all}) do |doc, score| > puts 'i am here to just drink some hot chocolate.' > puts index[doc]["name"] > end > -------------END SCRIPT---------------- From mkhumri at allegromedical.com Tue Sep 5 01:16:50 2006 From: mkhumri at allegromedical.com (Mufaddal Khumri) Date: Tue, 5 Sep 2006 07:16:50 +0200 Subject: [Ferret-talk] No matches In-Reply-To: References: <7a163939f9f08f9063beeb2cb0135739@ruby-forum.com> Message-ID: David Balmain wrote: > On 9/5/06, Mufaddal Khumri wrote: >> >> index << {:name => "Lona"} >> index << {:name => "Frank"} >> >> index.optimize >> >> query = Search::TermQuery.new(:name, "Joe") > > Your problem lies here. The AsciiStandardAnalyzer downcases all of the > data as it is entered into the index, so you should be searching for > "joe", not "Joe". Since you are using the Index class you can just do > it like this also; > > index.search_each("name:Joe", {:limit => :all}) do |doc, score| > > In this case the QueryParser will downcase "Joe" for you. > > Cheers, > Dave Thanks. I thought the AsciiStandardAnalyzer would be used by the index.search_each(..) call. How do I specify a analyzer (AsciiStandardAnalyzer) while searching the index? I thought that I could specify the analyzer via the constructor for the Index class. Later when I index data or search the index the set analyzer would be used. Am I understanding this right? -- Posted via http://www.ruby-forum.com/. From mkhumri at allegromedical.com Tue Sep 5 01:21:33 2006 From: mkhumri at allegromedical.com (Mufaddal Khumri) Date: Tue, 5 Sep 2006 07:21:33 +0200 Subject: [Ferret-talk] No matches In-Reply-To: References: <7a163939f9f08f9063beeb2cb0135739@ruby-forum.com> Message-ID: Mufaddal Khumri wrote: > David Balmain wrote: >> On 9/5/06, Mufaddal Khumri wrote: >>> >>> index << {:name => "Lona"} >>> index << {:name => "Frank"} >>> >>> index.optimize >>> >>> query = Search::TermQuery.new(:name, "Joe") >> >> Your problem lies here. The AsciiStandardAnalyzer downcases all of the >> data as it is entered into the index, so you should be searching for >> "joe", not "Joe". Since you are using the Index class you can just do >> it like this also; >> >> index.search_each("name:Joe", {:limit => :all}) do |doc, score| >> >> In this case the QueryParser will downcase "Joe" for you. >> >> Cheers, >> Dave > > Thanks. > > I thought the AsciiStandardAnalyzer would be used by the > index.search_each(..) call. How do I specify a analyzer > (AsciiStandardAnalyzer) while searching the index? I thought that I > could specify the analyzer via the constructor for the Index class. > Later when I index data or search the index the set analyzer would be > used. Am I understanding this right? Just read David's post carefully :) I guess the QueryParser uses the analyzer set: index.search_each("name:Joe", {:limit => :all}) do |doc, .... My next question is how do I get the analyzer to be used when i create my query and search like this: query = Search::TermQuery.new(:name, "Joe") index.search_each(query, {:limit => :all}) do |doc, score| .... Thanks. -- Posted via http://www.ruby-forum.com/. From dbalmain.ml at gmail.com Tue Sep 5 01:36:46 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Tue, 5 Sep 2006 14:36:46 +0900 Subject: [Ferret-talk] No matches In-Reply-To: References: <7a163939f9f08f9063beeb2cb0135739@ruby-forum.com> Message-ID: On 9/5/06, Mufaddal Khumri wrote: > Mufaddal Khumri wrote: > > David Balmain wrote: > >> On 9/5/06, Mufaddal Khumri wrote: > >>> > >>> index << {:name => "Lona"} > >>> index << {:name => "Frank"} > >>> > >>> index.optimize > >>> > >>> query = Search::TermQuery.new(:name, "Joe") > >> > >> Your problem lies here. The AsciiStandardAnalyzer downcases all of the > >> data as it is entered into the index, so you should be searching for > >> "joe", not "Joe". Since you are using the Index class you can just do > >> it like this also; > >> > >> index.search_each("name:Joe", {:limit => :all}) do |doc, score| > >> > >> In this case the QueryParser will downcase "Joe" for you. > >> > >> Cheers, > >> Dave > > > > Thanks. > > > > > I thought the AsciiStandardAnalyzer would be used by the > > index.search_each(..) call. How do I specify a analyzer > > (AsciiStandardAnalyzer) while searching the index? I thought that I > > could specify the analyzer via the constructor for the Index class. > > Later when I index data or search the index the set analyzer would be > > used. Am I understanding this right? > > Just read David's post carefully :) > > I guess the QueryParser uses the analyzer set: > index.search_each("name:Joe", {:limit => :all}) do |doc, .... > > My next question is how do I get the analyzer to be used when i create > my query and search like this: > > query = Search::TermQuery.new(:name, "Joe") > index.search_each(query, {:limit => :all}) do |doc, score| .... > > Thanks. I'm not sure why you'd want to do this when you can just pass the query string to the search_each method and it does it for you. What exactly are you trying to do? Can't you just do this; query = Search::TermQuery.new(:name, "Joe".downcase) Anything more complicated than this and you are better off leaving it to the QueryParser. Cheers, Dave From Neville.Burnell at bmsoft.com.au Tue Sep 5 03:00:29 2006 From: Neville.Burnell at bmsoft.com.au (Neville Burnell) Date: Tue, 5 Sep 2006 17:00:29 +1000 Subject: [Ferret-talk] Ferret 0.10.2 - Index#search_each() and :num_docs Message-ID: <126EC586577FD611A28E00A0C9A03758B5BC87@maui.bmsoft.com.au> Hi, I seem to be having trouble getting more than 10 hits from Index#search_each since upgrading to 0.10.2 (ie, this was working in 0.9.4). Maybe a bug, as the #search_each doesn't seem to use the options parameter any more ? Thanks, Neville =========================================== require 'rubygems' require 'ferret' p Ferret::VERSION idx = Ferret::Index::Index.new idx << {:id => 1, :name => 'Fred', :occupation => 'Toon'} idx << {:id => 2, :name => 'Barney', :occupation => 'Toon'} idx << {:id => 3, :name => 'Wilma', :occupation => 'Toon'} idx << {:id => 4, :name => 'Betty', :occupation => 'Toon'} idx << {:id => 5, :name => 'Pebbles', :occupation => 'Toon'} idx << {:id => 6, :name => 'Superman', :occupation => 'Hero'} idx << {:id => 7, :name => 'Batman', :occupation => 'Hero'} idx << {:id => 8, :name => 'Spiderman', :occupation => 'Hero'} idx << {:id => 9, :name => 'Green Lantern', :occupation => 'Hero'} idx << {:id => 10, :name => 'Dr Strange', :occupation => 'Hero'} idx << {:id => 11, :name => 'Phantom', :occupation => 'Hero'} p idx.size docs = [] query = 'occupation:(hero toon)' idx.search_each(query, :doc_num => idx.size) { |doc, score| docs << idx[doc] } p query p docs.length ============= ruby idx.rb "0.10.2" 11 "occupation:(hero toon)" 10 From Neville.Burnell at bmsoft.com.au Tue Sep 5 03:08:01 2006 From: Neville.Burnell at bmsoft.com.au (Neville Burnell) Date: Tue, 5 Sep 2006 17:08:01 +1000 Subject: [Ferret-talk] Ferret 0.10.2 - Index#search_each() and :num_docs Message-ID: <126EC586577FD611A28E00A0C9A03758B5BC88@maui.bmsoft.com.au> Doh, replace: idx.search_each(query, :doc_num => idx.size) { |doc, score| docs << idx[doc] } With: idx.search_each(query, :num_docs => idx.size) { |doc, score| docs << idx[doc] } Problem still remains though. -----Original Message----- From: ferret-talk-bounces at rubyforge.org [mailto:ferret-talk-bounces at rubyforge.org] On Behalf Of Neville Burnell Sent: Tuesday, 5 September 2006 5:00 PM To: ferret-talk at rubyforge.org Subject: [Ferret-talk] Ferret 0.10.2 - Index#search_each() and :num_docs Hi, I seem to be having trouble getting more than 10 hits from Index#search_each since upgrading to 0.10.2 (ie, this was working in 0.9.4). Maybe a bug, as the #search_each doesn't seem to use the options parameter any more ? Thanks, Neville =========================================== require 'rubygems' require 'ferret' p Ferret::VERSION idx = Ferret::Index::Index.new idx << {:id => 1, :name => 'Fred', :occupation => 'Toon'} idx << {:id => 2, :name => 'Barney', :occupation => 'Toon'} idx << {:id => 3, :name => 'Wilma', :occupation => 'Toon'} idx << {:id => 4, :name => 'Betty', :occupation => 'Toon'} idx << {:id => 5, :name => 'Pebbles', :occupation => 'Toon'} idx << {:id => 6, :name => 'Superman', :occupation => 'Hero'} idx << {:id => 7, :name => 'Batman', :occupation => 'Hero'} idx << {:id => 8, :name => 'Spiderman', :occupation => 'Hero'} idx << {:id => 9, :name => 'Green Lantern', :occupation => 'Hero'} idx << {:id => 10, :name => 'Dr Strange', :occupation => 'Hero'} idx << {:id => 11, :name => 'Phantom', :occupation => 'Hero'} p idx.size docs = [] query = 'occupation:(hero toon)' idx.search_each(query, :doc_num => idx.size) { |doc, score| docs << idx[doc] } p query p docs.length ============= ruby idx.rb "0.10.2" 11 "occupation:(hero toon)" 10 _______________________________________________ Ferret-talk mailing list Ferret-talk at rubyforge.org http://rubyforge.org/mailman/listinfo/ferret-talk From Neville.Burnell at bmsoft.com.au Tue Sep 5 03:21:37 2006 From: Neville.Burnell at bmsoft.com.au (Neville Burnell) Date: Tue, 5 Sep 2006 17:21:37 +1000 Subject: [Ferret-talk] Ferret 0.10.2 - Index#search_each() and :num_docs Message-ID: <126EC586577FD611A28E00A0C9A03758B5BC89@maui.bmsoft.com.au> And the problem doesn't occur in Index#search, ie: require 'rubygems' require 'ferret' p Ferret::VERSION idx = Ferret::Index::Index.new idx << {:id => 1, :name => 'Fred', :occupation => 'Toon'} idx << {:id => 2, :name => 'Barney', :occupation => 'Toon'} idx << {:id => 3, :name => 'Wilma', :occupation => 'Toon'} idx << {:id => 4, :name => 'Betty', :occupation => 'Toon'} idx << {:id => 5, :name => 'Pebbles', :occupation => 'Toon'} idx << {:id => 6, :name => 'Superman', :occupation => 'Hero'} idx << {:id => 7, :name => 'Batman', :occupation => 'Hero'} idx << {:id => 8, :name => 'Spiderman', :occupation => 'Hero'} idx << {:id => 9, :name => 'Green Lantern', :occupation => 'Hero'} idx << {:id => 10, :name => 'Dr Strange', :occupation => 'Hero'} idx << {:id => 11, :name => 'Phantom', :occupation => 'Hero'} p idx.size docs = [] query = 'occupation:(hero toon)' p "query=#{query}" options = {:num_docs => idx.size} idx.search_each(query, options) { |doc, score| docs << idx[doc] } p "search each=#{docs.length}" a = idx.search(query, options) p "search=#{a.total_hits}" ============= ruby idx.rb "0.10.2" 11 "query=occupation:(hero toon)" "search each=10" "search=11" -----Original Message----- From: ferret-talk-bounces at rubyforge.org [mailto:ferret-talk-bounces at rubyforge.org] On Behalf Of Neville Burnell Sent: Tuesday, 5 September 2006 5:08 PM To: ferret-talk at rubyforge.org Subject: Re: [Ferret-talk] Ferret 0.10.2 - Index#search_each() and :num_docs Doh, replace: idx.search_each(query, :doc_num => idx.size) { |doc, score| docs << idx[doc] } With: idx.search_each(query, :num_docs => idx.size) { |doc, score| docs << idx[doc] } Problem still remains though. -----Original Message----- From: ferret-talk-bounces at rubyforge.org [mailto:ferret-talk-bounces at rubyforge.org] On Behalf Of Neville Burnell Sent: Tuesday, 5 September 2006 5:00 PM To: ferret-talk at rubyforge.org Subject: [Ferret-talk] Ferret 0.10.2 - Index#search_each() and :num_docs Hi, I seem to be having trouble getting more than 10 hits from Index#search_each since upgrading to 0.10.2 (ie, this was working in 0.9.4). Maybe a bug, as the #search_each doesn't seem to use the options parameter any more ? Thanks, Neville =========================================== require 'rubygems' require 'ferret' p Ferret::VERSION idx = Ferret::Index::Index.new idx << {:id => 1, :name => 'Fred', :occupation => 'Toon'} idx << {:id => 2, :name => 'Barney', :occupation => 'Toon'} idx << {:id => 3, :name => 'Wilma', :occupation => 'Toon'} idx << {:id => 4, :name => 'Betty', :occupation => 'Toon'} idx << {:id => 5, :name => 'Pebbles', :occupation => 'Toon'} idx << {:id => 6, :name => 'Superman', :occupation => 'Hero'} idx << {:id => 7, :name => 'Batman', :occupation => 'Hero'} idx << {:id => 8, :name => 'Spiderman', :occupation => 'Hero'} idx << {:id => 9, :name => 'Green Lantern', :occupation => 'Hero'} idx << {:id => 10, :name => 'Dr Strange', :occupation => 'Hero'} idx << {:id => 11, :name => 'Phantom', :occupation => 'Hero'} p idx.size docs = [] query = 'occupation:(hero toon)' idx.search_each(query, :doc_num => idx.size) { |doc, score| docs << idx[doc] } p query p docs.length ============= ruby idx.rb "0.10.2" 11 "occupation:(hero toon)" 10 _______________________________________________ Ferret-talk mailing list Ferret-talk at rubyforge.org http://rubyforge.org/mailman/listinfo/ferret-talk _______________________________________________ Ferret-talk mailing list Ferret-talk at rubyforge.org http://rubyforge.org/mailman/listinfo/ferret-talk From dbalmain.ml at gmail.com Tue Sep 5 04:00:37 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Tue, 5 Sep 2006 17:00:37 +0900 Subject: [Ferret-talk] Ferret 0.10.2 - Index#search_each() and :num_docs In-Reply-To: <126EC586577FD611A28E00A0C9A03758B5BC89@maui.bmsoft.com.au> References: <126EC586577FD611A28E00A0C9A03758B5BC89@maui.bmsoft.com.au> Message-ID: On 9/5/06, Neville Burnell wrote: > And the problem doesn't occur in Index#search, ie: > > require 'rubygems' > require 'ferret' > > p Ferret::VERSION > > idx = Ferret::Index::Index.new > > idx << {:id => 1, :name => 'Fred', :occupation => 'Toon'} > idx << {:id => 2, :name => 'Barney', :occupation => 'Toon'} > idx << {:id => 3, :name => 'Wilma', :occupation => 'Toon'} > idx << {:id => 4, :name => 'Betty', :occupation => 'Toon'} > idx << {:id => 5, :name => 'Pebbles', :occupation => 'Toon'} > > idx << {:id => 6, :name => 'Superman', :occupation => 'Hero'} > idx << {:id => 7, :name => 'Batman', :occupation => 'Hero'} > idx << {:id => 8, :name => 'Spiderman', :occupation => 'Hero'} > idx << {:id => 9, :name => 'Green Lantern', :occupation => 'Hero'} > idx << {:id => 10, :name => 'Dr Strange', :occupation => 'Hero'} > > idx << {:id => 11, :name => 'Phantom', :occupation => 'Hero'} > > p idx.size > > docs = [] > query = 'occupation:(hero toon)' > p "query=#{query}" > options = {:num_docs => idx.size} > > > idx.search_each(query, options) { |doc, score| docs << idx[doc] } > p "search each=#{docs.length}" > > a = idx.search(query, options) > p "search=#{a.total_hits}" > > ============= > > ruby idx.rb > "0.10.2" > 11 > "query=occupation:(hero toon)" > "search each=10" > "search=11" > > > -----Original Message----- > From: ferret-talk-bounces at rubyforge.org > [mailto:ferret-talk-bounces at rubyforge.org] On Behalf Of Neville Burnell > Sent: Tuesday, 5 September 2006 5:08 PM > To: ferret-talk at rubyforge.org > Subject: Re: [Ferret-talk] Ferret 0.10.2 - Index#search_each() and > :num_docs > > Doh, replace: > > idx.search_each(query, :doc_num => idx.size) { |doc, score| docs << > idx[doc] } > > With: > > idx.search_each(query, :num_docs => idx.size) { |doc, score| docs << > idx[doc] } > > Problem still remains though. > > -----Original Message----- > From: ferret-talk-bounces at rubyforge.org > [mailto:ferret-talk-bounces at rubyforge.org] On Behalf Of Neville Burnell > Sent: Tuesday, 5 September 2006 5:00 PM > To: ferret-talk at rubyforge.org > Subject: [Ferret-talk] Ferret 0.10.2 - Index#search_each() and :num_docs > > Hi, > > I seem to be having trouble getting more than 10 hits from > Index#search_each since upgrading to 0.10.2 (ie, this was working in > 0.9.4). Maybe a bug, as the #search_each doesn't seem to use the options > parameter any more ? > > Thanks, > > Neville > =========================================== > > require 'rubygems' > require 'ferret' > > p Ferret::VERSION > > idx = Ferret::Index::Index.new > > idx << {:id => 1, :name => 'Fred', :occupation => 'Toon'} idx << {:id => > 2, :name => 'Barney', :occupation => 'Toon'} idx << {:id => 3, :name => > 'Wilma', :occupation => 'Toon'} idx << {:id => 4, :name => 'Betty', > :occupation => 'Toon'} idx << {:id => 5, :name => 'Pebbles', :occupation > => 'Toon'} > > idx << {:id => 6, :name => 'Superman', :occupation => 'Hero'} idx << > {:id => 7, :name => 'Batman', :occupation => 'Hero'} idx << {:id => 8, > :name => 'Spiderman', :occupation => 'Hero'} idx << {:id => 9, :name => > 'Green Lantern', :occupation => 'Hero'} idx << {:id => 10, :name => 'Dr > Strange', :occupation => 'Hero'} > > idx << {:id => 11, :name => 'Phantom', :occupation => 'Hero'} > > p idx.size > > docs = [] > query = 'occupation:(hero toon)' > idx.search_each(query, :doc_num => idx.size) { |doc, score| docs << > idx[doc] } > > p query > p docs.length > > > ============= > > > ruby idx.rb > "0.10.2" > 11 > "occupation:(hero toon)" > 10 Hi Neville, Some of the parameter names have changed in version 0.10.0. :num_docs has become :limit and :first_doc has become :offset. Unfortunately I neglected to update the documentation. I've remedied that and the fix will be out in the next release. Cheers, Dave From Neville.Burnell at bmsoft.com.au Tue Sep 5 06:33:06 2006 From: Neville.Burnell at bmsoft.com.au (Neville Burnell) Date: Tue, 5 Sep 2006 20:33:06 +1000 Subject: [Ferret-talk] Ferret 0.10.2 - Index#search_each() and :num_docs Message-ID: <126EC586577FD611A28E00A0C9A03758AB74E7@maui.bmsoft.com.au> Hi Dave, > Some of the parameter names have changed in version 0.10.0. > :num_docs has become :limit and :first_doc has become :offset. > Unfortunately I neglected to update the documentation. I've > remedied that and the fix will be out in the next release. Cool, thanks Neville From rubyonrails at transmogrify.co.uk Tue Sep 5 15:49:29 2006 From: rubyonrails at transmogrify.co.uk (Alastair Moore) Date: Tue, 5 Sep 2006 21:49:29 +0200 Subject: [Ferret-talk] using conditions In-Reply-To: <20060825163541.GD8241@cordoba.webit.de> References: <09dba232fd862badfcd75485cb782a29@ruby-forum.com> <20060825163541.GD8241@cordoba.webit.de> Message-ID: <375200bd29a39c09b80a00c5897c768e@ruby-forum.com> Jens Kraemer wrote: > that of course works if :area_id is an indexed field. > > If area_id is not indexed, the :conditions option can be used to limit > the find call that retrieves the model instances via ActiveRecord. Note > that you might get less results than expected when using :conditions, > since AR filters records out that Ferret counted as a hit. > > correct usage: > > @results = SupplierProduct.find_by_contents(params[:search], {}, { > :conditions => [] }) > > the first hash is for ferret-related options (like :limit, :offset and > so on), the second for activeRecord options. > Hi Jens and thanks for the help but I seem to be getting an error when I use the following - @results = MarketingProduct.find_by_contents(search_text, {}, :conditions => 'live = "t"') I'm getting this error - You have a nil object when you didn't expect it! You might have expected an instance of Array. The error occured while evaluating nil.sort! #{RAILS_ROOT}/vendor/plugins/acts_as_ferret/lib/acts_as_ferret.rb:286:in `find_by_contents' #{RAILS_ROOT}/app/controllers/marketing_controller.rb:21:in `results' Any ideas what might be up? Thanks, Alastair -- Posted via http://www.ruby-forum.com/. From rubyonrails at transmogrify.co.uk Tue Sep 5 15:51:52 2006 From: rubyonrails at transmogrify.co.uk (Alastair Moore) Date: Tue, 5 Sep 2006 21:51:52 +0200 Subject: [Ferret-talk] ferret finds 'tests' but not 'test' Message-ID: <94cbc17ff76e8950daeea9a13b10afd6@ruby-forum.com> Hello all, Quick question (possibly!) - I've got a few records indexed and doing a search for 'test' reports in no hits even though I know the word 'tests' exists in the indexed field. Doing a search for 'tests' produces a result. I would have thought that 'test' would match 'tests' but no such luck! Thanks, Alastair -- Posted via http://www.ruby-forum.com/. From Neville.Burnell at bmsoft.com.au Tue Sep 5 19:17:41 2006 From: Neville.Burnell at bmsoft.com.au (Neville Burnell) Date: Wed, 6 Sep 2006 09:17:41 +1000 Subject: [Ferret-talk] [ANN] 0.10.2 release with win32 gem Message-ID: <126EC586577FD611A28E00A0C9A03758B5BC8F@maui.bmsoft.com.au> Just a quick note of thanks for the Win32 gem. On my system, Ferret indexing is about 20x faster and Ferret searching is about 100x faster. Neville -----Original Message----- From: ferret-talk-bounces at rubyforge.org [mailto:ferret-talk-bounces at rubyforge.org] On Behalf Of David Balmain Sent: Monday, 4 September 2006 2:41 PM To: ferret-talk at rubyforge.org Subject: [Ferret-talk] [ANN] 0.10.2 release with win32 gem Hey all, I've just released Ferret version 0.10.2. It is mostly just a bug fix release. The only change is that a highlight method has been added to Ferret::Index::Index. Please try it out and let me know what you think. The big news for this release is that there is also a binary win32 gem included. This is the first time I've build a gem like this so please let me know if there are any issues. Cheers, Dave _______________________________________________ Ferret-talk mailing list Ferret-talk at rubyforge.org http://rubyforge.org/mailman/listinfo/ferret-talk From dbalmain.ml at gmail.com Wed Sep 6 00:20:03 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Wed, 6 Sep 2006 13:20:03 +0900 Subject: [Ferret-talk] ferret finds 'tests' but not 'test' In-Reply-To: <94cbc17ff76e8950daeea9a13b10afd6@ruby-forum.com> References: <94cbc17ff76e8950daeea9a13b10afd6@ruby-forum.com> Message-ID: On 9/6/06, Alastair Moore wrote: > Hello all, > > Quick question (possibly!) - I've got a few records indexed and doing a > search for 'test' reports in no hits even though I know the word 'tests' > exists in the indexed field. Doing a search for 'tests' produces a > result. I would have thought that 'test' would match 'tests' but no such > luck! > > Thanks, > > Alastair The default analyzer doesn't perform any stemming. You need to create your own analyzer with a stemmer. Something like this; require 'rubygems' require 'ferret' module Ferret::Analysis class MyAnalyzer def token_stream(field, text) StemFilter.new(StandardTokenizer.new(text)) end end end index = Ferret::I.new(:analyzer => Ferret::Analysis::MyAnalyzer.new) index << "test" index << "tests debate debater debating the for," puts index.search("test").total_hits Hope that helps, Dave From Neville.Burnell at bmsoft.com.au Wed Sep 6 01:06:40 2006 From: Neville.Burnell at bmsoft.com.au (Neville Burnell) Date: Wed, 6 Sep 2006 15:06:40 +1000 Subject: [Ferret-talk] Help with Multiple Readers, 1 Writer scenario Message-ID: <126EC586577FD611A28E00A0C9A03758B5BCA0@maui.bmsoft.com.au> > Otherwise it would be possible for the document IDs of the > documents to change between the time the search is run and > the time the document is referenced. Well, I started coding to use Searcher#search_each and found myself recoding most of the infrastructure of Index#search_each (and its friends) simply to avoid its @dir.synchronize when what you were saying above started to sink in. Ie, as I understand it, I can have concurrent searchers if the index is read-only but not if I have a writer. So while its possible to have multiple readers, 1 writer, the 1 writer requirement forces use of synchronized, which means that the readers must be serialised and not concurrent - is this correct? Kind Regards Neville -----Original Message----- From: ferret-talk-bounces at rubyforge.org [mailto:ferret-talk-bounces at rubyforge.org] On Behalf Of David Balmain Sent: Monday, 4 September 2006 2:05 PM To: ferret-talk at rubyforge.org Subject: Re: [Ferret-talk] Help with Multiple Readers, 1 Writer scenario On 9/4/06, Neville Burnell wrote: > Thanks for your reply Dave, > > > The one situation where you might be better off using a single > > IndexReader is when you are relying on caching. > > Filters and Sorts are cached per IndexReader and Sorts in particular > > can take up a fair chunk of memory so if you have a large index > > (large as in number of documents, not size of data) then you may be > > better off with a single IndexReader. IndexReader is thread-safe so > > using it concurrently should be fine. > > Just to clarify, I'm using Ferret::Index::Index concurrently at the > moment, and I'm not getting concurrent searches via #search_each. IE, > if a slow wild-card search arrives first, all subsequent searches wait > until the wild-card search completes. > > So I guess #search_each is "synchronised"? That's correct. Otherwise it would be possible for the document IDs of the documents to change between the time the search is run and the time the document is referenced. For the benefit of those who don't know this, document IDs are not constant. They represent the position of the document in the index. Think of it like an array. Let's add 5 documents to the index. [0,1,2,3,4] Now let's delete documents 1 and 2; [0,3,4] So document 4 now has a doc_id of 2. If this happened in the middle of a search you'd have a problem. So instead we synchronize the the Index#search and Index#search_each methods. Now this isn't the case for Searcher#search and Searcher#search_each since the IndexReader that Searcher uses remains consistent so you should be able to use Searcher concurrently. > Therefore to have multiple searches on an index concurrently, I really > need an IndexReader per thread and I would need to manage a pool of > reusable IndexReaders? Using Ferret::Index::Index this would be true. But if performance is a concern you should definitely use a Ferret::Search::Searcher object instead anyway and you'll be able to use it concurrently. > Any pointers on how other web apps [not using Rails] handle multiple > Ferret readers? Let us know if using the Searcher object isn't adequate. > > You can actually pass an array of readers as the first (only) > parameter to > > IndexReader.new. > > > > reader = IndexReader.new([reader1, reader2, reader3]) > > > > Interesting ... I had a look, but I don't really understand what this > does? Would you elaborate please :D A MultiReader object was initially what was used to read and search multiple indexes at a time. This functionality is now simply handled by the IndexReader object. There are several uses for this. One was to store each model in a separate index and you could then offer search across multiple models using a MultiReader. Another use-case might be to have multiple indexes to speed up indexing. If for example you are scraping websites it is a very good idea to have multiple scraping processes. The best way to do this is to have each process indexing to its own index. You could then search all indexes at once using a MultiReader or you could also merge all indexes into a single index. Hope that makes sense. Cheers, Dave _______________________________________________ Ferret-talk mailing list Ferret-talk at rubyforge.org http://rubyforge.org/mailman/listinfo/ferret-talk From ksibilev at yahoo.com Wed Sep 6 01:14:37 2006 From: ksibilev at yahoo.com (Kent Sibilev) Date: Wed, 6 Sep 2006 07:14:37 +0200 Subject: [Ferret-talk] Which analyzer to use Message-ID: Lucene's standard analyzer splits words separater with underscores. Ferret doesn't do this. For example, if I create an index with only document 'test_case' and search for 'case' it doesn't find anything. Lucene on the other hand finds it. The same story goes for words separated by colons. Which analyzer should I use to emulate Lucene's StandardAnalyzer behavior? Thanks. Kent -- Posted via http://www.ruby-forum.com/. From ryansking at gmail.com Wed Sep 6 01:18:34 2006 From: ryansking at gmail.com (Ryan King) Date: Tue, 5 Sep 2006 22:18:34 -0700 Subject: [Ferret-talk] Installing In-Reply-To: <4d493d2aed144082a2aae5229c5283dc@ruby-forum.com> References: <4d493d2aed144082a2aae5229c5283dc@ruby-forum.com> Message-ID: <846f30c70609052218x5511240cj6a62e813bfd53594@mail.gmail.com> 1. Install rubygems 2. from a cli, issue: gem install ferret -ryan On 9/4/06, Richard wrote: > I'm sorry to be such a newbie, but can someone please tell me exactly > what I need to do to get acts_as_ferret working on Windows XP. I'm just > not understanding the install process. > > Also will I need to change anything before uploading the site to a linux > host? > > -- > Posted via http://www.ruby-forum.com/. > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > From Neville.Burnell at bmsoft.com.au Wed Sep 6 02:28:49 2006 From: Neville.Burnell at bmsoft.com.au (Neville Burnell) Date: Wed, 6 Sep 2006 16:28:49 +1000 Subject: [Ferret-talk] Help with Multiple Readers, 1 Writer scenario Message-ID: <126EC586577FD611A28E00A0C9A03758B5BCA2@maui.bmsoft.com.au> I've whipped up this script to demonstrate what I'm trying [and failing] to achieve. The idea is that thread t1 adds docs to the index over time, while threads t2 and t3 search the same index for the new docs. Unfortunately the script doesn't work, as t2 and t3 don't find the docs that t1 has added. Can anyone point out where I am going wrong. Thanks so much. Neville ================================= require 'rubygems' require 'ferret' p Ferret::VERSION @dir = Ferret::Store::RAMDirectory.new @writer = Ferret::Index::IndexWriter.new(:dir => @dir) @searcher = Ferret::Search::Searcher.new(@dir) @parser = Ferret::QueryParser.new @docs = [] @docs << {:id => 1, :name => 'Fred', :occupation => 'Toon'} @docs << {:id => 2, :name => 'Barney', :occupation => 'Toon'} @docs << {:id => 3, :name => 'Wilma', :occupation => 'Toon'} @docs << {:id => 4, :name => 'Betty', :occupation => 'Toon'} @docs << {:id => 5, :name => 'Pebbles', :occupation => 'Toon'} @docs << {:id => 6, :name => 'Superman', :occupation => 'Hero'} @docs << {:id => 7, :name => 'Batman', :occupation => 'Hero'} @docs << {:id => 8, :name => 'Spiderman', :occupation => 'Hero'} @docs << {:id => 9, :name => 'Green Lantern', :occupation => 'Hero'} @docs << {:id => 10, :name => 'Dr Strange', :occupation => 'Hero'} @docs << {:id => 11, :name => 'Phantom', :occupation => 'Hero'} #populate index over time t1 = Thread.new do @docs.each do |doc| p "t1: adding #{doc[:id]} to index" @writer << doc sleep(10) end end #search for heroes over time t2 = Thread.new do query_txt = 'occupation:hero' query = @parser.parse(query_txt) while true do hits = @searcher.search(query) p "t2: searching for #{query_txt} found #{hits.total_hits}" return if hits.total_hits == 6 sleep(5) end end #search for toons over time t3 = Thread.new do query_txt = 'occupation:toon' query = @parser.parse(query_txt) while true do hits = @searcher.search(query) p "t3: searching for #{query_txt} found #{hits.total_hits}" return if hits.total_hits == 5 sleep(5) end end t1.join; t2.join; t3.join From dbalmain.ml at gmail.com Wed Sep 6 02:40:19 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Wed, 6 Sep 2006 15:40:19 +0900 Subject: [Ferret-talk] Help with Multiple Readers, 1 Writer scenario In-Reply-To: <126EC586577FD611A28E00A0C9A03758B5BCA0@maui.bmsoft.com.au> References: <126EC586577FD611A28E00A0C9A03758B5BCA0@maui.bmsoft.com.au> Message-ID: On 9/6/06, Neville Burnell wrote: > > Otherwise it would be possible for the document IDs of the > > documents to change between the time the search is run and > > the time the document is referenced. > > Well, I started coding to use Searcher#search_each and found myself > recoding most of the infrastructure of Index#search_each (and its > friends) simply to avoid its @dir.synchronize when what you were saying > above started to sink in. Ie, as I understand it, I can have concurrent > searchers if the index is read-only but not if I have a writer. > > So while its possible to have multiple readers, 1 writer, the 1 writer > requirement forces use of synchronized, which means that the readers > must be serialised and not concurrent - is this correct? Close, When you open an IndexReader on the index it is opened up on that particular version (or state) of the index. So any operations on the IndexReader (like searches) will only show what was in the index at the time you opened it. Any modifications to the index (usually through and IndexWriter) that occur after you open the IndexReader will not appear in your searches. So to keep searches up to date you need to close and reopen your IndexReader every time you commit changes to the index. So the writer doesn't force the use of synchronized. Rather it forces you to decide whether searches need to return the most up to date results available or if there can be a short delay between changes being written to the index and changes appearing in the search results. The Index class makes it as simple as possible to always search the latest index but there is a performance hit. Most of the time performance should be fine. The Ferret C core has been highly optimized and will still beat most other solutions hands down, even when used in this way. Now, if I were writing an application where search performance is a big issue (as it seems to be in your case) then I would start by using the base classes like IndexReader and IndexWriter (as we've already discussed). Like I just mentioned you might allow a delay between the time the index is modified and the time those modifications appear in search results. This would allow you to update the IndexReader every minute/hour/day/week without regard to what the IndexWriter is doing. This solution works well when when scraping webpages. Google's results, for example, aren't always completely up to date with the pages they index. If one of their results is a dead link it isn't the end of the world. If, however, you are indexing data in a database it often isn't this simple. If you use the previous solution with a database that allows deletes then you need some way to handle results that reference objects that have been deleted from the database. Otherwise you will need some way to synchronize on the index (probably on the Ferret::Store::Directory like Ferret::Index::Index does) so that no searches are done while the deletion is committed to the index and the IndexReaders are updated. Another solution which I'm going to experiment with is using the index as your database. You may still keep your original database but store any data in the index that will be shown back to the user as the result of a search. That way you don't need to worry about synchronization with the database. I don't think I've explained this very clearly here so feel free to try and clarify. I will be endeavoring to write this all down more clear and comprehensible manner so that everyone can work out the solution that best fits their needs. Cheers, Dave PS: The ideal solution for me would be an object database with Ferret-like full-text search built in. I've been thinking about this a lot lately. It would certainly fit the style of development used in many Rails apps. That is to say, all access to the database must go through the model as that is where all the validation is. If you are developing this way, why bother with the relational database and ORM solution. A good object database would serve the same purpose and would be a LOT more performant. Obviously this solution wouldn't be for everybody though so enterprise developers feel free to ignore. ;-) From dbalmain.ml at gmail.com Wed Sep 6 02:43:02 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Wed, 6 Sep 2006 15:43:02 +0900 Subject: [Ferret-talk] Help with Multiple Readers, 1 Writer scenario In-Reply-To: <126EC586577FD611A28E00A0C9A03758B5BCA2@maui.bmsoft.com.au> References: <126EC586577FD611A28E00A0C9A03758B5BCA2@maui.bmsoft.com.au> Message-ID: On 9/6/06, Neville Burnell wrote: > I've whipped up this script to demonstrate what I'm trying [and failing] > to achieve. The idea is that thread t1 adds docs to the index over time, > while threads t2 and t3 search the same index for the new docs. > Unfortunately the script doesn't work, as t2 and t3 don't find the docs > that t1 has added. > > Can anyone point out where I am going wrong. Thanks so much. Please let me know if the first paragraph of my previous email doesn't explain this. Cheers, Dave From dbalmain.ml at gmail.com Wed Sep 6 02:45:30 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Wed, 6 Sep 2006 15:45:30 +0900 Subject: [Ferret-talk] Installing In-Reply-To: <4d493d2aed144082a2aae5229c5283dc@ruby-forum.com> References: <4d493d2aed144082a2aae5229c5283dc@ruby-forum.com> Message-ID: On 9/4/06, Richard wrote: > I'm sorry to be such a newbie, but can someone please tell me exactly > what I need to do to get acts_as_ferret working on Windows XP. I'm just > not understanding the install process. > > Also will I need to change anything before uploading the site to a linux > host? You'll need to make sure that the same version of Ferret (at least 0.10.2) and acts_as_ferret are installed on the linux host. From dbalmain.ml at gmail.com Wed Sep 6 02:55:46 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Wed, 6 Sep 2006 15:55:46 +0900 Subject: [Ferret-talk] Which analyzer to use In-Reply-To: References: Message-ID: On 9/6/06, Kent Sibilev wrote: > Lucene's standard analyzer splits words separater with underscores. > Ferret doesn't do this. For example, if I create an index with only > document 'test_case' and search for 'case' it doesn't find anything. > Lucene on the other hand finds it. The same story goes for words > separated by colons. > > Which analyzer should I use to emulate Lucene's StandardAnalyzer > behavior? > > Thanks. > Kent Hi Kent, No analyzer currently emulates Lucene's StandardAnalyzer exactly. You'd have to port it to Ruby which shouldn't be too hard if you know how to use racc. But is sounds to me like you don't need anything so complex. If you are indexing code you might want to try using the AsciiLetterAnalyzer. Or you could use the RegExpAnalyzer and describe your tokens with a Ruby RegExp. Something like this; include Ferret include Ferret::Analysis index = I.new(:analyzer => RegExpAnalyzer.new(/[A-Za-z0-9]/)) # or if you want case sensitive searches; index = I.new(:analyzer => RegExpAnalyzer.new(/[A-Za-z0-9]/, false)) Hope that helps, Dave From Neville.Burnell at bmsoft.com.au Wed Sep 6 03:07:44 2006 From: Neville.Burnell at bmsoft.com.au (Neville Burnell) Date: Wed, 6 Sep 2006 17:07:44 +1000 Subject: [Ferret-talk] Help with Multiple Readers, 1 Writer scenario Message-ID: <126EC586577FD611A28E00A0C9A03758B5BCA6@maui.bmsoft.com.au> Thanks Dave, I think I understand now ... FWIW, the following script works now I have read your responses. I've posted it here for others to read. ================== require 'rubygems' require 'ferret' p Ferret::VERSION @dir = Ferret::Store::RAMDirectory.new @writer = Ferret::Index::IndexWriter.new(:dir => @dir) @searcher = Ferret::Search::Searcher.new(@dir) @parser = Ferret::QueryParser.new @docs = [] @docs << {:id => 1, :name => 'Fred', :occupation => 'Toon'} @docs << {:id => 2, :name => 'Barney', :occupation => 'Toon'} @docs << {:id => 3, :name => 'Wilma', :occupation => 'Toon'} @docs << {:id => 4, :name => 'Betty', :occupation => 'Toon'} @docs << {:id => 5, :name => 'Pebbles', :occupation => 'Toon'} @docs << {:id => 6, :name => 'Superman', :occupation => 'Hero'} @docs << {:id => 7, :name => 'Batman', :occupation => 'Hero'} @docs << {:id => 8, :name => 'Spiderman', :occupation => 'Hero'} @docs << {:id => 9, :name => 'Green Lantern', :occupation => 'Hero'} @docs << {:id => 10, :name => 'Dr Strange', :occupation => 'Hero'} @docs << {:id => 11, :name => 'Phantom', :occupation => 'Hero'} #populate index over time t1 = Thread.new do @docs.each do |doc| p "t1: adding #{doc[:id]} to index" @writer << doc sleep(10) end end #search for heroes over time t2 = Thread.new do query_txt = 'occupation:hero' query = @parser.parse(query_txt) while true do hits = @searcher.search(query) p "t2: searching for #{query_txt} found #{hits.total_hits}" return if hits.total_hits == 6 sleep(5) end end #search for toons over time t3 = Thread.new do query_txt = 'occupation:toon' query = @parser.parse(query_txt) while true do hits = @searcher.search(query) p "t3: searching for #{query_txt} found #{hits.total_hits}" return if hits.total_hits == 5 sleep(5) end end t1.join; t2.join; t3.join -----Original Message----- From: ferret-talk-bounces at rubyforge.org [mailto:ferret-talk-bounces at rubyforge.org] On Behalf Of David Balmain Sent: Wednesday, 6 September 2006 4:43 PM To: ferret-talk at rubyforge.org Subject: Re: [Ferret-talk] Help with Multiple Readers, 1 Writer scenario On 9/6/06, Neville Burnell wrote: > I've whipped up this script to demonstrate what I'm trying [and > failing] to achieve. The idea is that thread t1 adds docs to the index > over time, while threads t2 and t3 search the same index for the new docs. > Unfortunately the script doesn't work, as t2 and t3 don't find the > docs that t1 has added. > > Can anyone point out where I am going wrong. Thanks so much. Please let me know if the first paragraph of my previous email doesn't explain this. Cheers, Dave _______________________________________________ Ferret-talk mailing list Ferret-talk at rubyforge.org http://rubyforge.org/mailman/listinfo/ferret-talk From Neville.Burnell at bmsoft.com.au Wed Sep 6 03:16:02 2006 From: Neville.Burnell at bmsoft.com.au (Neville Burnell) Date: Wed, 6 Sep 2006 17:16:02 +1000 Subject: [Ferret-talk] Help with Multiple Readers, 1 Writer scenario Message-ID: <126EC586577FD611A28E00A0C9A03758B5BCA7@maui.bmsoft.com.au> Oops ... My cut & paste buffer was old! The key difference between this script and the old script is that the writer thread, t1, replaces the searcher after each index update, and each reader thread, t2 and t3, grab a new copy of the searcher, which they use for the duration of a search. So the old searchers are GC'd when no longer required. =================== require 'rubygems' require 'ferret' p Ferret::VERSION @dir = Ferret::Store::RAMDirectory.new @writer = Ferret::Index::IndexWriter.new(:dir => @dir) @searcher = Ferret::Search::Searcher.new(@dir) @parser = Ferret::QueryParser.new @docs = [] @docs << {:id => 1, :name => 'Fred', :occupation => 'Toon'} @docs << {:id => 2, :name => 'Barney', :occupation => 'Toon'} @docs << {:id => 3, :name => 'Wilma', :occupation => 'Toon'} @docs << {:id => 4, :name => 'Betty', :occupation => 'Toon'} @docs << {:id => 5, :name => 'Pebbles', :occupation => 'Toon'} @docs << {:id => 6, :name => 'Superman', :occupation => 'Hero'} @docs << {:id => 7, :name => 'Batman', :occupation => 'Hero'} @docs << {:id => 8, :name => 'Spiderman', :occupation => 'Hero'} @docs << {:id => 9, :name => 'Green Lantern', :occupation => 'Hero'} @docs << {:id => 10, :name => 'Dr Strange', :occupation => 'Hero'} @docs << {:id => 11, :name => 'Phantom', :occupation => 'Hero'} #@docs.each {|doc| @writer << doc} #@writer.commit #@searcher = Ferret::Search::Searcher.new(@dir) #populate index over time t1 = Thread.new do @docs.each do |doc| p "t1: adding #{doc[:id]} to index" @writer << doc @writer.commit #new searcher @searcher = Ferret::Search::Searcher.new(@dir) sleep(10) end end #search for heroes over time t2 = Thread.new do query_txt = 'occupation:hero' query = @parser.parse(query_txt) while true do mysearcher = @searcher hits = mysearcher.search(query) p "t2: searching for #{query_txt} found #{hits.total_hits}" break if hits.total_hits == 6 sleep(5) end end #search for toons over time t3 = Thread.new do query_txt = 'occupation:toon' query = @parser.parse(query_txt) while true do mysearcher = @searcher hits = mysearcher.search(query) p "t3: searching for #{query_txt} found #{hits.total_hits}" break if hits.total_hits == 5 sleep(5) end end t1.join; t2.join; t3.join From kraemer at webit.de Wed Sep 6 04:34:47 2006 From: kraemer at webit.de (Jens Kraemer) Date: Wed, 6 Sep 2006 10:34:47 +0200 Subject: [Ferret-talk] using conditions In-Reply-To: <375200bd29a39c09b80a00c5897c768e@ruby-forum.com> References: <09dba232fd862badfcd75485cb782a29@ruby-forum.com> <20060825163541.GD8241@cordoba.webit.de> <375200bd29a39c09b80a00c5897c768e@ruby-forum.com> Message-ID: <20060906083447.GY9513@cordoba.webit.de> Hi! On Tue, Sep 05, 2006 at 09:49:29PM +0200, Alastair Moore wrote: > Jens Kraemer wrote: > > > that of course works if :area_id is an indexed field. > > > > If area_id is not indexed, the :conditions option can be used to limit > > the find call that retrieves the model instances via ActiveRecord. Note > > that you might get less results than expected when using :conditions, > > since AR filters records out that Ferret counted as a hit. > > > > correct usage: > > > > @results = SupplierProduct.find_by_contents(params[:search], {}, { > > :conditions => [] }) > > > > the first hash is for ferret-related options (like :limit, :offset and > > so on), the second for activeRecord options. > > > > Hi Jens and thanks for the help but I seem to be getting an error when I > use the following - > > @results = MarketingProduct.find_by_contents(search_text, {}, > :conditions => 'live = "t"') that should read :conditions => [ "live='t'" ] . the clause gets shifted from the array and ANDed with aaf's own where clause, and any additional parameters get added to the conditions array that is the issued to ActiveRecord. But that doesn't seem to be your exact problem... What version of aaf do you use ? From the line number that doesn't seem to be latest trunk or the last version tagged stable ? cheers, Jens > I'm getting this error - > > You have a nil object when you didn't expect it! > You might have expected an instance of Array. > The error occured while evaluating nil.sort! > > #{RAILS_ROOT}/vendor/plugins/acts_as_ferret/lib/acts_as_ferret.rb:286:in > `find_by_contents' > #{RAILS_ROOT}/app/controllers/marketing_controller.rb:21:in `results' > > Any ideas what might be up? > > Thanks, > > Alastair > > -- > Posted via http://www.ruby-forum.com/. > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From ksibilev at yahoo.com Wed Sep 6 08:01:09 2006 From: ksibilev at yahoo.com (Kent Sibilev) Date: Wed, 6 Sep 2006 14:01:09 +0200 Subject: [Ferret-talk] Which analyzer to use In-Reply-To: References: Message-ID: David Balmain wrote: > On 9/6/06, Kent Sibilev wrote: > No analyzer currently emulates Lucene's StandardAnalyzer exactly. > You'd have to port it to Ruby which shouldn't be too hard if you know > how to use racc. But is sounds to me like you don't need anything so > complex. If you are indexing code you might want to try using the > AsciiLetterAnalyzer. No, it doesn't do what I want. Looking at the code I'm slightly confused. The criteria is that if isalpha returns 0 then we reached the end of a token. Does it mean that '_' character is considered alphanumeric? > Or you could use the RegExpAnalyzer and describe > your tokens with a Ruby RegExp. Something like this; > > include Ferret > include Ferret::Analysis > index = I.new(:analyzer => RegExpAnalyzer.new(/[A-Za-z0-9]/)) > > # or if you want case sensitive searches; > index = I.new(:analyzer => RegExpAnalyzer.new(/[A-Za-z0-9]/, false)) > It would be great if this code worked, but it segfaulted on me. I've glanced at the code and noticed that for this type of stream typedef struct RegExpTokenStream { CachedTokenStream super; VALUE rtext; VALUE regex; VALUE proc; int curr_ind; } RegExpTokenStream; you initialize tree VALUE objects but never mark them for garbage collector. Eventually they are being freed behind my back. What you should do is to keep the type of the stream in TokenStream structure and rework frt_ts_mark method. Hope that helps, Kent -- Posted via http://www.ruby-forum.com/. From dbalmain.ml at gmail.com Wed Sep 6 08:21:21 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Wed, 6 Sep 2006 21:21:21 +0900 Subject: [Ferret-talk] Which analyzer to use In-Reply-To: References: Message-ID: On 9/6/06, Kent Sibilev wrote: > David Balmain wrote: > > On 9/6/06, Kent Sibilev wrote: > > No analyzer currently emulates Lucene's StandardAnalyzer exactly. > > You'd have to port it to Ruby which shouldn't be too hard if you know > > how to use racc. But is sounds to me like you don't need anything so > > complex. If you are indexing code you might want to try using the > > AsciiLetterAnalyzer. > > No, it doesn't do what I want. Looking at the code I'm slightly > confused. The criteria is that if isalpha returns 0 then we reached the > end of a token. Does it mean that '_' character is considered > alphanumeric? irb(main):001:0> require 'rubygems' irb(main):002:0> require 'ferret' irb(main):004:0> i = Ferret::I.new(:analyzer => Ferret::Analysis::AsciiLetterAnalyzer.new) irb(main):005:0> i << "test_case" irb(main):006:0> i.search("case") => #], max_score=0.191783010959625> irb(main):007:0> So no, '_' is not considered alphanumeric (or in this case alpha, as AsciiLetterAnalyzer won't match numbers) > > Or you could use the RegExpAnalyzer and describe > > your tokens with a Ruby RegExp. Something like this; > > > > include Ferret > > include Ferret::Analysis > > index = I.new(:analyzer => RegExpAnalyzer.new(/[A-Za-z0-9]/)) > > > > # or if you want case sensitive searches; > > index = I.new(:analyzer => RegExpAnalyzer.new(/[A-Za-z0-9]/, false)) > > > > It would be great if this code worked, but it segfaulted on me. I've > glanced at the code and noticed that for this type of stream > > typedef struct RegExpTokenStream { > CachedTokenStream super; > VALUE rtext; > VALUE regex; > VALUE proc; > int curr_ind; > } RegExpTokenStream; > > > you initialize tree VALUE objects but never mark them for garbage > collector. Eventually they are being freed behind my back. What you > should do is to keep the type of the stream in TokenStream structure and > rework frt_ts_mark method. > > Hope that helps, > Kent Actually, frt_rets_mark already marks the three VALUE objects correctly. What would really help would be if you could give me an example script that segfaults. If you can do this I'll fix it and get a new gem out as soon as possible. Cheers, Dave From dbalmain.ml at gmail.com Wed Sep 6 08:30:42 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Wed, 6 Sep 2006 21:30:42 +0900 Subject: [Ferret-talk] Which analyzer to use In-Reply-To: References: Message-ID: On 9/6/06, David Balmain wrote: > correctly. What would really help would be if you could give me an > example script that segfaults. If you can do this I'll fix it and get > a new gem out as soon as possible. Actually, hold on that, I think I've found the problem. From rubyonrails at transmogrify.co.uk Wed Sep 6 08:36:39 2006 From: rubyonrails at transmogrify.co.uk (Alastair Moore) Date: Wed, 6 Sep 2006 14:36:39 +0200 Subject: [Ferret-talk] ferret finds 'tests' but not 'test' In-Reply-To: References: <94cbc17ff76e8950daeea9a13b10afd6@ruby-forum.com> Message-ID: David Balmain wrote: > On 9/6/06, Alastair Moore wrote: >> Alastair > The default analyzer doesn't perform any stemming. You need to create > your own analyzer with a stemmer. Something like this; > > require 'rubygems' > require 'ferret' > > module Ferret::Analysis > class MyAnalyzer > def token_stream(field, text) > StemFilter.new(StandardTokenizer.new(text)) > end > end > end > > index = Ferret::I.new(:analyzer => Ferret::Analysis::MyAnalyzer.new) > > index << "test" > index << "tests debate debater debating the for," > puts index.search("test").total_hits > > Hope that helps, > Dave Hi Dave, Many thanks for the help, it does help! However given the short timespan for this project, I think the users of the site will just have to be a bit more specific in their search terms :) Cheers and will bookmark your reply for a later project. Alastair -- Posted via http://www.ruby-forum.com/. From dbalmain.ml at gmail.com Wed Sep 6 09:26:16 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Wed, 6 Sep 2006 22:26:16 +0900 Subject: [Ferret-talk] Which analyzer to use In-Reply-To: References: Message-ID: On 9/6/06, David Balmain wrote: > On 9/6/06, David Balmain wrote: > > correctly. What would really help would be if you could give me an > > example script that segfaults. If you can do this I'll fix it and get > > a new gem out as soon as possible. > > Actually, hold on that, I think I've found the problem. Hi Kent, I've put in a fix which I think should fix your segfault. Unfortunately I can't seem to replicate the bug here to test it. Even calling GC.start doesn't seem to collect any of the three VALUES in RegExpTokenStream. I've had problems like this before when trying to test an implemention of a weak-key Hash. I really need to look into how the Ruby garbage collector works but it never seems to work predictable for me. Anyway, I was hoping you could help me out, either by testing your code against the latest version of Ferret in subversion or sending me a short (or long, I don't really care) script which causes the problem. If it'll make it any easier I can email you a gem of the current working version Ferret. Cheers, Dave From ksibilev at yahoo.com Wed Sep 6 10:23:30 2006 From: ksibilev at yahoo.com (Kent Sibilev) Date: Wed, 6 Sep 2006 16:23:30 +0200 Subject: [Ferret-talk] Which analyzer to use In-Reply-To: References: Message-ID: <5c029c7406aaf99bf4081168d1e57e67@ruby-forum.com> David Balmain wrote: > On 9/6/06, Kent Sibilev wrote: >> end of a token. Does it mean that '_' character is considered >> alphanumeric? > > irb(main):001:0> require 'rubygems' > irb(main):002:0> require 'ferret' > irb(main):004:0> i = Ferret::I.new(:analyzer => > Ferret::Analysis::AsciiLetterAnalyzer.new) > irb(main):005:0> i << "test_case" > irb(main):006:0> i.search("case") > => # Ferret::Search::Hit doc=0, score=0.191783010959625>], > max_score=0.191783010959625> > irb(main):007:0> > > So no, '_' is not considered alphanumeric (or in this case alpha, as > AsciiLetterAnalyzer won't match numbers) > Yes. It seems to work correctly, but I've noticed that index.search_each doesn't return more that 10 documents. Is there an option to change it? >> >> >> >> you initialize tree VALUE objects but never mark them for garbage >> collector. Eventually they are being freed behind my back. What you >> should do is to keep the type of the stream in TokenStream structure and >> rework frt_ts_mark method. >> >> Hope that helps, >> Kent > > Actually, frt_rets_mark already marks the three VALUE objects > correctly. What would really help would be if you could give me an > example script that segfaults. If you can do this I'll fix it and get > a new gem out as soon as possible. > I guess I didn't look carefully at the code. -- Posted via http://www.ruby-forum.com/. From ksibilev at yahoo.com Wed Sep 6 10:24:30 2006 From: ksibilev at yahoo.com (Kent Sibilev) Date: Wed, 6 Sep 2006 16:24:30 +0200 Subject: [Ferret-talk] Which analyzer to use In-Reply-To: References: Message-ID: David Balmain wrote: > On 9/6/06, David Balmain wrote: >> On 9/6/06, David Balmain wrote: >> > correctly. What would really help would be if you could give me an >> > example script that segfaults. If you can do this I'll fix it and get >> > a new gem out as soon as possible. >> >> Actually, hold on that, I think I've found the problem. > > Hi Kent, > > I've put in a fix which I think should fix your segfault. > Unfortunately I can't seem to replicate the bug here to test it. Even > calling GC.start doesn't seem to collect any of the three VALUES in > RegExpTokenStream. I've had problems like this before when trying to > test an implemention of a weak-key Hash. I really need to look into > how the Ruby garbage collector works but it never seems to work > predictable for me. > > Anyway, I was hoping you could help me out, either by testing your > code against the latest version of Ferret in subversion or sending me > a short (or long, I don't really care) script which causes the > problem. If it'll make it any easier I can email you a gem of the > current working version Ferret. Can you send it to my email at ksibilev at yahoo dot com? Thanks. -- Posted via http://www.ruby-forum.com/. From dbalmain.ml at gmail.com Wed Sep 6 11:09:28 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Thu, 7 Sep 2006 00:09:28 +0900 Subject: [Ferret-talk] Which analyzer to use In-Reply-To: <5c029c7406aaf99bf4081168d1e57e67@ruby-forum.com> References: <5c029c7406aaf99bf4081168d1e57e67@ruby-forum.com> Message-ID: On 9/6/06, Kent Sibilev wrote: > Yes. It seems to work correctly, but I've noticed that index.search_each > doesn't return more that 10 documents. Is there an option to change it? Yep, :limit. The documentation is wrong in 0.10.2. It will be corrected in the next version. index.search_each(query, :limit => 20) #... Or you can get all results like this; index.search_each(query, :limit => :all) #... If you are paging through results, use :offset; index.search_each(query, :limit => 20, :offset => 40) #... Cheers, Dave From ksruby at gmail.com Wed Sep 6 11:24:33 2006 From: ksruby at gmail.com (Kent Sibilev) Date: Wed, 6 Sep 2006 11:24:33 -0400 Subject: [Ferret-talk] Which analyzer to use In-Reply-To: References: <5c029c7406aaf99bf4081168d1e57e67@ruby-forum.com> Message-ID: <477eb2b30609060824x4346f0d3ob50bebf48259a78d@mail.gmail.com> On 9/6/06, David Balmain wrote: > > On 9/6/06, Kent Sibilev wrote: > > Yes. It seems to work correctly, but I've noticed that index.search_each > > doesn't return more that 10 documents. Is there an option to change it? > > Yep, :limit. The documentation is wrong in 0.10.2. It will be > corrected in the next version. > > index.search_each(query, :limit => 20) #... > > Or you can get all results like this; > > index.search_each(query, :limit => :all) #... > > If you are paging through results, use :offset; > > index.search_each(query, :limit => 20, :offset => 40) #... > > Perfect, but I think it would be less confusing it you set :all as a default value for :limit. BTW, Ferrect is quite faster than Lucene. I haven't checked all query types, but ones that I use most of the time are fast. Congrats! -- Kent --- http://www.datanoise.com -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20060906/dad0d021/attachment.html From junkeraccount at yahoo.com Wed Sep 6 13:10:23 2006 From: junkeraccount at yahoo.com (Caleb) Date: Wed, 6 Sep 2006 19:10:23 +0200 Subject: [Ferret-talk] uninitialized constant UNTOKENIZED In-Reply-To: <20060830065505.GD9513@cordoba.webit.de> References: <44F49AAE.6030109@benjaminkrause.com> <7aff201a8586f60b04319ebad11b9c0d@ruby-forum.com> <44F49D07.1000309@benjaminkrause.com> <20060829220951.GA11083@cordoba.webit.de> <70cdbfdb4971c95043470bd0e3cad9cc@ruby-forum.com> <20060830065505.GD9513@cordoba.webit.de> Message-ID: <5b60ca654ba01f07b880ddba39ab39d0@ruby-forum.com> Jens Kraemer wrote: > it should get called whenever acts_as_ferret indexes a record, since it > is referenced in the :fields hash. what does aaf log when you create a > new Url record ? Sorry for the delayed response. I find it annoying when thread are started that could be helpful to others and the author doesn't take time to indicate what ultimately solved the problem. So, I won't do that here. You're right, url_parts IS being called when a Url is CREATED. I was thinking that it would be called upon SEARCHING. I guess that wouldn't make sense unless you wanted to re-index everything on every search (not a good idea). So, the url_parts method works as expected. -- Posted via http://www.ruby-forum.com/. From jc.michel at symetrie.com Wed Sep 6 14:56:23 2006 From: jc.michel at symetrie.com (Jean-Christophe Michel) Date: Wed, 6 Sep 2006 20:56:23 +0200 Subject: [Ferret-talk] using highlight from aaf In-Reply-To: References: <8eea6dfd472d48aad53cf8f945ec3312@symetrie.com> <5b609d0df1bec0ec11b8944e3dd34887@symetrie.com> <96e6dc6806cffab8d7dc915a90e712b0@symetrie.com> Message-ID: <901e25f7bb20e3eda1e29106a827cf8b@symetrie.com> HI Dave, Le 5 sept. 06, ? 02:24, David Balmain a ?crit : > Ahhhh, of course. Sorry. Jens mentioned that yesterday so I should > have realized. You need to store the field as well as its term vector > :with_positions_offsets if you want to highlight it. The :term_vector > setting is :with_positions_offsets by default in aaf so you only need > to change the :store setting for the field you want to highlight. I'm not convinced about storing everything once more whereas I already store the texts in db. More, I don't know how to do it in aaf ;-) > By the way, Myclass.ferret_index.doc(12) will always return {}. The > documents are lazy loading now so Myclass.ferret_index.doc(12)[:id] > will return the model ID. You can load all fields with the load > method. Try; > > puts Myclass.ferret_index.doc(12).load().inspect() > > That should show you which fields are actually stored which in the > case of acts_as_ferret will only be the model ID (I think??). You are right. Anyway, I stick to my own ruby highlighting atm. Jean-Christophe Michel From Neville.Burnell at bmsoft.com.au Wed Sep 6 23:56:35 2006 From: Neville.Burnell at bmsoft.com.au (Neville Burnell) Date: Thu, 7 Sep 2006 13:56:35 +1000 Subject: [Ferret-talk] Help with Multiple Readers, 1 Writer scenario Message-ID: <126EC586577FD611A28E00A0C9A03758B5BCB2@maui.bmsoft.com.au> Thanks for your email Dave, I've thought about this overnight, and I've got a few questions please. > When you open an IndexReader on the index it is opened up on > that particular version (or state) of the index Would you elaborate on how Ferret manages versions please. For example, can I have two readers open, one which accesses the old version of the index, and the second which accesses the latest version? > So to keep searches up to date you need to close and reopen > your IndexReader every time you commit changes to the index. I guess by reopen you mean IndexReader.new ? I proceeded to replace my Index usage with an IndexReader and Searcher which are closed and recreated after each IndexWriter pass, and the result seems to be that searches are still serialised - ie, a long running query on thread t1 "blocks" the normally very fast query on thread t1. Might I be seeing another point of synchonisation, or am I just observing a characteristic of ruby threads ? Kind Regards, Neville From dbalmain.ml at gmail.com Thu Sep 7 02:07:12 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Thu, 7 Sep 2006 15:07:12 +0900 Subject: [Ferret-talk] Help with Multiple Readers, 1 Writer scenario In-Reply-To: <126EC586577FD611A28E00A0C9A03758B5BCB2@maui.bmsoft.com.au> References: <126EC586577FD611A28E00A0C9A03758B5BCB2@maui.bmsoft.com.au> Message-ID: On 9/7/06, Neville Burnell wrote: > Thanks for your email Dave, > > I've thought about this overnight, and I've got a few questions please. > > > When you open an IndexReader on the index it is opened up on > > that particular version (or state) of the index > > Would you elaborate on how Ferret manages versions please. For example, > can I have two readers open, one which accesses the old version of the > index, and the second which accesses the latest version? When you open an IndexReader it opens all the files that it needs to read the index and it keeps all of the file handles. Even after the index is updated and those files are deleted they are not actually freed by the operating system. If you then open an IndexReader on a later version it holds file handles to all the files needed for that version. So the answer is yes, you can have multiple IndexReaders open on an index at the same time, all reading different versions. Each version of the index has an internal version number and there is an IndexReader#latest? method to determine if the version of the index that you are reading is the current version. > > So to keep searches up to date you need to close and reopen > > your IndexReader every time you commit changes to the index. > > I guess by reopen you mean IndexReader.new ? That's correct. Don't forget to close the old IndexReader. That garbage collector will do this for you but IndexReaders hold a lot of resources so it's best to close them as soon as you no longer need them. > I proceeded to replace my Index usage with an IndexReader and Searcher > which are closed and recreated after each IndexWriter pass, and the > result seems to be that searches are still serialised - ie, a long > running query on thread t1 "blocks" the normally very fast query on > thread t1. > > Might I be seeing another point of synchonisation, or am I just > observing a characteristic of ruby threads ? I think it's probably a symptom of using ruby threads. I don't think they can swap threads in the middle of a call to a C function. It's unusual, however for a search to take long enough to be a problem though. What kind of search is it? If it's a PrefixQuery, FuzzyQuery or WildCardQuery you'll get much better performance on an optimized index. If you are making heavy use of any of these queries it is the one time I'd recommend always keeping the index in an optimized state. cheers, Dave From Pedro.CorteReal at iantt.pt Thu Sep 7 06:33:44 2006 From: Pedro.CorteReal at iantt.pt (Pedro =?ISO-8859-1?Q?C=F4rte-Real?=) Date: Thu, 07 Sep 2006 11:33:44 +0100 Subject: [Ferret-talk] disabling automatic indexing in acts_as_ferret In-Reply-To: <20060901112653.GQ9513@cordoba.webit.de> References: <1156504226.6728.4.camel@localhost.localdomain> <20060901112653.GQ9513@cordoba.webit.de> Message-ID: <1157625224.5407.3.camel@localhost.localdomain> On Fri, 2006-09-01 at 13:26 +0200, Jens Kraemer wrote: > there's an instance variable @ferret_reindex that's checked before the > indexing takes place. > Something like > > def save_noindex > @ferret_reindex = false > save > end > > in your model should save the record without reindexing it. This doesn't work because of this code: def ferret_before_update @ferret_reindex = true end alias :ferret_before_create :ferret_before_update # add to index def ferret_create logger.debug "ferret_create/update: #{self.class.name} : #{self.id}" self.class.ferret_index << self.to_doc if @ferret_reindex @ferret_reindex = true true end alias :ferret_update :ferret_createe This makes @ferret_reindex always true when ferret_create runs. > The boolean > is set to true in the after_save handler again, so the next call to save > should reindex again. I'm guessing the development branch has after_ handlers instead of before_. > You may call ferret_update directly to reindex > without saving, too. This worked great. Thanks, Pedro. From kraemer at webit.de Thu Sep 7 08:07:38 2006 From: kraemer at webit.de (Jens Kraemer) Date: Thu, 7 Sep 2006 14:07:38 +0200 Subject: [Ferret-talk] disabling automatic indexing in acts_as_ferret In-Reply-To: <1157625224.5407.3.camel@localhost.localdomain> References: <1156504226.6728.4.camel@localhost.localdomain> <20060901112653.GQ9513@cordoba.webit.de> <1157625224.5407.3.camel@localhost.localdomain> Message-ID: <20060907120738.GB17026@cordoba.webit.de> On Thu, Sep 07, 2006 at 11:33:44AM +0100, Pedro C?rte-Real wrote: > On Fri, 2006-09-01 at 13:26 +0200, Jens Kraemer wrote: > > there's an instance variable @ferret_reindex that's checked before the > > indexing takes place. > > Something like > > > > def save_noindex > > @ferret_reindex = false > > save > > end > > > > in your model should save the record without reindexing it. > > This doesn't work because of this code: > > def ferret_before_update > @ferret_reindex = true > end > alias :ferret_before_create :ferret_before_update > > # add to index > def ferret_create > logger.debug "ferret_create/update: #{self.class.name} : #{self.id}" > self.class.ferret_index << self.to_doc if @ferret_reindex > @ferret_reindex = true > true > end > alias :ferret_update :ferret_createe > > This makes @ferret_reindex always true when ferret_create runs. right, the only way to avoid this seems to be overriding ferret_beforee_update. I'll have a look into this. Jens > > > > The boolean > > is set to true in the after_save handler again, so the next call to save > > should reindex again. > > I'm guessing the development branch has after_ handlers instead of > before_. > > > You may call ferret_update directly to reindex > > without saving, too. > > This worked great. > > Thanks, > > Pedro. > > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From waspfactory at ggggggggggmaaaail.com Thu Sep 7 10:58:08 2006 From: waspfactory at ggggggggggmaaaail.com (Caspar) Date: Thu, 7 Sep 2006 16:58:08 +0200 Subject: [Ferret-talk] counting occurences of words in the result set Message-ID: Hello, I need to be able to count the occurences of certain terms in the reults. Currently my setup is Ferret 0.10.1 aaf bleeding edge. results = VoObject.find_by_contents(query,:offset=>page, :limit=> 20,:sort => sort_fields) I use results.total_hits for pagination. This all works really nicely. However i need to be able to know how many occurences of certain predefined terms occur in each result set. So in the animals fields there can be "mouse", "cat", "fish". A perfect sollution would be to have the results set has some extra attributes like results.cat_hits (that would be amazing) In reality there needs to be counts for 5 different fields. So is this something that ferret can do easily? How do i get ferret and aaf to produce this data for each search result? What should i go and investigate? Best reagards caspar -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Thu Sep 7 11:10:28 2006 From: kraemer at webit.de (Jens Kraemer) Date: Thu, 7 Sep 2006 17:10:28 +0200 Subject: [Ferret-talk] counting occurences of words in the result set In-Reply-To: References: Message-ID: <20060907151028.GC23939@cordoba.webit.de> On Thu, Sep 07, 2006 at 04:58:08PM +0200, Caspar wrote: > Hello, I need to be able to count the occurences of certain terms in the > reults. > Currently my setup is Ferret 0.10.1 aaf bleeding edge. > > results = VoObject.find_by_contents(query,:offset=>page, :limit=> > 20,:sort => sort_fields) > > I use results.total_hits for pagination. This all works really nicely. > However i need to be able to know how many occurences of certain > predefined terms occur in each result set. So in the animals fields > there can be "mouse", "cat", "fish". > A perfect sollution would be to have the results set has some extra > attributes like results.cat_hits (that would be amazing) In reality > there needs to be counts for 5 different fields. > > So is this something that ferret can do easily? > How do i get ferret and aaf to produce this data for each search result? > What should i go and investigate? I'd first try to just issue a seperate query for each of your special terms (ANDed with the original query), and take it's result count. Ideally you wouldn't use find_by_contents for this (because it fetches results from the db, which you don't want here), but use something like VoObject.ferret_index.search(query + " AND cat",...).total_hits Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From waspfactory at ggggggggggmaaaail.com Thu Sep 7 11:44:17 2006 From: waspfactory at ggggggggggmaaaail.com (Caspar) Date: Thu, 7 Sep 2006 17:44:17 +0200 Subject: [Ferret-talk] counting occurences of words in the result set In-Reply-To: <20060907151028.GC23939@cordoba.webit.de> References: <20060907151028.GC23939@cordoba.webit.de> Message-ID: <67baa891c671463028dc8779eeabda23@ruby-forum.com> Hi Jens, Thankyou for getting back so quickly. I should have given more information about the problem tho. One of the fields contains about 35 predefined values. I hope there is a more efficient way of producing these counts or I may well have to drop this functionality from the app. Any other ideas? I really appreciate the speed with which people reply on this forum. Regards c Jens Kraemer wrote: > On Thu, Sep 07, 2006 at 04:58:08PM +0200, Caspar wrote: >> there can be "mouse", "cat", "fish". >> A perfect sollution would be to have the results set has some extra >> attributes like results.cat_hits (that would be amazing) In reality >> there needs to be counts for 5 different fields. >> >> So is this something that ferret can do easily? >> How do i get ferret and aaf to produce this data for each search result? >> What should i go and investigate? > > I'd first try to just issue a seperate query for each of your special > terms (ANDed with the original query), and take it's result count. > > Ideally you wouldn't use find_by_contents for this (because it fetches > results from the db, which you don't want here), but use something like > > VoObject.ferret_index.search(query + " AND cat",...).total_hits > > > Jens > > -- > webit! Gesellschaft f?r neue Medien mbH www.webit.de > Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de > Schnorrstra?e 76 Tel +49 351 46766 0 > D-01069 Dresden Fax +49 351 46766 66 -- Posted via http://www.ruby-forum.com/. From waspfactory at ggggggggggmaaaail.com Thu Sep 7 13:55:08 2006 From: waspfactory at ggggggggggmaaaail.com (Caspar) Date: Thu, 7 Sep 2006 19:55:08 +0200 Subject: [Ferret-talk] counting occurences of words in the result set In-Reply-To: <67baa891c671463028dc8779eeabda23@ruby-forum.com> References: <20060907151028.GC23939@cordoba.webit.de> <67baa891c671463028dc8779eeabda23@ruby-forum.com> Message-ID: <9cd1b5b381a560571db0acdcee558142@ruby-forum.com> Hi okay we have spent the last few hours trawling through the ferret api and have come accross lots of promising leads, and many questions. index_reader.doc_freq(field, term) ? integer Return the number of documents in which the term term appears in the field field. would seem to partly fit the requirmemts. However when i have tried to instantiate a new index_reader like this reader = Ferret::Index::IndexReader.new("/home/c/V_O_2/index/development/vo_object/") and then try to access some of the documents returned by search_each i am only able to access the :id field. Q1: how do you create an index_reader that is able to access your aaf index? Q2: how do you actually return the contents of a field? Q3: How can i combine doc_freq (which seems perfect) with a search to count the frequency of terms? any answers would be brilliant. best regards caspar -- Posted via http://www.ruby-forum.com/. From clare.cavanagh at nospam.co.uk Thu Sep 7 14:11:16 2006 From: clare.cavanagh at nospam.co.uk (Clare) Date: Thu, 7 Sep 2006 20:11:16 +0200 Subject: [Ferret-talk] counting occurences of words in the result set In-Reply-To: <9cd1b5b381a560571db0acdcee558142@ruby-forum.com> References: <20060907151028.GC23939@cordoba.webit.de> <67baa891c671463028dc8779eeabda23@ruby-forum.com> <9cd1b5b381a560571db0acdcee558142@ruby-forum.com> Message-ID: <74bffe861d0b6bcb15632e0e538f2a87@ruby-forum.com> Caspar I have been trying to get the same thing working for a while but did not ever find a solution. It would help greatly if someone has the answer to this because I want to add this capability to my search to provide additional information to the user in the results page. But I only got the :id from the index also... :( Any help would be appreciated on this one. Thanks in advance as always! Clare -- Posted via http://www.ruby-forum.com/. From wiseleyb at gmail.com Thu Sep 7 14:44:17 2006 From: wiseleyb at gmail.com (ben) Date: Thu, 7 Sep 2006 20:44:17 +0200 Subject: [Ferret-talk] invalid characters with win32 Message-ID: I'm running on webbrick 1.3.1, winxp, ruby 1.8.4, rails 1.1.6, ferret-0.10.2-mswin32 Used gem install ferret - didn't report any errors. When I do a require 'ferret' in any of my controllers actionview blows up with a SyntaxError in Default#index /ruby/lib/ruby/gems/1.8/gems/actionpack-1.12.5/lib/action_view/base.rb:307:in `compile_and_render_template' complaining about invalid characters. Might be related/similar to this bug http://www.ruby-forum.com/topic/62365 I ran the \ferret-0.10.2-mswin32\test\test_all.rb and everything looked good. Any ideas on this? -ben -- Posted via http://www.ruby-forum.com/. From contact at ezabel.com Thu Sep 7 22:18:26 2006 From: contact at ezabel.com (Ian Zabel) Date: Fri, 8 Sep 2006 04:18:26 +0200 Subject: [Ferret-talk] invalid characters with win32 In-Reply-To: References: Message-ID: <41815996301eeac97b7416ba0f40f017@ruby-forum.com> ben wrote: > I'm running on webbrick 1.3.1, winxp, ruby 1.8.4, rails 1.1.6, > ferret-0.10.2-mswin32 > I'm running on the same, and am now getting weird errors on two separate machines. Different errors than you're getting, though. compile error ./script/../config/../app/views/layouts/application.rhtml:58: parse error, unexpected $, expecting kEND _erbout.concat "
\n" ^ What doesn't make sense to me is that the errors are not even inside of ruby scriptlets, they're just in normal HTML parts of the view. And all of my previous scriptlets are properly closed. If I mess around with the layout by removing different parts of the html, I get different errors, and then, randomly, some section of html being removed makes it work again. Very, very odd behavior. The SAME code (no messing around) running in my linux environment is working fine. If I remove acts_as_ferret from my model AND the filesystem, things begin to work without any changes to the layout or other views. I don't get it. :/ Maybe I'll try downgrading to Ferret 0.10.2 Ruby instead of win32 and see what happens. -- Posted via http://www.ruby-forum.com/. From dbalmain.ml at gmail.com Thu Sep 7 22:58:05 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Fri, 8 Sep 2006 11:58:05 +0900 Subject: [Ferret-talk] counting occurences of words in the result set In-Reply-To: <74bffe861d0b6bcb15632e0e538f2a87@ruby-forum.com> References: <20060907151028.GC23939@cordoba.webit.de> <67baa891c671463028dc8779eeabda23@ruby-forum.com> <9cd1b5b381a560571db0acdcee558142@ruby-forum.com> <74bffe861d0b6bcb15632e0e538f2a87@ruby-forum.com> Message-ID: On 9/8/06, Clare wrote: > Caspar > > I have been trying to get the same thing working for a while but did not > ever find a solution. It would help greatly if someone has the answer to > this because I want to add this capability to my search to provide > additional information to the user in the results page. > > But I only got the :id from the index also... :( > > Any help would be appreciated on this one. > > Thanks in advance as always! By default acts_as_ferret only stores the :id. You need to set the :store parameter of any other fields that you want stored. Something like this; acts_as_ferret :fields => { :title => { :store => :yes } :content => { :store => :yes } } As for counting the the frequency of terms in a resultset, IndexReader#doc_freq probably won't work. It will counts the frequency of terms in the index, not in the resultset. So back to the problem. Jens gave the solution I would probably use. Ferret's searches are faster enough that this solution is quite feasible for most indexes. Try it. You might be surprised. The alternative is counting throught the resultset. To do this you will need to set :limit => :all in the search_each method so you get all results back, then iterate through each result counting the occurances. For a huge index - slow query - small resultset this might be faster. Also, with the new filter_proc method there is another way you can do this without having to retrieve all results; require 'rubygems' require 'ferret' include Ferret index = I.new words = %w{one two three four five} 100000.times do |i| index << {:id => "%05d" % i, :word => words[rand(words.size)]} end counter = Hash.new(0) filter_proc = lambda do |doc, score, searcher| counter[searcher[doc][:word]] += 1 end resultset = index.search("id:[10000 20000}", :limit => 1, :filter_proc => filter_proc) puts resultset.total_hits puts counter.inspect Hope that helps, Dave From contact at ezabel.com Thu Sep 7 23:03:59 2006 From: contact at ezabel.com (Ian Zabel) Date: Fri, 8 Sep 2006 05:03:59 +0200 Subject: [Ferret-talk] invalid characters with win32 In-Reply-To: <41815996301eeac97b7416ba0f40f017@ruby-forum.com> References: <41815996301eeac97b7416ba0f40f017@ruby-forum.com> Message-ID: Well, I tried to install 0.10.2 Ruby on windows, but that doesn't work since it tries to compile the C libs. So I went back to 0.9.6 and aaf stable, and all is well again. -- Posted via http://www.ruby-forum.com/. From dbalmain.ml at gmail.com Thu Sep 7 23:16:07 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Fri, 8 Sep 2006 12:16:07 +0900 Subject: [Ferret-talk] invalid characters with win32 In-Reply-To: References: Message-ID: On 9/8/06, ben wrote: > I'm running on webbrick 1.3.1, winxp, ruby 1.8.4, rails 1.1.6, > ferret-0.10.2-mswin32 > > Used gem install ferret - didn't report any errors. > > When I do a > require 'ferret' > in any of my controllers actionview blows up with a > > SyntaxError in Default#index > /ruby/lib/ruby/gems/1.8/gems/actionpack-1.12.5/lib/action_view/base.rb:307:in > `compile_and_render_template' > > complaining about invalid characters. > > Might be related/similar to this bug > http://www.ruby-forum.com/topic/62365 I'd say it is definitely related to this. Which version of Ruby do you have? I've compiled against 1.8.4 which isn't the latest. Have you tried the whitespace solution described here; http://rails.techno-weenie.net/question/2006/3/26/how_do_i_get_file_column_plugin_working_on_windows_with_ruby_1_8_4 ie, replace all tabs with spaces. I have no idea why this would make a difference but it seems to. I'll puts out Ferret 0.10.3 today and I'll compile it against 1.8.5. Hopefully that will help. Cheers, Dave > I ran the \ferret-0.10.2-mswin32\test\test_all.rb and everything looked > good. > > Any ideas on this? > > -ben > > > -- > Posted via http://www.ruby-forum.com/. > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > From contact at ezabel.com Fri Sep 8 00:41:45 2006 From: contact at ezabel.com (Ian Zabel) Date: Fri, 8 Sep 2006 06:41:45 +0200 Subject: [Ferret-talk] invalid characters with win32 In-Reply-To: References: Message-ID: David Balmain wrote: > I'd say it is definitely related to this. Which version of Ruby do you > have? I've compiled against 1.8.4 which isn't the latest. I'm on 1.8.4 here. > Have you tried the whitespace solution described here; > > http://rails.techno-weenie.net/question/2006/3/26/how_do_i_get_file_column_plugin_working_on_windows_with_ruby_1_8_4 > > ie, replace all tabs with spaces. I haven't tried that yet, but I'll give it a shot. Ian. -- Posted via http://www.ruby-forum.com/. From mxcurioni at yahoo.com Fri Sep 8 02:04:34 2006 From: mxcurioni at yahoo.com (Maxime Curioni) Date: Fri, 8 Sep 2006 08:04:34 +0200 Subject: [Ferret-talk] tweaking minimum word length? In-Reply-To: References: <64E8273D-D7CC-4DE9-8BA4-CB50DAF4D123@fhwang.net> Message-ID: Hello Dave, Sorry for responding so late. I am actually using Ferret via the acts_as_ferret Rails plugin. I have a problem with small words, especially when I search for them between quotes. For example, I have indexed the following sentence: "e-commerce growth strategy for a major business to leverage key intangible assets" When I search for the sentence '"for a"' (not just 'for AND a' but the sentence "for a"), I don't get any results. Is there a way to impose to Ferret to return results _strictly_ containing certain words (i.e. exact results, not approximate results) ? I am also experiencing problems with words containing special characters (especially words separated with dashes). Is there a way to send a raw query to Ferret without having to escape the special characters ? Thank you for your help, Maxime Curioni David Balmain wrote: > Hi Maxime, > > Ferret already indexes all words no matter what their length (unless > you add a custom filter). Could you give an example of the problem? > ie. what words are you trying to search for? > > Cheers, > Dave -- Posted via http://www.ruby-forum.com/. From dbalmain.ml at gmail.com Fri Sep 8 02:24:27 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Fri, 8 Sep 2006 15:24:27 +0900 Subject: [Ferret-talk] tweaking minimum word length? In-Reply-To: References: <64E8273D-D7CC-4DE9-8BA4-CB50DAF4D123@fhwang.net> Message-ID: On 9/8/06, Maxime Curioni wrote: > Hello Dave, > Sorry for responding so late. I am actually using Ferret via the > acts_as_ferret Rails plugin. > > I have a problem with small words, especially when I search for them > between quotes. For example, I have indexed the following sentence: > "e-commerce growth strategy for a major business to leverage key > intangible assets" > > When I search for the sentence '"for a"' (not just 'for AND a' but the > sentence "for a"), I don't get any results. Hi Maxime, It's not the length of the words that is the problem. If you did a search for "cat" it would find it. The problem is that the default analyzer which you are using removes common stop-words like "and", "the", "a" and "for". You can create a StandardAnalyzer that doesn't remove stopwords like this; include Ferret::Index include Ferret::Analysis index = Index.new(:analyzer => StandardAnalyzer.new([])) > Is there a way to impose to > Ferret to return results _strictly_ containing certain words (i.e. exact > results, not approximate results) ? I'm not sure what you mean here. Can you give me an example where Ferret returns approximate results? > I am also experiencing problems with words containing special characters > (especially words separated with dashes). Is there a way to send a raw > query to Ferret without having to escape the special characters ? words separated by dashes are treated as single words by the current StandardAnalyzer but that will change in version 0.10.3. Here is an example; require 'rubygems' require 'ferret' index = Ferret::I.new(:analyzer => Ferret::Analysis::StandardAnalyzer.new([])) index << "e-commerce growth strategy for a major business to leverage key intangible assets" puts index.search("e-commerce") puts index.search("commerce") puts index.search("for a") Currently the search for "commerce" won't return any results. In version 0.10.3 both "e-commerce" and "commerce" and "e" for that matter will find the document. > Thank you for your help, > Maxime Curioni From Clare at nospam.com Fri Sep 8 03:40:51 2006 From: Clare at nospam.com (Clare) Date: Fri, 8 Sep 2006 09:40:51 +0200 Subject: [Ferret-talk] Performance Testing +counting occurences of words in the res In-Reply-To: References: <20060907151028.GC23939@cordoba.webit.de> <67baa891c671463028dc8779eeabda23@ruby-forum.com> <9cd1b5b381a560571db0acdcee558142@ruby-forum.com> <74bffe861d0b6bcb15632e0e538f2a87@ruby-forum.com> Message-ID: <2cf01a572220d3559ab73a06ffc06624@ruby-forum.com> Thanks David I will try both options. I am infact doing some performance testing now. I have created 100,000 search result set and it takes around 5 seconds (end to end) on my internal server to be returned (with 1 user). I am only doing 6 significant searches on this set. One for the main results and one for the top level categories. This is only on my test server and not in the larger production server and I am happy with this performance. If however I were to do my second level category search that has around 40 nodes in it, that would be 30 searches. I am not sure how this would perform. What I am seeing is CPU hungry search but not memory hungry. This makes sense to me. Q - I have test data set up in my tests that has some random junk in and then a word such as "fish" at the end of it. I am starting to think that I may have set up the test data wrong and should use a lot of different words in the result set because I am sure that Ferret will cache the search. This would give me a false impression on speed of search. I will create more test data however at the weekend but my instinct is that your method outlined above may be faster. I have 5 top level categories and this will not change much. Depending on the search there were be a lot more results in one category that the rest after the initial search. Drilling into the second level categories, the most nodes I have in a single second level category is around 40 at the moment although this is likely to be added to over time. The resuls again will not be normally distributed over the results set but assuming for now that they were and I had 500,000 records, and drilled into the second tier category structure I would have 100,000 records in this category. I would be doing 40 searches over 100,000 records. Q - What do you think will perform faster in this instance? I would love to have the time to build a x-dimensional memory resident result (bucket set) that kept all the results parameterised for all the categories, built at the initial time of the search. Would be memory hungry but would make searching through categories and nodes and parameters in subsequent searches lightening fast. Would be a great addition or am I missing something? I am really interested in the performance testing scenarios. As stated above, I only have one word "FISH" in my test data with random made up beforehand. e.g. "sadssderssdaatg FISH" etc. Q - Would I be better using more words in my test data? Also - I am interested in the round trip performance of search. The length of time it takes from when the user clicks on search and gets the results back. I will do this on the production server in the production environment. My rule of thumb is that it should not take longer than 8 seconds to return the results or the user will refresh (even worse for performance). With one user on my test system with 6 searches over 100,000 records it takes 5 seconds at the moment. I am expecting a large number of concurrent searches happening. I am defining concurrency as someone searching at the same time as another user is either searching or waiting for the results to be returned. Most testing tools that I can see only show you what is happening on the server. I am interested from the users perspective. I had a thought of setting up a script that would open a number of browser sessions and doing random searches concurrently and hammering the server to see when it 1) breaks search, 2) breaks something else 3) search goes over the 8 second limit. Q - does anyone have any experience in this area. Even better does anyone have a script to do this? If not, and I do write a script to do this would this be of value to the greater community? Sorry for the long winded post. My search page and category search is the most critical part of my site and I am anal on performance of this because if it does not work then my site will not work. Thanks once again for all your assistance. Sorry for any stupid or ignorant thoughts/remarks. Ferret rocks! Clare -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Fri Sep 8 03:43:44 2006 From: kraemer at webit.de (Jens Kraemer) Date: Fri, 8 Sep 2006 09:43:44 +0200 Subject: [Ferret-talk] tweaking minimum word length? In-Reply-To: References: <64E8273D-D7CC-4DE9-8BA4-CB50DAF4D123@fhwang.net> Message-ID: <20060908074344.GD23939@cordoba.webit.de> On Fri, Sep 08, 2006 at 03:24:27PM +0900, David Balmain wrote: > On 9/8/06, Maxime Curioni wrote: > > Hello Dave, > > Sorry for responding so late. I am actually using Ferret via the > > acts_as_ferret Rails plugin. > > > > I have a problem with small words, especially when I search for them > > between quotes. For example, I have indexed the following sentence: > > "e-commerce growth strategy for a major business to leverage key > > intangible assets" > > > > When I search for the sentence '"for a"' (not just 'for AND a' but the > > sentence "for a"), I don't get any results. > > Hi Maxime, > It's not the length of the words that is the problem. If you did a > search for "cat" it would find it. The problem is that the default > analyzer which you are using removes common stop-words like "and", > "the", "a" and "for". You can create a StandardAnalyzer that doesn't > remove stopwords like this; > > include Ferret::Index > include Ferret::Analysis > > index = Index.new(:analyzer => StandardAnalyzer.new([])) or, with aaf: acts_as_ferret :analyzer => StandardAnalyzer.new([]) Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From dbalmain.ml at gmail.com Fri Sep 8 04:49:31 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Fri, 8 Sep 2006 17:49:31 +0900 Subject: [Ferret-talk] Performance Testing +counting occurences of words in the res In-Reply-To: <2cf01a572220d3559ab73a06ffc06624@ruby-forum.com> References: <20060907151028.GC23939@cordoba.webit.de> <67baa891c671463028dc8779eeabda23@ruby-forum.com> <9cd1b5b381a560571db0acdcee558142@ruby-forum.com> <74bffe861d0b6bcb15632e0e538f2a87@ruby-forum.com> <2cf01a572220d3559ab73a06ffc06624@ruby-forum.com> Message-ID: On 9/8/06, Clare wrote: > Thanks David > > I will try both options. I am infact doing some performance testing now. > I have created 100,000 search result set and it takes around 5 seconds > (end to end) on my internal server to be returned (with 1 user). I am > only doing 6 significant searches on this set. One for the main results > and one for the top level categories. This is only on my test server and > not in the larger production server and I am happy with this > performance. If however I were to do my second level category search > that has around 40 nodes in it, that would be 30 searches. I am not sure > how this would perform. > > What I am seeing is CPU hungry search but not memory hungry. This makes > sense to me. > > Q - I have test data set up in my tests that has some random junk in and > then a word such as "fish" at the end of it. I am starting to think that > I may have set up the test data wrong and should use a lot of different > words in the result set because I am sure that Ferret will cache the > search. This would give me a false impression on speed of search. Firstly, searches don't get cached. Only filters do. If you want to cache the results from a query (which you would in this instance) then you should use a QueryFilter. Secondly, I'm not sure exactly what you are saying when you say your tests have some random junk and then the word "fish"? If you are putting data like this into every document; index << "asdlgkjhasd askdj asdg asdg asdg asdg lkjh asd fish" Then you probably should work on your test data. As far as search perfomance goes, this will be no different to doing this; index << "fish" What is important is to remember that TermQueries (fish) perform a lot better than BooleanQueries (fish AND rod) and PhraseQueries ("fishing rod") which perform better again than WildCardQueries (fi*) so you should try these queries too. Here is a much better way to create random strings; WORDS = %w{one two three} def random_sentence(min_size, max_size) len = min_size + rand(max_size - min_size) sentence = [] len.times {sentence << WORDS[Math.sqrt(rand(WORDS.size * WORDS.size))]} sentence.join(" ") end 10.times { puts random_sentence(10, 100) } The Math.sqrt stuff makes sure that words aren't evenly distributed to be more realistic. Words appearing later in the WORDS array will be much more common. Even better than this would be to use a copy of the real data that you will be using though. > I will create more test data however at the weekend but my instinct is > that your method outlined above may be faster. > > I have 5 top level categories and this will not change much. Depending > on the search there were be a lot more results in one category that the > rest after the initial search. > > Drilling into the second level categories, the most nodes I have in a > single second level category is around 40 at the moment although this is > likely to be added to over time. The resuls again will not be normally > distributed over the results set but assuming for now that they were and > I had 500,000 records, and drilled into the second tier category > structure I would have 100,000 records in this category. I would be > doing 40 searches over 100,000 records. > > Q - What do you think will perform faster in this instance? Impossible to say without testing. Both methods are pretty simple though so I'd try both with a variety of search strings. > I would love to have the time to build a x-dimensional memory resident > result (bucket set) that kept all the results parameterised for all the > categories, built at the initial time of the search. Would be memory > hungry but would make searching through categories and nodes and > parameters in subsequent searches lightening fast. > > Would be a great addition or am I missing something? As far as I'm concerned this functionality is already there with the filter_proc parameter. Make it any less general than this and it isn't much use anymore. For example; require 'rubygems' require 'ferret' include Ferret index = I.new words = %w{one two three four five} 100000.times do |i| index << {:id => "%05d" % i, :word => words[rand(words.size)]} end groups = {} filter_proc = lambda do |doc, score, searcher| word = searcher[doc][:word] (groups[word]||=[]) << doc end resultset = index.search("id:[09900 10000}", :limit => 1, :filter_proc => filter_proc) puts resultset.total_hits puts groups.inspect puts groups["two"].size I really can't see how you could make it any easier than that. > I am really interested in the performance testing scenarios. As stated > above, I only have one word "FISH" in my test data with random made up > beforehand. e.g. "sadssderssdaatg FISH" etc. > > Q - Would I be better using more words in my test data? See above. > Also - I am interested in the round trip performance of search. The > length of time it takes from when the user clicks on search and gets the > results back. I will do this on the production server in the production > environment. My rule of thumb is that it should not take longer than 8 > seconds to return the results or the user will refresh (even worse for > performance). With one user on my test system with 6 searches over > 100,000 records it takes 5 seconds at the moment. 5 seconds seems like a long time. Try optimizing your index and see how you go then. The example above took 0.028109 seconds. Personally, I would be worried about anything taking over 1 second which was the whole reason I wrote Ferret in C. > I am expecting a large number of concurrent searches happening. I am > defining concurrency as someone searching at the same time as another > user is either searching or waiting for the results to be returned. > > Most testing tools that I can see only show you what is happening on the > server. I am interested from the users perspective. > > I had a thought of setting up a script that would open a number of > browser sessions and doing random searches concurrently and hammering > the server to see when it 1) breaks search, 2) breaks something else 3) > search goes over the 8 second limit. > > Q - does anyone have any experience in this area. Even better does > anyone have a script to do this? If not, and I do write a script to do > this would this be of value to the greater community? If I were you, I'd test plain old search performance before I tested performance through a browser. And, again, it is pretty hard to generalize a script like this since so many people have different search needs. In my opinion, Ruby makes it easy enough to write this from scratch each time. > Sorry for the long winded post. My search page and category search is > the most critical part of my site and I am anal on performance of this > because if it does not work then my site will not work. > > Thanks once again for all your assistance. Sorry for any stupid or > ignorant thoughts/remarks. > > Ferret rocks! You're welcome, Dave > Clare > > > -- > Posted via http://www.ruby-forum.com/. > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > From wiseleyb at gmail.com Fri Sep 8 09:38:10 2006 From: wiseleyb at gmail.com (ben) Date: Fri, 8 Sep 2006 15:38:10 +0200 Subject: [Ferret-talk] invalid characters with win32 In-Reply-To: References: Message-ID: <495689e017e24d2ae399a86818ce882f@ruby-forum.com> David Balmain wrote: > I'll puts out Ferret 0.10.3 today and I'll > compile it against 1.8.5. Hopefully that will help. I updated to 1.8.5 this morning. Any chance you put out 0.10.3? -ben -- Posted via http://www.ruby-forum.com/. From mxcurioni at yahoo.com Fri Sep 8 11:00:28 2006 From: mxcurioni at yahoo.com (Maxime Curioni) Date: Fri, 8 Sep 2006 17:00:28 +0200 Subject: [Ferret-talk] tweaking minimum word length? In-Reply-To: References: <64E8273D-D7CC-4DE9-8BA4-CB50DAF4D123@fhwang.net> Message-ID: <392ea24a45e8f9cc99bbf655cd6cd67c@ruby-forum.com> Hello David and Jens, I cannot thank you enough for your prompt answers. I had quickly browsed through both Ferret and aaf APIs but being short on schedule, I did not really have time to dive in the technology. I have successfully used aaf and Ferret out out the box for my product, thanks to your work and the Rails environment. I am realizing now that if I had read the documentation (especially about Ferret analyzers), I could have saved some of your time... so thanks a lot ! I now understand about Ferret parsing the query for common words. I will use the basic analyzer that you provided me with. Regarding the "approximate results", after what you have told me, it makes more sense: the record "Defining an e-commerce growth strategy for a major business" would be matched by both '"Defining an"' and '"Defining as"'. I thought that Ferret would match 'approximate results', considering that those queries were somehow close enough to return the previous record as a valid result for both of them. I understand that "an" and "as" are considered common words and Ferret removes them, therefore giving the results of the '"Defining"' query. I understand that the feature I am looking for (matching words separated with dashes) will be available in the next released version: - what can I do, in the meantime, to match those words ? Do I need to write an ad hoc analyzer ? Could you tell me the list of the "special characters" ? - when do you estimate that 0.10.3 will be released ? Having to deliver my product soon, I was wondering if that version would make it into my work. Thank you again for your time and your help. Regards. Maxime Curioni -- Posted via http://www.ruby-forum.com/. From wiseleyb at gmail.com Fri Sep 8 12:02:22 2006 From: wiseleyb at gmail.com (ben) Date: Fri, 8 Sep 2006 18:02:22 +0200 Subject: [Ferret-talk] invalid characters with win32 In-Reply-To: <495689e017e24d2ae399a86818ce882f@ruby-forum.com> References: <495689e017e24d2ae399a86818ce882f@ruby-forum.com> Message-ID: fyi: if anyone's interested... got Mongrel up and running this morning and Mongrel + Ruby 1.8.5 + Ferret 0.10.2 = no joy, same wierd parsing errors. So it doesn't appear to be a webbrick thing at least. I tried this because some of the posts above mentioned that using apache would solve this issue... didn't try that yet though (but was hoping simple mongrel might solve it). -- Posted via http://www.ruby-forum.com/. From dbalmain.ml at gmail.com Fri Sep 8 12:46:00 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Sat, 9 Sep 2006 01:46:00 +0900 Subject: [Ferret-talk] [ANN] Ferret-0.10.3 released Message-ID: Hey all, I've just released Ferret 0.10.3. It is mostly just a bugfix release. I've also added Ferret::Analysis::HyphenFilter which filters hyphenated words for better search results. Basically the way it works is it concatenates a hyphenated word as well as adding the two separate terms. So "set-up" becomes "setup", "set", "up" so searches for "set-up", "setup" or just plain "set" will all match. This filter is also applied to the StandardAnalyzer. I've also made the process_query method in Index::Index public. Before anyone asks, the reason it is process_query and not parse_query is that it accepts strings or query objects and may also optimize queries in future. Have at it, Dave From wiseleyb at gmail.com Fri Sep 8 13:48:46 2006 From: wiseleyb at gmail.com (ben) Date: Fri, 8 Sep 2006 19:48:46 +0200 Subject: [Ferret-talk] [ANN] Ferret-0.10.3 released In-Reply-To: References: Message-ID: Dave, Got a ERROR: While executing gem ... (Zlib::BufError) buffer error when trying to gem install 0.10.3 mswin32 on xp on ruby 1.8.5 The ruby option failed as well but I think that's expected. Sorry to critize - I know how much it sucks to work your ass of on stuff and receive bug reports. -ben -- Posted via http://www.ruby-forum.com/. From srackham at methods.co.nz Fri Sep 8 20:52:40 2006 From: srackham at methods.co.nz (Stuart Rackham) Date: Sat, 9 Sep 2006 02:52:40 +0200 Subject: [Ferret-talk] search_each segmentation fault and parser anomoly Message-ID: <9833bfe8c00b852f2cde71324e3f37fc@ruby-forum.com> The included test script turned up the following anomolies (run against Ferret 0.10.3, but had same problems with 0.10.2): 1. When the content word is not in the index the inclusion of a wildcard file term causes search_each to throw a segmentation fault. $ ./test.rb zzz file:*.txt query: +content:zzz +file:*.txt ./test.rb:28: [BUG] Segmentation fault ruby 1.8.4 (2005-12-24) [i486-linux] Aborted 2. When the file query term is file:* wildcard the parser translates it to +* instead of +file:* $ ./test.rb one file:* query: +content:one +* file: f1.txt Am I missing something here? Cheers Stuart -- Stuart Rackham -----------BEGIN SCRIPT---------------- #!/usr/bin/env ruby require 'rubygems' require 'ferret' include Ferret path = '/tmp/test_index' index = Index::IndexWriter.new(:create => true, :path => path) index.field_infos.add_field(:file, :store => :yes, :index => :untokenized) index.field_infos.add_field(:content, :store => :no, :index => :yes) index << {:content => 'one', :file => 'f1.txt'} index << {:content => 'two', :file => 'f2.txt'} index << {:content => 'three', :file => 'f3.txt'} index << {:content => 'four', :file => 'f4.txt'} index << {:content => 'five', :file => 'f5.txt'} index.optimize index.close query_parser = QueryParser.new({:default_field => :content, :or_default => false, }) query = query_parser.parse(ARGV.join(' ')) puts "query: #{query}" searcher = Search::Searcher.new(path) searcher.search_each(query) do |doc, score| puts "file: #{searcher[doc][:file]}" end -------------END SCRIPT---------------- -- Posted via http://www.ruby-forum.com/. From dbalmain.ml at gmail.com Fri Sep 8 21:53:33 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Sat, 9 Sep 2006 10:53:33 +0900 Subject: [Ferret-talk] [ANN] Ferret-0.10.3 released In-Reply-To: References: Message-ID: On 9/9/06, ben wrote: > Dave, > > Got a > > ERROR: While executing gem ... (Zlib::BufError) > buffer error > > when trying to gem install 0.10.3 mswin32 on xp on ruby 1.8.5 > > The ruby option failed as well but I think that's expected. > > Sorry to critize - I know how much it sucks to work your ass of on stuff > and receive bug reports. > > -ben Hi Ben, I don't what the problem is and I'm afraid I don't have time to fix it right now. I rolled back to the latest stable release of the one-click installer (1.8.4-20 stable). Sorry about the trouble. Let me know if this fixes the earlier problem (with the rails view rendering). From dbalmain.ml at gmail.com Fri Sep 8 22:59:11 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Sat, 9 Sep 2006 11:59:11 +0900 Subject: [Ferret-talk] search_each segmentation fault and parser anomoly In-Reply-To: <9833bfe8c00b852f2cde71324e3f37fc@ruby-forum.com> References: <9833bfe8c00b852f2cde71324e3f37fc@ruby-forum.com> Message-ID: On 9/9/06, Stuart Rackham wrote: > The included test script turned up the following anomolies (run > against Ferret 0.10.3, but had same problems with 0.10.2): > > 1. When the content word is not in the index the inclusion of a > wildcard file term causes search_each to throw a segmentation > fault. > > $ ./test.rb zzz file:*.txt > query: +content:zzz +file:*.txt > ./test.rb:28: [BUG] Segmentation fault > ruby 1.8.4 (2005-12-24) [i486-linux] > > Aborted Thanks Stuart. This is fixed in subversion. I'll put another gem out ASAP. > 2. When the file query term is file:* wildcard the parser > translates it to +* instead of +file:* > > $ ./test.rb one file:* > query: +content:one +* > file: f1.txt > > Am I missing something here? "*" matches everything including empty strings. So basically it will match documents that don't even contain the :file field. I've therefore optimized it to a MatchAllQuery. Before doing this "*" was pretty much unusable in large indexes since it would create a massive MultiTermQuery with every term in the index (as long as you set the :max_clauses parameter of QueryParser large enough to accept them all). Now, if you do need to only match documents that contain the desired field you can do it like this; $ ./test.rb one file:?* Hope that makes sense. Dave From srackham at methods.co.nz Sat Sep 9 00:25:04 2006 From: srackham at methods.co.nz (Stuart Rackham) Date: Sat, 9 Sep 2006 06:25:04 +0200 Subject: [Ferret-talk] search_each segmentation fault and parser anomoly In-Reply-To: References: <9833bfe8c00b852f2cde71324e3f37fc@ruby-forum.com> Message-ID: David Balmain wrote: > Hope that makes sense. > > Dave Yes it does, thanks for the fast response. Stuart -- Posted via http://www.ruby-forum.com/. From jan.prill at gmail.com Sat Sep 9 05:27:51 2006 From: jan.prill at gmail.com (Jan Prill) Date: Sat, 9 Sep 2006 11:27:51 +0200 Subject: [Ferret-talk] [ANN] Ferret-0.10.3 released In-Reply-To: References: Message-ID: <562a35c10609090227x15e166f5k8de9d137fdccd835@mail.gmail.com> Hi, same Problem here on - Windows 2003 Server - ruby 1.8.4 (2005-12-24) [i386-mswin32] - ferret 0.10.3 (mswin32) ferret 0.10.4 has no mswin32 version - at least on the gem server I'm connected to right now... Cheers, Jan -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20060909/4362393a/attachment.html From kraemer at webit.de Sat Sep 9 06:00:10 2006 From: kraemer at webit.de (Jens Kraemer) Date: Sat, 9 Sep 2006 12:00:10 +0200 Subject: [Ferret-talk] [ANN] Ferret-0.10.3 released In-Reply-To: <562a35c10609090227x15e166f5k8de9d137fdccd835@mail.gmail.com> References: <562a35c10609090227x15e166f5k8de9d137fdccd835@mail.gmail.com> Message-ID: <20060909100010.GA14913@cordoba.webit.de> On Sat, Sep 09, 2006 at 11:27:51AM +0200, Jan Prill wrote: > Hi, > > same Problem here on > > - Windows 2003 Server > - ruby 1.8.4 (2005-12-24) [i386-mswin32] > - ferret 0.10.3 (mswin32) > > ferret 0.10.4 has no mswin32 version - at least on the gem server I'm > connected to right now... the 0.10.4 installs itself as ferret-0.1.4 on linux, I guess something's broken there. 0.10.3 works fine here (Linux). Jens > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From dbalmain.ml at gmail.com Sat Sep 9 11:43:23 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Sun, 10 Sep 2006 00:43:23 +0900 Subject: [Ferret-talk] [ANN] Ferret-0.10.3 released In-Reply-To: <20060909100010.GA14913@cordoba.webit.de> References: <562a35c10609090227x15e166f5k8de9d137fdccd835@mail.gmail.com> <20060909100010.GA14913@cordoba.webit.de> Message-ID: On 9/9/06, Jens Kraemer wrote: > On Sat, Sep 09, 2006 at 11:27:51AM +0200, Jan Prill wrote: > > Hi, > > > > same Problem here on > > > > - Windows 2003 Server > > - ruby 1.8.4 (2005-12-24) [i386-mswin32] > > - ferret 0.10.3 (mswin32) > > > > ferret 0.10.4 has no mswin32 version - at least on the gem server I'm > > connected to right now... > > the 0.10.4 installs itself as ferret-0.1.4 on linux, I guess something's > broken there. 0.10.3 works fine here (Linux). > > Jens Anyone else having this problem? It's fine here. I have no idea what might be causing this. Dave From ksruby at gmail.com Sat Sep 9 11:52:40 2006 From: ksruby at gmail.com (Kent Sibilev) Date: Sat, 9 Sep 2006 11:52:40 -0400 Subject: [Ferret-talk] [ANN] Ferret-0.10.3 released In-Reply-To: References: <562a35c10609090227x15e166f5k8de9d137fdccd835@mail.gmail.com> <20060909100010.GA14913@cordoba.webit.de> Message-ID: <477eb2b30609090852o3c4da23fu3759d5ea1bb04a9d@mail.gmail.com> On 9/9/06, David Balmain wrote: > On 9/9/06, Jens Kraemer wrote: > > On Sat, Sep 09, 2006 at 11:27:51AM +0200, Jan Prill wrote: > > > Hi, > > > > > > same Problem here on > > > > > > - Windows 2003 Server > > > - ruby 1.8.4 (2005-12-24) [i386-mswin32] > > > - ferret 0.10.3 (mswin32) > > > > > > ferret 0.10.4 has no mswin32 version - at least on the gem server I'm > > > connected to right now... > > > > the 0.10.4 installs itself as ferret-0.1.4 on linux, I guess something's > > broken there. 0.10.3 works fine here (Linux). > > > > Jens > > Anyone else having this problem? It's fine here. I have no idea what > might be causing this. No problems here. -- Kent --- http://www.datanoise.com From kraemer at webit.de Sat Sep 9 11:56:55 2006 From: kraemer at webit.de (Jens Kraemer) Date: Sat, 9 Sep 2006 17:56:55 +0200 Subject: [Ferret-talk] [ANN] Ferret-0.10.3 released In-Reply-To: References: <562a35c10609090227x15e166f5k8de9d137fdccd835@mail.gmail.com> <20060909100010.GA14913@cordoba.webit.de> Message-ID: <20060909155655.GB22193@cordoba.webit.de> On Sun, Sep 10, 2006 at 12:43:23AM +0900, David Balmain wrote: [..] > > the 0.10.4 installs itself as ferret-0.1.4 on linux, I guess something's > > broken there. 0.10.3 works fine here (Linux). > > > > Jens > > Anyone else having this problem? It's fine here. I have no idea what > might be causing this. works now. but I swear it didn't when I first tried ;-) btw, there's an issue with the (now public) process_query method: It doesn't call ensure_reader_open, but it should. The following crashes things atm: require 'rubygems' require 'ferret' i = Ferret::I.new i << 'testing' i.process_query 'test*' doing a search before the call to process_query, or invoking ensure_reader_open via send() helps, but isn't so nice ;-) I think now that it's public, the whole process_query has to get it's own @dir.synchronize block, too... I've worked around this for now in aaf, and will push out the first stable, 0.10.x compatible version this evening. cheers, Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From dbalmain.ml at gmail.com Sat Sep 9 12:36:24 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Sun, 10 Sep 2006 01:36:24 +0900 Subject: [Ferret-talk] [ANN] Ferret-0.10.3 released In-Reply-To: <20060909155655.GB22193@cordoba.webit.de> References: <562a35c10609090227x15e166f5k8de9d137fdccd835@mail.gmail.com> <20060909100010.GA14913@cordoba.webit.de> <20060909155655.GB22193@cordoba.webit.de> Message-ID: On 9/10/06, Jens Kraemer wrote: > On Sun, Sep 10, 2006 at 12:43:23AM +0900, David Balmain wrote: > [..] > > > the 0.10.4 installs itself as ferret-0.1.4 on linux, I guess something's > > > broken there. 0.10.3 works fine here (Linux). > > > > > > Jens > > > > Anyone else having this problem? It's fine here. I have no idea what > > might be causing this. > > works now. but I swear it didn't when I first tried ;-) Hey, I know the feeling. :-) > btw, there's an issue with the (now public) process_query method: It > doesn't call ensure_reader_open, but it should. The following crashes > things atm: > > require 'rubygems' > require 'ferret' > i = Ferret::I.new > i << 'testing' > i.process_query 'test*' > > doing a search before the call to process_query, or invoking > ensure_reader_open via send() helps, but isn't so nice ;-) I think now > that it's public, the whole process_query has to get it's own > @dir.synchronize block, too... > > I've worked around this for now in aaf, and will push out the first > stable, 0.10.x compatible version this evening. > > > cheers, > Jens Right you are. The fix is in subversion. Thanks again Jens. From kraemer at webit.de Sat Sep 9 13:28:06 2006 From: kraemer at webit.de (Jens Kraemer) Date: Sat, 9 Sep 2006 19:28:06 +0200 Subject: [Ferret-talk] [ANN] acts_as_ferret 0.3.0 Message-ID: <20060909172806.GA21883@cordoba.webit.de> Hi, just wanted to officially announce the release of acts_as_Ferret 0.3.0. As you see, I'm trying to catch up with Ferret's version numbers ;-) svn://projects.jkraemer.net/acts_as_ferret/tags/0.3.0/ or svn://projects.jkraemer.net/acts_as_ferret/tags/stable/ This release is now tagged stable, so in case anybody has used the old stable release via an svn external, please switch over to the 0.2.3 tag at svn://projects.jkraemer.net/acts_as_ferret/tags/0.2.3/ Besides the switch to the shiny new 0.10 API, there are two new features: - Match highlighting through Ferret's highlight method, with even more convenience added ;-) highlight = record.highlight('test') will by default scan all stored fields of this record for highlight matches. However this can be turned off on the field level in case you don't want to get highlights out of a field you stored for other reasons by specifying :highlight => :no as an option to acts_as_ferret. This method takes the same options as the Ferret::Index::Index#highlight one. If you only want to get highlights from exactly one field, just specify it with the :field option. - It's now possible to turn off the automatic indexing for the next or all following calls to #save. You can even give a block that should be executed without indexing, and optionally have aaf index your record after the whole block is finished. Please see http://projects.jkraemer.net/acts_as_ferret/wiki/AdvancedUsage for usage notes. - no new feature, but didn't work until now with 0.10.x - the more_like_this instance method now works as expected again. have fun! Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From ksruby at gmail.com Sat Sep 9 16:58:09 2006 From: ksruby at gmail.com (Kent Sibilev) Date: Sat, 9 Sep 2006 16:58:09 -0400 Subject: [Ferret-talk] Per field analyzer Message-ID: <477eb2b30609091358v5fb10632x86b674d370c5f3f5@mail.gmail.com> Is there a way to add per-field analyzer? I can't seem to find a way to do that. Thanks -- Kent --- http://www.datanoise.com From dbalmain.ml at gmail.com Sat Sep 9 22:08:53 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Sun, 10 Sep 2006 11:08:53 +0900 Subject: [Ferret-talk] Per field analyzer In-Reply-To: <477eb2b30609091358v5fb10632x86b674d370c5f3f5@mail.gmail.com> References: <477eb2b30609091358v5fb10632x86b674d370c5f3f5@mail.gmail.com> Message-ID: On 9/10/06, Kent Sibilev wrote: > Is there a way to add per-field analyzer? I can't seem to find a way to do that. > > Thanks > > -- > Kent Hi Kent, I'm not sure if you mean add *to* a PerFieldAnalyzer or add a PerFieldAnalyzer to and Index. Here is how you do both; include Ferret::Analysis pfa = PerFieldAnalyzer.new(StandardAnalyzer.new()) pfa['white'] = WhiteSpaceAnalyzer.new(false) pfa['white_l'] = WhiteSpaceAnalyzer.new(true) pfa['letter'] = LetterAnalyzer.new(false) pfa.add_field('letter', LetterAnalyzer.new(true)) pfa.add_field('letter_u', LetterAnalyzer.new(false)) index = Ferret::Index::Index.new(:analyzer => pfa) I hope that's what you were asking. Dave From ksruby at gmail.com Sat Sep 9 23:51:51 2006 From: ksruby at gmail.com (Kent Sibilev) Date: Sat, 9 Sep 2006 23:51:51 -0400 Subject: [Ferret-talk] Per field analyzer In-Reply-To: References: <477eb2b30609091358v5fb10632x86b674d370c5f3f5@mail.gmail.com> Message-ID: <477eb2b30609092051j2ff9202ct8ded51783449fa6@mail.gmail.com> On 9/9/06, David Balmain wrote: > On 9/10/06, Kent Sibilev wrote: > > Is there a way to add per-field analyzer? I can't seem to find a way to do that. > > > > Thanks > > > > -- > > Kent > > Hi Kent, > > I'm not sure if you mean add *to* a PerFieldAnalyzer or add a > PerFieldAnalyzer to and Index. Here is how you do both; > > include Ferret::Analysis > pfa = PerFieldAnalyzer.new(StandardAnalyzer.new()) > pfa['white'] = WhiteSpaceAnalyzer.new(false) > pfa['white_l'] = WhiteSpaceAnalyzer.new(true) > pfa['letter'] = LetterAnalyzer.new(false) > pfa.add_field('letter', LetterAnalyzer.new(true)) > pfa.add_field('letter_u', LetterAnalyzer.new(false)) > > index = Ferret::Index::Index.new(:analyzer => pfa) > > I hope that's what you were asking. > Perfect. That's exactly what I've been looking for. Now I see it in test cases, but rdoc lacks any the information about analyzers. -- Kent --- http://www.datanoise.com From dbalmain.ml at gmail.com Sun Sep 10 00:54:30 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Sun, 10 Sep 2006 13:54:30 +0900 Subject: [Ferret-talk] Per field analyzer In-Reply-To: <477eb2b30609092051j2ff9202ct8ded51783449fa6@mail.gmail.com> References: <477eb2b30609091358v5fb10632x86b674d370c5f3f5@mail.gmail.com> <477eb2b30609092051j2ff9202ct8ded51783449fa6@mail.gmail.com> Message-ID: On 9/10/06, Kent Sibilev wrote: > On 9/9/06, David Balmain wrote: > > On 9/10/06, Kent Sibilev wrote: > > > Is there a way to add per-field analyzer? I can't seem to find a way to do that. > > > > > > Thanks > > > > > > -- > > > Kent > > > > Hi Kent, > > > > I'm not sure if you mean add *to* a PerFieldAnalyzer or add a > > PerFieldAnalyzer to and Index. Here is how you do both; > > > > include Ferret::Analysis > > pfa = PerFieldAnalyzer.new(StandardAnalyzer.new()) > > pfa['white'] = WhiteSpaceAnalyzer.new(false) > > pfa['white_l'] = WhiteSpaceAnalyzer.new(true) > > pfa['letter'] = LetterAnalyzer.new(false) > > pfa.add_field('letter', LetterAnalyzer.new(true)) > > pfa.add_field('letter_u', LetterAnalyzer.new(false)) > > > > index = Ferret::Index::Index.new(:analyzer => pfa) > > > > I hope that's what you were asking. > > > > Perfect. That's exactly what I've been looking for. Now I see it in > test cases, but rdoc lacks any the information about analyzers. Whoops. I wrote my own frt_define_class_under to replace rb_define_class_under and rdoc was no longer picking up the documentation. It's fixed now and will be out in the next release. In the meantime you can see the full documentation here; http://ferret.davebalmain.com/api/ Cheers, Dave From srackham at methods.co.nz Sun Sep 10 02:18:32 2006 From: srackham at methods.co.nz (Stuart Rackham) Date: Sun, 10 Sep 2006 08:18:32 +0200 Subject: [Ferret-talk] [ANN] ff-1.1.0 Message-ID: <4d039cc317ad12ee9913a51ab9952d7a@ruby-forum.com> I've updated my ff document index/search utility so it now runs under Ferret 0.10.x. ff is a simple *NIX command-line utility that indexes and searches document files using Ferret, it can be found at http://www.methods.co.nz/ff/ Cheers, Stuart -- Stuart Rackham -- Posted via http://www.ruby-forum.com/. From Neville.Burnell at bmsoft.com.au Sun Sep 10 19:40:18 2006 From: Neville.Burnell at bmsoft.com.au (Neville Burnell) Date: Mon, 11 Sep 2006 09:40:18 +1000 Subject: [Ferret-talk] Help with Multiple Readers, 1 Writer scenario Message-ID: <126EC586577FD611A28E00A0C9A03758B5BCBC@maui.bmsoft.com.au> > It's unusual, however for a search to take long > enough to be a problem though. What kind of search > is it? Actually I'm misleading you. The searches are very fast, ie, 0.1 sec or faster on my 30,000 doc index. By "slow query" I really mean my "#search_each do" which fetches each doc from the index and appends it to an xml or html response. This is clearly not a Ferret issue I think. Thanks for all your help Dave, Regards Neville From david.wennergren at gmail.com Mon Sep 11 03:48:58 2006 From: david.wennergren at gmail.com (David Wennergren) Date: Mon, 11 Sep 2006 09:48:58 +0200 Subject: [Ferret-talk] Using a datefilter with caching Message-ID: <6c2537955efa3cac93ac19ecd32b0718@ruby-forum.com> I'm trying to use a DateFilter to speed up some searches. The situation is that a have an index with 200 000+ documents and I want to run a few thousand alerts (basically stored searches) on only a small portion of the index (documents added the last hour). Is datefilter the best solution for the situation described above? I assumed that the datefilter would be cached in the same way as a QueryFilter. But as far as I can tell from my tests that isn't the case... I read an example in the "Lucene in action"-book that seemed to suggest to use a CachingWrapperFilter in combination with a DateFilter to get the benefits of caching. How is this done in Ferret? I'm using ferret 0.9.6 and acts_as_ferret Thanks a lot! /David Wennergren -- Posted via http://www.ruby-forum.com/. From asbradbury at tekcentral.org Mon Sep 11 05:33:40 2006 From: asbradbury at tekcentral.org (A. S. Bradbury) Date: Mon, 11 Sep 2006 10:33:40 +0100 Subject: [Ferret-talk] [ANN] ff-1.1.0 In-Reply-To: <4d039cc317ad12ee9913a51ab9952d7a@ruby-forum.com> References: <4d039cc317ad12ee9913a51ab9952d7a@ruby-forum.com> Message-ID: <200609111033.40688.asbradbury@tekcentral.org> On Sunday 10 September 2006 07:18, Stuart Rackham wrote: > I've updated my ff document index/search utility so it now runs under > Ferret > 0.10.x. > > ff is a simple *NIX command-line utility that indexes and > searches document files using Ferret, it can be found at > http://www.methods.co.nz/ff/ This looks very handy indeed, I'll have to check it out. Alex From fsolt at rift.fr Mon Sep 11 06:05:36 2006 From: fsolt at rift.fr (Florent Solt) Date: Mon, 11 Sep 2006 12:05:36 +0200 Subject: [Ferret-talk] Boolean query bug Message-ID: Hello all, At the end of this script : http://pastie.caboo.se/12711, I made two search. They have to return the same result, but the first return 2 matches and the second only one match. The first is clearly buggy. Do you have an idea ? Thanks. PS: I've tested with 0.10.4 -- Posted via http://www.ruby-forum.com/. From dbalmain.ml at gmail.com Mon Sep 11 07:19:58 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Mon, 11 Sep 2006 20:19:58 +0900 Subject: [Ferret-talk] Using a datefilter with caching In-Reply-To: <6c2537955efa3cac93ac19ecd32b0718@ruby-forum.com> References: <6c2537955efa3cac93ac19ecd32b0718@ruby-forum.com> Message-ID: On 9/11/06, David Wennergren wrote: > I'm trying to use a DateFilter to speed up some searches. The situation > is that a have an index with 200 000+ documents and I want to run a few > thousand alerts (basically stored searches) on only a small portion of > the index (documents added the last hour). > > Is datefilter the best solution for the situation described above? > > I assumed that the datefilter would be cached in the same way as a > QueryFilter. But as far as I can tell from my tests that isn't the > case... I read an example in the "Lucene in action"-book that seemed to > suggest to use a CachingWrapperFilter in combination with a DateFilter > to get the benefits of caching. How is this done in Ferret? > > I'm using ferret 0.9.6 and acts_as_ferret > > Thanks a lot! > > /David Wennergren Hi David, Firstly, use a RangeFilter instead of a DateFilter. And, yes, it should be the best solution for the situation described. See the documentation here; http://ferret.davebalmain.com/api/ It is very simple to use. As far as caching goes, all filters in Ferret get cached but the bitvector cache gets garbage collected when the filter is garbage collected. Would it be possible for you to show me what you were doing when the bitvector wasn't getting cached. I'm curious to see how you went about solving the problem. Cheers, Dave From dbalmain.ml at gmail.com Mon Sep 11 07:29:52 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Mon, 11 Sep 2006 20:29:52 +0900 Subject: [Ferret-talk] Boolean query bug In-Reply-To: References: Message-ID: On 9/11/06, Florent Solt wrote: > > Hello all, > > At the end of this script : http://pastie.caboo.se/12711, I made two > search. > They have to return the same result, but the first return 2 matches and > the > second only one match. The first is clearly buggy. > > Do you have an idea ? > > Thanks. > > PS: I've tested with 0.10.4 Thanks Florent. That definitely looks like a bug. I'll fix it now. Cheers, Dave PS: keep the pasties coming. It's so much easier to fix something when I don't have to mess around trying to duplicate the problem here. From dbalmain.ml at gmail.com Mon Sep 11 07:56:41 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Mon, 11 Sep 2006 20:56:41 +0900 Subject: [Ferret-talk] Boolean query bug In-Reply-To: References: Message-ID: On 9/11/06, David Balmain wrote: > On 9/11/06, Florent Solt wrote: > > > > Hello all, > > > > At the end of this script : http://pastie.caboo.se/12711, I made two > > search. > > They have to return the same result, but the first return 2 matches and > > the > > second only one match. The first is clearly buggy. > > > > Do you have an idea ? > > > > Thanks. > > > > PS: I've tested with 0.10.4 > > Thanks Florent. That definitely looks like a bug. I'll fix it now. > > Cheers, > Dave > > PS: keep the pasties coming. It's so much easier to fix something when > I don't have to mess around trying to duplicate the problem here. Found the problem. The fix will be out in the next release. Very nice catch indeed. Cheers, Dave From fsolt at rift.fr Mon Sep 11 08:00:43 2006 From: fsolt at rift.fr (Florent Solt) Date: Mon, 11 Sep 2006 14:00:43 +0200 Subject: [Ferret-talk] Boolean query bug In-Reply-To: References: Message-ID: <14a9c2d21f8303d5b0422edc59003d18@ruby-forum.com> David Balmain wrote: > On 9/11/06, David Balmain wrote: >> > Do you have an idea ? >> PS: keep the pasties coming. It's so much easier to fix something when >> I don't have to mess around trying to duplicate the problem here. > > Found the problem. The fix will be out in the next release. Very nice > catch indeed. > > Cheers, > Dave Thanks a lot Dave. PS: Long life to pasties :) -- Posted via http://www.ruby-forum.com/. From mleung at projectrideme.com Mon Sep 11 10:15:36 2006 From: mleung at projectrideme.com (Michael Leung) Date: Mon, 11 Sep 2006 16:15:36 +0200 Subject: [Ferret-talk] Compilation Errors Message-ID: Hey there, I just updated to ferret 10.0.4, and acts_as_ferret 0.3, and now when I try to run my app with mongrel on Windows, I get very strange compile errors: ompile error C:/rails/app/script/../config/../app/views/login/index.rhtml:8: Invalid char `\002' in expression C:/rails/app/script/../config/../app/views/login/index.rhtml:9: syntax error _erbout.concat " ".; _erbout.concat(( text_field("user", "email") ).to_s); _erbout.concat "\n" ^ C:/rails/app/script/../config/../app/views/login/index.rhtml:12: Invalid char `\003' in expression C:/rails/app/script/../config/../app/views/login/index.rhtml:13: syntax error _erbout.concat " ".; _erbout.concat(( password_field("user", "password") ).to_s); _erbout.concat "\n" ^ C:/rails/app/script/../config/../app/views/login/index.rhtml:17: Invalid char `\024' in expression C:/rails/app/script/../config/../app/views/login/index.rhtml:18: syntax error _erbout.concat " ".; _erbout.concat(( submit_tag "Login" ).to_s); _erbout.concat " \n" It' strange, like somehow odd characters are being inserted into my views or something. Has anyone else come across this? This is true of two different apps, that both use acts_as_ferret. Any project that isn't using it runs fine. Thanks. :) -- Posted via http://www.ruby-forum.com/. From dbalmain.ml at gmail.com Mon Sep 11 10:53:18 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Mon, 11 Sep 2006 23:53:18 +0900 Subject: [Ferret-talk] Compilation Errors In-Reply-To: References: Message-ID: On 9/11/06, Michael Leung wrote: > Hey there, > > I just updated to ferret 10.0.4, and acts_as_ferret 0.3, and now when I > try to run my app with mongrel on Windows, I get very strange compile > errors: > > ompile error > C:/rails/app/script/../config/../app/views/login/index.rhtml:8: Invalid > char `\002' in expression > C:/rails/app/script/../config/../app/views/login/index.rhtml:9: syntax > error > _erbout.concat " ".; _erbout.concat(( text_field("user", "email") > ).to_s); _erbout.concat "\n" > ^ > C:/rails/app/script/../config/../app/views/login/index.rhtml:12: Invalid > char `\003' in expression > C:/rails/app/script/../config/../app/views/login/index.rhtml:13: syntax > error > _erbout.concat " ".; _erbout.concat(( password_field("user", > "password") ).to_s); _erbout.concat "\n" > ^ > C:/rails/app/script/../config/../app/views/login/index.rhtml:17: Invalid > char `\024' in expression > C:/rails/app/script/../config/../app/views/login/index.rhtml:18: syntax > error > _erbout.concat " ".; _erbout.concat(( submit_tag "Login" ).to_s); > _erbout.concat " \n" > > It' strange, like somehow odd characters are being inserted into my > views or something. Has anyone else come across this? This is true of > two different apps, that both use acts_as_ferret. Any project that isn't > using it runs fine. > > Thanks. :) Hi Michael, Please see this thread; http://www.ruby-forum.com/topic/80460#new Apparently replacing all tab characters in your views with spaces helps. Version 0.10.4 was compiled against one-click ruby 1.8.4-20 stable so make sure you are using that version of Ruby. Cheers, Dave From mleung at projectrideme.com Mon Sep 11 11:10:46 2006 From: mleung at projectrideme.com (Michael Leung) Date: Mon, 11 Sep 2006 17:10:46 +0200 Subject: [Ferret-talk] Compilation Errors In-Reply-To: References: Message-ID: <6d799ad3558d96bc85bc3453c464410b@ruby-forum.com> Hey David, Thanks for the reply. It is indeed the tab character causing the issue. My designer has liberally used tab characters throughout all my views. :( David Balmain wrote: > On 9/11/06, Michael Leung wrote: >> error >> C:/rails/app/script/../config/../app/views/login/index.rhtml:17: Invalid >> >> Thanks. :) > > Hi Michael, > > Please see this thread; > > http://www.ruby-forum.com/topic/80460#new > > Apparently replacing all tab characters in your views with spaces > helps. Version 0.10.4 was compiled against one-click ruby 1.8.4-20 > stable so make sure you are using that version of Ruby. > > Cheers, > Dave -- Posted via http://www.ruby-forum.com/. From contact at ezabel.com Mon Sep 11 15:29:18 2006 From: contact at ezabel.com (Ian Zabel) Date: Mon, 11 Sep 2006 21:29:18 +0200 Subject: [Ferret-talk] invalid characters with win32 In-Reply-To: References: Message-ID: The problem I was having has been fixed by replacing tabs with spaces. But, I did try updating to Ferret 0.10.4 and latest aaf trunk, and I was still experiencing the same errors before I removed all tabs. -- Posted via http://www.ruby-forum.com/. From contact at ezabel.com Mon Sep 11 16:11:22 2006 From: contact at ezabel.com (Ian Zabel) Date: Mon, 11 Sep 2006 22:11:22 +0200 Subject: [Ferret-talk] [ANN] acts_as_ferret 0.3.0 In-Reply-To: <20060909172806.GA21883@cordoba.webit.de> References: <20060909172806.GA21883@cordoba.webit.de> Message-ID: <84a33822dd30538c59128809dca5be1d@ruby-forum.com> Thanks for the update, Jens! How can I get more_like_this to work? Do I have to have my fields stored? I'm getting this error when I try to run it with fields stored or not: >> t = Topic.find(:first) >> t.more_like_this NoMethodError: You have a nil object when you didn't expect it! You might have expected an instance of Array. The error occured while evaluating nil.each from ./script/../config/../config/../vendor/plugins/acts_as_ferret/lib/more_like_this.rb:110:in `retrieve_terms' from ./script/../config/../config/../vendor/plugins/acts_as_ferret/lib/more_like_this.rb:65:in `more_like_this' from c:/ruby/lib/ruby/1.8/monitor.rb:229:in `synchronize' from ./script/../config/../config/../vendor/plugins/acts_as_ferret/lib/more_like_this.rb:61:in `more_like_this' from (irb):4 >> This is aaf trunk and ferret 0.10.4 -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Mon Sep 11 18:05:11 2006 From: kraemer at webit.de (Jens Kraemer) Date: Tue, 12 Sep 2006 00:05:11 +0200 Subject: [Ferret-talk] [ANN] acts_as_ferret 0.3.0 In-Reply-To: <84a33822dd30538c59128809dca5be1d@ruby-forum.com> References: <20060909172806.GA21883@cordoba.webit.de> <84a33822dd30538c59128809dca5be1d@ruby-forum.com> Message-ID: <20060911220510.GA26821@cordoba.webit.de> On Mon, Sep 11, 2006 at 10:11:22PM +0200, Ian Zabel wrote: > Thanks for the update, Jens! > > How can I get more_like_this to work? Do I have to have my fields > stored? > > I'm getting this error when I try to run it with fields stored or not: > > >> t = Topic.find(:first) > >> t.more_like_this > NoMethodError: You have a nil object when you didn't expect it! > You might have expected an instance of Array. > The error occured while evaluating nil.each oh, you'll have to specify an array of the fields you want to use for the similarity search: t.more_like_this(:fields => [ :content, :title ]) The query that is constructed to find similar topics will then be built from relevant terms found in those fields of t. more_like_this doesn't need stored field contents, it even can work without term vectors in the index, In this case it takes the contents from the db and builds the term information on the fly. ferret's term vectors will be much faster than that, of course. Stored Term vectors are the default with aaf, so no need to worry about that. cheers, Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From tom at subblue.com Tue Sep 12 11:07:31 2006 From: tom at subblue.com (Tom Beddard) Date: Tue, 12 Sep 2006 17:07:31 +0200 Subject: [Ferret-talk] Querying against numeric fields? e.g. price:( >= min_price) Message-ID: <6fdc0ba6d6fd714df36db1796e7449b3@ruby-forum.com> Using acts_as_ferret I'm trying to do a query like: active:(true) title|body:(#{params[:s]}) product_price:( >= #{params[:min]}) Where I want to return only the active products that contain the search term in the title or body and has a minimum price >= params[:min] I'm finding that even though I'm indexing the product price as an integer (so no .00 to cause confusion) I'm getting results in the 50 value range as well as 500 if I set the min price as 500. I presume ferret is doing the price as a string comparison, but is there any way to make it do a numeric match? Thanks -- Posted via http://www.ruby-forum.com/. From j_coppedge at yahoo.com Tue Sep 12 13:33:19 2006 From: j_coppedge at yahoo.com (J Coppedge) Date: Tue, 12 Sep 2006 19:33:19 +0200 Subject: [Ferret-talk] ferret / acts_as_ferret multiple server deployment Message-ID: <80767f27b7712e9a65f94ab6d4c09987@ruby-forum.com> Has anyone deployed ferret & act_as_ferret to a load balanced multiple server environment? If so, did you simply use a shared network index? I have a couple of ideas on how to deploy - but each have shortcomings and I'm hoping to find out if anyone else has deployed ferret in this manner. The application is simply load balanced between multiple servers running the same app for speed and redundancy, and things that are to be indexed could be changed at the same time on each instance - and to make sure the index is up to date we'll be using acts_as_ferret, but this seems to cause a potential problem when coming from multiple servers to the same index. Any insight you could provide would appreciated? -- Posted via http://www.ruby-forum.com/. From epugh at opensourceconnections.com Tue Sep 12 14:14:54 2006 From: epugh at opensourceconnections.com (Eric Pugh) Date: Tue, 12 Sep 2006 14:14:54 -0400 Subject: [Ferret-talk] Ferret on Windows? Message-ID: I am trying to use SVN HEAD of Acts_as_ferret, and I have ferret 0.10.4installed from the windows Gem. Unfortuanantly, whenever I i let act_as_ferrets init.rb file require ferret, my app blows up, with weird "Invalid char" errors. I have pasted the page below. If I comment out the require ferret, the page loads. Of course, act_as_ferret does blow up. My unit tests do work fine however! I can query my model using the ferret calls. Any recommendations? Eric SyntaxError in Narratives#index Showing *app/views/narratives/list.rhtml* where line *#2* raised: compile error ./script/../config/../app/views/narratives/list.rhtml:2: syntax error ./script/../config/../app/views/narratives/list.rhtml:3: Invalid char `\200' in expression ./script/../config/../app/views/narratives/list.rhtml:4: syntax error Extracted source (around line *#2*): 1:
2: 3: 4: 5:
Trace of template inclusion: /app/views/narratives/list.rhtml RAILS_ROOT: ./script/../config/.. Application Trace <#> | Framework Trace <#> | Full Trace <#> #{RAILS_ROOT}/app/views/narratives/list.rhtml:4:in `compile_template' #{RAILS_ROOT}/app/controllers/narratives_controller.rb:18:in `index' c:/ruby/lib/ruby/gems/1.8/gems/actionpack-1.12.5/lib/action_view/base.rb:307:in `compile_and_render_template' c:/ruby/lib/ruby/gems/1.8/gems/actionpack-1.12.5/lib/action_view/base.rb:292:in `render_template' c:/ruby/lib/ruby/gems/1.8/gems/actionpack-1.12.5/lib/action_view/base.rb:251:in `render_file' c:/ruby/lib/ruby/gems/1.8/gems/actionpack-1.12.5/lib/action_controller/base.rb:726:in `render_file' c:/ruby/lib/ruby/gems/1.8/gems/actionpack-1.12.5/lib/action_controller/base.rb:648:in `render_with_no_layout' c:/ruby/lib/ruby/gems/1.8/gems/actionpack-1.12.5/lib/action_controller/base.rb:769:in `render_without_layout' c:/ruby/lib/ruby/gems/1.8/gems/actionpack-1.12.5/lib/action_controller/base.rb:718:in `render_action' c:/ruby/lib/ruby/gems/1.8/gems/actionpack-1.12.5/lib/action_controller/base.rb:670:in `render_with_no_layout' c:/ruby/lib/ruby/gems/1.8/gems/actionpack-1.12.5/lib/action_controller/layout.rb:242:in `render_without_benchmark' c:/ruby/lib/ruby/gems/1.8/gems/actionpack-1.12.5/lib/action_controller/benchmarking.rb:53:in `render' c:/ruby/lib/ruby/1.8/benchmark.rb:293:in `measure' c:/ruby/lib/ruby/gems/1.8/gems/actionpack-1.12.5/lib/action_controller/benchmarking.rb:53:in `render' c:/ruby/lib/ruby/gems/1.8/gems/actionpack-1.12.5/lib/action_controller/base.rb:941:in `perform_action_without_filters' c:/ruby/lib/ruby/gems/1.8/gems/actionpack-1.12.5/lib/action_controller/filters.rb:368:in `perform_action_without_benchmark' c:/ruby/lib/ruby/gems/1.8/gems/actionpack-1.12.5/lib/action_controller/benchmarking.rb:69:in `perform_action_without_rescue' c:/ruby/lib/ruby/1.8/benchmark.rb:293:in `measure' c:/ruby/lib/ruby/gems/1.8/gems/actionpack-1.12.5/lib/action_controller/benchmarking.rb:69:in `perform_action_without_rescue' c:/ruby/lib/ruby/gems/1.8/gems/actionpack-1.12.5/lib/action_controller/rescue.rb:82:in `perform_action' c:/ruby/lib/ruby/gems/1.8/gems/actionpack-1.12.5/lib/action_controller/base.rb:408:in `process_without_filters' c:/ruby/lib/ruby/gems/1.8/gems/actionpack-1.12.5/lib/action_controller/filters.rb:377:in `process_without_session_management_support' c:/ruby/lib/ruby/gems/1.8/gems/actionpack-1.12.5/lib/action_controller/session_management.rb:117:in `process' c:/ruby/lib/ruby/gems/1.8/gems/rails-1.1.6/lib/dispatcher.rb:38:in `dispatch' c:/ruby/lib/ruby/gems/1.8/gems/rails-1.1.6/lib/webrick_server.rb:115:in `handle_dispatch' c:/ruby/lib/ruby/gems/1.8/gems/rails-1.1.6/lib/webrick_server.rb:81:in `service' c:/ruby/lib/ruby/1.8/webrick/httpserver.rb:104:in `service' c:/ruby/lib/ruby/1.8/webrick/httpserver.rb:65:in `run' c:/ruby/lib/ruby/1.8/webrick/server.rb:173:in `start_thread' c:/ruby/lib/ruby/1.8/webrick/server.rb:162:in `start_thread' c:/ruby/lib/ruby/1.8/webrick/server.rb:95:in `start' c:/ruby/lib/ruby/1.8/webrick/server.rb:92:in `start' c:/ruby/lib/ruby/1.8/webrick/server.rb:23:in `start' c:/ruby/lib/ruby/1.8/webrick/server.rb:82:in `start' c:/ruby/lib/ruby/gems/1.8/gems/rails-1.1.6/lib/webrick_server.rb:67:in `dispatch' c:/ruby/lib/ruby/gems/1.8/gems/rails-1.1.6/lib/commands/servers/webrick.rb:59 c:/ruby/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:27:in `require' c:/ruby/lib/ruby/gems/1.8/gems/activesupport-1.3.1/lib/active_support/dependencies.rb:147:in `require' c:/ruby/lib/ruby/gems/1.8/gems/rails-1.1.6/lib/commands/server.rb:30 c:/ruby/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:27:in `require' c:/ruby/lib/ruby/gems/1.8/gems/activesupport-1.3.1/lib/active_support/dependencies.rb:147:in `require' script/server:3 #{RAILS_ROOT}/app/views/narratives/list.rhtml:4:in `compile_template' c:/ruby/lib/ruby/gems/1.8/gems/actionpack-1.12.5/lib/action_view/base.rb:307:in `compile_and_render_template' c:/ruby/lib/ruby/gems/1.8/gems/actionpack-1.12.5/lib/action_view/base.rb:292:in `render_template' c:/ruby/lib/ruby/gems/1.8/gems/actionpack-1.12.5/lib/action_view/base.rb:251:in `render_file' c:/ruby/lib/ruby/gems/1.8/gems/actionpack-1.12.5/lib/action_controller/base.rb:726:in `render_file' c:/ruby/lib/ruby/gems/1.8/gems/actionpack-1.12.5/lib/action_controller/base.rb:648:in `render_with_no_layout' c:/ruby/lib/ruby/gems/1.8/gems/actionpack-1.12.5/lib/action_controller/base.rb:769:in `render_without_layout' c:/ruby/lib/ruby/gems/1.8/gems/actionpack-1.12.5/lib/action_controller/base.rb:718:in `render_action' c:/ruby/lib/ruby/gems/1.8/gems/actionpack-1.12.5/lib/action_controller/base.rb:670:in `render_with_no_layout' c:/ruby/lib/ruby/gems/1.8/gems/actionpack-1.12.5/lib/action_controller/layout.rb:242:in `render_without_benchmark' c:/ruby/lib/ruby/gems/1.8/gems/actionpack-1.12.5/lib/action_controller/benchmarking.rb:53:in `render' c:/ruby/lib/ruby/1.8/benchmark.rb:293:in `measure' c:/ruby/lib/ruby/gems/1.8/gems/actionpack-1.12.5/lib/action_controller/benchmarking.rb:53:in `render' #{RAILS_ROOT}/app/controllers/narratives_controller.rb:18:in `index' c:/ruby/lib/ruby/gems/1.8/gems/actionpack-1.12.5/lib/action_controller/base.rb:941:in `perform_action_without_filters' c:/ruby/lib/ruby/gems/1.8/gems/actionpack-1.12.5/lib/action_controller/filters.rb:368:in `perform_action_without_benchmark' c:/ruby/lib/ruby/gems/1.8/gems/actionpack-1.12.5/lib/action_controller/benchmarking.rb:69:in `perform_action_without_rescue' c:/ruby/lib/ruby/1.8/benchmark.rb:293:in `measure' c:/ruby/lib/ruby/gems/1.8/gems/actionpack-1.12.5/lib/action_controller/benchmarking.rb:69:in `perform_action_without_rescue' c:/ruby/lib/ruby/gems/1.8/gems/actionpack-1.12.5/lib/action_controller/rescue.rb:82:in `perform_action' c:/ruby/lib/ruby/gems/1.8/gems/actionpack-1.12.5/lib/action_controller/base.rb:408:in `process_without_filters' c:/ruby/lib/ruby/gems/1.8/gems/actionpack-1.12.5/lib/action_controller/filters.rb:377:in `process_without_session_management_support' c:/ruby/lib/ruby/gems/1.8/gems/actionpack-1.12.5/lib/action_controller/session_management.rb:117:in `process' c:/ruby/lib/ruby/gems/1.8/gems/rails-1.1.6/lib/dispatcher.rb:38:in `dispatch' c:/ruby/lib/ruby/gems/1.8/gems/rails-1.1.6/lib/webrick_server.rb:115:in `handle_dispatch' c:/ruby/lib/ruby/gems/1.8/gems/rails-1.1.6/lib/webrick_server.rb:81:in `service' c:/ruby/lib/ruby/1.8/webrick/httpserver.rb:104:in `service' c:/ruby/lib/ruby/1.8/webrick/httpserver.rb:65:in `run' c:/ruby/lib/ruby/1.8/webrick/server.rb:173:in `start_thread' c:/ruby/lib/ruby/1.8/webrick/server.rb:162:in `start_thread' c:/ruby/lib/ruby/1.8/webrick/server.rb:95:in `start' c:/ruby/lib/ruby/1.8/webrick/server.rb:92:in `start' c:/ruby/lib/ruby/1.8/webrick/server.rb:23:in `start' c:/ruby/lib/ruby/1.8/webrick/server.rb:82:in `start' c:/ruby/lib/ruby/gems/1.8/gems/rails-1.1.6/lib/webrick_server.rb:67:in `dispatch' c:/ruby/lib/ruby/gems/1.8/gems/rails-1.1.6/lib/commands/servers/webrick.rb:59 c:/ruby/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:27:in `require' c:/ruby/lib/ruby/gems/1.8/gems/activesupport-1.3.1/lib/active_support/dependencies.rb:147:in `require' c:/ruby/lib/ruby/gems/1.8/gems/rails-1.1.6/lib/commands/server.rb:30 c:/ruby/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:27:in `require' c:/ruby/lib/ruby/gems/1.8/gems/activesupport-1.3.1/lib/active_support/dependencies.rb:147:in `require' script/server:3 Request *Parameters*: None Show session dump <#> --- flash: !map:ActionController::Flash::FlashHash {} Response*Headers*: {"cookie"=>[], "Cache-Control"=>"no-cache"} -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20060912/d7e468d7/attachment.html From kraemer at webit.de Tue Sep 12 14:22:04 2006 From: kraemer at webit.de (Jens Kraemer) Date: Tue, 12 Sep 2006 20:22:04 +0200 Subject: [Ferret-talk] Ferret on Windows? In-Reply-To: References: Message-ID: <20060912182204.GA2233@cordoba.webit.de> On Tue, Sep 12, 2006 at 02:14:54PM -0400, Eric Pugh wrote: > I am trying to use SVN HEAD of Acts_as_ferret, and I have ferret > 0.10.4installed from the windows Gem. > > Unfortuanantly, whenever I i let act_as_ferrets init.rb file require ferret, > my app blows up, with weird "Invalid char" errors. I have pasted the page > below. If I comment out the require ferret, the page loads. Of course, > act_as_ferret does blow up. removing all tabs from your *rhtml templates should fix this. really annoying, seems to have s.th. to do with windows+ruby-c-extensions (RMagick has the same problem afair) Jens > My unit tests do work fine however! I can query my model using the ferret > calls. > > Any recommendations? > > Eric [..] > > 1:
> 2: > 3: > 4: > 5:
omg that doesn't look like xhtml ;-) -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From kraemer at webit.de Tue Sep 12 14:31:59 2006 From: kraemer at webit.de (Jens Kraemer) Date: Tue, 12 Sep 2006 20:31:59 +0200 Subject: [Ferret-talk] ferret / acts_as_ferret multiple server deployment In-Reply-To: <80767f27b7712e9a65f94ab6d4c09987@ruby-forum.com> References: <80767f27b7712e9a65f94ab6d4c09987@ruby-forum.com> Message-ID: <20060912183159.GB2233@cordoba.webit.de> On Tue, Sep 12, 2006 at 07:33:19PM +0200, J Coppedge wrote: > Has anyone deployed ferret & act_as_ferret to a load balanced multiple > server environment? If so, did you simply use a shared network index? I'm unsure if an index on a shared network drive would work. > I have a couple of ideas on how to deploy - but each have shortcomings > and I'm hoping to find out if anyone else has deployed ferret in this > manner. The application is simply load balanced between multiple > servers running the same app for speed and redundancy, and things that > are to be indexed could be changed at the same time on each instance - > and to make sure the index is up to date we'll be using acts_as_ferret, > but this seems to cause a potential problem when coming from multiple > servers to the same index. Any insight you could provide would > appreciated? interesting problem, that had to come up sooner or later :-) In case the 'index on a network drive' doesn't work out (file locking is one thing that could go wrong), I'd go for a central index server handling all the searching and indexing. This won't work with acts_as_ferret, though. If searching speed is an issue and accuracy of results is not, you could replicate the index to your app servers once in a while and search there. I feel it's time for acts_as_remote_ferret ;-) something like aaf, but connecting to a remote index server whenever a record is saved. Or implemented as an option to aaf, which then would be working on local indexes in development and test environments, and against a remote index server in production mode. sounds really interesting... what other deployment scenarios did you think of ? Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From dbalmain.ml at gmail.com Tue Sep 12 15:11:03 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Wed, 13 Sep 2006 04:11:03 +0900 Subject: [Ferret-talk] Querying against numeric fields? e.g. price:( >= min_price) In-Reply-To: <6fdc0ba6d6fd714df36db1796e7449b3@ruby-forum.com> References: <6fdc0ba6d6fd714df36db1796e7449b3@ruby-forum.com> Message-ID: On 9/13/06, Tom Beddard wrote: > Using acts_as_ferret I'm trying to do a query like: > > active:(true) title|body:(#{params[:s]}) product_price:( >= > #{params[:min]}) > > > Where I want to return only the active products that contain the search > term in the title or body and has a minimum price >= params[:min] > > I'm finding that even though I'm indexing the product price as an > integer (so no .00 to cause confusion) I'm getting results in the 50 > value range as well as 500 if I set the min price as 500. I presume > ferret is doing the price as a string comparison, but is there any way > to make it do a numeric match? > > Thanks Hi Tom, You need to pad all numbers to a fixed width when adding them to the index as well as when querying the index. Usually you'd write the code to do this yourself. I've recently come up with another way to do this. require 'ferret' module Ferret::Analysis class IntegerTokenizer def initialize(num, width) @num = num.to_i @width = width @done = false end def next if @done return nil else @done = true puts Token.new("%0#{@width}d" % @num, 0, @width) return Token.new("%0#{@width}d" % @num, 0, @width) end end def text=(text) @num = text.to_i @done = false end end class IntegerAnalyzer def initialize(width) @width = width end def token_stream(field, input) return IntegerTokenizer.new(input, @width) end end end include Ferret::Analysis analyzer = PerFieldAnalyzer.new(StandardAnalyzer.new) analyzer[:num] = IntegerAnalyzer.new(5) index = Ferret::Index::Index.new(:analyzer => analyzer) docs = [ {:num => 1, :data => "yes"}, {:num => 1, :data => "no"}, {:num => 10, :data => "yes"}, {:num => 10, :data => "no"}, {:num => 100, :data => "yes"}, {:num => 100, :data => "no"}, {:num => 1000, :data => "yes"}, {:num => 1000, :data => "no"} ] docs.each { |d| index << d } puts index.process_query('data:yes AND num:[10 100]') puts index.search('data:yes AND num:[10 100]') This will only work with the working copy of Ferret from the subversion repository. I'm still not convinced that this is the best way to do it. Cheers, Dave From kraemer at webit.de Tue Sep 12 15:17:21 2006 From: kraemer at webit.de (Jens Kraemer) Date: Tue, 12 Sep 2006 21:17:21 +0200 Subject: [Ferret-talk] options hash ignored when searching multiple readers Message-ID: <20060912191721.GC2233@cordoba.webit.de> Hi, I'm working on an aaf bug report that led me to what I think is a bug in Ferret itself. The snippet at http://pastie.caboo.se/12950 shows the problem, the last two lines should imho only return one result, because of :offset => 1 or :limit => 1, but both return all (that is, 2) results (Ferret 0.10.4). Cheers, Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From kraemer at webit.de Tue Sep 12 15:33:03 2006 From: kraemer at webit.de (Jens Kraemer) Date: Tue, 12 Sep 2006 21:33:03 +0200 Subject: [Ferret-talk] options hash ignored when searching multiple readers In-Reply-To: <20060912191721.GC2233@cordoba.webit.de> References: <20060912191721.GC2233@cordoba.webit.de> Message-ID: <20060912193303.GD2233@cordoba.webit.de> just ignore me, of course total_hits gives the total number of hits. I guess that's why it's named that way :-) Everything works as expected ... Jens On Tue, Sep 12, 2006 at 09:17:21PM +0200, Jens Kraemer wrote: > Hi, > > I'm working on an aaf bug report that led me to what I think is a bug > in Ferret itself. The snippet at > > http://pastie.caboo.se/12950 > > shows the problem, the last two lines should imho only return one > result, because of :offset => 1 or :limit => 1, but both return all > (that is, 2) results (Ferret 0.10.4). > > > Cheers, > Jens > > -- > webit! Gesellschaft f?r neue Medien mbH www.webit.de > Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de > Schnorrstra?e 76 Tel +49 351 46766 0 > D-01069 Dresden Fax +49 351 46766 66 > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From j_coppedge at yahoo.com Tue Sep 12 16:57:32 2006 From: j_coppedge at yahoo.com (J Coppedge) Date: Tue, 12 Sep 2006 22:57:32 +0200 Subject: [Ferret-talk] ferret / acts_as_ferret multiple server deployment In-Reply-To: <20060912183159.GB2233@cordoba.webit.de> References: <80767f27b7712e9a65f94ab6d4c09987@ruby-forum.com> <20060912183159.GB2233@cordoba.webit.de> Message-ID: <6e9c57140e75348102f2a5bcaf37a2ce@ruby-forum.com> I believe you touched on each one... 1. Shared network index. 2. Sync of centralized index to individual index on each "slave" server. 3. Centralizing the searching / indexing to a separate search server - however it's possible that you would also need to load balance service at some point... > sounds really interesting... > what other deployment scenarios did you think of ? > -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Tue Sep 12 17:16:59 2006 From: kraemer at webit.de (Jens Kraemer) Date: Tue, 12 Sep 2006 23:16:59 +0200 Subject: [Ferret-talk] ferret / acts_as_ferret multiple server deployment In-Reply-To: <6e9c57140e75348102f2a5bcaf37a2ce@ruby-forum.com> References: <80767f27b7712e9a65f94ab6d4c09987@ruby-forum.com> <20060912183159.GB2233@cordoba.webit.de> <6e9c57140e75348102f2a5bcaf37a2ce@ruby-forum.com> Message-ID: <20060912211659.GA29768@cordoba.webit.de> On Tue, Sep 12, 2006 at 10:57:32PM +0200, J Coppedge wrote: > I believe you touched on each one... > > 1. Shared network index. > > 2. Sync of centralized index to individual index on each "slave" server. > > 3. Centralizing the searching / indexing to a separate search server - > however it's possible that you would also need to load balance service > at some point... load balancing the indexing to several servers can only be done via segmenting the data across those servers, and merging it when searching. This seems possible but is not implemented in Ferret (yet?) Java-Lucene has some kind of RMI stuff for searching multiple remote indexes afair. Even with 2 servers accessing the same physical index on a shared network drive you would see no indexing speed increase, since only one process may write-access the index at a time. searching speed would increase, of course. I don't know what amounts of traffic you expect, but I'd go with the simplest solution (besides the shared disk, where I'm somewhat unsure if it is possible) as long as possible: one centralized server handling all searching/indexing. fail safety could be reached with a replication of the index to another box, that steps in when needed. cheers, Jens > > > sounds really interesting... > > what other deployment scenarios did you think of ? > > > > -- > Posted via http://www.ruby-forum.com/. > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From david.wennergren at gmail.com Wed Sep 13 05:09:15 2006 From: david.wennergren at gmail.com (David Wennergren) Date: Wed, 13 Sep 2006 11:09:15 +0200 Subject: [Ferret-talk] Problem with index file permissions Message-ID: <4c43f739fb6b4cfed20430a75995c8cd@ruby-forum.com> I have a problem with file permissions and Ferret. In our production enivroment the webserver runs as one user and the console (and cron jobs) as another one. When Ferret creates a segment or file in the index directory it's created with only read/write-permissions for the owner, which means that the other user can't access the file. How can I affect the permission on files created by Ferret? Thanks/David Wennergren -- Posted via http://www.ruby-forum.com/. From dbalmain.ml at gmail.com Wed Sep 13 10:07:43 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Wed, 13 Sep 2006 23:07:43 +0900 Subject: [Ferret-talk] ferret / acts_as_ferret multiple server deployment In-Reply-To: <20060912211659.GA29768@cordoba.webit.de> References: <80767f27b7712e9a65f94ab6d4c09987@ruby-forum.com> <20060912183159.GB2233@cordoba.webit.de> <6e9c57140e75348102f2a5bcaf37a2ce@ruby-forum.com> <20060912211659.GA29768@cordoba.webit.de> Message-ID: On 9/13/06, Jens Kraemer wrote: > On Tue, Sep 12, 2006 at 10:57:32PM +0200, J Coppedge wrote: > > I believe you touched on each one... > > > > 1. Shared network index. > > > > 2. Sync of centralized index to individual index on each "slave" server. > > > > 3. Centralizing the searching / indexing to a separate search server - > > however it's possible that you would also need to load balance service > > at some point... > > load balancing the indexing to several servers can only be done via > segmenting the data across those servers, and merging it when searching. > This seems possible but is not implemented in Ferret (yet?) The start of this is there (ie the MultiSearcher). I just need to implement RemoteSearcher. Don't expect it any time soon however as I'm a little burnt out at the moment. I'm just going to be cleaning up what is currently already built for the time being. Cheers, Dave From dbalmain.ml at gmail.com Wed Sep 13 10:12:41 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Wed, 13 Sep 2006 23:12:41 +0900 Subject: [Ferret-talk] German Phrase Message-ID: Hi German users, Can one of you guys give me a German Phrase that I can use to demonstrate tokenizing non-ascii text. Preferably something about 40 bytes long with lots of umlauts and perhaps a ?. Cheers, Dave From kraemer at webit.de Wed Sep 13 10:28:45 2006 From: kraemer at webit.de (Jens Kraemer) Date: Wed, 13 Sep 2006 16:28:45 +0200 Subject: [Ferret-talk] German Phrase In-Reply-To: References: Message-ID: <20060913142844.GW23939@cordoba.webit.de> On Wed, Sep 13, 2006 at 11:12:41PM +0900, David Balmain wrote: > Hi German users, > > Can one of you guys give me a German Phrase that I can use to > demonstrate tokenizing non-ascii text. Preferably something about 40 > bytes long with lots of umlauts and perhaps a ?. Zw?lf Boxk?mpfer jagen Viktor quer ?ber den gro?en Sylter Deich. or ?Fix, Schwyz!? qu?kt J?rgen bl?d vom Pa?. found on http://de.wikipedia.org/wiki/Pangramm Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From dbalmain.ml at gmail.com Wed Sep 13 10:57:22 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Wed, 13 Sep 2006 23:57:22 +0900 Subject: [Ferret-talk] German Phrase In-Reply-To: <20060913142844.GW23939@cordoba.webit.de> References: <20060913142844.GW23939@cordoba.webit.de> Message-ID: On 9/13/06, Jens Kraemer wrote: > On Wed, Sep 13, 2006 at 11:12:41PM +0900, David Balmain wrote: > > Hi German users, > > > > Can one of you guys give me a German Phrase that I can use to > > demonstrate tokenizing non-ascii text. Preferably something about 40 > > bytes long with lots of umlauts and perhaps a ?. > > Zw?lf Boxk?mpfer jagen Viktor quer ?ber den gro?en Sylter Deich. > > or > > ?Fix, Schwyz!" qu?kt J?rgen bl?d vom Pa?. Thanks Jens. This one is perfect although I can't seem to make sense of it. The translation I could get was; "Fix Schwyz!", croaked J?rgen blood from the pass. Does it make sense in German? Not that it really matters. I'm going to go ahead and use it anyway. Cheers, Dave From kraemer at webit.de Wed Sep 13 11:06:29 2006 From: kraemer at webit.de (Jens Kraemer) Date: Wed, 13 Sep 2006 17:06:29 +0200 Subject: [Ferret-talk] German Phrase In-Reply-To: References: <20060913142844.GW23939@cordoba.webit.de> Message-ID: <20060913150629.GY23939@cordoba.webit.de> On Wed, Sep 13, 2006 at 11:57:22PM +0900, David Balmain wrote: > On 9/13/06, Jens Kraemer wrote: > > On Wed, Sep 13, 2006 at 11:12:41PM +0900, David Balmain wrote: > > > Hi German users, > > > > > > Can one of you guys give me a German Phrase that I can use to > > > demonstrate tokenizing non-ascii text. Preferably something about 40 > > > bytes long with lots of umlauts and perhaps a ?. > > > > Zw?lf Boxk?mpfer jagen Viktor quer ?ber den gro?en Sylter Deich. > > > > or > > > > ?Fix, Schwyz!" qu?kt J?rgen bl?d vom Pa?. > > Thanks Jens. This one is perfect although I can't seem to make sense > of it. The translation I could get was; > > "Fix Schwyz!", croaked J?rgen blood from the pass. I'd translate it with "Go Suisse!", J?rgen {oafishly|zanily} croaks from the pass. ^^^^^^^^^^^^^^^ pick one ;-) Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From jan.prill at gmail.com Wed Sep 13 11:18:04 2006 From: jan.prill at gmail.com (Jan Prill) Date: Wed, 13 Sep 2006 17:18:04 +0200 Subject: [Ferret-talk] German Phrase In-Reply-To: References: <20060913142844.GW23939@cordoba.webit.de> Message-ID: <562a35c10609130818i312e2014mb8dc806316ad9cbb@mail.gmail.com> Hi Dave, these seem to be panagrams, which means they are using each character of the german abc. Heard of this myself for the first time. Naturally they don't make too much sense but all of the used words are correct. The first one means something like: Twelve boxers are hunting victor across the big bank of the island sylt. The second one goes around: "Come on, you Swiss!", croaked J?rgen chuckleheaded up from the pass. Cheers, Jan -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20060913/b14ce62e/attachment.html From dbalmain.ml at gmail.com Wed Sep 13 11:39:37 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Thu, 14 Sep 2006 00:39:37 +0900 Subject: [Ferret-talk] German Phrase In-Reply-To: <562a35c10609130818i312e2014mb8dc806316ad9cbb@mail.gmail.com> References: <20060913142844.GW23939@cordoba.webit.de> <562a35c10609130818i312e2014mb8dc806316ad9cbb@mail.gmail.com> Message-ID: On 9/14/06, Jan Prill wrote: > Hi Dave, > > these seem to be panagrams, which means they are using each character of the > german abc. Heard of this myself for the first time. Naturally they don't > make too much sense but all of the used words are correct. > > The first one means something like: > Twelve boxers are hunting victor across the big bank of the island sylt. > > The second one goes around: > "Come on, you Swiss!", croaked J?rgen chuckleheaded up from the pass. > > Cheers, > Jan Thanks guys. I guess it makes just as much sense as the quick brown fox jumping over a lazy dog (English pangram). The best perfect pangram I could find in English is "fix Mr. Gluck's hazy TV, PDQ!". Thanks again, Dave From marvin at rectangular.com Wed Sep 13 11:56:07 2006 From: marvin at rectangular.com (Marvin Humphrey) Date: Wed, 13 Sep 2006 08:56:07 -0700 Subject: [Ferret-talk] German Phrase In-Reply-To: References: <20060913142844.GW23939@cordoba.webit.de> <562a35c10609130818i312e2014mb8dc806316ad9cbb@mail.gmail.com> Message-ID: <48515A92-E0FF-4B68-AF88-A6598A559D75@rectangular.com> On Sep 13, 2006, at 8:39 AM, David Balmain wrote: > The best perfect > pangram I could find in English is "fix Mr. Gluck's hazy TV, PDQ!". /me .oO( Where's the 'e'? ) /me .oO( aha! ) New job: fix Mr. Gluck's hazy TV, PDQ! We now return you to your regularly scheduled programming... Marvin Humphrey Rectangular Research http://www.rectangular.com/ From dbalmain.ml at gmail.com Wed Sep 13 12:04:03 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Thu, 14 Sep 2006 01:04:03 +0900 Subject: [Ferret-talk] German Phrase In-Reply-To: <48515A92-E0FF-4B68-AF88-A6598A559D75@rectangular.com> References: <20060913142844.GW23939@cordoba.webit.de> <562a35c10609130818i312e2014mb8dc806316ad9cbb@mail.gmail.com> <48515A92-E0FF-4B68-AF88-A6598A559D75@rectangular.com> Message-ID: On 9/14/06, Marvin Humphrey wrote: > > On Sep 13, 2006, at 8:39 AM, David Balmain wrote: > > The best perfect > > pangram I could find in English is "fix Mr. Gluck's hazy TV, PDQ!". > > /me .oO( Where's the 'e'? ) > > > > /me .oO( aha! ) > > New job: fix Mr. Gluck's hazy TV, PDQ! > > We now return you to your regularly scheduled programming... Whoops, thanks for the correction. I should have counted the letters. :-) From jeff.cabaniss at gmail.com Thu Sep 14 00:08:37 2006 From: jeff.cabaniss at gmail.com (Jeff Cabaniss) Date: Thu, 14 Sep 2006 06:08:37 +0200 Subject: [Ferret-talk] acts_as_ferret with has_many :through relationships? Message-ID: <2bb69ec4884dded09ea27d646c1c132e@ruby-forum.com> I am currently using acts_as_ferret to search on a Posts table (title and content fields). These posts also have tags (with has_many :tags, :through => :questions_tags). I can't figure out how to get acts_as_ferret to work with that relationship and allow searching of title, content, And tags at the same time (with the ability to set boost as well). Can this be done? -- Posted via http://www.ruby-forum.com/. From Neville.Burnell at bmsoft.com.au Thu Sep 14 02:19:30 2006 From: Neville.Burnell at bmsoft.com.au (Neville Burnell) Date: Thu, 14 Sep 2006 16:19:30 +1000 Subject: [Ferret-talk] Possiible Bug ? indexWriter#doc_count counts deleted docs after #commit Message-ID: <126EC586577FD611A28E00A0C9A03758B5BD20@maui.bmsoft.com.au> I'm playing with "updating" docs in my index, and I think I've found bug with IndexWriter counting deleted docs. Script and output follow: ===== require 'rubygems' require 'ferret' p Ferret::VERSION @doc = {:id => '44', :name => 'fred', :email => 'abc at ozemail.com.au'} @dir = Ferret::Store::RAMDirectory.new def add_then_delete_fred @writer = Ferret::Index::IndexWriter.new(:dir => @dir) p "adding doc :id=#{@doc[:id]}" @writer << @doc p "doc_count=#{@writer.doc_count}" p "deleting doc :id=#{@doc[:id]}" @writer.delete(:id, @doc[:id]) p "doc_count=#{@writer.doc_count}" @writer.commit @writer.close @writer = nil end add_then_delete_fred add_then_delete_fred add_then_delete_fred @reader = Ferret::Index::IndexReader.new(@dir) p "reader count=#{@reader.num_docs}" @writer = Ferret::Index::IndexWriter.new(:dir => @dir) p "writer count=#{@writer.doc_count}" === $>ruby test_delete.rb "0.10.4" "adding doc :id=44" "doc_count=1" "deleting doc :id=44" "doc_count=1" "adding doc :id=44" "doc_count=2" "deleting doc :id=44" "doc_count=2" "adding doc :id=44" "doc_count=3" "deleting doc :id=44" "doc_count=3" "reader count=0" "writer count=3" From dbalmain.ml at gmail.com Thu Sep 14 02:55:02 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Thu, 14 Sep 2006 15:55:02 +0900 Subject: [Ferret-talk] Possiible Bug ? indexWriter#doc_count counts deleted docs after #commit In-Reply-To: <126EC586577FD611A28E00A0C9A03758B5BD20@maui.bmsoft.com.au> References: <126EC586577FD611A28E00A0C9A03758B5BD20@maui.bmsoft.com.au> Message-ID: On 9/14/06, Neville Burnell wrote: > I'm playing with "updating" docs in my index, and I think I've found bug > with IndexWriter counting deleted docs. Script and output follow: > > ===== > require 'rubygems' > require 'ferret' > > p Ferret::VERSION > > @doc = {:id => '44', :name => 'fred', :email => 'abc at ozemail.com.au'} > > @dir = Ferret::Store::RAMDirectory.new > > def add_then_delete_fred > @writer = Ferret::Index::IndexWriter.new(:dir => @dir) > > p "adding doc :id=#{@doc[:id]}" > @writer << @doc > p "doc_count=#{@writer.doc_count}" > > p "deleting doc :id=#{@doc[:id]}" > @writer.delete(:id, @doc[:id]) > p "doc_count=#{@writer.doc_count}" > > @writer.commit > @writer.close > @writer = nil > end > > add_then_delete_fred > add_then_delete_fred > add_then_delete_fred > > @reader = Ferret::Index::IndexReader.new(@dir) > p "reader count=#{@reader.num_docs}" > > @writer = Ferret::Index::IndexWriter.new(:dir => @dir) > p "writer count=#{@writer.doc_count}" > > === > > $>ruby test_delete.rb > "0.10.4" > "adding doc :id=44" > "doc_count=1" > "deleting doc :id=44" > "doc_count=1" > "adding doc :id=44" > "doc_count=2" > "deleting doc :id=44" > "doc_count=2" > "adding doc :id=44" > "doc_count=3" > "deleting doc :id=44" > "doc_count=3" > "reader count=0" > "writer count=3" Hi Neville, Unfortunately this is the way it has to work. Deleted documents don't get deleted until commit is called so there is no way to reliable tell how many undeleted documents exist in the index from the IndexWriter. It's a performance thing. I should change IndexWriter#doc_count to IndexWriter#max_doc to be consistant with IndexReader. Cheers, Dave From Neville.Burnell at bmsoft.com.au Thu Sep 14 03:19:38 2006 From: Neville.Burnell at bmsoft.com.au (Neville Burnell) Date: Thu, 14 Sep 2006 17:19:38 +1000 Subject: [Ferret-talk] Possiible Bug ? indexWriter#doc_count countsdeleted docs after #commit Message-ID: <126EC586577FD611A28E00A0C9A03758B5BD25@maui.bmsoft.com.au> Hi David, > Deleted documents don't get deleted until commit is called Ok, but FYI, my experiments show that #commit doesn't affect #doc_count, even across ruby sessions. On a different note, I'd like to request a variation of #add_document which returns the doc_id of the document added, as opposed to self. I'm trying to track down an issue with a large test index [600MB, 500k docs] in which I need to update a document. The old document is deleted then added again, but doesn't show up in my searches. A #doc_count on the writer before and after #add_document shows that the index is 1 document larger, but I still cant #search for the updated doc. What do you think about having #add_document "yield" the doc_id if block_given? Neville From dbalmain.ml at gmail.com Thu Sep 14 03:34:08 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Thu, 14 Sep 2006 16:34:08 +0900 Subject: [Ferret-talk] Possiible Bug ? indexWriter#doc_count countsdeleted docs after #commit In-Reply-To: <126EC586577FD611A28E00A0C9A03758B5BD25@maui.bmsoft.com.au> References: <126EC586577FD611A28E00A0C9A03758B5BD25@maui.bmsoft.com.au> Message-ID: On 9/14/06, Neville Burnell wrote: > Hi David, > > > Deleted documents don't get deleted until commit is called > > Ok, but FYI, my experiments show that #commit doesn't affect #doc_count, > even across ruby sessions. Sorry, I guess I wan't very clear on that point. The deletes don't get commited until commit is called which is why I don't have a num_docs method in IndexWriter to because there is no way to reliably tell until commit is called. IndexWriter#doc_count is like IndexReader#max_doc. It tells you how many documents there are in the index, deleted or not. > On a different note, I'd like to request a variation of #add_document > which returns the doc_id of the document added, as opposed to self. > > I'm trying to track down an issue with a large test index [600MB, 500k > docs] in which I need to update a document. The old document is deleted > then added again, but doesn't show up in my searches. > > A #doc_count on the writer before and after #add_document shows that the > index is 1 document larger, but I still cant #search for the updated > doc. > > What do you think about having #add_document "yield" the doc_id if > block_given? > > Neville How about just using the doc_count method. Call it after you add the document and subtract one and you'll have the document ID of the last document added. Don't call it before you add the document as a merge might happen when you add the document, possibly changing all document IDs when deletes are completely removed. Cheers, Dave From dbalmain.ml at gmail.com Thu Sep 14 03:37:37 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Thu, 14 Sep 2006 16:37:37 +0900 Subject: [Ferret-talk] Possiible Bug ? indexWriter#doc_count countsdeleted docs after #commit In-Reply-To: References: <126EC586577FD611A28E00A0C9A03758B5BD25@maui.bmsoft.com.au> Message-ID: On 9/14/06, David Balmain wrote: > On 9/14/06, Neville Burnell wrote: > > Hi David, > > > > > Deleted documents don't get deleted until commit is called > > > > Ok, but FYI, my experiments show that #commit doesn't affect #doc_count, > > even across ruby sessions. > > Sorry, I guess I wan't very clear on that point. The deletes don't get > commited until commit is called which is why I don't have a num_docs > method in IndexWriter to because there is no way to reliably tell > until commit is called. IndexWriter#doc_count is like > IndexReader#max_doc. It tells you how many documents there are in the > index, deleted or not. > > > On a different note, I'd like to request a variation of #add_document > > which returns the doc_id of the document added, as opposed to self. > > > > I'm trying to track down an issue with a large test index [600MB, 500k > > docs] in which I need to update a document. The old document is deleted > > then added again, but doesn't show up in my searches. > > > > A #doc_count on the writer before and after #add_document shows that the > > index is 1 document larger, but I still cant #search for the updated > > doc. > > > > What do you think about having #add_document "yield" the doc_id if > > block_given? > > > > Neville > > How about just using the doc_count method. Call it after you add the > document and subtract one and you'll have the document ID of the last > document added. Don't call it before you add the document as a merge > might happen when you add the document, possibly changing all document > IDs when deletes are completely removed. > > Cheers, > Dave > I should also mention the reason I wouldn't want to return the document ID from any IndexWriter method is that the document ID could become invalid when the next document is added (if a segment merge is triggered and deletes exist). At least when using an IndexReader, the document ID is valid for the life of the reader. From kraemer at webit.de Thu Sep 14 03:54:41 2006 From: kraemer at webit.de (Jens Kraemer) Date: Thu, 14 Sep 2006 09:54:41 +0200 Subject: [Ferret-talk] acts_as_ferret with has_many :through relationships? In-Reply-To: <2bb69ec4884dded09ea27d646c1c132e@ruby-forum.com> References: <2bb69ec4884dded09ea27d646c1c132e@ruby-forum.com> Message-ID: <20060914075441.GA5846@cordoba.webit.de> On Thu, Sep 14, 2006 at 06:08:37AM +0200, Jeff Cabaniss wrote: > I am currently using acts_as_ferret to search on a Posts table (title > and content fields). These posts also have tags (with has_many :tags, > :through => :questions_tags). I can't figure out how to get > acts_as_ferret to work with that relationship and allow searching of > title, content, And tags at the same time (with the ability to set > boost as well). Can this be done? yeah, define a method that returns a string containing all tags, and index that: acts_as_ferret :fields => { :title => { :boost => 2 }, :content => {}, :tag_string => {} } def tag_string tags.collect { |t| t.name }.join(' ') end Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From ryansking at gmail.com Thu Sep 14 13:08:20 2006 From: ryansking at gmail.com (Ryan King) Date: Thu, 14 Sep 2006 10:08:20 -0700 Subject: [Ferret-talk] Problem with index file permissions In-Reply-To: <4c43f739fb6b4cfed20430a75995c8cd@ruby-forum.com> References: <4c43f739fb6b4cfed20430a75995c8cd@ruby-forum.com> Message-ID: <846f30c70609141008t235c76abs53a86fa86dfcf519@mail.gmail.com> Use su? or sudo? Or maybe you should create a group, add the webserver and yourself to it. -ryan On 9/13/06, David Wennergren wrote: > I have a problem with file permissions and Ferret. In our production > enivroment the webserver runs as one user and the console (and cron > jobs) as another one. When Ferret creates a segment or file in the index > directory it's created with only read/write-permissions for the owner, > which means that the other user can't access the file. > > How can I affect the permission on files created by Ferret? > > Thanks/David Wennergren > > -- > Posted via http://www.ruby-forum.com/. > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > From Neville.Burnell at bmsoft.com.au Fri Sep 15 01:07:51 2006 From: Neville.Burnell at bmsoft.com.au (Neville Burnell) Date: Fri, 15 Sep 2006 15:07:51 +1000 Subject: [Ferret-talk] Trouble with "updating" a document Message-ID: <126EC586577FD611A28E00A0C9A03758B5BD2F@maui.bmsoft.com.au> Hi, I seem to be having trouble updating a doc, ie, deleting then re-adding to the index. The following script demonstrates my issue - I'm sure I'm missing something obvious, but I can't seem to find the problem. Can someone point out where I am going wrong please ? Regards Neville === require 'rubygems' require 'ferret' p Ferret::VERSION @dir = Ferret::Store::RAMDirectory.new @index = Ferret::Index::Index.new(:dir => @dir) (1..1000).each do |n| @index << {:id => "doc#{n}", :name => "name #{n}"} end @doc_999 = @index["doc999"] @doc_999.load if @doc_999 p "doc_999 not found" unless @doc_999 p "doc_999 name=#{@doc_999[:name]}" if @doc_999 @new_doc = {:id => "doc999", :name => "fred"} p "deleting and adding new doc999" @index.delete("doc999") @index << @new_doc @doc_999 = @index["doc999"] @doc_999.load if @doc_999 p "new_doc_999 not found" unless @doc_999 p "new_doc_999 name=#{@doc_999[:name]}" if @doc_999 @index.close @index = nil === $> ruby test_delete2.rb "0.10.4" "doc_999 name=name 999" "deleting and adding new doc999" "new_doc_999 not found" From Neville.Burnell at bmsoft.com.au Fri Sep 15 01:50:43 2006 From: Neville.Burnell at bmsoft.com.au (Neville Burnell) Date: Fri, 15 Sep 2006 15:50:43 +1000 Subject: [Ferret-talk] Possiible Bug ? indexWriter#doc_countcountsdeleted docs after #commit Message-ID: <126EC586577FD611A28E00A0C9A03758B5BD32@maui.bmsoft.com.au> > I should also mention the reason I wouldn't want > to return the document ID from any IndexWriter method > is that the document ID could become invalid when the > next document is added (if a segment merge is triggered > and deletes exist). At least when using an IndexReader, > the document ID is valid for the life of the reader. Thanks for your detail Dave! Regards, Neville _______________________________________________ Ferret-talk mailing list Ferret-talk at rubyforge.org http://rubyforge.org/mailman/listinfo/ferret-talk From Neville.Burnell at bmsoft.com.au Fri Sep 15 02:27:01 2006 From: Neville.Burnell at bmsoft.com.au (Neville Burnell) Date: Fri, 15 Sep 2006 16:27:01 +1000 Subject: [Ferret-talk] Trouble with "updating" a document Message-ID: <126EC586577FD611A28E00A0C9A03758B5BD34@maui.bmsoft.com.au> BTW, I just ran the same script [ie, without #load] with Ferret 0.9.3 and it worked "correctly", ie, "fred" is found as I expect. -----Original Message----- From: ferret-talk-bounces at rubyforge.org [mailto:ferret-talk-bounces at rubyforge.org] On Behalf Of Neville Burnell Sent: Friday, 15 September 2006 3:08 PM To: ferret-talk at rubyforge.org Subject: [Ferret-talk] Trouble with "updating" a document Hi, I seem to be having trouble updating a doc, ie, deleting then re-adding to the index. The following script demonstrates my issue - I'm sure I'm missing something obvious, but I can't seem to find the problem. Can someone point out where I am going wrong please ? Regards Neville === require 'rubygems' require 'ferret' p Ferret::VERSION @dir = Ferret::Store::RAMDirectory.new @index = Ferret::Index::Index.new(:dir => @dir) (1..1000).each do |n| @index << {:id => "doc#{n}", :name => "name #{n}"} end @doc_999 = @index["doc999"] @doc_999.load if @doc_999 p "doc_999 not found" unless @doc_999 p "doc_999 name=#{@doc_999[:name]}" if @doc_999 @new_doc = {:id => "doc999", :name => "fred"} p "deleting and adding new doc999" @index.delete("doc999") @index << @new_doc @doc_999 = @index["doc999"] @doc_999.load if @doc_999 p "new_doc_999 not found" unless @doc_999 p "new_doc_999 name=#{@doc_999[:name]}" if @doc_999 @index.close @index = nil === $> ruby test_delete2.rb "0.10.4" "doc_999 name=name 999" "deleting and adding new doc999" "new_doc_999 not found" _______________________________________________ Ferret-talk mailing list Ferret-talk at rubyforge.org http://rubyforge.org/mailman/listinfo/ferret-talk From f at andreas-s.net Fri Sep 15 09:00:50 2006 From: f at andreas-s.net (Andreas Schwarz) Date: Fri, 15 Sep 2006 15:00:50 +0200 Subject: [Ferret-talk] Crashes and tests failures again with 0.10.4 Message-ID: <1fbe3d780292747c4ab59b7c9393d357@ruby-forum.com> In the beginning 0.10.4 looked promising, but now that my index has grown to > 100 MB I'm getting segfaults on some searches again: >> Post.find_by_contents('rubyforum') # ok >> Post.find_by_contents('ruby-forum') /usr/local/lib/ruby/gems/1.8/gems/ferret-0.10.4/lib/ferret/index.rb:351: [BUG] Segmentation fault ruby 1.8.4 (2005-12-24) [i686-linux] The tests run fine on Linux, on OS X testall crashes with a segfault on test_index and several of the Ruby tests fail: 1) Failure: test_sorts(SearchAndSortTest) [./test/unit/../unit/analysis/../../unit/index/../../unit/query_parser/../../unit/search/tc_search_and_sort.rb:40:in `do_test_top_docs' ./test/unit/../unit/analysis/../../unit/index/../../unit/query_parser/../../unit/search/tc_search_and_sort.rb:39:in `do_test_top_docs' ./test/unit/../unit/analysis/../../unit/index/../../unit/query_parser/../../unit/search/tc_search_and_sort.rb:113:in `test_sorts']: <8> expected but was <1>. 2) Failure: test_boolean_query(SearcherTest) [./test/unit/../unit/analysis/../../unit/index/../../unit/query_parser/../../unit/search/tc_index_searcher.rb:39:in `check_hits' ./test/unit/../unit/analysis/../../unit/index/../../unit/query_parser/../../unit/search/tm_searcher.rb:90:in `test_boolean_query']: <14> expected but was <2>. 3) Failure: test_boolean_query(SimpleMultiSearcherTest) [./test/unit/../unit/analysis/../../unit/index/../../unit/query_parser/../../unit/search/tc_index_searcher.rb:39:in `check_hits' ./test/unit/../unit/analysis/../../unit/index/../../unit/query_parser/../../unit/search/tm_searcher.rb:90:in `test_boolean_query']: <14> expected but was <2>. 149 tests, 11066 assertions, 3 failures, 0 errors -- Posted via http://www.ruby-forum.com/. From siohan at watson.ibm.com Fri Sep 15 14:39:42 2006 From: siohan at watson.ibm.com (Olivier Siohan) Date: Fri, 15 Sep 2006 14:39:42 -0400 Subject: [Ferret-talk] Custom analyzer not invoked? Message-ID: <200609151439.42397.siohan@watson.ibm.com> Hello, I'm trying to define my own analyzer by doing something like: #----------------------------------------------------- require 'ferret' include Ferret class MyAnalyzer < Analysis::Analyzer def token_stream(field, str) # Display results of analysis puts 'Analyzing: field:%s str:%s' % [field, str] t = Analysis::LowerCaseFilter.new(Analysis::StandardTokenizer.new(str)) while true n = t.next() break if n == nil puts n.to_s end return Analysis::LowerCaseFilter.new(Analysis::StandardTokenizer.new(str)) end end puts '== Adding document to index...' index = Index::Index.new(:analyzer => MyAnalyzer.new()) index << { :content => "The quick brown fox" } index << { :content => "The cow jumps over the moon" } puts '== Searching Brown...' index.search_each('content:Brown') do |doc, score| puts "Document #{doc} found with a score of #{score}" end puts '== Searching Foo...' index.search_each('content:Foo') do |doc, score| puts "Document #{doc} found with a score of #{score}" end puts '== Searching Brown...' index.search_each('content:Brown') do |doc, score| puts "Document #{doc} found with a score of #{score}" end puts '== Searching Cow...' index.search_each('content:Cow') do |doc, score| puts "Document #{doc} found with a score of #{score}" end #----------------------------------------------------- The output is: == Adding document to index... Analyzing: field:content str: Analyzing: field:content str: == Searching Brown... Analyzing: field:content str:Brown token["brown":0:5:1] Document 0 found with a score of 0.5 == Searching Foo... == Searching Brown... Document 0 found with a score of 0.5 == Searching Cow... Document 1 found with a score of 0.375 The result is correct, i.e. documents are retrieved as expected. However, I don't understand why I don't see my 'Analyzing...' comment with the corresponding string being analyzed, except when searching for 'Brown', and why I'm getting an empty string in 'Analyzing: field:content str:' when the 2 documents are pushed into the index. Any explanations? I appologize if this is a trivial issue; I'm quite new to Ferret/Lucene. I use ferret-0.10.4 under linux. Many thanks. -- Olivier I'm trying to understand From sebastien.hugues at gmail.com Sat Sep 16 06:53:48 2006 From: sebastien.hugues at gmail.com (Sebastien Hugues) Date: Sat, 16 Sep 2006 12:53:48 +0200 Subject: [Ferret-talk] nfs shared and ferret segfault Message-ID: Hi, I use ferret 0.10.4 whith shared index over NFS directory. There are 2 applications servers. The web server is Mongrel 0.3.13.3 and mongrel_cluster 0.2.0. There are 20 Mongrel processes on each server. Each time my application update a model, Mongrel process stops running with this errro in its log: /usr/lib/ruby/gems/1.8/gems/ferret-0.10.4/lib/ferret/index.rb:663: [BUG] Segmentation fault Theses servers are serving a lot of requests. Is it good to have shared NFS index ? Is there a lock problem ? This is very urgent. 10000 thanks for any help. Regards Seb From kraemer at webit.de Sun Sep 17 07:07:33 2006 From: kraemer at webit.de (Jens Kraemer) Date: Sun, 17 Sep 2006 13:07:33 +0200 Subject: [Ferret-talk] nfs shared and ferret segfault In-Reply-To: References: Message-ID: <20060917110733.GA15253@cordoba.webit.de> Hi! On Sat, Sep 16, 2006 at 12:53:48PM +0200, Sebastien Hugues wrote: > Hi, > > I use ferret 0.10.4 whith shared index over NFS directory. > There are 2 applications servers. The web server is Mongrel 0.3.13.3 > and mongrel_cluster 0.2.0. There are 20 Mongrel processes on each > server. > > Each time my application update a model, Mongrel process > stops running with this errro in its log: > > /usr/lib/ruby/gems/1.8/gems/ferret-0.10.4/lib/ferret/index.rb:663: > [BUG] Segmentation fault > > Theses servers are serving a lot of requests. Is it good to have > shared NFS index ? Is there a lock problem ? Possible that there are locking problems. We had this question come up some days ago on the list, hope Dave can shed some light on this. I'm working on remote indexing capabilities for acts_as_ferret right now. Atm I have a little drb server that does the indexing/searching, and an experimental branch of acts_as_ferret that can talk to this server. So you'd only have one index and both machines talking to this. Drop me line if you're interested in a solution like this. Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From erik at ehatchersolutions.com Sun Sep 17 12:12:28 2006 From: erik at ehatchersolutions.com (Erik Hatcher) Date: Sun, 17 Sep 2006 12:12:28 -0400 Subject: [Ferret-talk] nfs shared and ferret segfault In-Reply-To: <20060917110733.GA15253@cordoba.webit.de> References: <20060917110733.GA15253@cordoba.webit.de> Message-ID: <02690AB3-9B37-4346-A197-EA3E50F62FB7@ehatchersolutions.com> FYI, it is a long time known issue between Java Lucene and NFS mounted file systems. There is actually quite a bit of active work going on in this space in the Java Lucene community. Erik On Sep 17, 2006, at 7:07 AM, Jens Kraemer wrote: > Hi! > On Sat, Sep 16, 2006 at 12:53:48PM +0200, Sebastien Hugues wrote: >> Hi, >> >> I use ferret 0.10.4 whith shared index over NFS directory. >> There are 2 applications servers. The web server is Mongrel 0.3.13.3 >> and mongrel_cluster 0.2.0. There are 20 Mongrel processes on each >> server. >> >> Each time my application update a model, Mongrel process >> stops running with this errro in its log: >> >> /usr/lib/ruby/gems/1.8/gems/ferret-0.10.4/lib/ferret/index.rb:663: >> [BUG] Segmentation fault >> >> Theses servers are serving a lot of requests. Is it good to have >> shared NFS index ? Is there a lock problem ? > > Possible that there are locking problems. We had this question come up > some days ago on the list, hope Dave can shed some light on this. > > I'm working on remote indexing capabilities for acts_as_ferret right > now. Atm I have a little drb server that does the indexing/searching, > and an experimental branch of acts_as_ferret that can talk to this > server. > So you'd only have one index and both machines talking to this. > Drop me > line if you're interested in a solution like this. > > > Jens > > > > -- > webit! Gesellschaft f?r neue Medien mbH www.webit.de > Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de > Schnorrstra?e 76 Tel +49 351 46766 0 > D-01069 Dresden Fax +49 351 46766 66 > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk From frankfan at 163.com Sun Sep 17 22:37:57 2006 From: frankfan at 163.com (Frank) Date: Mon, 18 Sep 2006 04:37:57 +0200 Subject: [Ferret-talk] indexing multiple languages with acts_as_ferret In-Reply-To: <44146.212.227.62.4.1156233055.squirrel@orkland.homeunix.org> References: <846f30c70608211454g794e480fsd18af7f8666418a5@mail.gmail.com> <44146.212.227.62.4.1156233055.squirrel@orkland.homeunix.org> Message-ID: <3ab6ebc76a73991432f61eadd15106d2@ruby-forum.com> > > hi.. > > i'm using ferret (not acts_as_ferret, but this shouldn't matter) to > index > content in german, english, polish, japanese, chinese, french .. all in > UTF8 and i don't had any problem with it yet :-) (using ferret 0.9.4 and > 0.9.5) > > Ben Hi,Ben Have u modified any code of ferret? I have also used ferret to index CJK(Chinese,Korea,Japanese) languages,all of which are encoded in utf-8,but i can not get them searched correctly Frank -- Posted via http://www.ruby-forum.com/. From sid137 at gmail.com Mon Sep 18 00:32:27 2006 From: sid137 at gmail.com (Sidney Burks) Date: Mon, 18 Sep 2006 06:32:27 +0200 Subject: [Ferret-talk] Automatic reindexing of associated columns acts_as_taggable Message-ID: Hi, So i'm trying to use acts_as_taggable with the acts_as_ferret plugin, where I have Post.rb model, which has a method tag_list made available through acts_as_taggable, as returns a string of associated tag words from the tags table (tag.rb). I've set up my Post.rb model in the following way. class Post < ActiveRecord::Base acts_as_taggable acts_as_ferret :fields => ["title", "description", :tag_list] ... end I'm noticing that when I submit a new post, the titles and descriptions are automatically indexed, and are searchable with ferret.. However, the tags do not show up in the results when I do a tag search. However, once I use script console and try to manually rebuild the Post index: Post.rebuild_index(Post) (is this syntax even correct? it returns a result of false) the tags immediately start to appear in the tag search results. So is there a way to set ferret to automatically rebuild the tag index when a Post is saved? I imagine it probably shouldn't rebuild the entire index for all posts every time a post is saved, as that might slow things up eventually, right? So what would be a good solution..? Thanks, -Sidney -- Posted via http://www.ruby-forum.com/. From f at andreas-s.net Mon Sep 18 03:09:07 2006 From: f at andreas-s.net (Andreas Schwarz) Date: Mon, 18 Sep 2006 09:09:07 +0200 Subject: [Ferret-talk] nfs shared and ferret segfault In-Reply-To: <20060917110733.GA15253@cordoba.webit.de> References: <20060917110733.GA15253@cordoba.webit.de> Message-ID: <088d822c66f6fa2fd72df75cbbfa5e1d@ruby-forum.com> Jens Kraemer wrote: > Hi! > On Sat, Sep 16, 2006 at 12:53:48PM +0200, Sebastien Hugues wrote: >> /usr/lib/ruby/gems/1.8/gems/ferret-0.10.4/lib/ferret/index.rb:663: >> [BUG] Segmentation fault >> >> Theses servers are serving a lot of requests. Is it good to have >> shared NFS index ? Is there a lock problem ? > > Possible that there are locking problems. We had this question come up > some days ago on the list, hope Dave can shed some light on this. > > I'm working on remote indexing capabilities for acts_as_ferret right > now. Atm I have a little drb server that does the indexing/searching, > and an experimental branch of acts_as_ferret that can talk to this > server. > So you'd only have one index and both machines talking to this. Drop me > line if you're interested in a solution like this. An alternative and IMO better solution is to disable indexing in the Rails app and use an external process that periodically reindexes changed (updated_at > last_update) records. This has the additional advantage that the index update can't block or crash the Rails app servers. -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Mon Sep 18 05:50:50 2006 From: kraemer at webit.de (Jens Kraemer) Date: Mon, 18 Sep 2006 11:50:50 +0200 Subject: [Ferret-talk] nfs shared and ferret segfault In-Reply-To: <088d822c66f6fa2fd72df75cbbfa5e1d@ruby-forum.com> References: <20060917110733.GA15253@cordoba.webit.de> <088d822c66f6fa2fd72df75cbbfa5e1d@ruby-forum.com> Message-ID: <20060918095050.GI9050@cordoba.webit.de> On Mon, Sep 18, 2006 at 09:09:07AM +0200, Andreas Schwarz wrote: > Jens Kraemer wrote: > > Hi! > > On Sat, Sep 16, 2006 at 12:53:48PM +0200, Sebastien Hugues wrote: > >> /usr/lib/ruby/gems/1.8/gems/ferret-0.10.4/lib/ferret/index.rb:663: > >> [BUG] Segmentation fault > >> > >> Theses servers are serving a lot of requests. Is it good to have > >> shared NFS index ? Is there a lock problem ? > > > > Possible that there are locking problems. We had this question come up > > some days ago on the list, hope Dave can shed some light on this. > > > > I'm working on remote indexing capabilities for acts_as_ferret right > > now. Atm I have a little drb server that does the indexing/searching, > > and an experimental branch of acts_as_ferret that can talk to this > > server. > > So you'd only have one index and both machines talking to this. Drop me > > line if you're interested in a solution like this. > > An alternative and IMO better solution is to disable indexing in the > Rails app and use an external process that periodically reindexes > changed (updated_at > last_update) records. This has the additional > advantage that the index update can't block or crash the Rails app > servers. Right, depending the exact application this may be a better solution. Plus you'd get better performance with batch-indexing new/updated records. but, if an index update crashes your rails server, it'll most likely crash your background indexer, too - you're just moving the problem away from the surface. Of course, having a stale index but the rest of the app working is better than having the whole app crashed ;-) In addition, the index won't be guaranteed to be always up to date, and deleted records will have to be handled, maybe by flagging them as deleted instead of really deleting them. Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From kraemer at webit.de Mon Sep 18 07:39:49 2006 From: kraemer at webit.de (Jens Kraemer) Date: Mon, 18 Sep 2006 13:39:49 +0200 Subject: [Ferret-talk] Automatic reindexing of associated columns acts_as_taggable In-Reply-To: References: Message-ID: <20060918113949.GK9050@cordoba.webit.de> Hi! On Mon, Sep 18, 2006 at 06:32:27AM +0200, Sidney Burks wrote: > Hi, > > So i'm trying to use acts_as_taggable with the acts_as_ferret plugin, > where I have Post.rb model, which has a method tag_list made available > through acts_as_taggable, as returns a string of associated tag words > from the tags table (tag.rb). I've set up my Post.rb model in the > following way. > > class Post < ActiveRecord::Base > acts_as_taggable > acts_as_ferret :fields => ["title", "description", :tag_list] > > ... > > end > > I'm noticing that when I submit a new post, the titles and descriptions > are automatically indexed, and are searchable with ferret.. However, > the tags do not show up in the results when I do a tag search. However, > once I use script console and try to manually rebuild the Post index: > > Post.rebuild_index(Post) > > (is this syntax even correct? it returns a result of false) you can omit the parameter in this case, as Post is the class you call rebuild_index upon. But doesn't hurt either. > the tags immediately start to appear in the tag search results. So is > there a way to set ferret to automatically rebuild the tag index when a > Post is saved? I imagine it probably shouldn't rebuild the entire index > for all posts every time a post is saved, as that might slow things up > eventually, right? So what would be a good solution..? > I don't remember right now, how acts_as_taggable works, but it seems the tags are set after the post is indexed. Calling post.ferret_update after applying the tags to the post object should reindex the object with tags included (this should work in 0.2.x versions of aaf, too). If you are concerned about performance, and use aaf 0.3, you can also suppress the first indexing of the non-tagged post: post = Post.new(...) post.disable_ferret post.save # set tags here ... post.ferret_update or even more elegant: post.disable_ferret(:index_when_finished) do post.save # set tags here end which calls ferret_update after executing the block. Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From epugh at opensourceconnections.com Mon Sep 18 08:31:00 2006 From: epugh at opensourceconnections.com (Eric Pugh) Date: Mon, 18 Sep 2006 13:31:00 +0100 Subject: [Ferret-talk] Automatic reindexing of associated columns acts_as_taggable In-Reply-To: <20060918113949.GK9050@cordoba.webit.de> References: <20060918113949.GK9050@cordoba.webit.de> Message-ID: <8F2A2D24-96F4-46A0-9F47-1B21653DD3F6@opensourceconnections.com> I was noticing a similar problem as well.. This looks like the solution for me! thanks... Eric On Sep 18, 2006, at 12:39 PM, Jens Kraemer wrote: > Hi! > > On Mon, Sep 18, 2006 at 06:32:27AM +0200, Sidney Burks wrote: >> Hi, >> >> So i'm trying to use acts_as_taggable with the acts_as_ferret >> plugin, >> where I have Post.rb model, which has a method tag_list made >> available >> through acts_as_taggable, as returns a string of associated tag words >> from the tags table (tag.rb). I've set up my Post.rb model in the >> following way. >> >> class Post < ActiveRecord::Base >> acts_as_taggable >> acts_as_ferret :fields => ["title", "description", :tag_list] >> >> ... >> >> end >> >> I'm noticing that when I submit a new post, the titles and >> descriptions >> are automatically indexed, and are searchable with ferret.. However, >> the tags do not show up in the results when I do a tag search. >> However, >> once I use script console and try to manually rebuild the Post index: >> >> Post.rebuild_index(Post) >> >> (is this syntax even correct? it returns a result of false) > > you can omit the parameter in this case, as Post is the class you call > rebuild_index upon. But doesn't hurt either. > >> the tags immediately start to appear in the tag search results. >> So is >> there a way to set ferret to automatically rebuild the tag index >> when a >> Post is saved? I imagine it probably shouldn't rebuild the entire >> index >> for all posts every time a post is saved, as that might slow >> things up >> eventually, right? So what would be a good solution..? >> > > I don't remember right now, how acts_as_taggable works, but it > seems the tags are set after the post is indexed. Calling > post.ferret_update after applying the tags to the post object should > reindex the object with tags included (this should work in 0.2.x > versions of aaf, too). If you are concerned about performance, and > use aaf 0.3, > you can also suppress the first indexing of the non-tagged post: > > post = Post.new(...) > post.disable_ferret > post.save > # set tags here ... > post.ferret_update > > or even more elegant: > > post.disable_ferret(:index_when_finished) do > post.save > # set tags here > end > > which calls ferret_update after executing the block. > > Jens > > > -- > webit! Gesellschaft f?r neue Medien mbH www.webit.de > Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de > Schnorrstra?e 76 Tel +49 351 46766 0 > D-01069 Dresden Fax +49 351 46766 66 > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk From david.sheldon at torchbox.com Mon Sep 18 11:28:45 2006 From: david.sheldon at torchbox.com (David Sheldon) Date: Mon, 18 Sep 2006 17:28:45 +0200 Subject: [Ferret-talk] Dynamic fields and AAF Message-ID: Hi, I have a model which has properties, these are your standard name/value pairs, but also have attributes that affect how I want to store them in ferret. I was using 0.9.5 with 0.2 of aaf, which seemed fine, I just copied and pasted (yes, I know, ick) the to_doc method and added code to iterate though the properties that that model had, and add relavent fields to the document. It seems that this will be a bit harder now with the FieldInfos. Has anyone else done this, and is there a recognised way of doing it? David -- Posted via http://www.ruby-forum.com/. From angrypirana at googlemail.com Mon Sep 18 12:06:00 2006 From: angrypirana at googlemail.com (Richard) Date: Mon, 18 Sep 2006 18:06:00 +0200 Subject: [Ferret-talk] Using the wildcard plus partial searched Message-ID: <2426e1f71ec93c48e175f489fdb7d9be@ruby-forum.com> I've seen no definitive answer to this question. I've noticed that typing "t" in the search box will return no results however if I type "t*" it brings up all results beginning with t. I would like this behaviour on by default without having to type the wildcard. Is there a way to do this? -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Mon Sep 18 12:59:17 2006 From: kraemer at webit.de (Jens Kraemer) Date: Mon, 18 Sep 2006 18:59:17 +0200 Subject: [Ferret-talk] Dynamic fields and AAF In-Reply-To: References: Message-ID: <20060918165917.GH31018@cordoba.webit.de> On Mon, Sep 18, 2006 at 05:28:45PM +0200, David Sheldon wrote: > Hi, > > I have a model which has properties, these are your standard name/value > pairs, but also have attributes that affect how I want to store them in > ferret. I was using 0.9.5 with 0.2 of aaf, which seemed fine, I just > copied and pasted (yes, I know, ick) the to_doc method and added code to > iterate though the properties that that model had, and add relavent > fields to the document. instead copy'n paste you could just call super: def to_doc doc = super # custom code here doc end > It seems that this will be a bit harder now with the FieldInfos. Has > anyone else done this, and is there a recognised way of doing it? imho adding arbitrary fields should work, you just can't specify any special per-field storage/indexing options, since the defaults determined at index creation will be used. With aaf this means :store => :no, :index => :tokenize changing the characteristics of a field for a special document doesn't seem to be possible any more. Was that what you did until now, i.e. tokenize or store a field's value sometimes, and sometimes not ? Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From kraemer at webit.de Mon Sep 18 13:02:28 2006 From: kraemer at webit.de (Jens Kraemer) Date: Mon, 18 Sep 2006 19:02:28 +0200 Subject: [Ferret-talk] Using the wildcard plus partial searched In-Reply-To: <2426e1f71ec93c48e175f489fdb7d9be@ruby-forum.com> References: <2426e1f71ec93c48e175f489fdb7d9be@ruby-forum.com> Message-ID: <20060918170228.GI31018@cordoba.webit.de> On Mon, Sep 18, 2006 at 06:06:00PM +0200, Richard wrote: > I've seen no definitive answer to this question. > > I've noticed that typing "t" in the search box will return no results > however if I type "t*" it brings up all results beginning with t. > > I would like this behaviour on by default without having to type the > wildcard. Is there a way to do this? Manually append a wild card to the query terms before giving it to the parser. I once did something like this for a customer. The problem is that this can get complex with more complex queries, i.e. you won't want to append a '*' to the word AND, since that's part of the query language. Would be better to have a special query parser for this, or even an option in the stock QueryParser to force it to use wild card queries all the time. Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From mleung at projectrideme.com Mon Sep 18 14:45:33 2006 From: mleung at projectrideme.com (Michael Leung) Date: Mon, 18 Sep 2006 20:45:33 +0200 Subject: [Ferret-talk] Sorting Boolean fields Message-ID: <457d1fc188b027b3863b34e24d137d92@ruby-forum.com> Hey there, I came across this thread: http://www.ruby-forum.com/topic/78148#126235 where sorting in reverse by a boolean field wasn't working properly. I'm having the same issue now, but there's not a reply in that thread. I was wondering if anyone has figured this out yet? Bascially, I want all the records with a "1" (true) to float to the top of the results. Thanks! -- Posted via http://www.ruby-forum.com/. From dbalmain.ml at gmail.com Mon Sep 18 19:39:35 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Tue, 19 Sep 2006 08:39:35 +0900 Subject: [Ferret-talk] Sorting Boolean fields In-Reply-To: <457d1fc188b027b3863b34e24d137d92@ruby-forum.com> References: <457d1fc188b027b3863b34e24d137d92@ruby-forum.com> Message-ID: On 9/19/06, Michael Leung wrote: > Hey there, > > I came across this thread: http://www.ruby-forum.com/topic/78148#126235 > where sorting in reverse by a boolean field wasn't working properly. I'm > having the same issue now, but there's not a reply in that thread. I was > wondering if anyone has figured this out yet? > > Bascially, I want all the records with a "1" (true) to float to the top > of the results. > > Thanks! Hi Michael, See the documentation for Ferret::Search::Sort and Ferret::Search::SortField: http://ferret.davebalmain.com/api/ Alternatively you can just pass a sort string like this: hits = index.search(query, :sort => "boolean_field DESC, title") Cheers, Dave From dbalmain.ml at gmail.com Mon Sep 18 20:16:20 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Tue, 19 Sep 2006 09:16:20 +0900 Subject: [Ferret-talk] Trouble with "updating" a document In-Reply-To: <126EC586577FD611A28E00A0C9A03758B5BD34@maui.bmsoft.com.au> References: <126EC586577FD611A28E00A0C9A03758B5BD34@maui.bmsoft.com.au> Message-ID: On 9/15/06, Neville Burnell wrote: > BTW, I just ran the same script [ie, without #load] with Ferret 0.9.3 > and it worked "correctly", ie, "fred" is found as I expect. > > -----Original Message----- > From: ferret-talk-bounces at rubyforge.org > [mailto:ferret-talk-bounces at rubyforge.org] On Behalf Of Neville Burnell > Sent: Friday, 15 September 2006 3:08 PM > To: ferret-talk at rubyforge.org > Subject: [Ferret-talk] Trouble with "updating" a document > > Hi, > > I seem to be having trouble updating a doc, ie, deleting then re-adding > to the index. > > The following script demonstrates my issue - I'm sure I'm missing > something obvious, but I can't seem to find the problem. Can someone > point out where I am going wrong please ? > > Regards > > Neville > > === > require 'rubygems' > require 'ferret' > > p Ferret::VERSION > > @dir = Ferret::Store::RAMDirectory.new > > @index = Ferret::Index::Index.new(:dir => @dir) > > (1..1000).each do |n| > @index << {:id => "doc#{n}", :name => "name #{n}"} > end > > @doc_999 = @index["doc999"] > @doc_999.load if @doc_999 > p "doc_999 not found" unless @doc_999 > p "doc_999 name=#{@doc_999[:name]}" if @doc_999 > > @new_doc = {:id => "doc999", :name => "fred"} > > p "deleting and adding new doc999" > @index.delete("doc999") > @index << @new_doc > > @doc_999 = @index["doc999"] > @doc_999.load if @doc_999 > p "new_doc_999 not found" unless @doc_999 > p "new_doc_999 name=#{@doc_999[:name]}" if @doc_999 > > @index.close > @index = nil > > === > > $> ruby test_delete2.rb > "0.10.4" > "doc_999 name=name 999" > "deleting and adding new doc999" > "new_doc_999 not found" Hi Neville, Thanks for letting me know about this. It has been fixed in the current version. I'll get a new gem out soon. Cheers, Dave From dbalmain.ml at gmail.com Mon Sep 18 20:26:22 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Tue, 19 Sep 2006 09:26:22 +0900 Subject: [Ferret-talk] Crashes and tests failures again with 0.10.4 In-Reply-To: <1fbe3d780292747c4ab59b7c9393d357@ruby-forum.com> References: <1fbe3d780292747c4ab59b7c9393d357@ruby-forum.com> Message-ID: On 9/15/06, Andreas Schwarz wrote: > In the beginning 0.10.4 looked promising, but now that my index has > grown to > 100 MB I'm getting segfaults on some searches again: > > >> Post.find_by_contents('rubyforum') > # ok > >> Post.find_by_contents('ruby-forum') > /usr/local/lib/ruby/gems/1.8/gems/ferret-0.10.4/lib/ferret/index.rb:351: > [BUG] Segmentation fault > ruby 1.8.4 (2005-12-24) [i686-linux] > > The tests run fine on Linux, on OS X testall crashes with a segfault on > test_index and several of the Ruby tests fail: > > 1) Failure: > test_sorts(SearchAndSortTest) > [./test/unit/../unit/analysis/../../unit/index/../../unit/query_parser/../../unit/search/tc_search_and_sort.rb:40:in > `do_test_top_docs' > ./test/unit/../unit/analysis/../../unit/index/../../unit/query_parser/../../unit/search/tc_search_and_sort.rb:39:in > `do_test_top_docs' > ./test/unit/../unit/analysis/../../unit/index/../../unit/query_parser/../../unit/search/tc_search_and_sort.rb:113:in > `test_sorts']: > <8> expected but was > <1>. > > 2) Failure: > test_boolean_query(SearcherTest) > [./test/unit/../unit/analysis/../../unit/index/../../unit/query_parser/../../unit/search/tc_index_searcher.rb:39:in > `check_hits' > ./test/unit/../unit/analysis/../../unit/index/../../unit/query_parser/../../unit/search/tm_searcher.rb:90:in > `test_boolean_query']: > <14> expected but was > <2>. > > 3) Failure: > test_boolean_query(SimpleMultiSearcherTest) > [./test/unit/../unit/analysis/../../unit/index/../../unit/query_parser/../../unit/search/tc_index_searcher.rb:39:in > `check_hits' > ./test/unit/../unit/analysis/../../unit/index/../../unit/query_parser/../../unit/search/tm_searcher.rb:90:in > `test_boolean_query']: > <14> expected but was > <2>. > > 149 tests, 11066 assertions, 3 failures, 0 errors Hi Andreas, Sorry for the slow reply. Do you think you would be able to run this through gdb? $ gdb ruby (gdb) run yourscript.rb (gdb) backtrace Cheers, Dave From dbalmain.ml at gmail.com Mon Sep 18 20:35:23 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Tue, 19 Sep 2006 09:35:23 +0900 Subject: [Ferret-talk] Custom analyzer not invoked? In-Reply-To: <200609151439.42397.siohan@watson.ibm.com> References: <200609151439.42397.siohan@watson.ibm.com> Message-ID: On 9/16/06, Olivier Siohan wrote: > Hello, > > I'm trying to define my own analyzer by doing something like: > > #----------------------------------------------------- > require 'ferret' > include Ferret > > class MyAnalyzer < Analysis::Analyzer > def token_stream(field, str) > > # Display results of analysis > puts 'Analyzing: field:%s str:%s' % [field, str] > t = > Analysis::LowerCaseFilter.new(Analysis::StandardTokenizer.new(str)) > while true > n = t.next() > break if n == nil > puts n.to_s > end > > return > Analysis::LowerCaseFilter.new(Analysis::StandardTokenizer.new(str)) > end > end > > > puts '== Adding document to index...' > index = Index::Index.new(:analyzer => MyAnalyzer.new()) > index << { :content => "The quick brown fox" } > index << { :content => "The cow jumps over the moon" } > > puts '== Searching Brown...' > index.search_each('content:Brown') do |doc, score| > puts "Document #{doc} found with a score of #{score}" > end > > puts '== Searching Foo...' > index.search_each('content:Foo') do |doc, score| > puts "Document #{doc} found with a score of #{score}" > end > > puts '== Searching Brown...' > index.search_each('content:Brown') do |doc, score| > puts "Document #{doc} found with a score of #{score}" > end > > puts '== Searching Cow...' > index.search_each('content:Cow') do |doc, score| > puts "Document #{doc} found with a score of #{score}" > end > #----------------------------------------------------- > > The output is: > == Adding document to index... > Analyzing: field:content str: > Analyzing: field:content str: > == Searching Brown... > Analyzing: field:content str:Brown > token["brown":0:5:1] > Document 0 found with a score of 0.5 > == Searching Foo... > == Searching Brown... > Document 0 found with a score of 0.5 > == Searching Cow... > Document 1 found with a score of 0.375 > > The result is correct, i.e. documents are retrieved as expected. > However, I don't understand why I don't see my 'Analyzing...' comment > with the corresponding string being analyzed, except when searching > for 'Brown', and why I'm getting an empty string in 'Analyzing: > field:content str:' when the 2 documents are pushed into the index. > > Any explanations? I appologize if this is a trivial issue; I'm quite > new to Ferret/Lucene. I use ferret-0.10.4 under linux. > > Many thanks. > > -- Olivier Hi Olivier, This is a bug I came across recently. It's fixed in the the working version. However, if you need it to work right away, take out the inheritence from Analysis::Analyzer. It makes Ferret think you are passing a C implemented Analyzer. The next gem will be out soon. Cheers, Dave From dbalmain.ml at gmail.com Mon Sep 18 20:41:05 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Tue, 19 Sep 2006 09:41:05 +0900 Subject: [Ferret-talk] indexing multiple languages with acts_as_ferret In-Reply-To: <3ab6ebc76a73991432f61eadd15106d2@ruby-forum.com> References: <846f30c70608211454g794e480fsd18af7f8666418a5@mail.gmail.com> <44146.212.227.62.4.1156233055.squirrel@orkland.homeunix.org> <3ab6ebc76a73991432f61eadd15106d2@ruby-forum.com> Message-ID: On 9/18/06, Frank wrote: > > > > hi.. > > > > i'm using ferret (not acts_as_ferret, but this shouldn't matter) to > > index > > content in german, english, polish, japanese, chinese, french .. all in > > UTF8 and i don't had any problem with it yet :-) (using ferret 0.9.4 and > > 0.9.5) > > > > Ben > > Hi,Ben > Have u modified any code of ferret? I have also used ferret to index > CJK(Chinese,Korea,Japanese) languages,all of which are encoded in > utf-8,but i can not get them searched correctly > > > > Frank Hi Frank, Someone else had this problem earlier. I think the Chinese charecters were being escaped by the browser. Are you running your searches through a browser? If so, you may need to call CGI.unescape on the query string. At any rate, the first thing I would check is the actual query string that you are passing to Ferret. Make sure it looks like you would expect it to and it really is UTF-8, not some other chinese character encoding. cheers, Dave From dbalmain.ml at gmail.com Mon Sep 18 20:50:25 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Tue, 19 Sep 2006 09:50:25 +0900 Subject: [Ferret-talk] Crashes and tests failures again with 0.10.4 In-Reply-To: References: <1fbe3d780292747c4ab59b7c9393d357@ruby-forum.com> Message-ID: One other question for you. What version of gcc do you have? From sid137 at gmail.com Tue Sep 19 00:56:03 2006 From: sid137 at gmail.com (sidney) Date: Tue, 19 Sep 2006 06:56:03 +0200 Subject: [Ferret-talk] Automatic reindexing of associated columns acts_as_tagga In-Reply-To: <20060918113949.GK9050@cordoba.webit.de> References: <20060918113949.GK9050@cordoba.webit.de> Message-ID: <4906cc76aeb119c822d629ee51f8fbbf@ruby-forum.com> Nice, thats awesome.. thanks alot! -- Posted via http://www.ruby-forum.com/. From david.sheldon at torchbox.com Tue Sep 19 02:50:29 2006 From: david.sheldon at torchbox.com (David Sheldon) Date: Tue, 19 Sep 2006 08:50:29 +0200 Subject: [Ferret-talk] Dynamic fields and AAF In-Reply-To: <20060918165917.GH31018@cordoba.webit.de> References: <20060918165917.GH31018@cordoba.webit.de> Message-ID: Jens Kraemer wrote: > instead copy'n paste you could just call super: > > def to_doc > doc = super > # custom code here > doc > end Ah, I had missed out on that, I don't really understand how super works in ruby. I had been trying to rename the method and create a new one aliased to it which didn't work. I'm still a bit confused as to_doc is created by the mixin as an instance method, is there still a superclass version? Anyway thanks for that tip, I'll try it. > changing the characteristics of a field for a special document doesn't > seem to be possible any more. Was that what you did until now, i.e. > tokenize or store a field's value sometimes, and sometimes not ? Yes. Some are strings (tokenize), some are integers (dont tokenize, ideally use a different analyser), and some are choices from lists (either untokenized String or treat as integer index of choice). Dates are treated as integers, and we may want to include some strings in the DB so they can be displayed in the search results. David -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Tue Sep 19 04:04:52 2006 From: kraemer at webit.de (Jens Kraemer) Date: Tue, 19 Sep 2006 10:04:52 +0200 Subject: [Ferret-talk] Dynamic fields and AAF In-Reply-To: References: <20060918165917.GH31018@cordoba.webit.de> Message-ID: <20060919080452.GJ31018@cordoba.webit.de> On Tue, Sep 19, 2006 at 08:50:29AM +0200, David Sheldon wrote: > Jens Kraemer wrote: > > instead copy'n paste you could just call super: > > > > def to_doc > > doc = super > > # custom code here > > doc > > end > > Ah, I had missed out on that, I don't really understand how super works > in ruby. I had been trying to rename the method and create a new one > aliased to it which didn't work. I'm still a bit confused as to_doc is > created by the mixin as an instance method, is there still a superclass > version? Anyway thanks for that tip, I'll try it. ah, good point. But this should still work if you do the override after calling acts_as_ferret. > > changing the characteristics of a field for a special document doesn't > > seem to be possible any more. Was that what you did until now, i.e. > > tokenize or store a field's value sometimes, and sometimes not ? > > Yes. Some are strings (tokenize), some are integers (dont tokenize, > ideally use a different analyser), and some are choices from lists > (either untokenized String or treat as integer index of choice). Dates > are treated as integers, and we may want to include some strings in the > DB so they can be displayed in the search results. difficult. you could declare one field per type of data (in terms of indexed/stored) you possibly run into, and in your to_doc then decide which data has to go into which field. doesn't sound really nice to mee, but might work. For searching you would then always have to search all these fields, of yourse. Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From dbalmain.ml at gmail.com Tue Sep 19 04:10:23 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Tue, 19 Sep 2006 17:10:23 +0900 Subject: [Ferret-talk] Dynamic fields and AAF In-Reply-To: References: <20060918165917.GH31018@cordoba.webit.de> Message-ID: On 9/19/06, David Sheldon wrote: > Jens Kraemer wrote: > > changing the characteristics of a field for a special document doesn't > > seem to be possible any more. Was that what you did until now, i.e. > > tokenize or store a field's value sometimes, and sometimes not ? > > Yes. Some are strings (tokenize), some are integers (dont tokenize, > ideally use a different analyser), and some are choices from lists > (either untokenized String or treat as integer index of choice). Dates > are treated as integers, and we may want to include some strings in the > DB so they can be displayed in the search results. > > David Hi David, Is there any reason you need them all to be in the same field? Or am I misunderstanding you? You do realize that different fields can have different properties right? Cheers, Dave From f at andreas-s.net Tue Sep 19 06:02:36 2006 From: f at andreas-s.net (Andreas Schwarz) Date: Tue, 19 Sep 2006 12:02:36 +0200 Subject: [Ferret-talk] Crashes and tests failures again with 0.10.4 In-Reply-To: References: <1fbe3d780292747c4ab59b7c9393d357@ruby-forum.com> Message-ID: <46ec9a5ce6eeabe5e45ee22824c49521@ruby-forum.com> First a testall run on OS X (gcc version 4.0.0 20041026 (Apple Computer, Inc. build 4061)): http://andreas-s.net/gdb-testall-segfault-osx.txt My crashed search on Linux (gcc version 3.3.5 (Debian 1:3.3.5-13)): http://andreas-s.net/gdb-mysearch-segfault.txt -- Posted via http://www.ruby-forum.com/. From david.sheldon at torchbox.com Tue Sep 19 06:59:32 2006 From: david.sheldon at torchbox.com (David Sheldon) Date: Tue, 19 Sep 2006 12:59:32 +0200 Subject: [Ferret-talk] Dynamic fields and AAF In-Reply-To: References: <20060918165917.GH31018@cordoba.webit.de> Message-ID: David Balmain wrote: > Is there any reason you need them all to be in the same field? Or am I > misunderstanding you? You do realize that different fields can have > different properties right? Yes, I want them all in different fields, named after the property, that way you could search for someone's name by 'name:Bob' or their year of matriculation with 'matriculation:1978'. The problem is that on creation of the index I do not know what properties will be associated with users so cannot define their field infos. Previously I was able to just specify the properties when adding that field to the document. David -- Posted via http://www.ruby-forum.com/. From jan.prill at gmail.com Tue Sep 19 08:12:00 2006 From: jan.prill at gmail.com (Jan Prill) Date: Tue, 19 Sep 2006 14:12:00 +0200 Subject: [Ferret-talk] Dynamic fields and AAF In-Reply-To: References: <20060918165917.GH31018@cordoba.webit.de> Message-ID: <562a35c10609190512w48da8ec3yc849b832fcd6ffa7@mail.gmail.com> without reading the whole thread: 1. you know that users have properties, right? 2. theses properties are like key value pairs. one could have a property like hobby: 'cars', another user might have a property like place-of-birth: 'Hamburg, Germany' 3. users might build their property key-value dynamically. You don't know which user chooses to inform you about which property 4. couldn't you use rubys reflection, inflection whatever features to iterate over the properties of which a user has many from and then inflect the key-value pairs to put them into the index? 5. this would mean that the field list of the index might grow to a great number. don't know how this would affect ferret. this further means that you need to know which fields one is able to search for. you would need to build something like an extended search form with all of these fields or inform the user about which fields he might use in his queries with effect. he should also be informed that only because of the existance of this field a user might not have provided this information. maybe it's only one user that informed you about his place-of-birth. cheers, Jan -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20060919/05be046f/attachment-0001.html From jan.prill at gmail.com Tue Sep 19 08:18:32 2006 From: jan.prill at gmail.com (Jan Prill) Date: Tue, 19 Sep 2006 14:18:32 +0200 Subject: [Ferret-talk] Dynamic fields and AAF In-Reply-To: <562a35c10609190512w48da8ec3yc849b832fcd6ffa7@mail.gmail.com> References: <20060918165917.GH31018@cordoba.webit.de> <562a35c10609190512w48da8ec3yc849b832fcd6ffa7@mail.gmail.com> Message-ID: <562a35c10609190518t4b130c6bx2f7a94ce8c52cd6d@mail.gmail.com> imho the described problem of a growing field list is one of the reasons for the popularity of tags. Simply let the user choose how to tag himself, his question, comment whatever and don't care about the field. it's fulltext search for a reason. imho you've got two sides in things like this: 1. predefine a field list, that would be filled in by most users and therefore is valueable information for your search, 2. choose tags for the stuff where users should be able to freely decide about the categorization. cheers, Jan -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20060919/15dda497/attachment.html From dbalmain.ml at gmail.com Tue Sep 19 08:21:49 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Tue, 19 Sep 2006 21:21:49 +0900 Subject: [Ferret-talk] Dynamic fields and AAF In-Reply-To: References: <20060918165917.GH31018@cordoba.webit.de> Message-ID: On 9/19/06, David Sheldon wrote: > David Balmain wrote: > > > Is there any reason you need them all to be in the same field? Or am I > > misunderstanding you? You do realize that different fields can have > > different properties right? > > Yes, I want them all in different fields, named after the property, that > way you could search for someone's name by 'name:Bob' or their year of > matriculation with 'matriculation:1978'. The problem is that on creation > of the index I do not know what properties will be associated with users > so cannot define their field infos. Previously I was able to just > specify the properties when adding that field to the document. > > David I'm assuming the matriculation field is always going to be a number. It won't change at a later date. So you can just set up the field whenever you use it for the first time. require 'rubygems' require 'ferret' i = Ferret::I.new puts i.field_infos if not i.field_infos[:matriculation] i.field_infos.add_field(:matriculation, :index => :untokenized) end puts i.field_infos i << {:matriculation => 1978} Of course you only need to do this for fields which vary from the norm. Whatever properties you instantiated the FieldInfos with will be used for fields added with the FieldInfos#add_field method unless otherwise specified. So if most of your fields are number or date fields you'd create the FieldInfos like this: fis = FieldInfos.new(:index => :untokenized_omit_norms, :term_vector => :no) Now when you add a text field you'll need to explicitly set it to tokenized and store term vectors: if not i.field_infos[:content] i.field_infos.add_field(:content, :term_vector => :with_positions_offsets, :index => :yes) end Let me know if this helps or not. Cheers, Dave From david.sheldon at torchbox.com Tue Sep 19 11:52:22 2006 From: david.sheldon at torchbox.com (David Sheldon) Date: Tue, 19 Sep 2006 17:52:22 +0200 Subject: [Ferret-talk] Dynamic fields and AAF In-Reply-To: References: <20060918165917.GH31018@cordoba.webit.de> Message-ID: David Balmain wrote: > On 9/19/06, David Sheldon wrote: >> so cannot define their field infos. Previously I was able to just >> specify the properties when adding that field to the document. >> >> David > > I'm assuming the matriculation field is always going to be a number. > It won't change at a later date. So you can just set up the field > whenever you use it for the first time. I've considered this. I use aaf, and this requires the model that describes what fields are allowed on objects to have access to the index models indexer, this isn't too bad. The only problem is when the index is created by something like rebuild_index, which needs to be extended to create all the extra fields. I don't want to add the fields to fields_for_ferret, as that would mean calling #{fieldname}_for_ferret for each possible property, rather than taking the properties defined on that user, and adding them. Would the fields_for_ferret solution be the correct way, somehow populating this out of the database and then overriding the foo_to_ferret methods to look in a cache? This was really easy with the old API. It seems a shame that it is so hard now. David -- Posted via http://www.ruby-forum.com/. From david.sheldon at torchbox.com Tue Sep 19 12:04:00 2006 From: david.sheldon at torchbox.com (David Sheldon) Date: Tue, 19 Sep 2006 18:04:00 +0200 Subject: [Ferret-talk] Dynamic fields and AAF In-Reply-To: References: <20060918165917.GH31018@cordoba.webit.de> Message-ID: <1be98078ddfc6b9bff12420560cb3ee1@ruby-forum.com> David Balmain wrote: > I'm assuming the matriculation field is always going to be a number. > It won't change at a later date. So you can just set up the field > whenever you use it for the first time. > > require 'rubygems' > require 'ferret' > i = Ferret::I.new > puts i.field_infos > if not i.field_infos[:matriculation] > i.field_infos.add_field(:matriculation, > :index => :untokenized) > end > puts i.field_infos > i << {:matriculation => 1978} Oh, I didn't really read this last time. It looks like this might be handy, http://ferret.davebalmain.com/api/classes/Ferret/Index/Index.html only lists the IndexReader as having the field_infos. How much overhead would it be to write an "add_value" method that is called, say 10 times per doc, which will lookup the field we're going to add in the index, and add it if it isn't already there? Is this what the old code did anyway? David -- Posted via http://www.ruby-forum.com/. From shammond at patientslikeme.com Tue Sep 19 12:19:35 2006 From: shammond at patientslikeme.com (Steven Hammond) Date: Tue, 19 Sep 2006 12:19:35 -0400 Subject: [Ferret-talk] acts_as_ferret and Fuzzy Searching Message-ID: <45101897.10800@patientslikeme.com> Hi there, I'd like to be able to tune the results of a Fuzzy search in a rails application. I've tried setting the following in my environment.rb file. Ferret::Search::FuzzyQuery.default_min_similarity = 0.75 Ferret::Search::FuzzyQuery.default_prefix_length = 2 When I go into the console, I can see those values as the default but when I run a search like Post.find_by_contents('word~') I always get the same results, no matter how I set the above variables. I can say Post.find_by_contents('word~0.75') and Post.find_by_contents('word~0.5') and get different results. Any help is appreciated. Thanks, Steve From miguel.wong at gmail.com Tue Sep 19 21:31:13 2006 From: miguel.wong at gmail.com (Miguel) Date: Wed, 20 Sep 2006 03:31:13 +0200 Subject: [Ferret-talk] Unit and Functional Tests Bombing with Ferret Message-ID: <1266e96b852f555af6b2c4a771e29da3@ruby-forum.com> Hello, I am currently using ferret 0.9.5 and acts_as_ferret 0.2.3 on windows XP All my unit and functions test that used to work before I installed ferret are erroring out. My index is on a model name Post, and it looks like all tests that contains methods which does CRUD to the Post model bombs out. Is there anything special I need to do before running unit and functional test? Thanks for your help in advance! Miguel -- Posted via http://www.ruby-forum.com/. From Neville.Burnell at bmsoft.com.au Wed Sep 20 01:40:03 2006 From: Neville.Burnell at bmsoft.com.au (Neville Burnell) Date: Wed, 20 Sep 2006 15:40:03 +1000 Subject: [Ferret-talk] Understanding boost ? Message-ID: <126EC586577FD611A28E00A0C9A03758B5BD68@maui.bmsoft.com.au> Hi, I'm confused about managing field boosting ... I have set the :boost for the :name field in my docs to 10, via :boost => 10 Then I performed a search for 'keith' over all fields via with *:(keith*), expecting a doc with Keith in the :name field to come out on top. But another doc with Keith mentioned in other fields (:comments, :address) scored higher. I viewed the explanation from the searcher, but it wasn't clear to me why the boost wasn't pushing the :name = Keith document to the top. Any help on understanding field boosting and explain would be great. Regards Neville PS, the two explains are: Doc1: 0.3352959 = product of: 8.047102 = sum of: 4.011141 = weight(comments: in 4697), product of: 0.5685414 = query_weight(comments:), product of: 28.22057 = idf(comments:<(keithex=1) + (keithb at zzzzzz.com=1) + (keith=115) = 117>) 0.02014635 = query_norm 7.055143 = field_weight(comments: in 4697), product of: 1.0 = The sum of: 1.0 = tf(term_freq(comments:keithex)=1)^1.0 28.22057 = idf(comments:<(keithex=1) + (keithb at zzzzzz.com=1) + (keith=115) = 117>) 0.25 = field_norm(field=comments, doc=4697) 4.03596 = weight(address: in 4697), product of: 0.4032613 = query_weight(address:), product of: 20.0166 = idf(address:<(keithex=1) + (keith=8) = 9>) 0.02014635 = query_norm 10.0083 = field_weight(address: in 4697), product of: 1.0 = The sum of: 1.0 = tf(term_freq(address:keithex)=1)^1.0 20.0166 = idf(address:<(keithex=1) + (keith=8) = 9>) 0.5 = field_norm(field=address, doc=4697) 0.04166667 = coord(2/48) Doc2: 0.2977623 = product of: 14.29259 = weight(name: in 31416), product of: 0.2028171 = query_weight(name:), product of: 10.06719 = idf(name:<(keith=3) = 3>) 0.02014635 = query_norm 70.47034 = field_weight(name: in 31416), product of: 1.0 = The sum of: 1.0 = tf(term_freq(name:keith)=1)^1.0 10.06719 = idf(name:<(keith=3) = 3>) 7.0 = field_norm(field=name, doc=31416) 0.02083333 = coord(1/48) From kraemer at webit.de Wed Sep 20 02:22:14 2006 From: kraemer at webit.de (Jens Kraemer) Date: Wed, 20 Sep 2006 08:22:14 +0200 Subject: [Ferret-talk] Unit and Functional Tests Bombing with Ferret In-Reply-To: <1266e96b852f555af6b2c4a771e29da3@ruby-forum.com> References: <1266e96b852f555af6b2c4a771e29da3@ruby-forum.com> Message-ID: <20060920062214.GA3032@cordoba.webit.de> On Wed, Sep 20, 2006 at 03:31:13AM +0200, Miguel wrote: > Hello, > > I am currently using ferret 0.9.5 and acts_as_ferret 0.2.3 on windows XP > > All my unit and functions test that used to work before I installed > ferret are erroring out. My index is on a model name Post, and it looks > like all tests that contains methods which does CRUD to the Post model > bombs out. > > Is there anything special I need to do before running unit and > functional test? well, if you told us what errors you get, we could probably tell you what the problem is ;-) Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From miguel.wong at gmail.com Wed Sep 20 04:12:39 2006 From: miguel.wong at gmail.com (Miguel) Date: Wed, 20 Sep 2006 10:12:39 +0200 Subject: [Ferret-talk] Unit and Functional Tests Bombing with Ferret In-Reply-To: <20060920062214.GA3032@cordoba.webit.de> References: <1266e96b852f555af6b2c4a771e29da3@ruby-forum.com> <20060920062214.GA3032@cordoba.webit.de> Message-ID: Oops. sorry. Looks likes there is an EOFError. I have copied the entire error output below (for a unit test) Also, if I delete the test index directory, and then add a line in the test that does some random Post.find_by_contents('?') before running the test, it would work. Thanks. EOFError: EOFError c:/ruby/lib/ruby/gems/1.8/gems/ferret-0.9.5/lib/ferret/store/buffered_index_ io.rb:178:in `refill' c:/ruby/lib/ruby/gems/1.8/gems/ferret-0.9.5/lib/ferret/store/buffered_index_ io.rb:94:in `read_byte' c:/ruby/lib/ruby/gems/1.8/gems/ferret-0.9.5/lib/ferret/store/index_io.rb:32: in `read_int' c:/ruby/lib/ruby/gems/1.8/gems/ferret-0.9.5/lib/ferret/index/segment_term_en um.rb:22:in `initialize' c:/ruby/lib/ruby/gems/1.8/gems/ferret-0.9.5/lib/ferret/index/term_infos_io.r b:122:in `initialize' c:/ruby/lib/ruby/gems/1.8/gems/ferret-0.9.5/lib/ferret/index/segment_reader. rb:29:in `initialize' c:/ruby/lib/ruby/gems/1.8/gems/ferret-0.9.5/lib/ferret/index/segment_reader. rb:11:in `get' c:/ruby/lib/ruby/gems/1.8/gems/ferret-0.9.5/lib/ferret/index/index_reader.rb :117:in `open' c:/ruby/lib/ruby/gems/1.8/gems/ferret-0.9.5/lib/ferret/index/index_reader.rb :116:in `open' c:/ruby/lib/ruby/gems/1.8/gems/ferret-0.9.5/lib/ferret/store/directory.rb:13 5:in `while_locked' c:/ruby/lib/ruby/gems/1.8/gems/ferret-0.9.5/lib/ferret/index/index_reader.rb :107:in `open' c:/ruby/lib/ruby/1.8/monitor.rb:229:in `synchronize' c:/ruby/lib/ruby/gems/1.8/gems/ferret-0.9.5/lib/ferret/index/index_reader.rb :105:in `open' c:/ruby/lib/ruby/gems/1.8/gems/ferret-0.9.5/lib/ferret/index/index.rb:643:in `ensure_reader_open' c:/ruby/lib/ruby/gems/1.8/gems/ferret-0.9.5/lib/ferret/index/index.rb:650:in `ensure_searcher_open' c:/ruby/lib/ruby/gems/1.8/gems/ferret-0.9.5/lib/ferret/index/index.rb:390:in `query_delete' c:/ruby/lib/ruby/1.8/monitor.rb:229:in `synchronize' c:/ruby/lib/ruby/gems/1.8/gems/ferret-0.9.5/lib/ferret/index/index.rb:389:in `query_delete' c:/ruby/lib/ruby/gems/1.8/gems/ferret-0.9.5/lib/ferret/index/index.rb:293:in `<<' c:/ruby/lib/ruby/1.8/monitor.rb:229:in `synchronize' c:/ruby/lib/ruby/gems/1.8/gems/ferret-0.9.5/lib/ferret/index/index.rb:258:in `<<' C:/_dev_alpha_3/src/woblo/config/../vendor/plugins/0.2.3/acts_as_ferret/lib/ acts_as_ferret.rb:510:in `ferret_update' c:/ruby/lib/ruby/gems/1.8/gems/activerecord-1.14.4/lib/active_record/callbac ks.rb:344:in `callback' c:/ruby/lib/ruby/gems/1.8/gems/activerecord-1.14.4/lib/active_record/callbac ks.rb:341:in `callback' c:/ruby/lib/ruby/gems/1.8/gems/activerecord-1.14.4/lib/active_record/callbac ks.rb:279:in `update_without_timestamps' c:/ruby/lib/ruby/gems/1.8/gems/activerecord-1.14.4/lib/active_record/timesta mp.rb:39:in `update' c:/ruby/lib/ruby/gems/1.8/gems/activerecord-1.14.4/lib/active_record/base.rb :1718:in `create_or_update_without_callbacks' c:/ruby/lib/ruby/gems/1.8/gems/activerecord-1.14.4/lib/active_record/callbac ks.rb:253:in `create_or_update' c:/ruby/lib/ruby/gems/1.8/gems/activerecord-1.14.4/lib/active_record/base.rb :1392:in `save_without_validation' c:/ruby/lib/ruby/gems/1.8/gems/activerecord-1.14.4/lib/active_record/validat ions.rb:736:in `save_without_transactions' c:/ruby/lib/ruby/gems/1.8/gems/activerecord-1.14.4/lib/active_record/transac tions.rb:126:in `save' c:/ruby/lib/ruby/gems/1.8/gems/activerecord-1.14.4/lib/active_record/connect ion_adapters/abstract/database_statements.rb:51:in `transaction' c:/ruby/lib/ruby/gems/1.8/gems/activerecord-1.14.4/lib/active_record/transac tions.rb:91:in `transaction' c:/ruby/lib/ruby/gems/1.8/gems/activerecord-1.14.4/lib/active_record/transac tions.rb:118:in `transaction' c:/ruby/lib/ruby/gems/1.8/gems/activerecord-1.14.4/lib/active_record/transac tions.rb:126:in `save' ./test/unit/tagfield_test.rb:14:in `test_tagfield' -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Wed Sep 20 04:20:33 2006 From: kraemer at webit.de (Jens Kraemer) Date: Wed, 20 Sep 2006 10:20:33 +0200 Subject: [Ferret-talk] Understanding boost ? In-Reply-To: <126EC586577FD611A28E00A0C9A03758B5BD68@maui.bmsoft.com.au> References: <126EC586577FD611A28E00A0C9A03758B5BD68@maui.bmsoft.com.au> Message-ID: <20060920082033.GN31018@cordoba.webit.de> Hi! On Wed, Sep 20, 2006 at 03:40:03PM +1000, Neville Burnell wrote: > Hi, > > I'm confused about managing field boosting ... > > I have set the :boost for the :name field in my docs to 10, via :boost > => 10 > > Then I performed a search for 'keith' over all fields via with > *:(keith*), expecting a doc with Keith in the :name field to come out on > top. But another doc with Keith mentioned in other fields (:comments, > :address) scored higher. > > I viewed the explanation from the searcher, but it wasn't clear to me > why the boost wasn't pushing the :name = Keith document to the top. as you can see from the explanation, the score for both fields that matched the query got summed up (8... = sum of:), if 'keith' only had shown up in one field, the other document would have had the higher score. I don't know of any methodology to determine the proper boost setting for a field, imho it's just a question of experimenting with queries and the results you expect. If you always want to have matches in the name ranked on the top, regardless of how many times a term is mentioned in other parts of your document, set the boost to 100 ;-) I don't know what the coord value is, though, maybe someone else can step in here ? Jens > PS, the two explains are: > > Doc1: > 0.3352959 = product of: > 8.047102 = sum of: > 4.011141 = weight(comments: in > 4697), product of: > 0.5685414 = > query_weight(comments:), product of: > 28.22057 = idf(comments:<(keithex=1) + (keithb at zzzzzz.com=1) + > (keith=115) = 117>) > 0.02014635 = query_norm > 7.055143 = field_weight(comments: > in 4697), product of: > 1.0 = The sum of: > 1.0 = tf(term_freq(comments:keithex)=1)^1.0 > 28.22057 = idf(comments:<(keithex=1) + (keithb at zzzzzz.com=1) + > (keith=115) = 117>) > 0.25 = field_norm(field=comments, doc=4697) > 4.03596 = weight(address: in 4697), product of: > 0.4032613 = query_weight(address:), product of: > 20.0166 = idf(address:<(keithex=1) + (keith=8) = 9>) > 0.02014635 = query_norm > 10.0083 = field_weight(address: in 4697), product > of: > 1.0 = The sum of: > 1.0 = tf(term_freq(address:keithex)=1)^1.0 > 20.0166 = idf(address:<(keithex=1) + (keith=8) = 9>) > 0.5 = field_norm(field=address, doc=4697) > 0.04166667 = coord(2/48) > > > Doc2: > 0.2977623 = product of: > 14.29259 = weight(name: in 31416), product of: > 0.2028171 = query_weight(name:), product of: > 10.06719 = idf(name:<(keith=3) = 3>) > 0.02014635 = query_norm > 70.47034 = field_weight(name: in 31416), product of: > 1.0 = The sum of: > 1.0 = tf(term_freq(name:keith)=1)^1.0 > 10.06719 = idf(name:<(keith=3) = 3>) > 7.0 = field_norm(field=name, doc=31416) > 0.02083333 = coord(1/48) > > > > > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From dbalmain.ml at gmail.com Wed Sep 20 04:53:54 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Wed, 20 Sep 2006 17:53:54 +0900 Subject: [Ferret-talk] acts_as_ferret and Fuzzy Searching In-Reply-To: <45101897.10800@patientslikeme.com> References: <45101897.10800@patientslikeme.com> Message-ID: On 9/20/06, Steven Hammond wrote: > > Hi there, > > I'd like to be able to tune the results of a Fuzzy search in a rails > application. I've tried setting the following in my environment.rb file. > > Ferret::Search::FuzzyQuery.default_min_similarity = 0.75 > Ferret::Search::FuzzyQuery.default_prefix_length = 2 > > When I go into the console, I can see those values as the default > but when I run a search like Post.find_by_contents('word~') I always get > the same results, no matter how I set the above variables. I can say > Post.find_by_contents('word~0.75') and Post.find_by_contents('word~0.5') > and get different results. > > Any help is appreciated. > > Thanks, > Steve Hi Steven, This was a bug so thanks for letting me know about it. I've fixed it now. You can get the latest version from the subversion repository or wait for the next gem. Cheers, Dave From dbalmain.ml at gmail.com Wed Sep 20 05:01:51 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Wed, 20 Sep 2006 18:01:51 +0900 Subject: [Ferret-talk] Dynamic fields and AAF In-Reply-To: <1be98078ddfc6b9bff12420560cb3ee1@ruby-forum.com> References: <20060918165917.GH31018@cordoba.webit.de> <1be98078ddfc6b9bff12420560cb3ee1@ruby-forum.com> Message-ID: On 9/20/06, David Sheldon wrote: > David Balmain wrote: > > > I'm assuming the matriculation field is always going to be a number. > > It won't change at a later date. So you can just set up the field > > whenever you use it for the first time. > > > > require 'rubygems' > > require 'ferret' > > i = Ferret::I.new > > puts i.field_infos > > if not i.field_infos[:matriculation] > > i.field_infos.add_field(:matriculation, > > :index => :untokenized) > > end > > puts i.field_infos > > i << {:matriculation => 1978} > > Oh, I didn't really read this last time. > > It looks like this might be handy, > > http://ferret.davebalmain.com/api/classes/Ferret/Index/Index.html only > lists the IndexReader as having the field_infos. > > How much overhead would it be to write an "add_value" method that is > called, say 10 times per doc, which will lookup the field we're going to > add in the index, and add it if it isn't already there? Not a lot. It's a hash lookup so it's fast and it should be rare (after a while at least) that new fields are added. ie, it's probably not going to happen for every document. > Is this what the old code did anyway? > > David The old code created a completely new FieldInfos object for every document you add to the index. It then merges the field_infos objects when the documents are merged. In other words it was a lot more complex. This is one of the reasons for the API change. Even after adding the add_value method, I'd guess that the newer version of Ferret will still index a lot faster. Cheers, Dave From david.sheldon at torchbox.com Wed Sep 20 05:22:52 2006 From: david.sheldon at torchbox.com (David Sheldon) Date: Wed, 20 Sep 2006 11:22:52 +0200 Subject: [Ferret-talk] Dynamic fields and AAF In-Reply-To: <1be98078ddfc6b9bff12420560cb3ee1@ruby-forum.com> References: <20060918165917.GH31018@cordoba.webit.de> <1be98078ddfc6b9bff12420560cb3ee1@ruby-forum.com> Message-ID: David Sheldon wrote: > How much overhead would it be to write an "add_value" method that is > called, say 10 times per doc, which will lookup the field we're going to > add in the index, and add it if it isn't already there? Ok, I've done this. But it was causing problems when called from rebuild_index, as there isn't an index at that point, and I was calling ferret_index on my model, which was creating a new index which couldnt get a write lock for my new fields. I have solved this by giving to_doc an optional index parameter that is passed in when rebuild is running, but if it is nil, it will call Model.ferret_index. It seems like an incorrect separation for the index to be passed in to the to_doc method. Have you any suggestions on how to make this nicer? David -- Posted via http://www.ruby-forum.com/. From dbalmain.ml at gmail.com Wed Sep 20 06:22:27 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Wed, 20 Sep 2006 19:22:27 +0900 Subject: [Ferret-talk] Understanding boost ? In-Reply-To: <126EC586577FD611A28E00A0C9A03758B5BD68@maui.bmsoft.com.au> References: <126EC586577FD611A28E00A0C9A03758B5BD68@maui.bmsoft.com.au> Message-ID: On 9/20/06, Neville Burnell wrote: > Hi, > > I'm confused about managing field boosting ... > > I have set the :boost for the :name field in my docs to 10, via :boost > => 10 > > Then I performed a search for 'keith' over all fields via with > *:(keith*), expecting a doc with Keith in the :name field to come out on > top. But another doc with Keith mentioned in other fields (:comments, > :address) scored higher. > > I viewed the explanation from the searcher, but it wasn't clear to me > why the boost wasn't pushing the :name = Keith document to the top. > > Any help on understanding field boosting and explain would be great. > > Regards > > Neville > > PS, the two explains are: > > Doc1: > 0.3352959 = product of: > 8.047102 = sum of: > 4.011141 = weight(comments: in > 4697), product of: > 0.5685414 = > query_weight(comments:), product of: > 28.22057 = idf(comments:<(keithex=1) + (keithb at zzzzzz.com=1) + > (keith=115) = 117>) > 0.02014635 = query_norm > 7.055143 = field_weight(comments: > in 4697), product of: > 1.0 = The sum of: > 1.0 = tf(term_freq(comments:keithex)=1)^1.0 > 28.22057 = idf(comments:<(keithex=1) + (keithb at zzzzzz.com=1) + > (keith=115) = 117>) > 0.25 = field_norm(field=comments, doc=4697) > 4.03596 = weight(address: in 4697), product of: > 0.4032613 = query_weight(address:), product of: > 20.0166 = idf(address:<(keithex=1) + (keith=8) = 9>) > 0.02014635 = query_norm > 10.0083 = field_weight(address: in 4697), product > of: > 1.0 = The sum of: > 1.0 = tf(term_freq(address:keithex)=1)^1.0 > 20.0166 = idf(address:<(keithex=1) + (keith=8) = 9>) > 0.5 = field_norm(field=address, doc=4697) > 0.04166667 = coord(2/48) > > > Doc2: > 0.2977623 = product of: > 14.29259 = weight(name: in 31416), product of: > 0.2028171 = query_weight(name:), product of: > 10.06719 = idf(name:<(keith=3) = 3>) > 0.02014635 = query_norm > 70.47034 = field_weight(name: in 31416), product of: > 1.0 = The sum of: > 1.0 = tf(term_freq(name:keith)=1)^1.0 > 10.06719 = idf(name:<(keith=3) = 3>) > 7.0 = field_norm(field=name, doc=31416) > 0.02083333 = coord(1/48) Hi Neville, The field's boost value affects the field_norm value in the Explanations above. Here is how it is calculated: field_norm = field_info->boost * doc->boost * field->boost * (1 / sqrt(field->num_terms) So as you can see from the Explanations above, field_norm is 7.0 on the boosted field which is more than 10 times the field_norms on the other two fields (0.25, 0.5) so at least you can see the boost is having an effect. The address field probably has a higher field_norm value than the comments field because the comments field is longer (see that last part of the field_norm equation). Note that the reason the boost is 7.0 and not 10.0 is that the field_norm gets stored in a single byte so there is quite a large loss of precision. Having said all this, there does seem to be a problem with the calculations. I don't think I've calculated the idf value correctly for MultiTermQueries. I've rectified this in subversion so the next version should give your results in an order that you'd expect. For information on tf and idf, check out this page: http://en.wikipedia.org/wiki/Tf-idf Hope that helps. I'd love to give a better explanation of the scoring but I don't have time right now. Cheers, Dave From clare.cav at arogent.co.uk Wed Sep 20 06:36:01 2006 From: clare.cav at arogent.co.uk (Clare) Date: Wed, 20 Sep 2006 12:36:01 +0200 Subject: [Ferret-talk] Range searches some times they work, some times not... Message-ID: Hi i'm using ferret to enable geographical postcode. I take a postcode and distance in miles from the user, strip off the outcode and then retrieve the associated x y coordinates in metres from the db. Then i get two temp x's and y's and search for all results that are within the box, see code below. Problems start to occur when i search on big distances so for example 40 miles from "G1" VoObject.ferret_index.search(" x:[206826 335573] AND y:[590526 719273]").total_hits => 165 300 miles VoObject.ferret_index.search("y:[172098 1137702]").total_hits Ferret::QueryParser::QueryParseException: Error occured in q_range.c:121 - range_new Upper bound must be greater than lower bound. "1137702" < "172098" from /usr/lib/ruby/gems/1.8/gems/ferret-0.10.1/lib/ferret/index.rb:572:in `parse' from /usr/lib/ruby/gems/1.8/gems/ferret-0.10.1/lib/ferret/index.rb:572:in `process_query' from /usr/lib/ruby/gems/1.8/gems/ferret-0.10.1/lib/ferret/index.rb:560:in `do_search' from /usr/lib/ruby/gems/1.8/gems/ferret-0.10.1/lib/ferret/index.rb:233:in `search' from /usr/lib/ruby/1.8/monitor.rb:229:in `synchronize' from /usr/lib/ruby/gems/1.8/gems/ferret-0.10.1/lib/ferret/index.rb:232:in `search' from (irb):16 So what am i doing wrong? How have other people used ferret for geographical searches? Is there another way that i can define the range so that it works properly? because I'm also getting other crazy and just plain wrong results VoObject.ferret_index.search("y:[0 9]").total_hits => 167 thats telling me that all the test data is with 8 metres of the origin... thanks in advance. clare if their_outcode && their_outcode.size > 0 temp_hwz = HwzPostcode.find(:first, :conditions => ['outcode = ?',their_outcode]) range_x_left = temp_hwz.x - (postcode_distance.to_f*1.60934 * 1000) range_x_right = temp_hwz.x + (postcode_distance.to_f*1.60934 * 1000) range_y_top = temp_hwz.y + (postcode_distance.to_f*1.60934 * 1000) range_y_bottom = temp_hwz.y - (postcode_distance.to_f*1.60934 * 1000) query += " AND x:[#{range_x_left.to_i} #{range_x_right.to_i}] AND y:[#{range_y_bottom.to_i} #{range_y_top.to_i}]" end -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Wed Sep 20 07:40:05 2006 From: kraemer at webit.de (Jens Kraemer) Date: Wed, 20 Sep 2006 13:40:05 +0200 Subject: [Ferret-talk] Dynamic fields and AAF In-Reply-To: References: <20060918165917.GH31018@cordoba.webit.de> <1be98078ddfc6b9bff12420560cb3ee1@ruby-forum.com> Message-ID: <20060920114005.GO31018@cordoba.webit.de> Hi! On Wed, Sep 20, 2006 at 11:22:52AM +0200, David Sheldon wrote: > David Sheldon wrote: > > > How much overhead would it be to write an "add_value" method that is > > called, say 10 times per doc, which will lookup the field we're going to > > add in the index, and add it if it isn't already there? > > Ok, I've done this. But it was causing problems when called from > rebuild_index, as there isn't an index at that point, and I was calling > ferret_index on my model, which was creating a new index which couldnt > get a write lock for my new fields. > > I have solved this by giving to_doc an optional index parameter that is > passed in when rebuild is running, but if it is nil, it will call > Model.ferret_index. > > It seems like an incorrect separation for the index to be passed in to > the to_doc method. Have you any suggestions on how to make this nicer? I could change the way rebuild_index works so that it uses and initializes the Ferret index instance returned by ferret_index. So you could access the index instance in to_doc when being called by rebuild_index, too. Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From dbalmain.ml at gmail.com Wed Sep 20 07:50:03 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Wed, 20 Sep 2006 20:50:03 +0900 Subject: [Ferret-talk] Understanding boost ? In-Reply-To: <20060920082033.GN31018@cordoba.webit.de> References: <126EC586577FD611A28E00A0C9A03758B5BD68@maui.bmsoft.com.au> <20060920082033.GN31018@cordoba.webit.de> Message-ID: On 9/20/06, Jens Kraemer wrote: > Hi! > > On Wed, Sep 20, 2006 at 03:40:03PM +1000, Neville Burnell wrote: > > Hi, > > > > I'm confused about managing field boosting ... > > > > I have set the :boost for the :name field in my docs to 10, via :boost > > => 10 > > > > Then I performed a search for 'keith' over all fields via with > > *:(keith*), expecting a doc with Keith in the :name field to come out on > > top. But another doc with Keith mentioned in other fields (:comments, > > :address) scored higher. > > > > I viewed the explanation from the searcher, but it wasn't clear to me > > why the boost wasn't pushing the :name = Keith document to the top. > > as you can see from the explanation, the score for both fields that > matched the query got summed up (8... = sum of:), if 'keith' only had > shown up in one field, the other document would have had the higher > score. > > I don't know of any methodology to determine the proper boost setting > for a field, imho it's just a question of experimenting with queries and > the results you expect. > > If you always want to have matches in the name ranked on the top, > regardless of how many times a term is mentioned in other parts of your > document, set the boost to 100 ;-) > > I don't know what the coord value is, though, maybe someone else can > step in here ? > > Jens The coord factor is the number of clauses in a BooleanQuery that matched over the number of clauses. It would seem that in the example, there were 48 clauses. When you submit a query over all fields (ie. "*:term") the query is rewritten as a boolean query with a clause for every field in your index. So it would seem that Neville has 48 fields in his index. Hope that makes sense, Dave PS: This might be a good time to mention that if you have an index with a lot of fields like this, it is probably worth thinking about what to set the :default_field and :all_fields parameters to. :all_fields is what "*:#{query}" expands to. It doesn't necessarily have to be all fields in the index. Usually you only want "*" to expand to all text fields, not actually all fields. For example, you'd probably want date fields to be excluded. And I've only just fixed this so it will work when you use a Ferret::Index::Index object. Previously the QueryParser had all fields in the index added to the :all_fields parameter. Now that only happens if :all_fields isn't set explicitly. From dbalmain.ml at gmail.com Wed Sep 20 08:21:59 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Wed, 20 Sep 2006 21:21:59 +0900 Subject: [Ferret-talk] Range searches some times they work, some times not... In-Reply-To: References: Message-ID: On 9/20/06, Clare wrote: > Hi i'm using ferret to enable geographical postcode. I take a postcode > and distance in miles from the user, strip off the outcode and then > retrieve the associated x y coordinates in metres from the db. Then i > get two temp x's and y's and search for all results that are within the > box, see code below. > > Problems start to occur when i search on big distances so for example > > 40 miles from "G1" > VoObject.ferret_index.search(" x:[206826 335573] AND y:[590526 > 719273]").total_hits > => 165 > > > 300 miles > VoObject.ferret_index.search("y:[172098 1137702]").total_hits > Ferret::QueryParser::QueryParseException: Error occured in q_range.c:121 > - range_new > Upper bound must be greater than lower bound. "1137702" < > "172098" > > from > /usr/lib/ruby/gems/1.8/gems/ferret-0.10.1/lib/ferret/index.rb:572:in > `parse' > from > /usr/lib/ruby/gems/1.8/gems/ferret-0.10.1/lib/ferret/index.rb:572:in > `process_query' > from > /usr/lib/ruby/gems/1.8/gems/ferret-0.10.1/lib/ferret/index.rb:560:in > `do_search' > from > /usr/lib/ruby/gems/1.8/gems/ferret-0.10.1/lib/ferret/index.rb:233:in > `search' > from /usr/lib/ruby/1.8/monitor.rb:229:in `synchronize' > from > /usr/lib/ruby/gems/1.8/gems/ferret-0.10.1/lib/ferret/index.rb:232:in > `search' > from (irb):16 > > > So what am i doing wrong? How have other people used ferret for > geographical searches? Is there another way that i can define the range > so that it works properly? > > because I'm also getting other crazy and just plain wrong results > > VoObject.ferret_index.search("y:[0 9]").total_hits > => 167 > > thats telling me that all the test data is with 8 metres of the > origin... > > thanks in advance. > clare > > > if their_outcode && their_outcode.size > 0 > temp_hwz = HwzPostcode.find(:first, :conditions => ['outcode = > ?',their_outcode]) > range_x_left = temp_hwz.x - (postcode_distance.to_f*1.60934 * 1000) > range_x_right = temp_hwz.x + (postcode_distance.to_f*1.60934 * 1000) > range_y_top = temp_hwz.y + (postcode_distance.to_f*1.60934 * 1000) > range_y_bottom = temp_hwz.y - (postcode_distance.to_f*1.60934 * 1000) > > query += " AND x:[#{range_x_left.to_i} #{range_x_right.to_i}] AND > y:[#{range_y_bottom.to_i} #{range_y_top.to_i}]" > end Hi Clare, Ranges are calculated according to lexical ordering, not numerical ordering. Try this: puts ["0", "9", "167"].sort You'll see that "167" does indeed fall between "0" and "9". Now try this: puts ["000", "009", "167"].sort So that should explain what you have to do. You need to pad all numbers to a fixed width. Alternatively you could build a custom IntegerRangeFilter and combine it with a ConstantScoreQuery. Here is an example for Floats: require 'rubygems' require 'ferret' class FloatRangeFilter attr_accessor :field, :upper, :lower, :upper_op, :lower_op def initialize(field, options) @field = field @upper = options[:<] || options[:<=] @lower = options[:>] || options[:>=] if @upper.nil? and @lower.nil? raise ArgError, "Must specify a bound" end @upper_op = options[:<].nil? ? :<= : :< @lower_op = options[:>].nil? ? :>= : :> end def bits(index_reader) bit_vector = Ferret::Utils::BitVector.new term_doc_enum = index_reader.term_docs index_reader.terms(@field).each do |term, freq| float = term.to_f next if @upper and not float.send(@upper_op, @upper) next if @lower and not float.send(@lower_op, @lower) term_doc_enum.seek(@field, term) term_doc_enum.each {|doc_id, freq| bit_vector.set(doc_id)} end return bit_vector end def hash return @field.hash ^ @upper.hash ^ @lower.hash ^ @upper_op.hash ^ @lower_op.hash end def eql?(o) return (o.instance_of?(FloatRangeFilter) and @field == o.field and @upper == o.upper and @lower == o.lower and @upper_op == o.upper_op and @lower_op == o.lower_op) end end You'll have to work out what is going on here yourself though. I have no time for explanation. Note that this won't perform very well compared to the padded field version because so much is going on in the Ruby code. I could possibly be persuaded to implement this in C. Cheers, Dave From david.sheldon at torchbox.com Wed Sep 20 09:56:39 2006 From: david.sheldon at torchbox.com (David Sheldon) Date: Wed, 20 Sep 2006 15:56:39 +0200 Subject: [Ferret-talk] Dynamic fields and AAF In-Reply-To: <20060920114005.GO31018@cordoba.webit.de> References: <20060918165917.GH31018@cordoba.webit.de> <1be98078ddfc6b9bff12420560cb3ee1@ruby-forum.com> <20060920114005.GO31018@cordoba.webit.de> Message-ID: Jens Kraemer wrote: > I could change the way rebuild_index works so that it uses and > initializes the Ferret index instance returned by ferret_index. So you > could access the index instance in to_doc when being called by > rebuild_index, too. That sounds good. The other thing I noticed was that if you wanted to create a field that is created by rebuild_index, but isn't actually put in there by the standard to_doc you can specifiy the fields along with :ignore => true, for example { :index => :untokenized, :ignore => true }. I want to do this as there is a field that I want to include many times on a document, and returning an array from foo_for_ferret didn't add a field for each. David, are you supposed to be able to set several values for a field in the document? Thanks for all you guy's support. David -- Posted via http://www.ruby-forum.com/. From david.wennergren at gmail.com Wed Sep 20 10:22:18 2006 From: david.wennergren at gmail.com (David Wennergren) Date: Wed, 20 Sep 2006 16:22:18 +0200 Subject: [Ferret-talk] acts_as_ferret limit on multi_search not working? Message-ID: I'm using acts_as_ferret to do a query like this: Model1.multi_search("my query",[Model2,Model3], :limit => 2) No matter what number i set limit to I get 10 items in the resultset. Am I doing something wrong? Thanks/David -- Posted via http://www.ruby-forum.com/. From rafe.colburn at extension.org Wed Sep 20 11:23:14 2006 From: rafe.colburn at extension.org (Rafe Colburn) Date: Wed, 20 Sep 2006 11:23:14 -0400 Subject: [Ferret-talk] bad interaction of ferret (0.10.5) and mongrel on linux Message-ID: <370AF7C5-4359-47D7-A4B7-00DA25A53106@extension.org> We have an application that uses Ferret and acts_as_ferret that we just upgraded to Ferret 0.10.5 (from 0.9.5) and the corresponding version of the acts_as_ferret. Everything works as expected on my laptop, which is running mongrel 0.3.13.3. When I deploy the application on our server, which is running some version of Red Hat Linux, I can't get Ferret to work at all when running under mongrel (again, version 0.3.13.3). Oddly, everything works fine if I open a console and rebuild the index or run a search. When I access the site via the web and try to modify a document that triggers indexing, I get the following error: A NoMethodError occurred in questions#update_faq: undefined method `new' for Ferret::Document:Module [RAILS_ROOT]/vendor/plugins/acts_as_ferret/lib/instance_methods.rb: 110:in `to_doc' When I try to run a search, I get no results at all even though the documents are indexed. (I can run searches fine from the Rails console.) Anyone ever seen anything like this? I have no idea what it is about the ferret/mongrel/Linux combination that's causing these problems. --Rafe From miguel.wong at gmail.com Wed Sep 20 12:52:22 2006 From: miguel.wong at gmail.com (Miguel) Date: Wed, 20 Sep 2006 18:52:22 +0200 Subject: [Ferret-talk] Unit and Functional Tests Bombing with Ferret In-Reply-To: References: <1266e96b852f555af6b2c4a771e29da3@ruby-forum.com> <20060920062214.GA3032@cordoba.webit.de> Message-ID: Does anyone know why the EOFError is popping up? Thanks a bunch! > EOFError: EOFError > c:/ruby/lib/ruby/gems/1.8/gems/ferret-0.9.5/lib/ferret/store/buffered_index_ > io.rb:178:in `refill' > c:/ruby/lib/ruby/gems/1.8/gems/ferret-0.9.5/lib/ferret/store/buffered_index_ > io.rb:94:in `read_byte' > c:/ruby/lib/ruby/gems/1.8/gems/ferret-0.9.5/lib/ferret/store/index_io.rb:32: > in `read_int' > .... -- Posted via http://www.ruby-forum.com/. From dbalmain.ml at gmail.com Wed Sep 20 13:14:21 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Thu, 21 Sep 2006 02:14:21 +0900 Subject: [Ferret-talk] Dynamic fields and AAF In-Reply-To: References: <20060918165917.GH31018@cordoba.webit.de> <1be98078ddfc6b9bff12420560cb3ee1@ruby-forum.com> <20060920114005.GO31018@cordoba.webit.de> Message-ID: On 9/20/06, David Sheldon wrote: > David, are you supposed to be able to set several values for a field in > the document? I think I know what you are asking here but I'm not sure. You can do this in Ferret: index << {:content = "yada yada yada", :tags => ["ruby", "rails", "ferret"]} So :tags has multiple values. But you can't do this: doc = Ferret::Document.new doc[:tag] = "ruby" doc[:tag] = "rails" doc[:tag] = "ferret" You should do this: doc[:tag] = ["ruby", "rails", "ferret"] Or this: doc[:tag] = ["ruby"] doc[:tag] << "rails" doc[:tag] << "ferret" After all, Ferret::Document is just a Hash with a boost field. Perhaps I have just misunderstood you completely so please let me know if I did. Cheers, Dave From dbalmain.ml at gmail.com Wed Sep 20 13:32:55 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Thu, 21 Sep 2006 02:32:55 +0900 Subject: [Ferret-talk] Updating to the bleeding edge version of Ferret Message-ID: Hey guys, It has occurred to me that a lot of people need some of the fixes I make to Ferret ASAP and don't like having to wait to long for the gem. On the other hand, it is a bit of a pain to download and install from subversion because then you need to uninstall when the next gem comes out. So I thought I may as well put some instructions out as to how you can build your own Ferret gem that will get overridden by the next official version to come out. Here goes: $ svn co svn://www.davebalmain.com/exp/ ferret $ cd ferret/ruby/ Now optionally run the tests to make sure I haven't checked any dodgy changes in: $ rake build the gem. REL should be the current release and then append 0.1. If you do this a second time between release append 0.2 and so on. The current version is 0.10.5 so we'll build 0.10.5.1: $ rake package REL=0.10.5.1 $ cd pkg $ ls -l drwxr-xr-x 5 dbalmain dbalmain 4096 2006-09-21 02:28 ferret-0.10.5.1 -rw-r--r-- 1 dbalmain dbalmain 415744 2006-09-21 02:28 ferret-0.10.5.1.gem -rw-r--r-- 1 dbalmain dbalmain 233614 2006-09-21 02:28 ferret-0.10.5.1.tgz -rw-r--r-- 1 dbalmain dbalmain 281737 2006-09-21 02:28 ferret-0.10.5.1.zip $ sudo gem install ferret-0.10.5.1.gem If you find any mistakes here please let me know. I've also added it to the wiki here: http://ferret.davebalmain.com/trac/wiki/DownloadCurrent cheers, Dave From kraemer at webit.de Wed Sep 20 14:48:21 2006 From: kraemer at webit.de (Jens Kraemer) Date: Wed, 20 Sep 2006 20:48:21 +0200 Subject: [Ferret-talk] bad interaction of ferret (0.10.5) and mongrel on linux In-Reply-To: <370AF7C5-4359-47D7-A4B7-00DA25A53106@extension.org> References: <370AF7C5-4359-47D7-A4B7-00DA25A53106@extension.org> Message-ID: <20060920184821.GA11593@cordoba.webit.de> Hi ! On Wed, Sep 20, 2006 at 11:23:14AM -0400, Rafe Colburn wrote: > We have an application that uses Ferret and acts_as_ferret that we > just upgraded to Ferret 0.10.5 (from 0.9.5) and the corresponding > version of the acts_as_ferret. Everything works as expected on my > laptop, which is running mongrel 0.3.13.3. When I deploy the > application on our server, which is running some version of Red Hat > Linux, I can't get Ferret to work at all when running under mongrel > (again, version 0.3.13.3). Oddly, everything works fine if I open a > console and rebuild the index or run a search. > > When I access the site via the web and try to modify a document > that triggers indexing, I get the following error: > > A NoMethodError occurred in questions#update_faq: > > undefined method `new' for Ferret::Document:Module > [RAILS_ROOT]/vendor/plugins/acts_as_ferret/lib/instance_methods.rb: > 110:in `to_doc' > > When I try to run a search, I get no results at all even though the > documents are indexed. (I can run searches fine from the Rails console.) > > Anyone ever seen anything like this? I have no idea what it is > about the ferret/mongrel/Linux combination that's causing these > problems. is it possible that there is an old Ferret 0.9.x still lying around on your server ? imho in 0.9.x Ferret::Document was a module, and Ferret::Document::Document was the class. In 0.10.x Ferret::Document is a class. I have different versions of Ferret and aff running with Mongrel without any problem. Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From kraemer at webit.de Wed Sep 20 14:49:01 2006 From: kraemer at webit.de (Jens Kraemer) Date: Wed, 20 Sep 2006 20:49:01 +0200 Subject: [Ferret-talk] acts_as_ferret limit on multi_search not working? In-Reply-To: References: Message-ID: <20060920184901.GB11593@cordoba.webit.de> On Wed, Sep 20, 2006 at 04:22:18PM +0200, David Wennergren wrote: > I'm using acts_as_ferret to do a query like this: > > Model1.multi_search("my query",[Model2,Model3], :limit => 2) > > No matter what number i set limit to I get 10 items in the resultset. Am > I doing something wrong? nothing, this is supposed to work. what version of Ferret/aaf do you use ? Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From rafe.colburn at extension.org Wed Sep 20 15:05:05 2006 From: rafe.colburn at extension.org (Rafe Colburn) Date: Wed, 20 Sep 2006 15:05:05 -0400 Subject: [Ferret-talk] bad interaction of ferret (0.10.5) and mongrel on linux In-Reply-To: <370AF7C5-4359-47D7-A4B7-00DA25A53106@extension.org> References: <370AF7C5-4359-47D7-A4B7-00DA25A53106@extension.org> Message-ID: <45EC8BBB-6C89-43AB-8182-54641988D0FC@extension.org> Ignore this question -- as it turns out Mongrel had not been restarted after the Gem was updated. --Rafe On Sep 20, 2006, at 11:23 AM, Rafe Colburn wrote: > We have an application that uses Ferret and acts_as_ferret that we > just upgraded to Ferret 0.10.5 (from 0.9.5) and the corresponding > version of the acts_as_ferret. Everything works as expected on my > laptop, which is running mongrel 0.3.13.3. When I deploy the > application on our server, which is running some version of Red Hat > Linux, I can't get Ferret to work at all when running under mongrel > (again, version 0.3.13.3). Oddly, everything works fine if I open a > console and rebuild the index or run a search. From Neville.Burnell at bmsoft.com.au Wed Sep 20 19:50:07 2006 From: Neville.Burnell at bmsoft.com.au (Neville Burnell) Date: Thu, 21 Sep 2006 09:50:07 +1000 Subject: [Ferret-talk] Understanding boost ? Message-ID: <126EC586577FD611A28E00A0C9A0375886EE7A@maui.bmsoft.com.au> Thanks Dave, Having boost seamingly absent from the explain calculation confused me, but your explanation of field_norm helps a lot. Neville -----Original Message----- From: ferret-talk-bounces at rubyforge.org [mailto:ferret-talk-bounces at rubyforge.org] On Behalf Of David Balmain Sent: Wednesday, 20 September 2006 8:22 PM To: ferret-talk at rubyforge.org Subject: Re: [Ferret-talk] Understanding boost ? On 9/20/06, Neville Burnell wrote: > Hi, > > I'm confused about managing field boosting ... > > I have set the :boost for the :name field in my docs to 10, via :boost > => 10 > > Then I performed a search for 'keith' over all fields via with > *:(keith*), expecting a doc with Keith in the :name field to come out > on top. But another doc with Keith mentioned in other fields > (:comments, > :address) scored higher. > > I viewed the explanation from the searcher, but it wasn't clear to me > why the boost wasn't pushing the :name = Keith document to the top. > > Any help on understanding field boosting and explain would be great. > > Regards > > Neville > > PS, the two explains are: > > Doc1: > 0.3352959 = product of: > 8.047102 = sum of: > 4.011141 = weight(comments: in > 4697), product of: > 0.5685414 = > query_weight(comments:), product of: > 28.22057 = idf(comments:<(keithex=1) + (keithb at zzzzzz.com=1) + > (keith=115) = 117>) > 0.02014635 = query_norm > 7.055143 = > field_weight(comments: > in 4697), product of: > 1.0 = The sum of: > 1.0 = tf(term_freq(comments:keithex)=1)^1.0 > 28.22057 = idf(comments:<(keithex=1) + (keithb at zzzzzz.com=1) + > (keith=115) = 117>) > 0.25 = field_norm(field=comments, doc=4697) > 4.03596 = weight(address: in 4697), product of: > 0.4032613 = query_weight(address:), product of: > 20.0166 = idf(address:<(keithex=1) + (keith=8) = 9>) > 0.02014635 = query_norm > 10.0083 = field_weight(address: in 4697), product > of: > 1.0 = The sum of: > 1.0 = tf(term_freq(address:keithex)=1)^1.0 > 20.0166 = idf(address:<(keithex=1) + (keith=8) = 9>) > 0.5 = field_norm(field=address, doc=4697) > 0.04166667 = coord(2/48) > > > Doc2: > 0.2977623 = product of: > 14.29259 = weight(name: in 31416), product of: > 0.2028171 = query_weight(name:), product of: > 10.06719 = idf(name:<(keith=3) = 3>) > 0.02014635 = query_norm > 70.47034 = field_weight(name: in 31416), product of: > 1.0 = The sum of: > 1.0 = tf(term_freq(name:keith)=1)^1.0 > 10.06719 = idf(name:<(keith=3) = 3>) > 7.0 = field_norm(field=name, doc=31416) > 0.02083333 = coord(1/48) Hi Neville, The field's boost value affects the field_norm value in the Explanations above. Here is how it is calculated: field_norm = field_info->boost * doc->boost * field->boost * (1 / sqrt(field->num_terms) So as you can see from the Explanations above, field_norm is 7.0 on the boosted field which is more than 10 times the field_norms on the other two fields (0.25, 0.5) so at least you can see the boost is having an effect. The address field probably has a higher field_norm value than the comments field because the comments field is longer (see that last part of the field_norm equation). Note that the reason the boost is 7.0 and not 10.0 is that the field_norm gets stored in a single byte so there is quite a large loss of precision. Having said all this, there does seem to be a problem with the calculations. I don't think I've calculated the idf value correctly for MultiTermQueries. I've rectified this in subversion so the next version should give your results in an order that you'd expect. For information on tf and idf, check out this page: http://en.wikipedia.org/wiki/Tf-idf Hope that helps. I'd love to give a better explanation of the scoring but I don't have time right now. Cheers, Dave _______________________________________________ Ferret-talk mailing list Ferret-talk at rubyforge.org http://rubyforge.org/mailman/listinfo/ferret-talk From Neville.Burnell at bmsoft.com.au Wed Sep 20 20:25:14 2006 From: Neville.Burnell at bmsoft.com.au (Neville Burnell) Date: Thu, 21 Sep 2006 10:25:14 +1000 Subject: [Ferret-talk] Understanding boost ? Message-ID: <126EC586577FD611A28E00A0C9A0375886EE7E@maui.bmsoft.com.au> -----Original Message----- From: ferret-talk-bounces at rubyforge.org [mailto:ferret-talk-bounces at rubyforge.org] On Behalf Of David Balmain Sent: Wednesday, 20 September 2006 9:50 PM To: ferret-talk at rubyforge.org Subject: Re: [Ferret-talk] Understanding boost ? On 9/20/06, Jens Kraemer wrote: > Hi! > > On Wed, Sep 20, 2006 at 03:40:03PM +1000, Neville Burnell wrote: > > Hi, > > > > I'm confused about managing field boosting ... > > > > I have set the :boost for the :name field in my docs to 10, via > > :boost => 10 > > > > Then I performed a search for 'keith' over all fields via with > > *:(keith*), expecting a doc with Keith in the :name field to come > > out on top. But another doc with Keith mentioned in other fields > > (:comments, > > :address) scored higher. > > > > I viewed the explanation from the searcher, but it wasn't clear to > > me why the boost wasn't pushing the :name = Keith document to the top. > > as you can see from the explanation, the score for both fields that > matched the query got summed up (8... = sum of:), if 'keith' only had > shown up in one field, the other document would have had the higher > score. > > I don't know of any methodology to determine the proper boost setting > for a field, imho it's just a question of experimenting with queries > and the results you expect. > > If you always want to have matches in the name ranked on the top, > regardless of how many times a term is mentioned in other parts of > your document, set the boost to 100 ;-) > > I don't know what the coord value is, though, maybe someone else can > step in here ? > > Jens The coord factor is the number of clauses in a BooleanQuery that matched over the number of clauses. It would seem that in the example, there were 48 clauses. When you submit a query over all fields (ie. "*:term") the query is rewritten as a boolean query with a clause for every field in your index. So it would seem that Neville has 48 fields in his index. Hope that makes sense, Dave PS: This might be a good time to mention that if you have an index with a lot of fields like this, it is probably worth thinking about what to set the :default_field and :all_fields parameters to. :all_fields is what "*:#{query}" expands to. It doesn't necessarily have to be all fields in the index. Usually you only want "*" to expand to all text fields, not actually all fields. For example, you'd probably want date fields to be excluded. And I've only just fixed this so it will work when you use a Ferret::Index::Index object. Previously the QueryParser had all fields in the index added to the :all_fields parameter. Now that only happens if :all_fields isn't set explicitly. _______________________________________________ Ferret-talk mailing list Ferret-talk at rubyforge.org http://rubyforge.org/mailman/listinfo/ferret-talk From Neville.Burnell at bmsoft.com.au Wed Sep 20 20:26:14 2006 From: Neville.Burnell at bmsoft.com.au (Neville Burnell) Date: Thu, 21 Sep 2006 10:26:14 +1000 Subject: [Ferret-talk] Understanding boost ? Message-ID: <126EC586577FD611A28E00A0C9A0375886EE7F@maui.bmsoft.com.au> >> it is probably worth thinking about what to >> set the :default_field and :all_fields parameters to. Hi Dave, Thanks for pointing this out. Neville -----Original Message----- PS: This might be a good time to mention that if you have an index with a lot of fields like this, it is probably worth thinking about what to set the :default_field and :all_fields parameters to. :all_fields is what "*:#{query}" expands to. It doesn't necessarily have to be all fields in the index. Usually you only want "*" to expand to all text fields, not actually all fields. For example, you'd probably want date fields to be excluded. And I've only just fixed this so it will work when you use a Ferret::Index::Index object. Previously the QueryParser had all fields in the index added to the :all_fields parameter. Now that only happens if :all_fields isn't set explicitly. _______________________________________________ Ferret-talk mailing list Ferret-talk at rubyforge.org http://rubyforge.org/mailman/listinfo/ferret-talk From Neville.Burnell at bmsoft.com.au Wed Sep 20 20:35:29 2006 From: Neville.Burnell at bmsoft.com.au (Neville Burnell) Date: Thu, 21 Sep 2006 10:35:29 +1000 Subject: [Ferret-talk] Understanding boost ? Message-ID: <126EC586577FD611A28E00A0C9A0375886EE80@maui.bmsoft.com.au> >> it would seem that Neville has 48 fields in his index. Yes, there are 48 fields. But around 17 fields are marked as :index => :no because they are only used as detail in the retrieved doc and not for indexing purposes. Shouldn't that affect both the coord factor and the :all_fields expansion ? Kind Regards Neville -----Original Message----- The coord factor is the number of clauses in a BooleanQuery that matched over the number of clauses. It would seem that in the example, there were 48 clauses. When you submit a query over all fields (ie. "*:term") the query is rewritten as a boolean query with a clause for every field in your index. So it would seem that Neville has 48 fields in his index. Hope that makes sense, Dave PS: This might be a good time to mention that if you have an index with a lot of fields like this, it is probably worth thinking about what to set the :default_field and :all_fields parameters to. :all_fields is what "*:#{query}" expands to. It doesn't necessarily have to be all fields in the index. Usually you only want "*" to expand to all text fields, not actually all fields. For example, you'd probably want date fields to be excluded. And I've only just fixed this so it will work when you use a Ferret::Index::Index object. Previously the QueryParser had all fields in the index added to the :all_fields parameter. Now that only happens if :all_fields isn't set explicitly. _______________________________________________ Ferret-talk mailing list Ferret-talk at rubyforge.org http://rubyforge.org/mailman/listinfo/ferret-talk From samuelgiffney at gmail.com Wed Sep 20 22:28:16 2006 From: samuelgiffney at gmail.com (Sam Giffney) Date: Thu, 21 Sep 2006 04:28:16 +0200 Subject: [Ferret-talk] Range searches some times they work, some times not... In-Reply-To: References: Message-ID: <2af4de9a4436f125451a523c6c4c53ac@ruby-forum.com> David Balmain wrote: > On 9/20/06, Clare wrote: > > You'll have to work out what is going on here yourself though. I have > no time for explanation. Note that this won't perform very well > compared to the padded field version because so much is going on in > the Ruby code. I could possibly be persuaded to implement this in C. > > Cheers, > Dave I've also implemented a geographic search using lucene/ferret. There a couple of key points that helped me 'get it' - 1 - lucene does lexographic, not numeric, search so to search on numbers you need to convert them to a string which works for lexographic sort (usually by adding leading zeros or a fixed number of decimal places after the decimal point) [as pointed out by Dave above] 2 - a range search is actually converted into a boolean search internally (someone please correct me if I got that wrong) so doing a range search over massive ranges may be problematic by exceeding accepted query lengths. Then you start a trade off between accuracy (more decimal places) and speed. The way I got round it was to assume that for my purposes search only needed to be accurate to about 100m so formatting longitude/latitude to 3 decimal places would work fine (I live in a small country :) Sam -- Posted via http://www.ruby-forum.com/. From dbalmain.ml at gmail.com Wed Sep 20 23:43:00 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Thu, 21 Sep 2006 12:43:00 +0900 Subject: [Ferret-talk] [ANN] Ferret-0.10.6 released Message-ID: Hey folks, I just released Ferret 0.10.6. In case you haven't noticed, we have gone beta so the API should be pretty solid now. From now on, backwards compatibility will be a major priority. Also, I'm going to try and do a better job of recording changes and making the release announcements regularly so people know what is going on. == Changes: * Fixed scoring of MultiTermQuery. MultiTermQueries with a high number of terms were being weighted too highly against other queries. * Fixed TUTORIAL so that it is current. On that note, please send in bug reports for documentation errors as well as the usual bug reports. Happy Ferreting Dave From dbalmain.ml at gmail.com Wed Sep 20 23:51:05 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Thu, 21 Sep 2006 12:51:05 +0900 Subject: [Ferret-talk] Understanding boost ? In-Reply-To: <126EC586577FD611A28E00A0C9A0375886EE80@maui.bmsoft.com.au> References: <126EC586577FD611A28E00A0C9A0375886EE80@maui.bmsoft.com.au> Message-ID: On 9/21/06, Neville Burnell wrote: > >> it would seem that Neville has 48 fields in his index. > > Yes, there are 48 fields. > > But around 17 fields are marked as :index => :no because they are only > used as detail in the retrieved doc and not for indexing purposes. > > Shouldn't that affect both the coord factor and the :all_fields > expansion ? > > Kind Regards > > Neville Yes, that's a good idea. It wouldn't be too much trouble to modify Index to only add indexed fields to the :all_fields value. I'll change this in the next release. Even with 17 fields though it may still be worth setting :all_fields up yourself, but I have no idea what is in those 17 fields so I may be wrong. By the way, the coord factors devisor will always be equal to the size of :all_fields in this type of query so I just need to fix the setting of :all_fields. Cheers, Dave From dbalmain.ml at gmail.com Thu Sep 21 00:02:17 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Thu, 21 Sep 2006 13:02:17 +0900 Subject: [Ferret-talk] Ferret is now accepting donations Message-ID: Hey folks, Since this is the Ferret mailing list I should preface this by that the donation of time and help by people like Jens Kraemer, Jan Prill and all the other people who have contributed to Ferret is just as valuable to me as any financial assistance. Thanks. Ferret has been a labour of love but it has taken up a lot more of my life than I ever expected. At in excess of 50,000 lines of code, I believe it is one of the largest Ruby projects, especially with only a single developer. (previous version before rewrite had >70,000 LOC so added together that is a lot of work). I would love to keep pushing Ferret forward at the rate it has been going but other things are going to have to start taking priority (like putting food on the table). If you find Ferret useful in your application and you aren't able to contribute with the development, please consider making a donation at the Ferret website: http://ferret.davebalmain.com/trac So where do I see Ferret going in the future? I'd really like to build an object-database based on Ferret, with ActiveRecord and Og bindings. Why?: * Fixes the current DRY problems with Ferret. ie, should you store data in the Ferret index to take advantage or highlighting? Or build your own highlighter so that the data isn't stored in two places. * Simplifies things. You'll be able to forget about IndexReaders, IndexWriters, file-locking, etcetera. Just create the database as you usually would and you have Ferret full-text search built in. * Range queries just work. No need to pad numbers or format dates correctly. * Sort just works. And it won't take forever to build the sort-index (currently a problem on very large indexes). * Performance, performance, performance. As people are often pointing out, the bottle neck in many applications falls in the data access layer. Mapping relational database schemas to Ruby objects (or any OO language for that matter) can be very expensive at run-time. A good object database should easily outperform even SQLite. (and I'm being very cautious here) Right now, I'd need to raise at least 5 figures before I'd consider this undertaking so please send some encouragement my way if you would be interested in something like this. Otherwise I'd appreciate any kind of contribution, financial or assistance with development. In the meantime I will continue to improve test coverage and Ferret documentation, fix bugs and help people on the Ferret mailing list. Happy Ferreting. Dave From dbalmain.ml at gmail.com Thu Sep 21 02:07:41 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Thu, 21 Sep 2006 15:07:41 +0900 Subject: [Ferret-talk] Range searches some times they work, some times not... In-Reply-To: <2af4de9a4436f125451a523c6c4c53ac@ruby-forum.com> References: <2af4de9a4436f125451a523c6c4c53ac@ruby-forum.com> Message-ID: On 9/21/06, Sam Giffney wrote: > 2 - a range search is actually converted into a boolean search > internally (someone please correct me if I got that wrong) so doing a > range search over massive ranges may be problematic by exceeding > accepted query lengths. Then you start a trade off between accuracy > (more decimal places) and speed. The way I got round it was to assume > that for my purposes search only needed to be accurate to about 100m so > formatting longitude/latitude to 3 decimal places would work fine (I > live in a small country :) This used to be correct, but it is no longer the case in either Ferret or Lucene (version 2.0). RangeQueries get reduced to ConstantScoreQueries which use a Filter. So Sam, you can now feel free to use RangeQueries with as large a Range as you like :-). WildcardQueries, FuzzyQueries and PrefixQueries do however get rewritten as BooleanQueries in Lucene and MultiTermQueries in Ferret so you do need to be careful when using these queries. Ferret's MultiTermQuery is a lot more efficient than a BooleanQuery for this task so it it allows a lot more clauses then you could probably use efficiently in Lucene. Also, the query "*" gets rewritten as a MatchAllQuery so it is safe to use. Cheers, Dave From david.sheldon at torchbox.com Thu Sep 21 04:31:40 2006 From: david.sheldon at torchbox.com (David Sheldon) Date: Thu, 21 Sep 2006 10:31:40 +0200 Subject: [Ferret-talk] Dynamic fields and AAF In-Reply-To: References: <20060918165917.GH31018@cordoba.webit.de> <1be98078ddfc6b9bff12420560cb3ee1@ruby-forum.com> <20060920114005.GO31018@cordoba.webit.de> Message-ID: <6fd3934d01e5df9d8c09128f76360464@ruby-forum.com> David Balmain wrote: > So :tags has multiple values. But you can't do this: > > doc = Ferret::Document.new > doc[:tag] = "ruby" > doc[:tag] = "rails" > doc[:tag] = "ferret" > > You should do this: > > doc[:tag] = ["ruby", "rails", "ferret"] That is exactly what I mean. And it looks like that is another way I can simplify my code with the new API. I can return an array from foo_for_ferret and have all the individual values counted. Previously I did basically networks.each { |net| doc << Field.new('network', net.name) } Thanks. David -- Posted via http://www.ruby-forum.com/. From jan.prill at gmail.com Thu Sep 21 05:06:33 2006 From: jan.prill at gmail.com (Jan Prill) Date: Thu, 21 Sep 2006 11:06:33 +0200 Subject: [Ferret-talk] Ferret is now accepting donations In-Reply-To: References: Message-ID: <562a35c10609210206xf3ea7a1l467b40bca424eb3e@mail.gmail.com> Hey, this is great! I'll definitly hit the donation button myself in near future, but don't expect much since I'm a little away from having a job that really pays out right now. Anyway: Your plans for the future sound *very* interesting. I've been using lucene quite some time ago for the one and only persistence layer in a project and it did work out. An optimized ferret for these kinds of things would definitly be great. IMHO people - including myself - often have a sql layer in the back office mainly because of security reasons.. relational dbms are so long around and often you are relying on things simply because they have proven to work. On the other hand performance sensible apps, gameservers or telco servers with many concurrent users often can't rely on rdbms out of performance reasons. So this sounds really interesting from a practical point of view as well. Hopefully such companies will make a noticeable donation once they are aware that something like ferret exists and has future plans like the ones you've described. Being noticed seems to be very important, imho it needs a marketing genius like DHH to give an open source project an breakthrough like the one that rails has seen. Regarding the 'I need something on the table to eat' imho the best thing that could happen to an open source developer who loves his child project is that one of the big companys hires him and lets him work on what he loves. Seems as this has just happened to Charles Nutter et al who have been hired from sun because of their work on jruby. On the other hand many developers don't want to work for a big company because of the loss of freedom. Whatever you decide to do in the future there really should be some great opportunities (and donations) for an incredible developer who is pushing forward a project of the size of ferret in such a great pace. My personal opinion: If one is making money with the help of an open source project like ferret and there is an opportunity to donate, they should feel a need to give something back. May it be patches or money... Cheers, Jan -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20060921/598a10fe/attachment-0001.html From david.wennergren at gmail.com Thu Sep 21 05:58:30 2006 From: david.wennergren at gmail.com (David Wennergren) Date: Thu, 21 Sep 2006 11:58:30 +0200 Subject: [Ferret-talk] acts_as_ferret limit on multi_search not working? In-Reply-To: <20060920184901.GB11593@cordoba.webit.de> References: <20060920184901.GB11593@cordoba.webit.de> Message-ID: <97efd6d34b6fe3a8f444c287d835331f@ruby-forum.com> Jens Kraemer wrote: > On Wed, Sep 20, 2006 at 04:22:18PM +0200, David Wennergren wrote: >> I'm using acts_as_ferret to do a query like this: >> >> Model1.multi_search("my query",[Model2,Model3], :limit => 2) >> >> No matter what number i set limit to I get 10 items in the resultset. Am >> I doing something wrong? > > nothing, this is supposed to work. what version of Ferret/aaf do you > use ? > > Jens > > -- > webit! Gesellschaft f?r neue Medien mbH www.webit.de > Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de > Schnorrstra?e 76 Tel +49 351 46766 0 > D-01069 Dresden Fax +49 351 46766 66 I'm using ferret 0.10.4 and aaf 0.3.0. I'll try to make it into a testcase so it easier to reproduce. This is my actual query: >> Pressrelease.multi_search("con*",[Event,Image],:limit => 2).size => 10 /david -- Posted via http://www.ruby-forum.com/. From miguel.wong at gmail.com Thu Sep 21 14:13:34 2006 From: miguel.wong at gmail.com (Miguel) Date: Thu, 21 Sep 2006 20:13:34 +0200 Subject: [Ferret-talk] EOF Error with Unit Tests Message-ID: <3f35515adc44e16be64c2d3b5854ba92@ruby-forum.com> I am getting this weird EOFError when running tests using rake When running unit tests one by one (test file by test file), this error does not pop up. Does anyone know what is happening? Thanks! -- Posted via http://www.ruby-forum.com/. From miguel.wong at gmail.com Thu Sep 21 14:15:39 2006 From: miguel.wong at gmail.com (Miguel) Date: Thu, 21 Sep 2006 20:15:39 +0200 Subject: [Ferret-talk] EOF Error with Unit Tests In-Reply-To: <3f35515adc44e16be64c2d3b5854ba92@ruby-forum.com> References: <3f35515adc44e16be64c2d3b5854ba92@ruby-forum.com> Message-ID: <0aadb024eeeafcfac48b26984fb76590@ruby-forum.com> Oh I forget to mention, obviously some tests include CRUD operations on the object that has been indexed in addition to using fixtures, I also have written some test helper methods (add_post() for example) Thanks. Miguel wrote: > I am getting this weird EOFError when running tests using rake > > When running unit tests one by one (test file by test file), this error > does not pop up. Does anyone know what is happening? > > Thanks! -- Posted via http://www.ruby-forum.com/. From sera at fhwang.net Thu Sep 21 16:18:02 2006 From: sera at fhwang.net (Francis Hwang) Date: Thu, 21 Sep 2006 16:18:02 -0400 Subject: [Ferret-talk] strange matching: maybe a multilanguage collation problem? Message-ID: <247D2FF6-4223-41B3-AB82-6FBAD03ED877@fhwang.net> Hi, We're using Ferret in a slightly unorthodox way: We're indexing a large (>100,000) list of names of places all around the world. Mostly we're quite happy with it, and have been able to graft on our own particular required functionality with just a little tweaking. There's one strange problem, though: We've got a place in Cyprus called "Gazima\304\237usa" (that \304\237 is a multibyte character in UTF-8), and it matches a search for "usa". We'd rather it not match. I don't know that much about Ferret or about this sort of indexing in general, but is this because Ferret views \304\237 as a word break, and splits the name into two words? If so, is there a way you'd recommend to get around this -- keeping in mind that we've got names in romanized forms of many different languages? Thanks in advance, Francis From samuelgiffney at gmail.com Thu Sep 21 20:19:08 2006 From: samuelgiffney at gmail.com (Sam Giffney) Date: Fri, 22 Sep 2006 02:19:08 +0200 Subject: [Ferret-talk] QueryParser bug? Message-ID: I cooked up a little script to show what I mean. This doesn't look right to me, but maybe I just completely misunderstand QueryParser. Same output on mswin32, unix, ferret 0.9 and 0.10 Cheers, Sam require 'rubygems' require 'ferret' p Ferret::VERSION # 0.10.6 index = Ferret::Index::Index.new() index << {:title => "Programming Ruby", :content => "yada yada yada"} puts index.search("ruby").total_hits # returns 1 query_parser = Ferret::QueryParser.new( :default_field => :title ) query = query_parser.parse("title:ruby") puts index.search(query).total_hits # returns 1 query_parser = Ferret::QueryParser.new( ) # :default_field: Default: "*" # The default field to search when no field is specified in the search string. query = query_parser.parse("ruby") puts index.search(query).total_hits # returns 0 -- Posted via http://www.ruby-forum.com/. From dbalmain.ml at gmail.com Thu Sep 21 22:02:33 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Fri, 22 Sep 2006 11:02:33 +0900 Subject: [Ferret-talk] EOF Error with Unit Tests In-Reply-To: <0aadb024eeeafcfac48b26984fb76590@ruby-forum.com> References: <3f35515adc44e16be64c2d3b5854ba92@ruby-forum.com> <0aadb024eeeafcfac48b26984fb76590@ruby-forum.com> Message-ID: On 9/22/06, Miguel wrote: > Oh I forget to mention, obviously some tests include CRUD operations on > the object that has been indexed > > in addition to using fixtures, I also have written some test helper > methods (add_post() for example) > > Thanks. > > Miguel wrote: > > I am getting this weird EOFError when running tests using rake > > > > When running unit tests one by one (test file by test file), this error > > does not pop up. Does anyone know what is happening? > > > > Thanks! > Hi Miguel, A couple of questions will help us answer this. Are you on Windows? Is your application a Rails app? Are you using acts_as_ferret? The first thing I'd check is that you are closing your Index, IndexReader or IndexWriter when you are finished with it (ie in your test methods . Not doing this can possibly cause and EOFError. Also, on Windows, I had a lot of trouble making sure files get deleted correctly, but I may have made a mistake somewhere. I hope we can help you out, Dave From dbalmain.ml at gmail.com Thu Sep 21 22:20:48 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Fri, 22 Sep 2006 11:20:48 +0900 Subject: [Ferret-talk] strange matching: maybe a multilanguage collation problem? In-Reply-To: <247D2FF6-4223-41B3-AB82-6FBAD03ED877@fhwang.net> References: <247D2FF6-4223-41B3-AB82-6FBAD03ED877@fhwang.net> Message-ID: On 9/22/06, Francis Hwang wrote: > Hi, > > We're using Ferret in a slightly unorthodox way: We're indexing a > large (>100,000) list of names of places all around the world. Mostly > we're quite happy with it, and have been able to graft on our own > particular required functionality with just a little tweaking. > > There's one strange problem, though: We've got a place in Cyprus > called "Gazima\304\237usa" (that \304\237 is a multibyte character in > UTF-8), and it matches a search for "usa". We'd rather it not match. > I don't know that much about Ferret or about this sort of indexing in > general, but is this because Ferret views \304\237 as a word break, > and splits the name into two words? If so, is there a way you'd > recommend to get around this -- keeping in mind that we've got names > in romanized forms of many different languages? > > Thanks in advance, > > Francis Hi Francis, It is because Ferret sees that as a word break. This must be either because you are using an ASCII Analzyer (which I doubt) or your locale isn't set to handle UTF-8. You can set your locale like this: ENV['LANG'] = 'en_US.utf8' Or use whatever locale your data is stored as. Let me know if that helps. Cheers, Dave PS: if not all your data is UTF-8 you may need to convert it. In that case you should check out the Ruby's iconv standard library. From jk at jkraemer.net Tue Sep 19 14:38:32 2006 From: jk at jkraemer.net (Jens Kraemer) Date: Tue, 19 Sep 2006 20:38:32 +0200 Subject: [Ferret-talk] strange acts_as_ferret bug in my enviorment In-Reply-To: <45101D47.6070907@rubylicio.us> References: <45101D47.6070907@rubylicio.us> Message-ID: <20060919183832.GH17212@thunder.jkraemer.net> Bouncing this to the list since mail to the original sender doesn't seem to get through - hope he reads here... short intro to the problem - SQL queries get localized (',' instead of '.' as decimal separator) when having aaf in vendor/plugins... Hi! On Tue, Sep 19, 2006 at 06:39:35PM +0200, admin wrote: [..] > When I check activerecord logfiles I see that WITH acts_as_ferret > installed I have > --- > `unit_weight` = 50,0 > --- > > and Without acts_as_ferret I get: > --- > `unit_weight` = 50.0 > --- > > One has "," -- the other has ".". strange, I once had the same problem with strange Javascript errors, where some duration of an effect had a ',' instead of a '.'. I solved this by explicitly setting ENV['LC_NUMERIC'] = 'en_US.UTF-8' in environment.rb however I doubt acts_as_ferret is the problem, it might be Ferret in general. to check this, you could just require 'ferret' in environment.rb, with aaf removed from vendor/plugins. Ferret indeed does something with locales, it at least looks at the environment to determine what analyzer to use... What does your system environment look like, i.e. what gives calling `locale` in a terminal as the user your server runs with ? cheers, Jens -- Jens Kr?mer jk at jkraemer.net From shammond at patientslikeme.com Tue Sep 19 11:02:22 2006 From: shammond at patientslikeme.com (Steven Hammond) Date: Tue, 19 Sep 2006 11:02:22 -0400 Subject: [Ferret-talk] acts_as_ferret and Fuzzy Searching Message-ID: <4510067E.30204@patientslikeme.com> Hi there, I'd like to be able to tune the results of a Fuzzy search in a rails application. I've tried setting the following in my environment.rb file. Ferret::Search::FuzzyQuery.default_min_similarity = 0.75 Ferret::Search::FuzzyQuery.default_prefix_length = 2 When I go into the console, I can see those values as the default but when I run a search like Post.find_by_contents('word~') I always get the same results, no matter how I set the above variables. I can say Post.find_by_contents('word~0.75') and Post.find_by_contents('word~0.5') and get different results. Any help is appreciated. Thanks, Steve From ilya at fortehost.com Fri Sep 22 00:33:31 2006 From: ilya at fortehost.com (Ilya Grigorik) Date: Fri, 22 Sep 2006 06:33:31 +0200 Subject: [Ferret-talk] Win XP / Ferret & Acts_as_ferret .dump problem Message-ID: <0dd7a17d3f24fa5d2093e07b4d3d7f18@ruby-forum.com> Hey, Has anyone managed to bypass or fix the ferret's .dump method problem? When I include acts_as_ferret my whole rails app just blows up because of Ferret's .dump method. Ex: --- print "\t hello".dump >> "\t hello">Exit code: 0 --- --- require 'ferret' print "\t hello".dump >> " hello"(NUL char)>Exit code: 0 --- Essentially this breaks erb and thus Rails template engines dies. I've found topics on this issue before and best recommendation I've found is 'remove all tabs in your rhtml files'. Now, first off it fails even on non-tab characters in my case, so this is a no go... (Not to mention that this is hardly a solution) I'm running: ruby 1.8.4 (2006-04-14) [i386-mswin32] rails 1.1.6 ferret 0.10.4 (10.6 and all the rest all have same issue) -- Posted via http://www.ruby-forum.com/. From dbalmain.ml at gmail.com Fri Sep 22 02:24:28 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Fri, 22 Sep 2006 15:24:28 +0900 Subject: [Ferret-talk] QueryParser bug? In-Reply-To: References: Message-ID: On 9/22/06, Sam Giffney wrote: > I cooked up a little script to show what I mean. This doesn't look right > to me, but maybe I just completely misunderstand QueryParser. > Same output on mswin32, unix, ferret 0.9 and 0.10 > Cheers, Sam > > require 'rubygems' > require 'ferret' > p Ferret::VERSION # 0.10.6 > > index = Ferret::Index::Index.new() > > index << {:title => "Programming Ruby", :content => "yada yada yada"} > puts index.search("ruby").total_hits # returns 1 > > query_parser = Ferret::QueryParser.new( :default_field => :title ) > query = query_parser.parse("title:ruby") > puts index.search(query).total_hits # returns 1 > > query_parser = Ferret::QueryParser.new( ) > # :default_field: Default: "*" > # The default field to search when no field is specified in the search > string. > query = query_parser.parse("ruby") > puts index.search(query).total_hits # returns 0 Hi Sam, The way "*" works is it expands to query to search all fields specified by the :fields parameter. So you could also try this: query = query_parser.parse("*:ruby") puts index.search(query).total_hits # returns 0 Without knowing what fields are available the query parser wouldn't know how to expand the "*" field specifier. Note that you can print the query to to see how it was parsed. require 'rubygems' require 'ferret' p Ferret::VERSION # 0.10.6 index = Ferret::Index::Index.new() index << {:title => "Programming Ruby", :content => "yada yada yada"} puts index.search("ruby").total_hits # returns 1 query_parser = Ferret::QueryParser.new( :default_field => :title ) query = query_parser.parse("title:ruby") puts index.search(query).total_hits # returns 1 query_parser = Ferret::QueryParser.new( ) # :default_field: Default: "*" # The default field to search when no field is specified in the search string. query = query_parser.parse("ruby") puts query.to_s puts index.search(query).total_hits # returns 0 query_parser = Ferret::QueryParser.new( :fields => [:title, :content]) query = query_parser.parse("ruby") puts query.to_s puts index.search(query).total_hits # returns 0 Hope that clears things up for you. Cheers, Dave From Neville.Burnell at bmsoft.com.au Fri Sep 22 02:30:30 2006 From: Neville.Burnell at bmsoft.com.au (Neville Burnell) Date: Fri, 22 Sep 2006 16:30:30 +1000 Subject: [Ferret-talk] Error with :create => true and existing index Message-ID: <126EC586577FD611A28E00A0C9A0375886EE96@maui.bmsoft.com.au> I implemented a "reindex" command which simply creates an IndexWriter with :create => true for a prexisting index. The "reindexing" seems to start out ok, with several thousand docs added, then Ferret throws an exception: IO Error occured: couldn't rename file "index\_0.tmp" to "index\_0.cfs": I guess that _0.cfs is held open by an IndexReader, so the :create is failing to delete it and hence the rename is failing. Kind Regards Neville From kraemer at webit.de Fri Sep 22 05:17:04 2006 From: kraemer at webit.de (Jens Kraemer) Date: Fri, 22 Sep 2006 11:17:04 +0200 Subject: [Ferret-talk] Win XP / Ferret & Acts_as_ferret .dump problem In-Reply-To: <0dd7a17d3f24fa5d2093e07b4d3d7f18@ruby-forum.com> References: <0dd7a17d3f24fa5d2093e07b4d3d7f18@ruby-forum.com> Message-ID: <20060922091703.GA11602@cordoba.webit.de> Hi! please see http://rubyforge.org/tracker/?func=detail&aid=3837&group_id=12&atid=133 for extensive info on this issue. Apparently this doesn't have to do with Ferret itself, but with you using another ruby version than the one the Ferret binary extension was built against. I don't know, however, with which Ruby version the win32 gem is built. Jens On Fri, Sep 22, 2006 at 06:33:31AM +0200, Ilya Grigorik wrote: > Hey, > > Has anyone managed to bypass or fix the ferret's .dump method problem? > When I include acts_as_ferret my whole rails app just blows up because > of Ferret's .dump method. Ex: > > --- > print "\t hello".dump >> "\t hello">Exit code: 0 > --- > > --- > require 'ferret' > print "\t hello".dump >> " hello"(NUL char)>Exit code: 0 > --- > > Essentially this breaks erb and thus Rails template engines dies. I've > found topics on this issue before and best recommendation I've found is > 'remove all tabs in your rhtml files'. Now, first off it fails even on > non-tab characters in my case, so this is a no go... (Not to mention > that this is hardly a solution) > > I'm running: > ruby 1.8.4 (2006-04-14) [i386-mswin32] > rails 1.1.6 > ferret 0.10.4 (10.6 and all the rest all have same issue) > > -- > Posted via http://www.ruby-forum.com/. > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From dbalmain.ml at gmail.com Fri Sep 22 05:55:47 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Fri, 22 Sep 2006 18:55:47 +0900 Subject: [Ferret-talk] Win XP / Ferret & Acts_as_ferret .dump problem In-Reply-To: <0dd7a17d3f24fa5d2093e07b4d3d7f18@ruby-forum.com> References: <0dd7a17d3f24fa5d2093e07b4d3d7f18@ruby-forum.com> Message-ID: On 9/22/06, Ilya Grigorik wrote: > Hey, > > Has anyone managed to bypass or fix the ferret's .dump method problem? > When I include acts_as_ferret my whole rails app just blows up because > of Ferret's .dump method. Ex: > > --- > print "\t hello".dump >> "\t hello">Exit code: 0 > --- > > --- > require 'ferret' > print "\t hello".dump >> " hello"(NUL char)>Exit code: 0 > --- > > Essentially this breaks erb and thus Rails template engines dies. I've > found topics on this issue before and best recommendation I've found is > 'remove all tabs in your rhtml files'. Now, first off it fails even on > non-tab characters in my case, so this is a no go... (Not to mention > that this is hardly a solution) > > I'm running: > ruby 1.8.4 (2006-04-14) [i386-mswin32] > rails 1.1.6 > ferret 0.10.4 (10.6 and all the rest all have same issue) Hi Ilya, Firstly, String#dump isn't a Ferret method. Secondly, I have no idea why this is occuring. The error isn't happening in Ferret code so it is difficult for me to diagnose the problem. I didn't realize that it was this easy to replicate the problem though so I'll have another look at it tonight. Don't get your hopes up though. It isn't only Ferret that is causing this problem so I'm not goig to count out the possibility that the bug is in Ruby of VC6. Cheers, Dave From dbalmain.ml at gmail.com Fri Sep 22 06:46:48 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Fri, 22 Sep 2006 19:46:48 +0900 Subject: [Ferret-talk] Error with :create => true and existing index In-Reply-To: <126EC586577FD611A28E00A0C9A0375886EE96@maui.bmsoft.com.au> References: <126EC586577FD611A28E00A0C9A0375886EE96@maui.bmsoft.com.au> Message-ID: On 9/22/06, Neville Burnell wrote: > I implemented a "reindex" command which simply creates an IndexWriter > with :create => true for a prexisting index. > > The "reindexing" seems to start out ok, with several thousand docs > added, then Ferret throws an exception: > > IO Error occured: couldn't rename file "index\_0.tmp" to "index\_0.cfs": > > > I guess that _0.cfs is held open by an IndexReader, so the :create is > failing to delete it and hence the rename is failing. > > Kind Regards > > Neville > Hi Neville, Probably a good guess. That is why you need to close your IndexReader when you finish with it. I'm not sure if you are asking for a confirmation that this was the problem or looking for a solution. Let me know if closing all IndexReaders doesn't fix the error. Cheers, Dave From alex at blackkettle.org Fri Sep 22 08:25:56 2006 From: alex at blackkettle.org (Alex Young) Date: Fri, 22 Sep 2006 13:25:56 +0100 Subject: [Ferret-talk] IOError on clearing locks Message-ID: <4513D654.4080909@blackkettle.org> Hi all, I've got a slight problem with using Ferret in unit tests. In order to create as little cross-contamination between test suites as possible, some of my tests are creating a fresh index per test case, and then calling Index#close and deleting the containing dir during the teardown. The problem comes when GC.start kicks in after the deleting the directory: IOError: IO Error occured at :79 in xraise Error occured in fs_store.c:146 - fs_clear_locks clearing locks in persistence_path/00000000001: The persistence_path/ directory is the one that was File.rm_r'd. How can I stop this from happening? Is it a bug, or have I messed something up? This worked with 0.9.5, but doesn't now that I've updated to 0.10.6. Any clues? -- Alex From bk at benjaminkrause.com Fri Sep 22 09:36:04 2006 From: bk at benjaminkrause.com (Benjamin Krause) Date: Fri, 22 Sep 2006 15:36:04 +0200 Subject: [Ferret-talk] Query Objects vs. Query Strings Message-ID: <4513E6C4.1060905@benjaminkrause.com> Hi .. I tried to build some query objects to get some documents from my index.. without success.. Is something wrong here? q = Ferret::Search::BooleanQuery.new q1 = Ferret::Search::TermQuery.new(:type, "movie") q2 = Ferret::Search::TermQuery.new(:name, "Indiana") q.add_query(q1, :should) q.add_query(q2, :should) Indexer.index.search_each(q) do |doc, score| puts doc end 0 Indexer.index.search_each(q.to_s) do |doc, score| puts doc end 70 65 68 5368 197 => 5 Ben From bk at benjaminkrause.com Fri Sep 22 09:48:16 2006 From: bk at benjaminkrause.com (Benjamin Krause) Date: Fri, 22 Sep 2006 15:48:16 +0200 Subject: [Ferret-talk] Searching untokenized fields Message-ID: <4513E9A0.4050406@benjaminkrause.com> Hi .. I tried to exclude certain objects from my search, by adding appropriate term queries .. i = Ferret::Index::Index.new i.field_infos.add_field(:type, :index => :untokenized, :term_vector => :no) i << {:type => "Movie", :name => "Indiana" } i << {:type => "Movie", :name => "Forrest" } i << {:type => "People", :name => "Forrest" } now searching for forrest should give 2 results.. >> i.search_each("forrest") do end => 2 now i would like to exclude the movie, so i tried to do this: >> i.search_each("forrest AND NOT type:movie") do end => 2 >> i.search_each("forrest AND NOT type:Movie") do end => 2 So how to exclude objects with a certain untokenized value in it? having the field tokenized works great .. Ben From dbalmain.ml at gmail.com Fri Sep 22 11:52:44 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Sat, 23 Sep 2006 00:52:44 +0900 Subject: [Ferret-talk] IOError on clearing locks In-Reply-To: <4513D654.4080909@blackkettle.org> References: <4513D654.4080909@blackkettle.org> Message-ID: On 9/22/06, Alex Young wrote: > Hi all, > > I've got a slight problem with using Ferret in unit tests. In order to > create as little cross-contamination between test suites as possible, > some of my tests are creating a fresh index per test case, and then > calling Index#close and deleting the containing dir during the teardown. > The problem comes when GC.start kicks in after the deleting the directory: > > IOError: IO Error occured at :79 in xraise > Error occured in fs_store.c:146 - fs_clear_locks > clearing locks in persistence_path/00000000001: or directory> > > The persistence_path/ directory is the one that was File.rm_r'd. How > can I stop this from happening? Is it a bug, or have I messed something > up? This worked with 0.9.5, but doesn't now that I've updated to > 0.10.6. Any clues? > > -- > Alex > Hi Alex, This is a bug which I'm fixing right now. If you open any FSDirectories then you must close them too before you rm_f the index dir. Unfortunately FSDirectory#close doesn't currently work and the Index class doesn't call it either so try 0.10.7 when I release it. Cheers, Dave From dbalmain.ml at gmail.com Fri Sep 22 13:11:12 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Sat, 23 Sep 2006 02:11:12 +0900 Subject: [Ferret-talk] Win XP / Ferret & Acts_as_ferret .dump problem In-Reply-To: <20060922091703.GA11602@cordoba.webit.de> References: <0dd7a17d3f24fa5d2093e07b4d3d7f18@ruby-forum.com> <20060922091703.GA11602@cordoba.webit.de> Message-ID: On 9/22/06, Jens Kraemer wrote: > Hi! > > please see > http://rubyforge.org/tracker/?func=detail&aid=3837&group_id=12&atid=133 > for extensive info on this issue. > > Apparently this doesn't have to do with Ferret itself, but with you using > another ruby version than the one the Ferret binary extension was built > against. I don't know, however, with which Ruby version the win32 gem is > built. > > > Jens After looking into this myself I think Tim must be right, although for some reason I'm having trouble replicating the error here, even when I do install different versions of Ruby. Anyway, I will endeavor to always build Ferret with the latest stable version of One-Click Ruby. Currently that is 1.8.4-20 stable. If you are still running into this problem and you have that version of Ruby installed the let me know. Make sure you try restarting your computer first though. I think that may be why I couldn't replicate the problem. Cheers, Dave From dbalmain.ml at gmail.com Fri Sep 22 14:10:35 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Sat, 23 Sep 2006 03:10:35 +0900 Subject: [Ferret-talk] Query Objects vs. Query Strings In-Reply-To: <4513E6C4.1060905@benjaminkrause.com> References: <4513E6C4.1060905@benjaminkrause.com> Message-ID: You need to downcase Indiana. The QueryParser does it for you but when you build the Query object yourself you need to make sure the terms are downcase as if they had already been through the Analyzer. Similarly, if you use a StopFilter then you shouldn't add "the" (or "das" or whatever is a stop-word in the language you are working with) terms to your queries. That's why the QueryParser is so useful. It handles all of this for you. :-) On 9/22/06, Benjamin Krause wrote: > Hi .. > > I tried to build some query objects to get some documents from my > index.. without success.. Is something wrong here? > > q = Ferret::Search::BooleanQuery.new > q1 = Ferret::Search::TermQuery.new(:type, "movie") > q2 = Ferret::Search::TermQuery.new(:name, "Indiana") > q.add_query(q1, :should) > q.add_query(q2, :should) > > Indexer.index.search_each(q) do |doc, score| puts doc end > 0 > Indexer.index.search_each(q.to_s) do |doc, score| puts doc end > 70 > 65 > 68 > 5368 > 197 > => 5 > > Ben > > > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > From dbalmain.ml at gmail.com Fri Sep 22 14:17:01 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Sat, 23 Sep 2006 03:17:01 +0900 Subject: [Ferret-talk] Searching untokenized fields In-Reply-To: <4513E9A0.4050406@benjaminkrause.com> References: <4513E9A0.4050406@benjaminkrause.com> Message-ID: On 9/22/06, Benjamin Krause wrote: > Hi .. > > I tried to exclude certain objects from my search, by adding appropriate > term queries .. > > i = Ferret::Index::Index.new > i.field_infos.add_field(:type, :index => :untokenized, :term_vector => :no) > i << {:type => "Movie", :name => "Indiana" } > i << {:type => "Movie", :name => "Forrest" } > i << {:type => "People", :name => "Forrest" } > > now searching for forrest should give 2 results.. > > >> i.search_each("forrest") do end > => 2 > > now i would like to exclude the movie, so i tried to do this: > > >> i.search_each("forrest AND NOT type:movie") do end > => 2 > >> i.search_each("forrest AND NOT type:Movie") do end > => 2 > > So how to exclude objects with a certain untokenized value in it? having > the field tokenized works great .. > > Ben This is happening because in this case the QueryParser is downcasing "Movie", even though it remained in uppercase in the index. You need to give the QueryParser a specialized Analyzer that won't analyze untokenized fields. This is will be automated in the next version of Ferret: 0.10.7 for the Index class. From sera at fhwang.net Fri Sep 22 17:30:23 2006 From: sera at fhwang.net (Francis Hwang) Date: Fri, 22 Sep 2006 17:30:23 -0400 Subject: [Ferret-talk] strange matching: maybe a multilanguage collation problem? In-Reply-To: References: <247D2FF6-4223-41B3-AB82-6FBAD03ED877@fhwang.net> Message-ID: <4DF6098D-7665-45E1-9C57-3AE738965EB3@fhwang.net> On Sep 21, 2006, at 10:20 PM, David Balmain wrote: > On 9/22/06, Francis Hwang wrote: >> Hi, >> >> We're using Ferret in a slightly unorthodox way: We're indexing a >> large (>100,000) list of names of places all around the world. Mostly >> we're quite happy with it, and have been able to graft on our own >> particular required functionality with just a little tweaking. >> >> There's one strange problem, though: We've got a place in Cyprus >> called "Gazima\304\237usa" (that \304\237 is a multibyte character in >> UTF-8), and it matches a search for "usa". We'd rather it not match. >> I don't know that much about Ferret or about this sort of indexing in >> general, but is this because Ferret views \304\237 as a word break, >> and splits the name into two words? If so, is there a way you'd >> recommend to get around this -- keeping in mind that we've got names >> in romanized forms of many different languages? >> >> Thanks in advance, >> >> Francis > > Hi Francis, > > It is because Ferret sees that as a word break. This must be either > because you are using an ASCII Analzyer (which I doubt) or your locale > isn't set to handle UTF-8. You can set your locale like this: > > ENV['LANG'] = 'en_US.utf8' > > Or use whatever locale your data is stored as. Let me know if that > helps. > > Cheers, > Dave > > PS: if not all your data is UTF-8 you may need to convert it. In that > case you should check out the Ruby's iconv standard library. I tried that and it made no difference. The data is in UTF-8 already. And as far as the analyzer, we're just using the StandardAnalyzer. (I actually don't know much about what all the different analyzers do, at any rate.) Any other ideas? Francis From bk at benjaminkrause.com Fri Sep 22 17:40:28 2006 From: bk at benjaminkrause.com (Benjamin Krause) Date: Fri, 22 Sep 2006 23:40:28 +0200 Subject: [Ferret-talk] Searching untokenized fields In-Reply-To: References: <4513E9A0.4050406@benjaminkrause.com> Message-ID: <4514584C.6010008@benjaminkrause.com> > This is happening because in this case the QueryParser is downcasing > "Movie", even though it remained in uppercase in the index. You need > to give the QueryParser a specialized Analyzer that won't analyze > untokenized fields. This is will be automated in the next version of > Ferret: 0.10.7 for the Index class. great.. thanks :-) From alex at blackkettle.org Fri Sep 22 17:52:38 2006 From: alex at blackkettle.org (Alex Young) Date: Fri, 22 Sep 2006 22:52:38 +0100 Subject: [Ferret-talk] IOError on clearing locks In-Reply-To: References: <4513D654.4080909@blackkettle.org> Message-ID: <45145B26.3030800@blackkettle.org> David Balmain wrote: > On 9/22/06, Alex Young wrote: >> Hi all, >> >> I've got a slight problem with using Ferret in unit tests. In order to >> create as little cross-contamination between test suites as possible, >> some of my tests are creating a fresh index per test case, and then >> calling Index#close and deleting the containing dir during the teardown. >> The problem comes when GC.start kicks in after the deleting the directory: >> >> IOError: IO Error occured at :79 in xraise >> Error occured in fs_store.c:146 - fs_clear_locks >> clearing locks in persistence_path/00000000001: > or directory> >> >> The persistence_path/ directory is the one that was File.rm_r'd. How >> can I stop this from happening? Is it a bug, or have I messed something >> up? This worked with 0.9.5, but doesn't now that I've updated to >> 0.10.6. Any clues? >> >> -- >> Alex >> > > Hi Alex, > > This is a bug which I'm fixing right now. If you open any > FSDirectories then you must close them too before you rm_f the index > dir. Unfortunately FSDirectory#close doesn't currently work and the > Index class doesn't call it either so try 0.10.7 when I release it. Ah, ok. It's not actually affecting live code, it just makes a mess of my tests as it stands. Keep up the good work :-) -- Alex From ilya at fortehost.com Fri Sep 22 21:10:47 2006 From: ilya at fortehost.com (Ilya Grigorik) Date: Sat, 23 Sep 2006 03:10:47 +0200 Subject: [Ferret-talk] Win XP / Ferret & Acts_as_ferret .dump problem In-Reply-To: References: <0dd7a17d3f24fa5d2093e07b4d3d7f18@ruby-forum.com> <20060922091703.GA11602@cordoba.webit.de> Message-ID: Dave and Tim.. I tried running all ms-win versions of ferret on 1.8.5 / 1.8.4-20 and even 1.8.2, all of them resulted in the same error. It's absurd, I don't understand why you guys are unable to reproduce this error. (I did restart and all that jazz... with each reinstall of ruby etc). Mind you, I only tried this on my main development machine so I will see if I can reproduce this on my laptop in a second. From my readings on this problem, RMagick seems to have had the same bug because it overwrote the default .dump method. Hence I assumed that Ferret is doing the same. I'll let you guys know on the results of my experiment.. Ilya -- Posted via http://www.ruby-forum.com/. From ilya at fortehost.com Fri Sep 22 21:31:51 2006 From: ilya at fortehost.com (Ilya Grigorik) Date: Sat, 23 Sep 2006 03:31:51 +0200 Subject: [Ferret-talk] Win XP / Ferret & Acts_as_ferret .dump problem In-Reply-To: References: <0dd7a17d3f24fa5d2093e07b4d3d7f18@ruby-forum.com> <20060922091703.GA11602@cordoba.webit.de> Message-ID: <6141455475abb0854179ef5af1b09af4@ruby-forum.com> Same problem, 1.8.4-20 stable... C:\Documents and Settings\Ilya>irb irb(main):001:0> "\t\tabcd".dump => "\"\\t\\tabcd\"" irb(main):002:0> require 'ferret' => true irb(main):003:0> "\t\tabcd".dump => "\" abcd\"\000\000" irb(main):004:0> Null chars, spaces.. Oi. Am I missing something something in my install? I'm just installing the one-click ruby and then 'gem install ferret'. Any dependencies I'm not aware of? Ilya -- Posted via http://www.ruby-forum.com/. From Neville.Burnell at bmsoft.com.au Fri Sep 22 22:05:04 2006 From: Neville.Burnell at bmsoft.com.au (Neville Burnell) Date: Sat, 23 Sep 2006 12:05:04 +1000 Subject: [Ferret-talk] Error with :create => true and existing index Message-ID: <126EC586577FD611A28E00A0C9A0375886EE9B@maui.bmsoft.com.au> > I'm not sure if you are asking for a confirmation that this was the > problem or looking for a solution. I guess I am looking for a solution to rebuilding the index while readers are active. For the time being I could simply delete all docs, then add all docs. Would a IndexWriter.delete_all be possible? Kind Regards Neville From dbalmain.ml at gmail.com Sat Sep 23 00:56:23 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Sat, 23 Sep 2006 13:56:23 +0900 Subject: [Ferret-talk] strange matching: maybe a multilanguage collation problem? In-Reply-To: <4DF6098D-7665-45E1-9C57-3AE738965EB3@fhwang.net> References: <247D2FF6-4223-41B3-AB82-6FBAD03ED877@fhwang.net> <4DF6098D-7665-45E1-9C57-3AE738965EB3@fhwang.net> Message-ID: On 9/23/06, Francis Hwang wrote: > On Sep 21, 2006, at 10:20 PM, David Balmain wrote: > > > On 9/22/06, Francis Hwang wrote: > >> Hi, > >> > >> We're using Ferret in a slightly unorthodox way: We're indexing a > >> large (>100,000) list of names of places all around the world. Mostly > >> we're quite happy with it, and have been able to graft on our own > >> particular required functionality with just a little tweaking. > >> > >> There's one strange problem, though: We've got a place in Cyprus > >> called "Gazima\304\237usa" (that \304\237 is a multibyte character in > >> UTF-8), and it matches a search for "usa". We'd rather it not match. > >> I don't know that much about Ferret or about this sort of indexing in > >> general, but is this because Ferret views \304\237 as a word break, > >> and splits the name into two words? If so, is there a way you'd > >> recommend to get around this -- keeping in mind that we've got names > >> in romanized forms of many different languages? > >> > >> Thanks in advance, > >> > >> Francis > > > > Hi Francis, > > > > It is because Ferret sees that as a word break. This must be either > > because you are using an ASCII Analzyer (which I doubt) or your locale > > isn't set to handle UTF-8. You can set your locale like this: > > > > ENV['LANG'] = 'en_US.utf8' > > > > Or use whatever locale your data is stored as. Let me know if that > > helps. > > > > Cheers, > > Dave > > > > PS: if not all your data is UTF-8 you may need to convert it. In that > > case you should check out the Ruby's iconv standard library. > > I tried that and it made no difference. The data is in UTF-8 already. > And as far as the analyzer, we're just using the StandardAnalyzer. (I > actually don't know much about what all the different analyzers do, > at any rate.) Any other ideas? > > Francis Hi Francis, I don't really have any other ideas. Did you re-index the data after you set ENV["LANG"]? Could you try this code and tell me what you get; require 'rubygems' require 'ferret' p Ferret::VERSION # 0.10.6 p Ferret::locale # "en_US.UTF-8" index = Ferret::I.new() index << {:place => "Gazima\304\237usa"} index << {:place => "U.S.A."} puts "Search: USA" index.search_each("USA") {|id, score| puts index[id][:place]} # Search: USA # U.S.A. puts "Search: Gazima\304\237usa" index.search_each("Gazima\304\237usa") {|id, score| puts index[id][:place]} # Search: Gazima?usa # Gazima?usa Cheers, Dave From dbalmain.ml at gmail.com Sat Sep 23 02:47:08 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Sat, 23 Sep 2006 15:47:08 +0900 Subject: [Ferret-talk] Win XP / Ferret & Acts_as_ferret .dump problem In-Reply-To: References: <0dd7a17d3f24fa5d2093e07b4d3d7f18@ruby-forum.com> <20060922091703.GA11602@cordoba.webit.de> Message-ID: On 9/23/06, Ilya Grigorik wrote: > Dave and Tim.. > > I tried running all ms-win versions of ferret on 1.8.5 / 1.8.4-20 and > even 1.8.2, all of them resulted in the same error. It's absurd, I don't > understand why you guys are unable to reproduce this error. (I did > restart and all that jazz... with each reinstall of ruby etc). > > Mind you, I only tried this on my main development machine so I will see > if I can reproduce this on my laptop in a second. From my readings on > this problem, RMagick seems to have had the same bug because it > overwrote the default .dump method. Hence I assumed that Ferret is doing > the same. Hi Ilya, Actually, if you look at that link that Jens posted, RMagick doesn't overwrite the String#dump method; "Of course RMagick doesn't intentionally alter String.dump, and your example doesn't even use RMagick for anything." And neither does Ferret, although I can certainly see why people would think it did. Believe me, I understand your frustration. If I could find the problem I would. I'll give it a try on my laptop and see if I can reproduce the problem there. Perhaps you could try asking on the One-Click Ruby list or on the ruby-talk list. I'll keep trying to find the problem at this end too. Cheers, Dave From dbalmain.ml at gmail.com Sat Sep 23 02:54:27 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Sat, 23 Sep 2006 15:54:27 +0900 Subject: [Ferret-talk] Error with :create => true and existing index In-Reply-To: <126EC586577FD611A28E00A0C9A0375886EE9B@maui.bmsoft.com.au> References: <126EC586577FD611A28E00A0C9A0375886EE9B@maui.bmsoft.com.au> Message-ID: On 9/23/06, Neville Burnell wrote: > > I'm not sure if you are asking for a confirmation that this was the > > problem or looking for a solution. > > I guess I am looking for a solution to rebuilding the index while > readers are active. > > For the time being I could simply delete all docs, then add all docs. > > Would a IndexWriter.delete_all be possible? > > Kind Regards > > Neville Hi Neville, You could do it like this: reader.max_doc.times {|i| reader.delete(i)} That will delete all documents. Make sure you close your IndexWriter before doing this though as the IndexReader will try and obtain locks on the index for the deletions. Alternatively you could use a rotating index directory. When you rebuild, rebuild in a new directory. Then when you have finished rebuilding, re-open the IndexReaders on the new directory. Both solutions should work equally well. Cheers, Dave From ilya at fortehost.com Sat Sep 23 12:29:09 2006 From: ilya at fortehost.com (Ilya Grigorik) Date: Sat, 23 Sep 2006 18:29:09 +0200 Subject: [Ferret-talk] Win XP / Ferret & Acts_as_ferret .dump problem In-Reply-To: References: <0dd7a17d3f24fa5d2093e07b4d3d7f18@ruby-forum.com> <20060922091703.GA11602@cordoba.webit.de> Message-ID: <426978784cdd2e7bb1d6e0b31e296a8e@ruby-forum.com> Thanks Dave.. I've finally managed to find a version of Ruby (1.8.2-15) which passes the .dump test above. Using: http://rubyforge.org/frs/?group_id=167 Now, the headache is.. I can't find an RMagick gem to go with it (which I need for my Rails app) - the rmagick repo only has gems for 1.8.4 and 1.8.5. Sigh. :) I know this won't be a problem when I deploy my app, but it totally screws up my ability to development an app on XP. :) Ilya -- Posted via http://www.ruby-forum.com/. From wmorgan-ferret at masanjin.net Sat Sep 23 17:22:52 2006 From: wmorgan-ferret at masanjin.net (William Morgan) Date: Sat, 23 Sep 2006 14:22:52 -0700 Subject: [Ferret-talk] TermQuery problem Message-ID: <20060923212252.GA3808@masanjin.net> Hi, Using the 0.10.4 gem under ruby 1.8.5 (2006-08-25) [i686-linux], I get different results with a TermQuery and a search string. Namely, using a search string seems to always work whereas using a TermQuery often doesn't return any entries. For example: > x=@i[450][:message_id] => "9e7db9110509070759732b21c4 at mail.gmail.com" > @i.search("message_id:#{x}") => #], max_score=6.51688194274902> > @i.search(Ferret::Search::TermQuery.new(:message_id, x)) => # But sometimes it works fine: > x=@i[123][:message_id] => "c715e64050831145815d9262c at mail.gmail.com" > @i.search("message_id:#{x}") => #], max_score=7.21260595321655> > @i.search(Ferret::Search::TermQuery.new(:message_id, x)) => #], max_score=7.21260595321655> So how come the first search doesn't return anything? FWIW, I am creating the index like this: field_infos = Ferret::Index::FieldInfos.new :store => :yes field_infos.add_field :message_id # ... field_infos.create_index dir @i = Ferret::Index::Index.new(:path => dir) Thanks for any help! -- William From wmorgan-ferret at masanjin.net Sat Sep 23 17:32:30 2006 From: wmorgan-ferret at masanjin.net (William Morgan) Date: Sat, 23 Sep 2006 14:32:30 -0700 Subject: [Ferret-talk] svn problems Message-ID: <20060923213230.GB3808@masanjin.net> I can consistently segfault the 0.10.4 gem, so I'm trying to get the subversion version working with hopes towards tracking the problem down. I have a fresh SVN checkout but: a) the version (in ferret.rb) claims to be 0.9.6; and b) Ferret::Index::FieldInfos and a couple other classes are missing at run time. It looks like this is because they're not exported in the C extension (although I do see the corresponding C objects in the code.) Have I managed to acquire some outdated version of Ferret? Thanks for any help! -- William From dbalmain.ml at gmail.com Sat Sep 23 23:52:25 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Sun, 24 Sep 2006 12:52:25 +0900 Subject: [Ferret-talk] svn problems In-Reply-To: <20060923213230.GB3808@masanjin.net> References: <20060923213230.GB3808@masanjin.net> Message-ID: On 9/24/06, William Morgan wrote: > I can consistently segfault the 0.10.4 gem, so I'm trying to get the > subversion version working with hopes towards tracking the problem down. > > I have a fresh SVN checkout but: > > a) the version (in ferret.rb) claims to be 0.9.6; and > b) Ferret::Index::FieldInfos and a couple other classes are missing at > run time. It looks like this is because they're not exported in the C > extension (although I do see the corresponding C objects in the > code.) > > Have I managed to acquire some outdated version of Ferret? > > Thanks for any help! Hi William, The 0.10.* series was developed in a different subversion repository. You can check it out from: $ svn co svn://www.davebalmain.com/exp ferret If I have time today I might roll it into the original repository. I'm not sure exactly how I'm going to do it though. By the way, the 0.10.7 gem is out and it has all changes in it, including the fix for your TermQuery problem. Cheers, Dave From dbalmain.ml at gmail.com Sun Sep 24 00:53:12 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Sun, 24 Sep 2006 13:53:12 +0900 Subject: [Ferret-talk] [ANN] Ferret 0.10.7 released Message-ID: Hey guys, I've just released Ferret 0.10.7. It is still in beta but we are getting closer and closer to a 1.0 release. The main changes in this release are better handling of fields by the QueryParser. You can now give the QueryParser a list of fields that are tokenized so that only those fields will be analyzed in the QueryParser. This means that you can search untokenized fields for terms with spaces in them. For example: results = index.search('title:Shawshank\ Redemption') If :title is an untokenized field, this query will be parsed as a single TermQuery. Previously the search would have failed. I've also changed the StandardTokenizer behaviour so that it will handle email addresses that start with numbers. This also means that 200km will be parsed as a single term instead of seperate terms. Whether this is a good thing or not is a matter of taste. You can easily use a filter to split these terms up if you need to. There are a lot of other bug fixes as well so Ferret should be a little more stable. Happy Ferreting, Dave From wmorgan-ferret at masanjin.net Sun Sep 24 15:20:21 2006 From: wmorgan-ferret at masanjin.net (William Morgan) Date: Sun, 24 Sep 2006 12:20:21 -0700 Subject: [Ferret-talk] svn problems In-Reply-To: References: <20060923213230.GB3808@masanjin.net> Message-ID: <20060924192021.GB16880@masanjin.net> Excerpts from David Balmain's mail of 23 Sep 2006 (PDT): > The 0.10.* series was developed in a different subversion repository. > You can check it out from: > > $ svn co svn://www.davebalmain.com/exp ferret Thanks! See patch in following message. > By the way, the 0.10.7 gem is out and it has all changes in it, > including the fix for your TermQuery problem. Sadly it doesn't seem to fix the problem, but I'll spend some more time playing around now that I have the updated source. -- William From wmorgan-ferret at masanjin.net Sun Sep 24 18:44:34 2006 From: wmorgan-ferret at masanjin.net (William Morgan) Date: Sun, 24 Sep 2006 15:44:34 -0700 Subject: [Ferret-talk] [patch] buffer overflow in q_parser.y Message-ID: <20060924224434.GD16880@masanjin.net> Hi Dave, The patch below corrects a buffer overflow bug in q_parser.y. Since it is triggered by excessively long query strings, I believe that this bug could be exploited to allow arbitrary code execution if a query string supplied by a user is passed in directly to Ferret and not truncatated. If I'm right, you should consider a new release asap. I've fixed it to simply allocate more memory if the standard buffers aren't enough (because I had some long (i.e. > 255) query strings that I needed to support), but there are other solutions as well. Index: c/include/search.h =================================================================== --- c/include/search.h (revision 615) +++ c/include/search.h (working copy) @@ -819,6 +819,7 @@ char *qstr; char *qstrp; char buf[QP_CONC_WORDS][MAX_WORD_SIZE]; + char *dynbuf; int buf_index; HashTable *field_cache; HashSet *fields; Index: c/src/q_parser.y =================================================================== --- c/src/q_parser.y (revision 615) +++ c/src/q_parser.y (working copy) @@ -173,6 +173,11 @@ char *bufp = buf; qp->buf_index = (qp->buf_index + 1) % QP_CONC_WORDS; + if (qp->dynbuf) { + free(qp->dynbuf); + qp->dynbuf = NULL; + } + qp->qstrp--; /* need to back up one character */ while (!strchr(not_word, (c=*qp->qstrp++))) { @@ -192,6 +197,15 @@ default: *bufp++ = c; } + /* we've exceeded the static buffer. switch to the dynamic + one. */ + + if (!qp->dynbuf && ((bufp - buf) == MAX_WORD_SIZE)) { + qp->dynbuf = calloc(strlen(qp->qstr) + 1, sizeof(char)); + strncpy(qp->dynbuf, buf, MAX_WORD_SIZE); + buf = qp->dynbuf; + bufp = buf + MAX_WORD_SIZE; + } } qp->qstrp--; /* check for keywords. There are only four so we have a bit of a hack which @@ -262,7 +276,7 @@ } mutex_unlock(&qp->mutex); RAISE(PARSE_ERROR, "couldn't parse query ``%s''. Error message " - " was %se", buf, (char *)msg); + "was: %s", buf, (char *)msg); } return 0; } @@ -707,6 +721,9 @@ if (self->tokenized_fields) { hs_destroy(self->tokenized_fields); } + if (self->dynbuf) { + free(self->dynbuf); + } hs_destroy(self->all_fields); hs_destroy(self->fields_buf); h_destroy(self->field_cache); @@ -754,6 +771,7 @@ self->analyzer = analyzer; self->ts_cache = h_new_str(&free, (free_ft)&ts_deref); self->buf_index = 0; + self->dynbuf = NULL; self->non_tokenizer = non_tokenizer_new(); mutex_init(&self->mutex, NULL); return self; -- William From dbalmain.ml at gmail.com Sun Sep 24 19:27:39 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Mon, 25 Sep 2006 08:27:39 +0900 Subject: [Ferret-talk] svn problems In-Reply-To: <20060924192021.GB16880@masanjin.net> References: <20060923213230.GB3808@masanjin.net> <20060924192021.GB16880@masanjin.net> Message-ID: On 9/25/06, William Morgan wrote: > Excerpts from David Balmain's mail of 23 Sep 2006 (PDT): > > The 0.10.* series was developed in a different subversion repository. > > You can check it out from: > > > > $ svn co svn://www.davebalmain.com/exp ferret > > Thanks! See patch in following message. > > > By the way, the 0.10.7 gem is out and it has all changes in it, > > including the fix for your TermQuery problem. > > Sadly it doesn't seem to fix the problem, but I'll spend some more time > playing around now that I have the updated source. Hi William, Did you rebuild the index? You'll need to do that before it makes any difference. Cheers, Dave From dbalmain.ml at gmail.com Sun Sep 24 22:09:20 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Mon, 25 Sep 2006 11:09:20 +0900 Subject: [Ferret-talk] [patch] buffer overflow in q_parser.y In-Reply-To: <20060924224434.GD16880@masanjin.net> References: <20060924224434.GD16880@masanjin.net> Message-ID: On 9/25/06, William Morgan wrote: > Hi Dave, > > The patch below corrects a buffer overflow bug in q_parser.y. Since it > is triggered by excessively long query strings, I believe that this bug > could be exploited to allow arbitrary code execution if a query string > supplied by a user is passed in directly to Ferret and not truncatated. > If I'm right, you should consider a new release asap. > > I've fixed it to simply allocate more memory if the standard buffers > aren't enough (because I had some long (i.e. > 255) query strings that I > needed to support), but there are other solutions as well. > > Index: c/include/search.h > =================================================================== > --- c/include/search.h (revision 615) > +++ c/include/search.h (working copy) > @@ -819,6 +819,7 @@ > char *qstr; > char *qstrp; > char buf[QP_CONC_WORDS][MAX_WORD_SIZE]; > + char *dynbuf; > int buf_index; > HashTable *field_cache; > HashSet *fields; > Index: c/src/q_parser.y > =================================================================== > --- c/src/q_parser.y (revision 615) > +++ c/src/q_parser.y (working copy) > @@ -173,6 +173,11 @@ > char *bufp = buf; > qp->buf_index = (qp->buf_index + 1) % QP_CONC_WORDS; > > + if (qp->dynbuf) { > + free(qp->dynbuf); > + qp->dynbuf = NULL; > + } > + > qp->qstrp--; /* need to back up one character */ > > while (!strchr(not_word, (c=*qp->qstrp++))) { > @@ -192,6 +197,15 @@ > default: > *bufp++ = c; > } > + /* we've exceeded the static buffer. switch to the dynamic > + one. */ > + > + if (!qp->dynbuf && ((bufp - buf) == MAX_WORD_SIZE)) { > + qp->dynbuf = calloc(strlen(qp->qstr) + 1, sizeof(char)); > + strncpy(qp->dynbuf, buf, MAX_WORD_SIZE); > + buf = qp->dynbuf; > + bufp = buf + MAX_WORD_SIZE; > + } > } > qp->qstrp--; > /* check for keywords. There are only four so we have a bit of a hack which > @@ -262,7 +276,7 @@ > } > mutex_unlock(&qp->mutex); > RAISE(PARSE_ERROR, "couldn't parse query ``%s''. Error message " > - " was %se", buf, (char *)msg); > + "was: %s", buf, (char *)msg); > } > return 0; > } > @@ -707,6 +721,9 @@ > if (self->tokenized_fields) { > hs_destroy(self->tokenized_fields); > } > + if (self->dynbuf) { > + free(self->dynbuf); > + } > hs_destroy(self->all_fields); > hs_destroy(self->fields_buf); > h_destroy(self->field_cache); > @@ -754,6 +771,7 @@ > self->analyzer = analyzer; > self->ts_cache = h_new_str(&free, (free_ft)&ts_deref); > self->buf_index = 0; > + self->dynbuf = NULL; > self->non_tokenizer = non_tokenizer_new(); > mutex_init(&self->mutex, NULL); > return self; > Thanks, I'll release a new gem ASAP. cheers, Dave From samuelgiffney at gmail.com Mon Sep 25 00:06:25 2006 From: samuelgiffney at gmail.com (Sam Giffney) Date: Mon, 25 Sep 2006 06:06:25 +0200 Subject: [Ferret-talk] Odd indexing issue Message-ID: Hey Dave, I just contributed $100 to the ferret donation box. My project is earning no money yet (but hopefully will), for now I hope this helps you out and covers me for asking stupid questions ;). To get a distance sorted output, I am passing an array of the id field from a ferret search through to mysql in a custom select statement. SELECT ... id IN (#{ids.join(",")}) This has been working fine through ferret 0.9. I moved to 0.10 this week and it has been ok but I'm not sure if I just wasn't 'activating' the error. It happens on 0.10.6 and on 0.10.7. Today the sql statement was invalid on a certain query. This turned out to be because 1 or more of the ids passed into the IN statement were not numbers but some sort of wierd character sequence like \240\236D\010 or \350\240\227\010. I've tried deleting the index and rebuilding it. It keeps happening, although on different items in the index on each rebuild. This happens on 2 different machines, each Debian sarge. Below is a little console script with output showing the oddness. The relevant model code is at the bottom of this post, please let me know if there's anything else I can supply. Sam --------ruby script/console Entry.create_ferret_index index = Ferret::Index::Index.new(FerretConfig::INDEXOPTIONS) # an arbitrary query to return all results from index index.search_each("*", {:limit => 6000}) do |doc, score| if docindex !~ /^\d*$/ then # show me ids that aren't numeric p doc.to_s + " " + docindex = index[doc][:id] end end ----------- OUTPUT FROM THE ABOVE 1st TIME "542 \2102\032" "2294 0\3075\010" "4186 \250* \010" OUTPUT FROM THE ABOVE 2nd TIME "1762 \260\020\036\010" "2617 \000\000\000\000" "2719 0`+\010" "3176 p`0\010" ---------------from entry.rb def self.create_ferret_index() field_infos = Ferret::Index::FieldInfos.new(:store => :no, :index => :yes, :term_vector => :no, :boost => 1.0) field_infos.add_field(:name, :store => :no, :index => :yes, :term_vector => :with_positions_offsets, :boost => 10.0) field_infos.add_field(:address, :store => :no, :index => :yes, :term_vector => :with_positions_offsets, :boost => 1.0) field_infos.add_field(:tags, :store => :no, :index => :yes, :term_vector => :with_positions_offsets, :boost => 5.0) field_infos.add_field(:id, :store => :yes, :index => :untokenized, :term_vector => :no) field_infos.create_index(FerretConfig::INDEXPATH) index = Ferret::Index::Index.new(FerretConfig::INDEXOPTIONS) batch_size = 1000 Entry.transaction do 0.step(Entry.count, batch_size) do |i| Entry.find(:all, :limit => batch_size, :offset => i).each do |rec| index << rec.make_entry_ferret_doc end end end index.flush index.optimize index.close end def make_entry_ferret_doc doc = Ferret::Document.new doc[:id] = self.id doc[:name] = self.name doc[:address] = self.physical_address doc[:tags] = self.tags doc end -- Posted via http://www.ruby-forum.com/. From wintonius at gmail.com Mon Sep 25 01:12:59 2006 From: wintonius at gmail.com (Winton) Date: Mon, 25 Sep 2006 07:12:59 +0200 Subject: [Ferret-talk] acts_as_ferret highlight Message-ID: <80d1cfd241af40aca05cb671aff452e5@ruby-forum.com> I am getting nil returned when doing the following: r.highlight(@condition, :field => 'body') 'r' is an instance of the a_a_f model. 'body' is a tokenized saved field. I am running latest ferret and a_a_f trunk. Am I doing something wrong? -- Posted via http://www.ruby-forum.com/. From dbalmain.ml at gmail.com Mon Sep 25 01:13:21 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Mon, 25 Sep 2006 14:13:21 +0900 Subject: [Ferret-talk] [ANN] Ferret 0.10.8 Security Patch Release Message-ID: Hey folks, I've just released Ferret 0.10.8. It has a security hole patched as well as a couple other bug fixes. You should "gem update" ASAP. I'd also like to take the opportunity to thank those who have been kind enough to donate to the project. I really do appreciate the support and so will all other Ferret users. Cheers, Dave From gwcoffey at gmail.com Mon Sep 25 22:05:52 2006 From: gwcoffey at gmail.com (Geoff Coffey) Date: Tue, 26 Sep 2006 04:05:52 +0200 Subject: [Ferret-talk] Some documents not found Message-ID: I'm a ferret newbie, so hopefully I'm missing something simple :) I am using ferret to index data about 36,000 products from a MySQL database. The index has one document for each product, with these important fields: id: the id (unique) of the product record in the database content: a concatenation of several bits of information from the product and associated records I have a few tools to manage my index. First, my bulk indexer creates a new index from scratch with every product in the database. Second, my updater can delete a single product from the index and re-insert it. Finally, given an id, I can dump all the stored fields in the index for the associated document (to help hunt down this problem, I started storing the content field; normally it would not be stored). If I run this query against a newly created index: content:"blood pressure" I get 4 hits. But I know there are more expected results. I can easily find an example product that should be returned but is not. If I look this product up in the index by id, and dump the results, I can see that the content field has the correct data stored (and "blood pressure" appears in this field more than once). But for some reason it isn't returned from the above query. Oddly, if I then update this product (ie: delete the document, and re-insert it using the update tool) my query suddenly begins including this product -- and 5 hits instead of 4. I have been able to repeat this for several "mising" products. I have re-run my bulk indexer several times with identical results. The bulk indexer is a little complex for the sake of performance, and I suspected that it was somehow broken, so I modified it so that it simply callse the updater code once for each product. This method was slower, and the resulting index was clearly different. My same "blood pressure" query now returned 14 hits instead of 4, but it was still missing many. I was again able to make missing products start working by removing them from the index and reinsterting them. If I dump one of these mysterious documents before and after updating, according to diff they are identical. It is as though the product data is _stored_ in the index correctly, but the actual index (the index _in_ the index, if you will...) is borked in some way. I have been able to reproduce these exact results in two configurations: 1: Ruby 1.8.4 / Ferret 10.6 / Mac OS X 10.4.7 on Intel 2: Ruby 1.8.4 / Ferret 10.8 / Debian Linux on Intel In one case, after a full index the normal way, it seemed to return 5 results for my query instead of 4. It is possible I made a mistake, or perhaps the exact number of results is semi-random per run of the indexer. If anybody can help me understand what is going on, I would be very appreciative. Thanks, Geoff PS: Here is some relevant code in case it helps. If you need more, please ask, but this should be everything that matters. If necessary, I can try to produce a simple test case the reproduces the problem... ### --- bulk indexer --- # create an empty index... fi = Ferret::Index::FieldInfos.new(:term_vector => :no) fi.add_field(:id, :index => :untokenized, :term_vector => :no, :store => :yes) fi.add_field(:content, :index => :yes, :term_vector => :no, :store => :no) fi.create_index("search-index-new") # open it... index = Ferret::Index::Index.new(:path => 'search-index-new', :analyzer => Ferret::Analysis::AsciiStandardAnalyzer.new ) # get the products... start = Time.new puts 'loading product data' offset = 0 batch_size = 100 loop do prods = Vandelay::Product.find(:all, :limit => batch_size, :offset => offset, :include => [:descriptions, :categories, {:skus => :supplieritems}]) offset += batch_size break if prods.size == 0 populate_index(index, prods) end # optimize it... puts 'optimizing index...' index.optimize index.close # and finally copy it into place FileUtils.remove_dir('search-index') FileUtils.move('search-index-new', 'search-index') ### --- populate_index method --- def populate_index(index, products) # get the ids of every product for caching purposes... ids = products.collect {|p| p.id} # pre-cache all the keywords for the products kwcache = {} Vandelay::Keyword.find_by_sql(["select productId, term from product_keywords where productId in (?)", ids]).each {|kw| sym = kw.productId.to_sym kwcache[sym] = [] if !kwcache[sym] kwcache[sym] << kw.term } # pre-cache all the attribute values for the products attr_cache = {} Vandelay::ProductStringAttribute.find_by_sql(["select productId, name, value from product_stringattribute where productId in (?)", ids]).each {|a| sym = a.productId.to_sym attr_cache[sym] = [] if !attr_cache[sym] attr_cache[sym] << a } Vandelay::ProductBooleanAttribute.find_by_sql(["select productId, name, value from product_booleanattribute where productId in (?)", ids]).each {|a| sym = a.productId.to_sym attr_cache[sym] = [] if !attr_cache[sym] attr_cache[sym] << a } # now populate the index with data puts "indexing #{products.size} products..." products.each {|prod| index << prod.index_document(:keywords => kwcache, :attribute_values => attr_cache) } end ### --- updater --- index = Ferret::Index::Index.new(:path => 'search-index', :analyzer => Ferret::Analysis::AsciiStandardAnalyzer.new ) index.delete(:id, product.id) index << product.index_document index.close ### --- Vandelay::Product::index_document method --- def index_document(caches = {}) result = {} result[:id] = self.id result[:active] = self.isActive # add attributes if caches[:attribute_values] != nil build_attribute_cache(caches[:attribute_values][self.id.to_sym]) end ALL_ATTRIBUTES.each { |sa| result["attr_#{sa.name}".to_sym] = self.attribute_value(sa) } # add content content = '' content << self.id << ' ' << self.name << ' ' self.descriptions.each {|d| content << d.text << ' '} if caches[:keywords] != nil kwterms = caches[:keywords][self.id.to_sym] else kwterms = self.keywords.collect {|k| k.term} end kwterms.each {|k| content << k << ' '} if kwterms self.skus.each{|s| content << s.displayName << ' '} self.categories.each {|c| content << c.name << ' '} result[:content] = content return result end -- Posted via http://www.ruby-forum.com/. From jan.prill at gmail.com Tue Sep 26 01:02:23 2006 From: jan.prill at gmail.com (Jan Prill) Date: Tue, 26 Sep 2006 07:02:23 +0200 Subject: [Ferret-talk] erb problems Message-ID: <562a35c10609252202m1cf4f999jdd99aee460bbb9ac@mail.gmail.com> Hi List, I just read this mail on the rails mailing list, that might be of interest for ferret. I don't even know if it is true, that ferret is causing the erb problems. If it is, then the fix of rmagick might be applicable to ferret too... Cheers, Jan Caleb wrote: > Sure enough, I did a search and replace for tabs and replaced them with > spaces. Is this a difference in the way Linux and Windows encode/read > tab characters? If it isn't an encoding issue then this seems like a > ruby bug (since it works in one environment and not the other). Any chance you're using RMagick or Ferret? These have been known to cause Erb problems on Windows - specifically errors in your .rhtml files claiming that you have invalid characters in your file. I believe RMagick was just updated to fix this problem, but I don't know about Ferret. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20060926/35409006/attachment.html From dbalmain.ml at gmail.com Tue Sep 26 01:20:00 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Tue, 26 Sep 2006 14:20:00 +0900 Subject: [Ferret-talk] Some documents not found In-Reply-To: References: Message-ID: On 9/26/06, Geoff Coffey wrote: > PS: Here is some relevant code in case it helps. If you need more, > please ask, but this should be everything that matters. If necessary, I > can try to produce a simple test case the reproduces the problem... Hi Geoff, If you could produce a simple test case then that would be great. I'll try and find the problem but it can be difficult when I can't reproduce the problem here. Cheers, Dave From dbalmain.ml at gmail.com Tue Sep 26 03:54:32 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Tue, 26 Sep 2006 16:54:32 +0900 Subject: [Ferret-talk] erb problems In-Reply-To: <562a35c10609252202m1cf4f999jdd99aee460bbb9ac@mail.gmail.com> References: <562a35c10609252202m1cf4f999jdd99aee460bbb9ac@mail.gmail.com> Message-ID: On 9/26/06, Jan Prill wrote: > Hi List, > > I just read this mail on the rails mailing list, that might be of interest > for ferret. I don't even know if it is true, that ferret is causing the erb > problems. If it is, then the fix of rmagick might be applicable to ferret > too... > > Cheers, Jan > Thanks Jan, This is a known problem. The only fix I'm aware of in RMagick is they released a gem built with Ruby 1.8.4.20 - stable. Ferret is already built with this version. Details of the problem can be found here: http://rubyforge.org/tracker/index.php?func=detail&aid=3837&group_id=12&atid=133 Cheers, Dave From kraemer at webit.de Tue Sep 26 05:33:57 2006 From: kraemer at webit.de (Jens Kraemer) Date: Tue, 26 Sep 2006 11:33:57 +0200 Subject: [Ferret-talk] acts_as_ferret highlight In-Reply-To: <80d1cfd241af40aca05cb671aff452e5@ruby-forum.com> References: <80d1cfd241af40aca05cb671aff452e5@ruby-forum.com> Message-ID: <20060926093357.GC11602@cordoba.webit.de> Hi! On Mon, Sep 25, 2006 at 07:12:59AM +0200, Winton wrote: > I am getting nil returned when doing the following: > > r.highlight(@condition, :field => 'body') > 'r' is an instance of the a_a_f model. 'body' is a tokenized saved > field. > > I am running latest ferret and a_a_f trunk. Am I doing something wrong? maybe ;-) you should not use strings for field names any more. Maybe this already fixes things... what does happen if you call highlight without the :field option ? Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From wintonius at gmail.com Tue Sep 26 07:50:09 2006 From: wintonius at gmail.com (Winton) Date: Tue, 26 Sep 2006 13:50:09 +0200 Subject: [Ferret-talk] acts_as_ferret highlight In-Reply-To: <20060926093357.GC11602@cordoba.webit.de> References: <80d1cfd241af40aca05cb671aff452e5@ruby-forum.com> <20060926093357.GC11602@cordoba.webit.de> Message-ID: Should I be using symbols then? When I omit the :field option I get @condition thrown at me (which, btw, is my query string). Thanks for your help. - Winton Jens Kraemer wrote: > Hi! > On Mon, Sep 25, 2006 at 07:12:59AM +0200, Winton wrote: >> I am getting nil returned when doing the following: >> >> r.highlight(@condition, :field => 'body') >> 'r' is an instance of the a_a_f model. 'body' is a tokenized saved >> field. >> >> I am running latest ferret and a_a_f trunk. Am I doing something wrong? > > maybe ;-) > you should not use strings for field names any more. Maybe this already > fixes things... > > what does happen if you call highlight without the :field option ? > > Jens > > > -- > webit! Gesellschaft f?r neue Medien mbH www.webit.de > Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de > Schnorrstra?e 76 Tel +49 351 46766 0 > D-01069 Dresden Fax +49 351 46766 66 -- Posted via http://www.ruby-forum.com/. From dbalmain.ml at gmail.com Tue Sep 26 10:18:44 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Tue, 26 Sep 2006 23:18:44 +0900 Subject: [Ferret-talk] acts_as_ferret highlight In-Reply-To: References: <80d1cfd241af40aca05cb671aff452e5@ruby-forum.com> <20060926093357.GC11602@cordoba.webit.de> Message-ID: Hi Winton, On 9/26/06, Winton wrote: > Should I be using symbols then? Yes, although I'm not sure that is the problem in this case. I tried to write all the code to work with Strings as well as Symbols but all Ferret's unit tests use Symbols for field names so String field names are not supported. > When I omit the :field option I get @condition thrown at me (which, btw, > is my query string). With what error message? I'm not sure why that would be happening. Cheers, Dave From dbalmain.ml at gmail.com Tue Sep 26 10:25:46 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Tue, 26 Sep 2006 23:25:46 +0900 Subject: [Ferret-talk] Some documents not found In-Reply-To: References: Message-ID: On 9/26/06, David Balmain wrote: > On 9/26/06, Geoff Coffey wrote: > > > PS: Here is some relevant code in case it helps. If you need more, > > please ask, but this should be everything that matters. If necessary, I > > can try to produce a simple test case the reproduces the problem... > > Hi Geoff, > > If you could produce a simple test case then that would be great. I'll > try and find the problem but it can be difficult when I can't > reproduce the problem here. > Never mind, I managed to reproduce the problem. It was a bug after all and a fix will be released in a moment. I just need to swap to Windows and compile a win32 gem. Thanks for letting me know about this, Geoff. Cheers, Dave From josh.nug at gmail.com Tue Sep 26 11:16:25 2006 From: josh.nug at gmail.com (Josh D.) Date: Tue, 26 Sep 2006 17:16:25 +0200 Subject: [Ferret-talk] concurrency / #search_each problem / segfault Message-ID: <36115846c5a17377b2c332f69ffd37be@ruby-forum.com> Hello everyone, I was stress-testing my application (running on Rails via FastCGI) by letting two concurrent users (not human .. an app called 'siege') a) save an Article and b) search for all Articles. I am searching via Article.ferret_index.search_each( ..) do |doc_id,score| doc = index[doc_id] .. end and writing via Article.ferret_index << self.to_doc where Article.ferret_index is implemented as in 'act_as_ferret': @@ferret_index = nil def Article::ferret_index @@ferret_index ||= Ferret::Index::Index.new( :path => ferret_path, :auto_flush => true, :create_if_missing => false ) end The 2 errors I got (when I do "doc = index[doc_id]") were : ArgumentError (:12250 is out of range [0..12243] for IndexWriter#[]): /usr/lib/ruby/gems/1.8/gems/ferret-0.10.8/lib/ferret/index.rb:382:in `[]' /usr/lib/ruby/gems/1.8/gems/ferret-0.10.8/lib/ferret/index.rb:382:in `[]' /usr/lib/ruby/1.8/monitor.rb:229:in `synchronize' /usr/lib/ruby/gems/1.8/gems/ferret-0.10.8/lib/ferret/index.rb:375:in `[]' ... Ferret::StateError (State Error occured at :79 in xraise Error occured in index.c:3404 - sr_get_lazy_doc Document 0 has already been deleted ): /usr/lib/ruby/gems/1.8/gems/ferret-0.10.8/lib/ferret/index.rb:382:in `[]' /usr/lib/ruby/gems/1.8/gems/ferret-0.10.8/lib/ferret/index.rb:382:in `[]' /usr/lib/ruby/1.8/monitor.rb:229:in `synchronize' /usr/lib/ruby/gems/1.8/gems/ferret-0.10.8/lib/ferret/index.rb:375:in `[]' ... So, obviously the index changed after #search_each and before "doc = index[doc_id]". Is this expected behaviour? How to avoid this? Then I did the same thing without a webserver, just 2 consoles. One for saving, one for searching. The one searching now just ends with "/usr/lib/ruby/gems/1.8/gems/ferret-0.10.8/lib/ferret/index.rb:363: [BUG] Segmentation fault" As for now, to avoid the problem, I use an IndexReader that I newly create for every search. But I guess that is not the best approach? I am using ferret 0.10.8 and ruby 1.8.4 on Debian Sarge. Best regards josh -- Posted via http://www.ruby-forum.com/. From gwcoffey at gmail.com Tue Sep 26 12:11:28 2006 From: gwcoffey at gmail.com (Geoff Coffey) Date: Tue, 26 Sep 2006 18:11:28 +0200 Subject: [Ferret-talk] Some documents not found In-Reply-To: References: Message-ID: David Balmain wrote: > If you could produce a simple test case then that would be great. I'll > try and find the problem but it can be difficult when I can't > reproduce the problem here. I'm trying but not having much luck. Maybe someone can help me understand something that might shed some light on the problem. I can search for blood pressure in three different ways: +content:"blood pressure" This method returns a limited number of results (7 right now) and misses lots of products that have the exact words "blood pressure" in the content field. It includes one product that does not have the exact phrase "blood pressure" but does have the word "pressure" and then, several words later, the word "blood". +content:"pressure blood" This method returns just 2 results, neither of which has "pressure blood" in their content. Both have "blood pressure" though. +content:"blood" +content:"pressure" This method returns 99 results, which as far as I can tell is every product with "blood pressure" in the content, plus a few that have both "blood" and "pressure" So what is the "right" way to search multi-term phrases like this. I suspect all my oddness centers on my lack of understanding of how this _should_ work. Ideally, I'm looking for an exact match on the phrase, and I was going to play with adding some slop if Ferret supports it. Note that my content field is tokenized. Does the analyzer on the Index::Index object matter when searching, or should I be preprocessing my search phrase in some way? Thanks! Geoff -- Posted via http://www.ruby-forum.com/. From wmorgan-ferret at masanjin.net Tue Sep 26 15:43:18 2006 From: wmorgan-ferret at masanjin.net (William Morgan) Date: Tue, 26 Sep 2006 12:43:18 -0700 Subject: [Ferret-talk] svn problems In-Reply-To: References: <20060923213230.GB3808@masanjin.net> <20060924192021.GB16880@masanjin.net> Message-ID: <20060926194318.GA3460@masanjin.net> Hi Dave, Excerpts from David Balmain's mail of 24 Sep 2006 (PDT): > Did you rebuild the index? You'll need to do that before it makes any > difference. Yes, the original example now works---thanks! Unfortunately, I still see a lot of queries that return nothing in TermQuery form, but work fine in String form. For example: > (0..10).each do |j| > m = @i[j][:message_id] > n1 = @i.search(Ferret::Search::TermQuery.new(:message_id, m)).total_hits > n2 = @i.search("message_id:#{m}").total_hits > puts "#{m}: #{n1} #{n2}" > end 43134A26.5010503 at qwest.com: 0 1 20050830.032307.1370960293.aamine at loveruby.net: 1 1 43137684.4090506 at qwest.com: 1 1 39AA6550E5AA554AB1456707D6E5563D0DCCF5 at QTOMAE2K3M01.AD.QINTRA.COM: 0 1 200508292246.j7TMkwdh001657 at sharui.nakada.niregi.kanuma.tochigi.jp: 0 1 87zmr017on.fsf at m17n.org: 1 1 1125383295.382347.22398.nullmailer at x31.priv.netlab.jp: 1 1 9B68375A-AA86-4EB9-AEC9-675E7C6EFBA6 at pobox.com: 0 1 20050905154808.53555.qmail at web50313.mail.yahoo.com: 1 1 431C7204.80505 at pobox.com: 0 1 200509052114.j85LEek4030178 at rubyforge.org: 0 1 Based on the first and third entries, I can't imagine this is a tokenization problem. What do you think? -- William From andy.caspar at gmail.com Tue Sep 26 17:13:45 2006 From: andy.caspar at gmail.com (Andy Caspar) Date: Tue, 26 Sep 2006 14:13:45 -0700 Subject: [Ferret-talk] RAMDirectory with acts_as_ferret Message-ID: <146582890609261413m25e5fccake25d6709d2d8087d@mail.gmail.com> Hi There, Is anyone using RAMDirectory as the data store with acts_as_ferret? I would love some pointers on how to configure acts_as_ferret to correctly use RAMDirectory (rather than FSDirectory). Thanks in advance. AC -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20060926/bd1230c5/attachment-0001.html From wintonius at gmail.com Tue Sep 26 17:27:20 2006 From: wintonius at gmail.com (Winton) Date: Tue, 26 Sep 2006 23:27:20 +0200 Subject: [Ferret-talk] acts_as_ferret highlight In-Reply-To: References: <80d1cfd241af40aca05cb671aff452e5@ruby-forum.com> <20060926093357.GC11602@cordoba.webit.de> Message-ID: <7ad9a4655d7431d13150e7b01b44e63f@ruby-forum.com> > With what error message? I'm not sure why that would be happening. Here's my dev log: ActionView::TemplateError (can't convert Hash into String) on line #46 of app/views/search/index.rhtml: 46: body = r.highlight(@condition, :field => :body) #{RAILS_ROOT}/vendor/rails/actionpack/lib/action_view/helpers/text_helper.rb:38:in `escape' #{RAILS_ROOT}/vendor/rails/actionpack/lib/action_view/helpers/text_helper.rb:38:in `highlight' #{RAILS_ROOT}/app/views/search/index.rhtml:46:in `_run_rhtml_search_index' #{RAILS_ROOT}/vendor/plugins/acts_as_ferret/lib/acts_as_ferret.rb:66:in `method_missing' #{RAILS_ROOT}/app/views/search/index.rhtml:45:in `_run_rhtml_search_index' #{RAILS_ROOT}/vendor/rails/actionpack/lib/action_view/base.rb:316:in `compile_and_render_template' #{RAILS_ROOT}/vendor/rails/actionpack/lib/action_view/base.rb:292:in `render_template' #{RAILS_ROOT}/vendor/rails/actionpack/lib/action_view/base.rb:251:in `render_file' #{RAILS_ROOT}/vendor/rails/actionpack/lib/action_controller/base.rb:726:in `render_file' #{RAILS_ROOT}/vendor/rails/actionpack/lib/action_controller/base.rb:648:in `render_with_no_layout' #{RAILS_ROOT}/vendor/rails/actionpack/lib/action_controller/layout.rb:245:in `render_without_benchmark' #{RAILS_ROOT}/vendor/rails/actionpack/lib/action_controller/benchmarking.rb:53:in `render' /Applications/Locomotive2/Bundles/rails112.locobundle/powerpc/lib/ruby/1.8/benchmark.rb:293:in `measure' #{RAILS_ROOT}/vendor/rails/actionpack/lib/action_controller/benchmarking.rb:53:in `render' #{RAILS_ROOT}/vendor/rails/actionpack/lib/action_controller/base.rb:942:in `perform_action_without_filters' #{RAILS_ROOT}/vendor/rails/actionpack/lib/action_controller/filters.rb:368:in `perform_action_without_benchmark' #{RAILS_ROOT}/vendor/rails/actionpack/lib/action_controller/benchmarking.rb:69:in `perform_action_without_rescue' /Applications/Locomotive2/Bundles/rails112.locobundle/powerpc/lib/ruby/1.8/benchmark.rb:293:in `measure' #{RAILS_ROOT}/vendor/rails/actionpack/lib/action_controller/benchmarking.rb:69:in `perform_action_without_rescue' #{RAILS_ROOT}/vendor/rails/actionpack/lib/action_controller/rescue.rb:82:in `perform_action' #{RAILS_ROOT}/vendor/rails/actionpack/lib/action_controller/base.rb:408:in `process_without_filters' #{RAILS_ROOT}/vendor/rails/actionpack/lib/action_controller/filters.rb:377:in `process_without_session_management_support' #{RAILS_ROOT}/vendor/rails/actionpack/lib/action_controller/session_management.rb:117:in `process' #{RAILS_ROOT}/vendor/rails/railties/lib/dispatcher.rb:38:in `dispatch' #{RAILS_ROOT}/vendor/rails/railties/lib/fcgi_handler.rb:150:in `process_request' #{RAILS_ROOT}/vendor/rails/railties/lib/fcgi_handler.rb:54:in `process!' /Applications/Locomotive2/Bundles/rails112.locobundle/powerpc/lib/ruby/gems/1.8/gems/fcgi-0.8.6.1/fcgi.rb:600:in `each_cgi' /Applications/Locomotive2/Bundles/rails112.locobundle/powerpc/lib/ruby/gems/1.8/gems/fcgi-0.8.6.1/fcgi.rb:597:in `each_cgi' #{RAILS_ROOT}/vendor/rails/railties/lib/fcgi_handler.rb:53:in `process!' #{RAILS_ROOT}/vendor/rails/railties/lib/fcgi_handler.rb:23:in `process!' /Users/Merry/Sites/fall-release/public/dispatch.fcgi:24 @condition = title|body:"test" Search works fine when I omit calling highlight. - Winton -- Posted via http://www.ruby-forum.com/. From gwcoffey at gmail.com Tue Sep 26 17:37:40 2006 From: gwcoffey at gmail.com (Geoff Coffey) Date: Tue, 26 Sep 2006 23:37:40 +0200 Subject: [Ferret-talk] Some documents not found In-Reply-To: References: Message-ID: David Balmain wrote: > Never mind, I managed to reproduce the problem. It was a bug after all > and a fix will be released in a moment. I just need to swap to Windows > and compile a win32 gem. You so completely rock. I just tested 10.9 and it works like a champ. Thank you for all your hard work, Geoff -- Posted via http://www.ruby-forum.com/. From colin at 3x.to Tue Sep 26 18:13:02 2006 From: colin at 3x.to (Colin Cc) Date: Wed, 27 Sep 2006 00:13:02 +0200 Subject: [Ferret-talk] Scoring/similarity, biased towards small fields? Message-ID: Lucene, and perhaps most search engines, are biased towards small fields with little content (where thus the term frequency is higher). Lucene has the option to define a custom (Similarity) class to calculate the similarity between two fields (custom calculation of lengthNorm and tf) in different documents. But how do I do this in ferret? (I know to boost a field, but this is not what I (think to) need, I need to be able to influence the relative importance between the same field) -- Posted via http://www.ruby-forum.com/. From colin at 3x.to Tue Sep 26 18:16:26 2006 From: colin at 3x.to (Colin Cc) Date: Wed, 27 Sep 2006 00:16:26 +0200 Subject: [Ferret-talk] Scoring/similarity, biased towards small fields? In-Reply-To: References: Message-ID: <1bdd6c7e939e33b5ef5b924c09d7e58f@ruby-forum.com> Forgot to say, ferret seems to be really amazing, especially considering how much it has been improved in the last couple of months! -- Posted via http://www.ruby-forum.com/. From wassabio at one.lv Tue Sep 26 18:17:22 2006 From: wassabio at one.lv (Maksim Bujezko) Date: Wed, 27 Sep 2006 00:17:22 +0200 Subject: [Ferret-talk] Magic Scuare !!!! :)) Message-ID: Link http://www.about-sexxx.com/magic-scuare.html On this site the magic square read mind. I think of number, then i click on black big scuare , and i see my number .... You must see yet !!!!! -- Posted via http://www.ruby-forum.com/. From dbalmain.ml at gmail.com Tue Sep 26 23:06:24 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Wed, 27 Sep 2006 12:06:24 +0900 Subject: [Ferret-talk] acts_as_ferret highlight In-Reply-To: <7ad9a4655d7431d13150e7b01b44e63f@ruby-forum.com> References: <80d1cfd241af40aca05cb671aff452e5@ruby-forum.com> <20060926093357.GC11602@cordoba.webit.de> <7ad9a4655d7431d13150e7b01b44e63f@ruby-forum.com> Message-ID: On 9/27/06, Winton wrote: > > With what error message? I'm not sure why that would be happening. > > Here's my dev log: > > ActionView::TemplateError (can't convert Hash into String) on line #46 > of app/views/search/index.rhtml: > 46: body = r.highlight(@condition, :field => :body) > #{RAILS_ROOT}/vendor/rails/actionpack/lib/action_view/helpers/text_helper.rb:38:in > `escape' > #{RAILS_ROOT}/vendor/rails/actionpack/lib/action_view/helpers/text_helper.rb:38:in > `highlight' > #{RAILS_ROOT}/app/views/search/index.rhtml:46:in > `_run_rhtml_search_index' > #{RAILS_ROOT}/vendor/plugins/acts_as_ferret/lib/acts_as_ferret.rb:66:in > `method_missing' If you look at line 66 in acts_as_ferret it seems that r is actually an instance of ARFerret::SearchResults which is basically an array of acts_as_ferret model instances with a total_hits attribute. Try this: body = r[0].highlight(@condition, :field => :body) That should fix the problem. Cheers, Dave From dbalmain.ml at gmail.com Tue Sep 26 23:30:43 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Wed, 27 Sep 2006 12:30:43 +0900 Subject: [Ferret-talk] Scoring/similarity, biased towards small fields? In-Reply-To: References: Message-ID: On 9/27/06, Colin Cc wrote: > Lucene, and perhaps most search engines, are biased towards small fields > with little content (where thus the term frequency is higher). Lucene > has the option to define a custom (Similarity) class to calculate the > similarity between two fields (custom calculation of lengthNorm and tf) > in different documents. But how do I do this in ferret? (I know to boost > a field, but this is not what I (think to) need, I need to be able to > influence the relative importance between the same field) > Hi Colin, Ferret uses the same similarity scoring as Lucene. Scoring is based more on the ratio of number of matches to the length of the field, rather than just the length of the field. So a small field with a single match will score higher than a large field with a single match. But a large field with many matches may still score more highly than a small field with a single match. The Similarity class is still unavailable in the Ruby API and it isn't high on my list of priorities to write the bindings for it (unless someone was willing to compensate me). However, I don't think you need it for what you are describing. Boosts should do the job perfectly. If you want to make the :title field more important than the :content field then you set the boost of the :title FieldInfo, probably like this: fi = FieldInfos.new fi.add_field(:title, :boost => 10.0) But I think you want to make the same field more important in different documents. So you can set the boost of the field when you add it. You can either set the boost for the whole document: doc = Ferret::Document.new(20.0) doc[:title] = "Braveheart" doc[:actors] = ["Mel Gibson", "Sophie Marceau"] This will affect all fields in the document. Or you can set the boost of the field directly. doc = { :title => Field.new("Legally Blonde", 0.02), :actors => Field.new(["Reese Witherspoon", "Luke Wilson"], 2.0) } Hope that helps, Cheers, Dave From dbalmain.ml at gmail.com Tue Sep 26 23:44:41 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Wed, 27 Sep 2006 12:44:41 +0900 Subject: [Ferret-talk] svn problems In-Reply-To: <20060926194318.GA3460@masanjin.net> References: <20060923213230.GB3808@masanjin.net> <20060924192021.GB16880@masanjin.net> <20060926194318.GA3460@masanjin.net> Message-ID: On 9/27/06, William Morgan wrote: > Hi Dave, > > Excerpts from David Balmain's mail of 24 Sep 2006 (PDT): > > Did you rebuild the index? You'll need to do that before it makes any > > difference. > > Yes, the original example now works---thanks! Unfortunately, I still see > a lot of queries that return nothing in TermQuery form, but work fine in > String form. > > For example: > > > (0..10).each do |j| > > m = @i[j][:message_id] > > n1 = @i.search(Ferret::Search::TermQuery.new(:message_id, m)).total_hits > > n2 = @i.search("message_id:#{m}").total_hits > > puts "#{m}: #{n1} #{n2}" > > end > 43134A26.5010503 at qwest.com: 0 1 > 20050830.032307.1370960293.aamine at loveruby.net: 1 1 > 43137684.4090506 at qwest.com: 1 1 > 39AA6550E5AA554AB1456707D6E5563D0DCCF5 at QTOMAE2K3M01.AD.QINTRA.COM: 0 1 > 200508292246.j7TMkwdh001657 at sharui.nakada.niregi.kanuma.tochigi.jp: 0 1 > 87zmr017on.fsf at m17n.org: 1 1 > 1125383295.382347.22398.nullmailer at x31.priv.netlab.jp: 1 1 > 9B68375A-AA86-4EB9-AEC9-675E7C6EFBA6 at pobox.com: 0 1 > 20050905154808.53555.qmail at web50313.mail.yahoo.com: 1 1 > 431C7204.80505 at pobox.com: 0 1 > 200509052114.j85LEek4030178 at rubyforge.org: 0 1 > > Based on the first and third entries, I can't imagine this is a > tokenization problem. What do you think? > > -- > William Hi William, You need to downcase the term when you add it to a TermQuery. The StandardAnalyzer downcases all text so you need to do the same with any terms you add to any hand built queries. One way to see what might possibly be wrong is to run the term through the analyzer yourself. require 'rubygems' require 'ferret' include Ferret::Analysis EMAILS = [ "43134A26.5010503 at qwest.com", "20050830.032307.1370960293.aamine at loveruby.net", "43137684.4090506 at qwest.com", "39AA6550E5AA554AB1456707D6E5563D0DCCF5 at QTOMAE2K3M01.AD.QINTRA.COM", "200508292246.j7TMkwdh001657 at sharui.nakada.niregi.kanuma.tochigi.jp", "87zmr017on.fsf at m17n.org", "1125383295.382347.22398.nullmailer at x31.priv.netlab.jp", "9B68375A-AA86-4EB9-AEC9-675E7C6EFBA6 at pobox.com", "20050905154808.53555.qmail at web50313.mail.yahoo.com", "431C7204.80505 at pobox.com", "200509052114.j85LEek4030178 at rubyforge.org" ] a = StandardAnalyzer.new EMAILS.each do |email| print email + ":" tz = a.token_stream(:field, email) puts email == tz.next.text end Hope that clears things up. Cheers, Dave From wintonius at gmail.com Wed Sep 27 01:33:33 2006 From: wintonius at gmail.com (Winton) Date: Wed, 27 Sep 2006 07:33:33 +0200 Subject: [Ferret-talk] acts_as_ferret highlight In-Reply-To: References: <80d1cfd241af40aca05cb671aff452e5@ruby-forum.com> <20060926093357.GC11602@cordoba.webit.de> <7ad9a4655d7431d13150e7b01b44e63f@ruby-forum.com> Message-ID: <48161c0e91d91155c1667ec9343de9e8@ruby-forum.com> Sorry for not being clear, r comes from a "for r in @results" iteration. So r would be an instance of the model. Thanks for your help, Jens and Dave. - Winton -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Wed Sep 27 02:07:49 2006 From: kraemer at webit.de (Jens Kraemer) Date: Wed, 27 Sep 2006 08:07:49 +0200 Subject: [Ferret-talk] RAMDirectory with acts_as_ferret In-Reply-To: <146582890609261413m25e5fccake25d6709d2d8087d@mail.gmail.com> References: <146582890609261413m25e5fccake25d6709d2d8087d@mail.gmail.com> Message-ID: <20060927060749.GA12735@cordoba.webit.de> Hi Andy, this isn't possible atm, but shouldn't be hard to implement. Could you please file a feature request for this, so I don't forget it when preparing the next version ? http://projects.jkraemer.net/acts_as_ferret cheers, Jens On Tue, Sep 26, 2006 at 02:13:45PM -0700, Andy Caspar wrote: > Hi There, > > Is anyone using RAMDirectory as the data store with acts_as_ferret? I would > love some pointers on how to configure acts_as_ferret to correctly use > RAMDirectory (rather than FSDirectory). > > Thanks in advance. > AC > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From david.wennergren at gmail.com Wed Sep 27 03:33:57 2006 From: david.wennergren at gmail.com (David Wennergren) Date: Wed, 27 Sep 2006 09:33:57 +0200 Subject: [Ferret-talk] RangeFilter performance Message-ID: <3786fd1f0a933a4d37f8655bed538238@ruby-forum.com> I'm using a RangeFilter to limit a search to only the most recently added documents. My index is about 150 000 articles and the RangeFilter typically selects about 1000 of them to run a batch of searches against. The performance is great as long as my index is newly optimized. As soon as add a few new documents the average search time for a batch of searches using the rangefilter increases about 10 times. And if i keep adding a few documents the search time increases even more. Until I optimize the index again, and performance is back to superfast. I can't do optimize as often as I would like since it takes a long time. Have anyone excperienced anything similiar? Thanks/David -- Posted via http://www.ruby-forum.com/. From dbalmain.ml at gmail.com Wed Sep 27 04:55:58 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Wed, 27 Sep 2006 17:55:58 +0900 Subject: [Ferret-talk] acts_as_ferret highlight In-Reply-To: <48161c0e91d91155c1667ec9343de9e8@ruby-forum.com> References: <80d1cfd241af40aca05cb671aff452e5@ruby-forum.com> <20060926093357.GC11602@cordoba.webit.de> <7ad9a4655d7431d13150e7b01b44e63f@ruby-forum.com> <48161c0e91d91155c1667ec9343de9e8@ruby-forum.com> Message-ID: On 9/27/06, Winton wrote: > Sorry for not being clear, r comes from a "for r in @results" iteration. > So r would be an instance of the model. > > Thanks for your help, Jens and Dave. > > - Winton Well, I'm not sure what is going on but it isn't a problem in acts_as_ferret or Ferret. The highlight method being called is ActionView::Helpers::TextHelper#highlight: def highlight(text, phrase, highlighter = '\1') if phrase.blank? then return text end text.gsub(/(#{Regexp.escape(phrase)})/i, highlighter) unless text.nil? end As you can see, the phrase is supposed to be a String, not a Hash. That is why the exception is being thrown. Cheers, Dave From dbalmain.ml at gmail.com Wed Sep 27 05:30:15 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Wed, 27 Sep 2006 18:30:15 +0900 Subject: [Ferret-talk] RangeFilter performance In-Reply-To: <3786fd1f0a933a4d37f8655bed538238@ruby-forum.com> References: <3786fd1f0a933a4d37f8655bed538238@ruby-forum.com> Message-ID: On 9/27/06, David Wennergren wrote: > I'm using a RangeFilter to limit a search to only the most recently > added documents. My index is about 150 000 articles and the RangeFilter > typically selects about 1000 of them to run a batch of searches against. > > The performance is great as long as my index is newly optimized. As soon > as add a few new documents the average search time for a batch of > searches using the rangefilter increases about 10 times. And if i keep > adding a few documents the search time increases even more. Until I > optimize the index again, and performance is back to superfast. I can't > do optimize as often as I would like since it takes a long time. > > Have anyone excperienced anything similiar? > > Thanks/David > > Hi David, I know about this problem. It also occurs when sorting large indexes. I'm currently trying to work on a solution. I'll keep you updated. Cheers, Dave From colin at 3x.to Wed Sep 27 05:30:58 2006 From: colin at 3x.to (Colin Cc) Date: Wed, 27 Sep 2006 11:30:58 +0200 Subject: [Ferret-talk] Scoring/similarity, biased towards small fields? In-Reply-To: References: Message-ID: <77b40ac801998b640ee25d7c86f5a8e2@ruby-forum.com> Thanks for answering! I couldn't find anything of relevance in the docs/api, now i know not to look for that functionality in the ruby api again :) Actually boosting doesn't really help in my case. I use lucene to index some articles with bodies of variable length. But whether a word occurs in a short or long article, the article is supposed to be equally relevant (of course, words occurring in title fields will make the result more important, for that there is boosting (and this bias towards short fields)) But it's only a small issue, maybe i'll start spitting through the source-code sometime to see if i can add it. -- Posted via http://www.ruby-forum.com/. From dbalmain.ml at gmail.com Wed Sep 27 09:13:56 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Wed, 27 Sep 2006 22:13:56 +0900 Subject: [Ferret-talk] concurrency / #search_each problem / segfault In-Reply-To: <36115846c5a17377b2c332f69ffd37be@ruby-forum.com> References: <36115846c5a17377b2c332f69ffd37be@ruby-forum.com> Message-ID: On 9/27/06, Josh D. wrote: > Hello everyone, > > I was stress-testing my application (running on Rails via FastCGI) by > letting two concurrent users (not human .. an app called 'siege') > a) save an Article and b) search for all Articles. > > I am searching via > Article.ferret_index.search_each( ..) do |doc_id,score| > doc = index[doc_id] > .. > end > > and writing via > Article.ferret_index << self.to_doc > > where Article.ferret_index is implemented as in 'act_as_ferret': > @@ferret_index = nil > def Article::ferret_index > @@ferret_index ||= Ferret::Index::Index.new( :path => ferret_path, > :auto_flush => true, :create_if_missing => false ) > end > > The 2 errors I got (when I do "doc = index[doc_id]") were : > ArgumentError (:12250 is out of range [0..12243] for IndexWriter#[]): > /usr/lib/ruby/gems/1.8/gems/ferret-0.10.8/lib/ferret/index.rb:382:in > `[]' > /usr/lib/ruby/gems/1.8/gems/ferret-0.10.8/lib/ferret/index.rb:382:in > `[]' > /usr/lib/ruby/1.8/monitor.rb:229:in `synchronize' > /usr/lib/ruby/gems/1.8/gems/ferret-0.10.8/lib/ferret/index.rb:375:in > `[]' > ... > > Ferret::StateError (State Error occured at :79 in xraise > Error occured in index.c:3404 - sr_get_lazy_doc > Document 0 has already been deleted > > ): > /usr/lib/ruby/gems/1.8/gems/ferret-0.10.8/lib/ferret/index.rb:382:in > `[]' > /usr/lib/ruby/gems/1.8/gems/ferret-0.10.8/lib/ferret/index.rb:382:in > `[]' > /usr/lib/ruby/1.8/monitor.rb:229:in `synchronize' > /usr/lib/ruby/gems/1.8/gems/ferret-0.10.8/lib/ferret/index.rb:375:in > `[]' > ... > > So, obviously the index changed after #search_each and before "doc = > index[doc_id]". > Is this expected behaviour? How to avoid this? > > Then I did the same thing without a webserver, just 2 consoles. One for > saving, one for searching. > The one searching now just ends with > "/usr/lib/ruby/gems/1.8/gems/ferret-0.10.8/lib/ferret/index.rb:363: > [BUG] Segmentation fault" > > As for now, to avoid the problem, I use an IndexReader that I newly > create for every search. > But I guess that is not the best approach? > > I am using ferret 0.10.8 and ruby 1.8.4 on Debian Sarge. > > > Best regards > > josh > Hi Josh, Just thought I'd let you know that I'm working on fixing this. Expect a solution in the next release. Cheers, Dave From dbalmain.ml at gmail.com Wed Sep 27 09:21:22 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Wed, 27 Sep 2006 22:21:22 +0900 Subject: [Ferret-talk] concurrency / #search_each problem / segfault In-Reply-To: <36115846c5a17377b2c332f69ffd37be@ruby-forum.com> References: <36115846c5a17377b2c332f69ffd37be@ruby-forum.com> Message-ID: On 9/27/06, Josh D. wrote: > Hello everyone, > > I was stress-testing my application (running on Rails via FastCGI) by > letting two concurrent users (not human .. an app called 'siege') > a) save an Article and b) search for all Articles. > > I am searching via > Article.ferret_index.search_each( ..) do |doc_id,score| > doc = index[doc_id] > .. > end Hi Josh, Just to you this time. What is the rest of the code in this loop (above). ie, what is the "..". It should help me sort out the problem. By the way, you should upgrade to Ferret-0.10.9. Cheers, Dave From wiseleyb at gmail.com Wed Sep 27 13:27:14 2006 From: wiseleyb at gmail.com (ben) Date: Wed, 27 Sep 2006 19:27:14 +0200 Subject: [Ferret-talk] Seg Fault - crashed our server Message-ID: <4f2f8a19e8d0dab4dadb33229b70a09d@ruby-forum.com> We were using ferret (with acts_as_ferret) on our production boxes. Everything was going OK for a few days then we got a seg fault from it that brought down the box. The specs: * Rails 1.1.6 * Ruby 1.8.4 * Ferret 0.10.6 * RedHat ES 3 * Apache * Pound * 5 instances of Mongrel Can ferret handle multiple processes accessing it's files at the same time? Not that we were doing this (but we'd like to in the future) any ideas on how to run this off a networked drive (like in a server farm implementation)? We were also getting a bunch of these errors: A EOFError occurred in agents#confirm: End-of-File Error occured at :79 in xraise Error occured in compound_io.c:123 - cmpdi_read_i Tried to read past end of file. File length is <3> and tried to read to <975> /usr/local/lib/ruby/gems/1.8/gems/ferret-0.10.6/lib/ferret/index.rb:506:in `close' We were just doing basic stuff like: acts_as_ferret :fields => { :title => {:boost => 2}, :description => {:boost => 1.5} Model.find_by_contents("find stuff") On tables that had maybe 10,000+ rows Anyone have any ideas? -ben http://activerain.com -- Posted via http://www.ruby-forum.com/. From wintonius at gmail.com Wed Sep 27 16:36:15 2006 From: wintonius at gmail.com (Winton) Date: Wed, 27 Sep 2006 22:36:15 +0200 Subject: [Ferret-talk] acts_as_ferret highlight In-Reply-To: References: <80d1cfd241af40aca05cb671aff452e5@ruby-forum.com> <20060926093357.GC11602@cordoba.webit.de> <7ad9a4655d7431d13150e7b01b44e63f@ruby-forum.com> <48161c0e91d91155c1667ec9343de9e8@ruby-forum.com> Message-ID: <06697ad18bb549659d30d8792a3b07e5@ruby-forum.com> So would this be a rails issue? Not that I am doubting your expertise, but why would I be getting #{RAILS_ROOT}/vendor/plugins/acts_as_ferret/lib/acts_as_ferret.rb:66:in `method_missing' before it tries the ActionView method? Thanks again, Winton -- Posted via http://www.ruby-forum.com/. From wmorgan-ferret at masanjin.net Wed Sep 27 16:53:42 2006 From: wmorgan-ferret at masanjin.net (William Morgan) Date: Wed, 27 Sep 2006 13:53:42 -0700 Subject: [Ferret-talk] Seg Fault - crashed our server In-Reply-To: <4f2f8a19e8d0dab4dadb33229b70a09d@ruby-forum.com> References: <4f2f8a19e8d0dab4dadb33229b70a09d@ruby-forum.com> Message-ID: <20060927205342.GA18702@masanjin.net> Excerpts from ben's mail of 27 Sep 2006 (PDT): > We were using ferret (with acts_as_ferret) on our production boxes. > Everything was going OK for a few days then we got a seg fault from it > that brought down the box. I know 0.10.8 definitely fixes one segfault, though it may not be yours. -- William From wmorgan-ferret at masanjin.net Wed Sep 27 23:06:02 2006 From: wmorgan-ferret at masanjin.net (William Morgan) Date: Wed, 27 Sep 2006 20:06:02 -0700 Subject: [Ferret-talk] svn problems In-Reply-To: References: <20060923213230.GB3808@masanjin.net> <20060924192021.GB16880@masanjin.net> <20060926194318.GA3460@masanjin.net> Message-ID: <20060928030602.GC24924@masanjin.net> Hi Dave, Excerpts from David Balmain's mail of 26 Sep 2006 (PDT): > You need to downcase the term when you add it to a TermQuery. The > StandardAnalyzer downcases all text so you need to do the same with > any terms you add to any hand built queries. Thanks for the response. Downcasing the string passed into the TermQuery does, in fact, retrieve the document. BUT, I had used a WhitespaceAnalyzer with no downcasing on that field, so it should have preserved case in the index. In fact, some experimentation shows: > mid = "43134A26.5010503 at qwest.com" > i = Ferret::Index::Index.new > wsa = Ferret::Analysis::WhiteSpaceAnalyzer.new false > wsa.token_stream(:message_id, mid).next => token["43134A26.5010503 at qwest.com":0:26:1] > i.add_document({:message_id => mid}, wsa) > i.search(Ferret::Search::TermQuery.new(:message_id, mid)) => # > i.search(Ferret::Search::TermQuery.new(:message_id, mid.downcase)) => #], max_score=0.3068528175354> So it looks like WSA#token_stream does the right thing. Is it possible isn't not actually being called at insertion time? Or am I misunderstanding something? -- William From dbalmain.ml at gmail.com Thu Sep 28 00:10:16 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Thu, 28 Sep 2006 13:10:16 +0900 Subject: [Ferret-talk] svn problems In-Reply-To: <20060928030602.GC24924@masanjin.net> References: <20060923213230.GB3808@masanjin.net> <20060924192021.GB16880@masanjin.net> <20060926194318.GA3460@masanjin.net> <20060928030602.GC24924@masanjin.net> Message-ID: On 9/28/06, William Morgan wrote: > Hi Dave, > > Excerpts from David Balmain's mail of 26 Sep 2006 (PDT): > > You need to downcase the term when you add it to a TermQuery. The > > StandardAnalyzer downcases all text so you need to do the same with > > any terms you add to any hand built queries. > > Thanks for the response. Downcasing the string passed into the TermQuery > does, in fact, retrieve the document. BUT, I had used a > WhitespaceAnalyzer with no downcasing on that field, so it should have > preserved case in the index. > > In fact, some experimentation shows: > > > mid = "43134A26.5010503 at qwest.com" > > i = Ferret::Index::Index.new > > wsa = Ferret::Analysis::WhiteSpaceAnalyzer.new false > > wsa.token_stream(:message_id, mid).next > => token["43134A26.5010503 at qwest.com":0:26:1] > > i.add_document({:message_id => mid}, wsa) > > i.search(Ferret::Search::TermQuery.new(:message_id, mid)) > => # > > i.search(Ferret::Search::TermQuery.new(:message_id, mid.downcase)) > => #], max_score=0.3068528175354> > > So it looks like WSA#token_stream does the right thing. Is it possible > isn't not actually being called at insertion time? Or am I > misunderstanding something? > > -- > William Hi William, Ok, this is definitely a a bug. I've already fixed it and it'll be out in the next release. By the way, you probably already know this but you can set the analyzer used by the index. Ferret::Index::Index.new(:analyzer => wsa) You probably have a good reason to be doing it the way you are but I just wanted to check. Cheers, Dave From dbalmain.ml at gmail.com Thu Sep 28 00:17:20 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Thu, 28 Sep 2006 13:17:20 +0900 Subject: [Ferret-talk] acts_as_ferret highlight In-Reply-To: <06697ad18bb549659d30d8792a3b07e5@ruby-forum.com> References: <80d1cfd241af40aca05cb671aff452e5@ruby-forum.com> <20060926093357.GC11602@cordoba.webit.de> <7ad9a4655d7431d13150e7b01b44e63f@ruby-forum.com> <48161c0e91d91155c1667ec9343de9e8@ruby-forum.com> <06697ad18bb549659d30d8792a3b07e5@ruby-forum.com> Message-ID: On 9/28/06, Winton wrote: > So would this be a rails issue? > > Not that I am doubting your expertise, but why would I be getting > > #{RAILS_ROOT}/vendor/plugins/acts_as_ferret/lib/acts_as_ferret.rb:66:in > `method_missing' > > before it tries the ActionView method? > > Thanks again, > Winton > Hi Winton, The method_missing gets called when you do "for r in @results". I think this actually calls the @results.each method which is then delegated to its internal array of results. I couldn't say whether it was a rails issue or not without looking at your code. All I know is that Ferret's highlight method is not being called. Cheers, Dave From dbalmain.ml at gmail.com Thu Sep 28 00:19:41 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Thu, 28 Sep 2006 13:19:41 +0900 Subject: [Ferret-talk] Seg Fault - crashed our server In-Reply-To: <20060927205342.GA18702@masanjin.net> References: <4f2f8a19e8d0dab4dadb33229b70a09d@ruby-forum.com> <20060927205342.GA18702@masanjin.net> Message-ID: On 9/28/06, William Morgan wrote: > Excerpts from ben's mail of 27 Sep 2006 (PDT): > > We were using ferret (with acts_as_ferret) on our production boxes. > > Everything was going OK for a few days then we got a seg fault from it > > that brought down the box. > > I know 0.10.8 definitely fixes one segfault, though it may not be yours. > 0.10.9 fixes another one too. When upgrading to 0.10.9 you will need to rebuild your indexes. From Neville.Burnell at bmsoft.com.au Thu Sep 28 03:15:51 2006 From: Neville.Burnell at bmsoft.com.au (Neville Burnell) Date: Thu, 28 Sep 2006 17:15:51 +1000 Subject: [Ferret-talk] A few questions about numbers and dates Message-ID: <126EC586577FD611A28E00A0C9A0375886EED5@maui.bmsoft.com.au> Hi, I just noticed that Ferret seems to convert every field to a string [ruby code appended for those interested], which has thwarted my attempt to format Dates (to "dd/mm/yyyy") and Floats (to "n.nn") for consumption further down the line based on the class of the field stored. I considered pre-formatting Dates and Floats prior to indexing, which would store the field exactly as I need for presentation purposes, however I would lose sorting and range searching. I'm wondering what approaches people are using to manage formatting post retrieve from Ferret. Any pointers appreciated, Kind Regards Neville ========================== require 'rubygems' require 'ferret' require 'date' p Ferret::VERSION @dir = Ferret::Store::RAMDirectory.new @index = Ferret::Index::Index.new(:dir => @dir) invoice = {:invoice_date => Date.new(2006,9,20), :invoice_value => 44.50, :invoice_no => 45656, :invoice_to => 'Nev'} @index << invoice doc = @index[0].load doc.fields.each do |f| p f p doc[f].class p doc[f] end ========================== >ruby test_format.rb "0.10.6" :invoice_date String "2006-09-20" :invoice_value String "44.5" :invoice_no String "45656" :invoice_to String "Nev" From dbalmain.ml at gmail.com Thu Sep 28 05:05:12 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Thu, 28 Sep 2006 18:05:12 +0900 Subject: [Ferret-talk] A few questions about numbers and dates In-Reply-To: <126EC586577FD611A28E00A0C9A0375886EED5@maui.bmsoft.com.au> References: <126EC586577FD611A28E00A0C9A0375886EED5@maui.bmsoft.com.au> Message-ID: On 9/28/06, Neville Burnell wrote: > Hi, > > I just noticed that Ferret seems to convert every field to a string > [ruby code appended for those interested], which has thwarted my attempt > to format Dates (to "dd/mm/yyyy") and Floats (to "n.nn") for consumption > further down the line based on the class of the field stored. > > I considered pre-formatting Dates and Floats prior to indexing, which > would store the field exactly as I need for presentation purposes, > however I would lose sorting and range searching. > > I'm wondering what approaches people are using to manage formatting post > retrieve from Ferret. > > Any pointers appreciated, > > Kind Regards > > Neville Hi Neville, One possible solution is to add those fields twice, once in a stored/unindexed field in the correct format for output and another in an unstored/indexed field with the correct format for sorting and range queries. Actually, sorting will already work on those fields above, it is just range queries you have to worry about, and even they will work on your date field. I would certainly be amenable to developing a FloatRangeQuery and IntegerRangeQuery with the right motivation. ;-) Finally, I've mentioned before that I'm very interested in building an object database based on Ferret. I mentioned it here: http://www.ruby-forum.com/topic/82086#new By turning documents into objects and giving fields types it would solve the problems you have just mentioned plus many more. Cheers, Dave From sera at fhwang.net Thu Sep 28 10:54:56 2006 From: sera at fhwang.net (Francis Hwang) Date: Thu, 28 Sep 2006 10:54:56 -0400 Subject: [Ferret-talk] strange matching: maybe a multilanguage collation problem? In-Reply-To: References: <247D2FF6-4223-41B3-AB82-6FBAD03ED877@fhwang.net> <4DF6098D-7665-45E1-9C57-3AE738965EB3@fhwang.net> Message-ID: On Sep 23, 2006, at 12:56 AM, David Balmain wrote: > I don't really have any other ideas. Did you re-index the data after > you set ENV["LANG"]? Could you try this code and tell me what you get; > > require 'rubygems' > require 'ferret' > p Ferret::VERSION # 0.10.6 > p Ferret::locale # "en_US.UTF-8" > > index = Ferret::I.new() > > index << {:place => "Gazima\304\237usa"} > index << {:place => "U.S.A."} > puts "Search: USA" > index.search_each("USA") {|id, score| puts index[id][:place]} > # Search: USA > # U.S.A. > > puts "Search: Gazima\304\237usa" > index.search_each("Gazima\304\237usa") {|id, score| puts index > [id][:place]} > # Search: Gazima?usa > # Gazima?usa In the end, setting ENV['LANG'] didn't seem to have an effect, but setting Ferret::locale directly seems to work: Ferret::locale = 'en_US.UTF-8' Thanks! Francis From damektretiak at gmail.com Thu Sep 28 12:35:59 2006 From: damektretiak at gmail.com (Damek) Date: Thu, 28 Sep 2006 18:35:59 +0200 Subject: [Ferret-talk] uninitialized constant BooleanClause In-Reply-To: <20060904152617.GV9513@cordoba.webit.de> References: <20060904152617.GV9513@cordoba.webit.de> Message-ID: <45af7a44e6acffc457b190ac23ac8123@ruby-forum.com> Jens Kraemer wrote: > On Mon, Sep 04, 2006 at 05:07:54PM +0200, Richard wrote: >> I've installed the latest Win32 gem and the acts_as_ferret plugin (i >> checked out the files and placed them in the vendor/plugins directory). >> >> When I try to search I get the following error: >> >> uninitialized constant BooleanClause > > Seems you're using an older version of acts_as_ferret, which isn't > compatible with Ferret 0.10.x yet. I hope to officially release a new > version this week. For the time being, please use the trunk: > > script/plugin install > svn://projects.jkraemer.net/acts_as_ferret/trunk/plugin/acts_as_ferret > > Jens > > -- > webit! Gesellschaft f?r neue Medien mbH www.webit.de > Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de > Schnorrstra?e 76 Tel +49 351 46766 0 > D-01069 Dresden Fax +49 351 46766 66 Hi Jens - I seem to be getting the same error even with the latest trunk version. Is this still an issue? ~ Damek -- Posted via http://www.ruby-forum.com/. From wmorgan-ferret at masanjin.net Thu Sep 28 13:18:40 2006 From: wmorgan-ferret at masanjin.net (William Morgan) Date: Thu, 28 Sep 2006 10:18:40 -0700 Subject: [Ferret-talk] svn problems In-Reply-To: References: <20060923213230.GB3808@masanjin.net> <20060924192021.GB16880@masanjin.net> <20060926194318.GA3460@masanjin.net> <20060928030602.GC24924@masanjin.net> Message-ID: <20060928171840.GA2170@masanjin.net> Excerpts from David Balmain's mail of 27 Sep 2006 (PDT): > Ok, this is definitely a a bug. I've already fixed it and it'll be out > in the next release. Thank you. > By the way, you probably already know this but you can set the > analyzer used by the index. > > Ferret::Index::Index.new(:analyzer => wsa) > > You probably have a good reason to be doing it the way you are but I > just wanted to check. Nope, no good reason. Just an incomplete understanding of the API. This way's much better. -- William From kraemer at webit.de Thu Sep 28 17:15:32 2006 From: kraemer at webit.de (Jens Kraemer) Date: Thu, 28 Sep 2006 23:15:32 +0200 Subject: [Ferret-talk] uninitialized constant BooleanClause In-Reply-To: <45af7a44e6acffc457b190ac23ac8123@ruby-forum.com> References: <20060904152617.GV9513@cordoba.webit.de> <45af7a44e6acffc457b190ac23ac8123@ruby-forum.com> Message-ID: <20060928211532.GA32641@cordoba.webit.de> On Thu, Sep 28, 2006 at 06:35:59PM +0200, Damek wrote: > Jens Kraemer wrote: > > On Mon, Sep 04, 2006 at 05:07:54PM +0200, Richard wrote: > >> I've installed the latest Win32 gem and the acts_as_ferret plugin (i > >> checked out the files and placed them in the vendor/plugins directory). > >> > >> When I try to search I get the following error: > >> > >> uninitialized constant BooleanClause > > > > Seems you're using an older version of acts_as_ferret, which isn't > > compatible with Ferret 0.10.x yet. I hope to officially release a new > > version this week. For the time being, please use the trunk: > > > > script/plugin install > > svn://projects.jkraemer.net/acts_as_ferret/trunk/plugin/acts_as_ferret > > > > Hi Jens - > > I seem to be getting the same error even with the latest trunk version. > Is this still an issue? no, I use aaf with Ferret 0.10.x myself without problems. Are you sure you don't use the BooleanClause class somewhere in your own code ? Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From Neville.Burnell at bmsoft.com.au Thu Sep 28 20:10:21 2006 From: Neville.Burnell at bmsoft.com.au (Neville Burnell) Date: Fri, 29 Sep 2006 10:10:21 +1000 Subject: [Ferret-talk] A few questions about numbers and dates Message-ID: <126EC586577FD611A28E00A0C9A0375886EEDD@maui.bmsoft.com.au> Hi David, > One possible solution is to add those fields twice > once in a stored/unindexed field in the correct format > for output and another in an unstored/indexed field > with the correct format for sorting and range queries. Yes, I'll likely do this, in combination with some form of hungarian notation to indicate the "native" type of the field. This would allow all dates to be stored as is by Ferret, but allow the presentation layer to reformat the date for display to dd/mm/yyyy format. For numbers I am planning to pad the string with zeros which will likewise support ranges/sorting, and then strip the zeros in the presentation layer. > Actually, sorting will already work on those fields above I don't see how sorting could work for dates stored as "dd/mm/yyyy" ... Would you elaborate? > Finally, I've mentioned before that I'm very interested in > building an object database based on Ferret. I mentioned it here: > http://www.ruby-forum.com/topic/82086#new > By turning documents into objects and giving fields types it > would solve the problems you have just mentioned plus many more. Yes, I think this is a very interesting idea, very useful and reminiscent of WinFS for objects! Kind Regards Neville From dbalmain.ml at gmail.com Thu Sep 28 22:42:41 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Fri, 29 Sep 2006 11:42:41 +0900 Subject: [Ferret-talk] A few questions about numbers and dates In-Reply-To: <126EC586577FD611A28E00A0C9A0375886EEDD@maui.bmsoft.com.au> References: <126EC586577FD611A28E00A0C9A0375886EEDD@maui.bmsoft.com.au> Message-ID: On 9/29/06, Neville Burnell wrote: > > Actually, sorting will already work on those fields above > > I don't see how sorting could work for dates stored as "dd/mm/yyyy" ... > Would you elaborate? Sorry, I meant, the date 2006-09-20 in you example would sort correctly. You are quite right. "dd/mm/yyyy" won't sort correctly. From wintonius at gmail.com Fri Sep 29 01:20:05 2006 From: wintonius at gmail.com (Winton) Date: Fri, 29 Sep 2006 07:20:05 +0200 Subject: [Ferret-talk] limit results to specific IDs Message-ID: <41acfcdb74fe49e363ef5fc1f688c0c2@ruby-forum.com> How would you go about searching only a specific set of IDs in acts_as_ferret? Should I just create a huge "or" query or is there a more stylish way? Thanks, Winton -- Posted via http://www.ruby-forum.com/. From vpuz at rogers.com Fri Sep 29 07:18:25 2006 From: vpuz at rogers.com (Vince Puzzella) Date: Fri, 29 Sep 2006 07:18:25 -0400 Subject: [Ferret-talk] limit results to specific IDs In-Reply-To: <41acfcdb74fe49e363ef5fc1f688c0c2@ruby-forum.com> Message-ID: New to this, but can't you do something like: your_index.search('id:1|2|3|4|5|6|7') ??? On 9/29/06 1:20 AM, "Winton" wrote: > How would you go about searching only a specific set of IDs in > acts_as_ferret? Should I just create a huge "or" query or is there a > more stylish way? > > Thanks, > Winton From dbalmain.ml at gmail.com Fri Sep 29 08:18:38 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Fri, 29 Sep 2006 21:18:38 +0900 Subject: [Ferret-talk] limit results to specific IDs In-Reply-To: <41acfcdb74fe49e363ef5fc1f688c0c2@ruby-forum.com> References: <41acfcdb74fe49e363ef5fc1f688c0c2@ruby-forum.com> Message-ID: On 9/29/06, Winton wrote: > How would you go about searching only a specific set of IDs in > acts_as_ferret? Should I just create a huge "or" query or is there a > more stylish way? > > Thanks, > Winton > That depends. If all the ids are in a certain range then you are best off using a range query. If the are randomly distributed then you can use an "or" query, although a MultiTermQuery would be better. To specify it in Ferret Query Language just add the terms separated by a "|". For example: index.search(query + ' id:"1257|8732|3428|2387"') Or if the IDs are in an array you'd do it like this: index.search(query + ' id:"' + id_ary.join('|') + '"') You could also build a filter but I don't think it would be worth it in this case. Cheers, Dave From albert at mymail.nospam.com Fri Sep 29 08:37:04 2006 From: albert at mymail.nospam.com (Albert) Date: Fri, 29 Sep 2006 14:37:04 +0200 Subject: [Ferret-talk] ferret finds 'tests' but not 'test' In-Reply-To: References: <94cbc17ff76e8950daeea9a13b10afd6@ruby-forum.com> Message-ID: Hi there, Thanks for this useful piece of information! What I'm wondering is how do stemming on queries as well. My first try was: query = Ferret::QueryParser.new(:analyzer => Ferret::Analysis::StemmingAnalyzer.new).parse(query_string) index.search_each(query) { |doc, score| ... } But this does not work the way I would expect it to work, i.e., it seems to deliver empty results independent of the input. Does anybody have an idea what I'm doing wrong? Cheers, Albert David Balmain wrote: > On 9/6/06, Alastair Moore wrote: >> Alastair > The default analyzer doesn't perform any stemming. You need to create > your own analyzer with a stemmer. Something like this; > > require 'rubygems' > require 'ferret' > > module Ferret::Analysis > class MyAnalyzer > def token_stream(field, text) > StemFilter.new(StandardTokenizer.new(text)) > end > end > end > > index = Ferret::I.new(:analyzer => Ferret::Analysis::MyAnalyzer.new) > > index << "test" > index << "tests debate debater debating the for," > puts index.search("test").total_hits > > Hope that helps, > Dave -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Fri Sep 29 08:50:01 2006 From: kraemer at webit.de (Jens Kraemer) Date: Fri, 29 Sep 2006 14:50:01 +0200 Subject: [Ferret-talk] limit results to specific IDs In-Reply-To: References: <41acfcdb74fe49e363ef5fc1f688c0c2@ruby-forum.com> Message-ID: <20060929125001.GX11602@cordoba.webit.de> On Fri, Sep 29, 2006 at 07:18:25AM -0400, Vince Puzzella wrote: > New to this, but can't you do something like: > > your_index.search('id:1|2|3|4|5|6|7') that would result in the huge 'or' query mentioned :-) however, in that special case a range query would be more appropriate: 'id:[1 7]' with acts_as_ferret, you can also limit the result set on the active_record side of things, by giving a standard AR conditions argument: find_by_contents('query', {}, :conditions => ['in in (?)', id_array]) Note that this makes ferret's limit and offset parameters useless, as the ar conditions further limit the resultset found in the ferret index. Usually this is intended to be used on fields that aren't in the ferret index or are easier to query via the DB. In your case I'd go for the or query approach, maybe you can even optimize them to some range queries. Jens > > On 9/29/06 1:20 AM, "Winton" wrote: > > > How would you go about searching only a specific set of IDs in > > acts_as_ferret? Should I just create a huge "or" query or is there a > > more stylish way? > > > > Thanks, > > Winton > > > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From dbalmain.ml at gmail.com Fri Sep 29 10:31:04 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Fri, 29 Sep 2006 23:31:04 +0900 Subject: [Ferret-talk] ferret finds 'tests' but not 'test' In-Reply-To: References: <94cbc17ff76e8950daeea9a13b10afd6@ruby-forum.com> Message-ID: On 9/29/06, Albert wrote: > David Balmain wrote: > > On 9/6/06, Alastair Moore wrote: > >> Alastair > > The default analyzer doesn't perform any stemming. You need to create > > your own analyzer with a stemmer. Something like this; > > > > require 'rubygems' > > require 'ferret' > > > > module Ferret::Analysis > > class MyAnalyzer > > def token_stream(field, text) > > StemFilter.new(StandardTokenizer.new(text)) > > end > > end > > end > > > > index = Ferret::I.new(:analyzer => Ferret::Analysis::MyAnalyzer.new) > > > > index << "test" > > index << "tests debate debater debating the for," > > puts index.search("test").total_hits > > > > Hope that helps, > > Dave > > Hi there, > > Thanks for this useful piece of information! What I'm wondering is how > do stemming on queries as well. My first try was: > > query = Ferret::QueryParser.new(:analyzer => > Ferret::Analysis::StemmingAnalyzer.new).parse(query_string) > > index.search_each(query) { |doc, score| ... } > > But this does not work the way I would expect it to work, i.e., it seems > to deliver empty results independent of the input. > > Does anybody have an idea what I'm doing wrong? > > Cheers, > > Albert Hi Albert, Could you show us your implementation of StemmingAnalyzer as well. Also, you need to be sure to use the same analyzer for both indexing and analysis, although I think you already new this. Cheers, Dave From albert at mymail.nospam.com Fri Sep 29 12:45:10 2006 From: albert at mymail.nospam.com (Albert) Date: Fri, 29 Sep 2006 18:45:10 +0200 Subject: [Ferret-talk] ferret finds 'tests' but not 'test' In-Reply-To: References: <94cbc17ff76e8950daeea9a13b10afd6@ruby-forum.com> Message-ID: <76bb1ab46e8a110a4f91338864f3f22a@ruby-forum.com> Hi Dave, Thanks for following up! The StemmingAnalyzer is actually just the MyAnalyzer from the example above: module Ferret::Analysis class StemmingAnalyzer def token_stream(field, text) StemFilter.new(StandardTokenizer.new(text)) end end end I've been trying to find the error but no success. The searching is done this way: i = Ferret::Index::Index.new(:path => index) qp = Ferret::QueryParser.new(:analyzer => Ferret::Analysis::StemmingAnalyzer.new) query = qp.parse(query_string) i.search_each(query) { |doc, score| ... } What I don't get is that search_each(query) never returns a result whereas when I use the original query string as in i = Ferret::Index::Index.new(:path => index) # qp = Ferret::QueryParser.new(:analyzer => Ferret::Analysis::StemmingAnalyzer.new) # query = qp.parse(query_string) i.search_each(query_string) { |doc, score| ... } ------------ things work as expected (modulo the stemmming, of course). So, it may be that I fundamentally misunderstand something or make a stupid mistake ... Cheers, Albert David Balmain wrote: > On 9/29/06, Albert wrote: >> > class MyAnalyzer >> > puts index.search("test").total_hits >> Ferret::Analysis::StemmingAnalyzer.new).parse(query_string) >> Albert > Hi Albert, > > Could you show us your implementation of StemmingAnalyzer as well. > Also, you need to be sure to use the same analyzer for both indexing > and analysis, although I think you already new this. > > Cheers, > Dave -- Posted via http://www.ruby-forum.com/. From wintonius at gmail.com Fri Sep 29 17:47:08 2006 From: wintonius at gmail.com (Winton) Date: Fri, 29 Sep 2006 23:47:08 +0200 Subject: [Ferret-talk] limit results to specific IDs In-Reply-To: <20060929125001.GX11602@cordoba.webit.de> References: <41acfcdb74fe49e363ef5fc1f688c0c2@ruby-forum.com> <20060929125001.GX11602@cordoba.webit.de> Message-ID: <0baac64b9b03b2b2e8fd33c91800b698@ruby-forum.com> Great, thanks to all of you for your help. Looks like I'll just be making a big ol' query since the IDs aren't sequential. - Winton -- Posted via http://www.ruby-forum.com/. From dbalmain.ml at gmail.com Fri Sep 29 19:00:52 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Sat, 30 Sep 2006 08:00:52 +0900 Subject: [Ferret-talk] ferret finds 'tests' but not 'test' In-Reply-To: <76bb1ab46e8a110a4f91338864f3f22a@ruby-forum.com> References: <94cbc17ff76e8950daeea9a13b10afd6@ruby-forum.com> <76bb1ab46e8a110a4f91338864f3f22a@ruby-forum.com> Message-ID: On 9/30/06, Albert wrote: > > Hi Dave, > > Thanks for following up! The StemmingAnalyzer is actually just the > MyAnalyzer from the example above: > > module Ferret::Analysis > class StemmingAnalyzer > def token_stream(field, text) > StemFilter.new(StandardTokenizer.new(text)) > end > end > end > > I've been trying to find the error but no success. The searching is > done this way: > > i = Ferret::Index::Index.new(:path => index) > qp = Ferret::QueryParser.new(:analyzer => > Ferret::Analysis::StemmingAnalyzer.new) > query = qp.parse(query_string) > i.search_each(query) { |doc, score| ... } > > What I don't get is that search_each(query) never returns a result > whereas when I use the original query string as in > > i = Ferret::Index::Index.new(:path => index) > # qp = Ferret::QueryParser.new(:analyzer => > Ferret::Analysis::StemmingAnalyzer.new) > # query = qp.parse(query_string) > i.search_each(query_string) { |doc, score| ... } > ------------ > > things work as expected (modulo the stemmming, of course). So, it may > be that I fundamentally misunderstand something or make a stupid mistake > ... > > Cheers, > > Albert > Sorry, I must have been tired last night. The problem is obvious to me now. You need to set the :fields parameter. The above query parser should work as long as you explicitly specify all fields in your query. For example: "content:(ruby rails) title:(ruby rails)" But if you want to search all fields by default then you need to tell the QueryParser what fields exist. The Index class will handle all of this for you including using the same analyzer as is used during indexing. It looks like you are using the Index class for your searches so why not just leave the query parsing to it. Otherwise you can get the fields from the reader. query = Ferret::QueryParser.new( :analyzer => Ferret::Analysis::StemmingAnalyzer.new, :fields => reader.fields, :tokenized_fields => reader.tokenized_fields ).parse(query_string) index.search_each(query) { |doc, score| ... } Hope that helps, Dave From albert at mymail.nospam.com Sat Sep 30 03:04:02 2006 From: albert at mymail.nospam.com (Albert) Date: Sat, 30 Sep 2006 09:04:02 +0200 Subject: [Ferret-talk] ferret finds 'tests' but not 'test' In-Reply-To: References: <94cbc17ff76e8950daeea9a13b10afd6@ruby-forum.com> <76bb1ab46e8a110a4f91338864f3f22a@ruby-forum.com> Message-ID: <9e12f4174e00f8c863545b99d31110fb@ruby-forum.com> Hi Dave, Wonderful! Thanks! I should have taken a deeper look at the documentation, indeed. Anyway, thanks for your patience! Cheers, Al. David Balmain wrote: > On 9/30/06, Albert wrote: >> end >> i.search_each(query) { |doc, score| ... } >> >> things work as expected (modulo the stemmming, of course). So, it may >> be that I fundamentally misunderstand something or make a stupid mistake >> ... >> >> Cheers, >> >> Albert >> > > Sorry, I must have been tired last night. The problem is obvious to me > now. You need to set the :fields parameter. The above query parser > should work as long as you explicitly specify all fields in your > query. For example: > > "content:(ruby rails) title:(ruby rails)" > > But if you want to search all fields by default then you need to tell > the QueryParser what fields exist. The Index class will handle all of > this for you including using the same analyzer as is used during > indexing. It looks like you are using the Index class for your > searches so why not just leave the query parsing to it. Otherwise you > can get the fields from the reader. > > query = Ferret::QueryParser.new( > :analyzer => Ferret::Analysis::StemmingAnalyzer.new, > :fields => reader.fields, > :tokenized_fields => reader.tokenized_fields > ).parse(query_string) > > index.search_each(query) { |doc, score| ... } > > Hope that helps, > Dave -- Posted via http://www.ruby-forum.com/. From clare at nospam.com Sat Sep 30 04:39:38 2006 From: clare at nospam.com (Clare) Date: Sat, 30 Sep 2006 10:39:38 +0200 Subject: [Ferret-talk] Blistit - on web in Beta Message-ID: <009c21314ca219027829f12d905625e8@ruby-forum.com> I would like to thank everyone who has been of great assistance on this project using ROR, AJAX and most importantly Ferret. I would especially like to thank David and Jens for their help. Quite frankly this project would not have got to this stage without your help and we at Blistit appreciate it! I saw your post David with regards to putting food on the table and when we are able to put food on the table too you will not be forgotten! We have finally put Blistit up on the web in Beta and hope to go fully into production at the end of October. Blistit is a free premier listing service in the United Kingdom where standard listings are free to post. The url is http://www.blistit.com There are around 300 listings in there at the moment so please feel free to check it out, especially the browse and the "Show Advanced Search Options" button in the results page. Also feel free to register and post listings if you wish and you live in the UK. While it is in Beta there is a chance that your listing will be promoted to a highlighted or premier listing free of charge. All feedback is welcome. If it is of benefit to the group then please feedback here otherwise login and send "admin" a message. Thanks once again for all your help and assistance and remember, "If you want it.... Blistit!". Regards Clare. -- Posted via http://www.ruby-forum.com/. From clare at nospam.com Sat Sep 30 04:50:08 2006 From: clare at nospam.com (Clare) Date: Sat, 30 Sep 2006 10:50:08 +0200 Subject: [Ferret-talk] ferret finds 'tests' but not 'test' In-Reply-To: <94cbc17ff76e8950daeea9a13b10afd6@ruby-forum.com> References: <94cbc17ff76e8950daeea9a13b10afd6@ruby-forum.com> Message-ID: Alastair Moore wrote: > Hello all, > > Quick question (possibly!) - I've got a few records indexed and doing a > search for 'test' reports in no hits even though I know the word 'tests' > exists in the indexed field. Doing a search for 'tests' produces a > result. I would have thought that 'test' would match 'tests' but no such > luck! > > Thanks, > > Alastair Alastair - if you only want to find the plural of something and not the full stem of words then ROR has a plurisation capability. It will take test and bring back all the plurals or take tests and bring back the singulars. You can then search on all these words. It is not a full stemmer but in some circumstances perhaps this may be all that you are wanting to do. One thing to watch that caught us out was that as standard pluralistation of words with two 'ss' at the end does not work properly. For example, "glass" would come back as "glas" from the pluralizer. There is a simple fix that is in the ROR forum that covers all this off. I would only use the ror pluraliser if all you are looking to do is bring back plurals of words and are not interested in the full stemming of the words. For example, if you do a search on "tax" full stemming should also search on "taxes" and "taxation". Pluralise would not search on "taxation". Hope this helps. Clare -- Posted via http://www.ruby-forum.com/. From dbalmain.ml at gmail.com Sat Sep 30 08:51:12 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Sat, 30 Sep 2006 21:51:12 +0900 Subject: [Ferret-talk] Blistit - on web in Beta In-Reply-To: <009c21314ca219027829f12d905625e8@ruby-forum.com> References: <009c21314ca219027829f12d905625e8@ruby-forum.com> Message-ID: Nice work Clare. I like the simple interface. If you like, I'd love to see it listed here: http://ferret.davebalmain.com/trac/wiki/PoweredBy I'll put it up if you like. Or you could do it. Or not. It's up to you. Cheers, Dave On 9/30/06, Clare wrote: > I would like to thank everyone who has been of great assistance on this > project using ROR, AJAX and most importantly Ferret. I would especially > like to thank David and Jens for their help. Quite frankly this project > would not have got to this stage without your help and we at Blistit > appreciate it! I saw your post David with regards to putting food on the > table and when we are able to put food on the table too you will not be > forgotten! > > We have finally put Blistit up on the web in Beta and hope to go fully > into production at the end of October. Blistit is a free premier listing > service in the United Kingdom where standard listings are free to post. > > The url is http://www.blistit.com > > There are around 300 listings in there at the moment so please feel free > to check it out, especially the browse and the "Show Advanced Search > Options" button in the results page. Also feel free to register and post > listings if you wish and you live in the UK. > > While it is in Beta there is a chance that your listing will be promoted > to a highlighted or premier listing free of charge. > > All feedback is welcome. If it is of benefit to the group then please > feedback here otherwise login and send "admin" a message. > > Thanks once again for all your help and assistance and remember, "If you > want it.... Blistit!". > > Regards > > Clare. > > -- > Posted via http://www.ruby-forum.com/. > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > From clare at nospam.com Sat Sep 30 10:50:02 2006 From: clare at nospam.com (Clare) Date: Sat, 30 Sep 2006 16:50:02 +0200 Subject: [Ferret-talk] Blistit - on web in Beta In-Reply-To: References: <009c21314ca219027829f12d905625e8@ruby-forum.com> Message-ID: David - please feel free to put it up. Thanks -- Posted via http://www.ruby-forum.com/.