From vamlists at gmx.net Mon Jan 2 07:20:13 2006 From: vamlists at gmx.net (Vamsee Kanakala) Date: Mon, 02 Jan 2006 17:50:13 +0530 Subject: [Ferret-talk] Multiple indexes? Message-ID: <43B91A7D.3090807@gmx.net> Hi, I'm indexing records from different database tables and they have identical column names in many cases. Does this mean I have to create different indexes for each table? TIA, Vamsee. From erik at ehatchersolutions.com Mon Jan 2 08:25:26 2006 From: erik at ehatchersolutions.com (Erik Hatcher) Date: Mon, 2 Jan 2006 08:25:26 -0500 Subject: [Ferret-talk] Multiple indexes? In-Reply-To: <43B91A7D.3090807@gmx.net> References: <43B91A7D.3090807@gmx.net> Message-ID: Not necessarily. If you index the table name as a separate field for each document added to the index, it can be used (as a TermQuery clause) to constrain searches for just documents from a specific table. The acts_as_ferret posted to the Ferret wiki does this, for example. Erik On Jan 2, 2006, at 7:20 AM, Vamsee Kanakala wrote: > Hi, > > I'm indexing records from different database tables and they > have > identical column names in many cases. Does this mean I have to create > different indexes for each table? > > TIA, > Vamsee. > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk From fcsmith at gmail.com Mon Jan 2 18:20:14 2006 From: fcsmith at gmail.com (Finn Smith) Date: Mon, 2 Jan 2006 18:20:14 -0500 Subject: [Ferret-talk] aligning Ferret's IndexSearcher.search API with Lucene's Message-ID: <6e72bbd70601021520s325943c7p7ff99be722c8aa03@mail.gmail.com> Recently I've been revisiting some of my search code. With a greater understanding of how Java Lucene implements its search methods, I realized that one level of abstraction is not present in the Ferret classes/methods. Here are the relevant method signatures: Ferret's search methods: in Ferret::Index::Index: search(query, options = {}) -> returns a TopDocs search_each(query, options = {}) {|doc, score| ...} -> yields to context w/ doc and score for each hit in Ferret::Search::IndexSearcher: search(query, options = {}) -> returns a TopDocs search_each(query, filter = nil) {|doc, score| ...} -> yields to context w/ doc and score for each hit Lucene's search methods: in the interface Searchable: public void search(Query query, Filter filter, HitCollector results) public TopDocs search(Query query, Filter filter, int n) public TopFieldDocs search(Query query, Filter filter, int n, Sort sort) in org.apache.lucene.search.Searcher (which implements Searchable): public final Hits search(Query query) public Hits search(Query query, Filter filter) public Hits search(Query query, Sort sort) public Hits search(Query query, Filter filter, Sort sort) I was wondering if there were plans to implement the Hits class in Ferret. (Or if someone were to write a patch implementing them, would David integrate it into the source?) It seems like it is a useful abstraction since TopDocs does not allow you to access its hits by index, only by the .each() method call. Some questions: * Will changing these methods break people's existing code? * Where is the proper place to put these methods? Move the methods that return TopDocs to a module, which is more or less the same as a Java interface, and implement the methods that return Hits directly in the class? What is a good way to do this that feels Rubyish and takes advantage of its strengths and idioms? * The options to limit the search (first_doc and num_doc) in Search::IndexSearcher and the code that implements them should probably be moved out of Search::IndexSearcher into Index::Index * Are there lower level issues I am not aware of that would make any of this a bad idea? Am I missing something here? Are there reasons not to have Ferret's implementation of these methods and classes follow Java Lucene's as closely as possible? I'd appreciate hearing your thoughts. -F From vamlists at gmx.net Tue Jan 3 07:35:31 2006 From: vamlists at gmx.net (Vamsee Kanakala) Date: Tue, 03 Jan 2006 18:05:31 +0530 Subject: [Ferret-talk] Multiple indexes? In-Reply-To: References: <43B91A7D.3090807@gmx.net> Message-ID: <43BA6F93.3090102@gmx.net> Erik Hatcher wrote: >Not necessarily. If you index the table name as a separate field for >each document added to the index, it can be used (as a TermQuery >clause) to constrain searches for just documents from a specific >table. The acts_as_ferret posted to the Ferret wiki does this, for >example. > > Thanks, Erik. That raises the obvious question I should've asked before - which is a better approach and which is faster? TIA, Vamsee. From kraemer at webit.de Tue Jan 3 07:55:18 2006 From: kraemer at webit.de (Jens Kraemer) Date: Tue, 3 Jan 2006 13:55:18 +0100 Subject: [Ferret-talk] Multiple indexes? In-Reply-To: <43BA6F93.3090102@gmx.net> References: <43B91A7D.3090807@gmx.net> <43BA6F93.3090102@gmx.net> Message-ID: <20060103125518.GL21872@cordoba.webit.de> On Tue, Jan 03, 2006 at 06:05:31PM +0530, Vamsee Kanakala wrote: > Erik Hatcher wrote: > > >Not necessarily. If you index the table name as a separate field for > >each document added to the index, it can be used (as a TermQuery > >clause) to constrain searches for just documents from a specific > >table. The acts_as_ferret posted to the Ferret wiki does this, for > >example. > > > > > Thanks, Erik. That raises the obvious question I should've asked before > - which is a better approach and which is faster? Imho this depends on how your queries look like. If you want to run queries across all tables, having only one index should be faster because no merging of results from different indexes has to take place. On the other hand, if you want to query only data of one of your tables, having a dedicated index for that table should be faster. But unless you have huge amounts of data in your indexes, in practice the difference in speed won't matter. At least that's what I experienced with Java lucene. Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Telefon +49 351 46766 0 D-01069 Dresden Telefax +49 351 46766 66 From f at andreas-s.net Thu Jan 5 13:53:12 2006 From: f at andreas-s.net (Andreas S.) Date: Thu, 5 Jan 2006 19:53:12 +0100 Subject: [Ferret-talk] Problems with Ferret in RForum Message-ID: Hi, I have been using Ferret on ruby-forum.com for a few days. While the searching works with reasonable performance, there are a lot of problems related to indexing. Sometimes the process hangs with 100% CPU usage, sometimes it leaves Lockfiles in the directory (causing other processes to fail while one ist still working), sometimes it even crashes with a segfault. I'm going to try to use only one indexing process to solve the locking problems. A process that looks periodically for posts that need to be indexed is probably the most robust solution. Andreas -- Posted via http://www.ruby-forum.com/. From erik at ehatchersolutions.com Thu Jan 5 16:09:36 2006 From: erik at ehatchersolutions.com (Erik Hatcher) Date: Thu, 5 Jan 2006 16:09:36 -0500 Subject: [Ferret-talk] Java Lucene compatibility? Message-ID: <17E7E11D-5AAB-4E86-9B85-CCF252C44ABC@ehatchersolutions.com> I haven't dug into this yet, but wanted to report it. I've built an index with Java Lucene (1.9, from svn trunk) and then trying to search it with Ferret and getting this: /usr/lib/ruby/site_ruby/1.8/ferret/index/term_buffer.rb:31:in `read': undefined method `name' for nil:NilClass (NoMethodError) from /usr/lib/ruby/site_ruby/1.8/ferret/index/ segment_term_enum.rb:90:in `next?' from /usr/lib/ruby/site_ruby/1.8/ferret/index/ segment_term_enum.rb:118:in `scan_to' from /usr/lib/ruby/site_ruby/1.8/ferret/index/ term_infos_io.rb:285:in `scan_for_term_info' from /usr/lib/ruby/site_ruby/1.8/ferret/index/ term_infos_io.rb:163:in `get_term_info' from /usr/lib/ruby/site_ruby/1.8/ferret/index/ segment_reader.rb:176:in `doc_freq' from /usr/lib/ruby/site_ruby/1.8/ferret/search/ index_searcher.rb:47:in `doc_freq' from /usr/lib/ruby/site_ruby/1.8/ferret/search/term_query.rb: 13:in `initialize' from /usr/lib/ruby/site_ruby/1.8/ferret/search/term_query.rb: 99:in `new' from /usr/lib/ruby/site_ruby/1.8/ferret/search/term_query.rb: 99:in `create_weight' from /usr/lib/ruby/site_ruby/1.8/ferret/search/query.rb: 51:in `weight' from /usr/lib/ruby/site_ruby/1.8/ferret/search/ index_searcher.rb:104:in `search' from /usr/lib/ruby/site_ruby/1.8/ferret/index/index.rb: 606:in `do_search' from /usr/lib/ruby/site_ruby/1.8/ferret/index/index.rb: 303:in `search_each' from /usr/lib/ruby/site_ruby/1.8/ferret/index/index.rb: 302:in `synchronize' from /usr/lib/ruby/site_ruby/1.8/ferret/index/index.rb: 302:in `search_each' I'm using trunk of Ferret also. I'll work on building an index solely with Ferret and comparing with Java Lucene. Erik From vamlists at gmx.net Fri Jan 6 00:01:45 2006 From: vamlists at gmx.net (Vamsee Kanakala) Date: Fri, 06 Jan 2006 10:31:45 +0530 Subject: [Ferret-talk] Problems with Ferret in RForum In-Reply-To: References: Message-ID: <43BDF9B9.50201@gmx.net> Andreas S. wrote: >I'm going to try to use only one indexing process to solve the locking >problems. A process that looks periodically for posts that need to be >indexed is probably the most robust solution. > > You must've tried this already, just checking - did you set :auto_flush => true while creating your index? The documentation says it prevents locking when multiple processes are accessing the index. Vamsee. From f at andreas-s.net Fri Jan 6 07:32:07 2006 From: f at andreas-s.net (Andreas S.) Date: Fri, 6 Jan 2006 13:32:07 +0100 Subject: [Ferret-talk] Problems with Ferret in RForum In-Reply-To: <43BDF9B9.50201@gmx.net> References: <43BDF9B9.50201@gmx.net> Message-ID: <6f25ec12efae510f2d71daa1428fb7e3@ruby-forum.com> Vamsee Kanakala wrote: > Andreas S. wrote: > >>I'm going to try to use only one indexing process to solve the locking >>problems. A process that looks periodically for posts that need to be >>indexed is probably the most robust solution. >> >> > You must've tried this already, just checking - did you set :auto_flush > => true while creating your index? Yes. > The documentation says it prevents > locking when multiple processes are accessing the index. Locking usually works correctly, but *sometimes* I get lockfiles that are never removed. What I don't understand is why one of the proceses is still able to write to the index then, while the others aren't. -- Posted via http://www.ruby-forum.com/. From vamlists at gmx.net Fri Jan 6 07:40:40 2006 From: vamlists at gmx.net (Vamsee Kanakala) Date: Fri, 06 Jan 2006 18:10:40 +0530 Subject: [Ferret-talk] Constructing a query from an object Message-ID: <43BE6548.8020209@gmx.net> Hi, Apart from checking each field in an object created from form params, is there an easy way I can construct a query from an object? For example, I'm doing this: @candidate = Candidate.new(params[:candidate]) And go on to check each attribute if it's blank and thereby constructing a query. It's pretty common, so I'm guessing there should be a easier way. Thanks, Vamsee. From JanPrill at blauton.de Fri Jan 6 08:03:09 2006 From: JanPrill at blauton.de (Jan Prill) Date: Fri, 06 Jan 2006 14:03:09 +0100 Subject: [Ferret-talk] Problems with Ferret in RForum In-Reply-To: <6f25ec12efae510f2d71daa1428fb7e3@ruby-forum.com> References: <43BDF9B9.50201@gmx.net> <6f25ec12efae510f2d71daa1428fb7e3@ruby-forum.com> Message-ID: <43BE6A8D.50406@blauton.de> Hi, Andreas, maybe a stupid guess: Maybe because this one process is the locking one? Regards Jan Andreas S. wrote: >Vamsee Kanakala wrote: > > >>Andreas S. wrote: >> >> >> >>>I'm going to try to use only one indexing process to solve the locking >>>problems. A process that looks periodically for posts that need to be >>>indexed is probably the most robust solution. >>> >>> >>> >>> >>You must've tried this already, just checking - did you set :auto_flush >>=> true while creating your index? >> >> > >Yes. > > > >>The documentation says it prevents >>locking when multiple processes are accessing the index. >> >> > >Locking usually works correctly, but *sometimes* I get lockfiles that >are never removed. What I don't understand is why one of the proceses is >still able to write to the index then, while the others aren't. > > > From f at andreas-s.net Fri Jan 6 09:38:55 2006 From: f at andreas-s.net (Andreas S.) Date: Fri, 6 Jan 2006 15:38:55 +0100 Subject: [Ferret-talk] Problems with Ferret in RForum In-Reply-To: <43BE6A8D.50406@blauton.de> References: <43BDF9B9.50201@gmx.net> <6f25ec12efae510f2d71daa1428fb7e3@ruby-forum.com> <43BE6A8D.50406@blauton.de> Message-ID: <1635d71b074c1308a12dfa7aaa3622c6@ruby-forum.com> Jan Prill wrote: > Hi, Andreas, > > maybe a stupid guess: Maybe because this one process is the locking one? > > Regards > Jan I don't know. I have two fastcgi processes, and it seems that they are actually both able to write to the index (at least I never got exceptions when I created a post in the forum), only the third process (mail receiver) isn't. Andreas -- Posted via http://www.ruby-forum.com/. From nick.snels at gmail.com Sat Jan 7 14:46:29 2006 From: nick.snels at gmail.com (Nick Snels) Date: Sat, 7 Jan 2006 20:46:29 +0100 Subject: [Ferret-talk] Problems with Ferret in RForum In-Reply-To: References: Message-ID: Hi Andreas, I am using a modified version of your Ferret code in my application, which is still in development mode, so no actual user testing has been done. I also encountered the locking problem. What I did was remove the auto_flush => true and handle flushing myself. So in the create_doc method I placed: index << doc index.flush at the end and removed your last line 'doc'. Which also made the update method absolete, so I made update an alias of create_doc . I haven't had any locks during my testing, but I haven't done any real world testing yet. So I'm eager to know how you managed to fix it, once it is fixed. Kind regards, Nick Andreas S. wrote: > Hi, > > I have been using Ferret on ruby-forum.com for a few days. While the > searching works with reasonable performance, there are a lot of problems > related to indexing. Sometimes the process hangs with 100% CPU usage, > sometimes it leaves Lockfiles in the directory (causing other processes > to fail while one ist still working), sometimes it even crashes with a > segfault. > > I'm going to try to use only one indexing process to solve the locking > problems. A process that looks periodically for posts that need to be > indexed is probably the most robust solution. > > Andreas -- Posted via http://www.ruby-forum.com/. From f at andreas-s.net Sat Jan 7 15:34:28 2006 From: f at andreas-s.net (Andreas S.) Date: Sat, 7 Jan 2006 21:34:28 +0100 Subject: [Ferret-talk] Problems with Ferret in RForum In-Reply-To: References: Message-ID: <2ddc7939218c06ab49d6a4d890c439da@ruby-forum.com> Nick Snels wrote: > Hi Andreas, > > I am using a modified version of your Ferret code in my application, > which is still in development mode, so no actual user testing has been > done. I also encountered the locking problem. What I did was remove the > auto_flush => true and handle flushing myself. I tried that too, but had the same problem as before. > at the end and removed your last line 'doc'. Which also made the update > method absolete, so I made update an alias of create_doc . I haven't had > any locks during my testing, but I haven't done any real world testing > yet. So I'm eager to know how you managed to fix it, once it is fixed. I am now using a process that checks periodically for new/changed posts and updates the index. This seems to be the best solution. You can find the code in RForum SVN. Andreas -- Posted via http://www.ruby-forum.com/. From vamlists at gmx.net Tue Jan 10 02:22:49 2006 From: vamlists at gmx.net (Vamsee Kanakala) Date: Tue, 10 Jan 2006 12:52:49 +0530 Subject: [Ferret-talk] search_each returns partial results? Message-ID: <43C360C9.8040606@gmx.net> Hi, I'm having some trouble with ferret search_each. I'm posting rails' script/console output, so I guess you can decrypt it: >> res = [] => [] >> index.search_each('name: a*') do |doc, score| ?> res << doc >> end => 50 >> res.size => 10 >> I'm guessing the '=>50' after search_each indicates that there are 50 documents returned. How come I can only see 10 in the array res? Thanks much, Vamsee. From lists at sourceillustrated.com Tue Jan 10 08:55:34 2006 From: lists at sourceillustrated.com (John Wells) Date: Tue, 10 Jan 2006 14:55:34 +0100 Subject: [Ferret-talk] Ferret with IMAP dirs Message-ID: I'd like to use ferret to build an imap indexer and search utility, but want to check first to see if anyone else is working on this and offer my help. Anyone? Also, if you could provide any helpful pointers on indexing directories via ferret, it'll be very much appreciated. I'm a lucene nuby. Thanks! John -- Posted via http://www.ruby-forum.com/. From fcsmith at gmail.com Tue Jan 10 09:53:34 2006 From: fcsmith at gmail.com (Finn Smith) Date: Tue, 10 Jan 2006 09:53:34 -0500 Subject: [Ferret-talk] search_each returns partial results? In-Reply-To: <43C360C9.8040606@gmx.net> References: <43C360C9.8040606@gmx.net> Message-ID: <6e72bbd70601100653j42d3d8dan35cd5d508f5b0bbe@mail.gmail.com> On 1/10/06, Vamsee Kanakala wrote: > I'm guessing the '=>50' after search_each indicates that there are 50 documents returned. How come I can only see 10 in the array res? If you look at Index::Index's search_each method you'll see it calls do_search internally, which in turn calls Search::IndexSearcher's search method. This search method takes an options hash which accepts the following keys: # filter:: filters docs from the search result # first_doc:: The index in the results of the first doc retrieved. # Default is 0 # num_docs:: The number of results returned. Default is 10 # sort:: An array of SortFields describing how to sort the results. So it is returning 10 docs due to the fact that num_docs defaults to 10. I've been waiting for David to return to bring this up again, but I think this code should be moved out of Search::IndexSearcher into Index::Index and that there should be a Hits class a la Java Lucene to manage the hits returned. -F From jennyw at dangerousideas.com Tue Jan 10 15:27:03 2006 From: jennyw at dangerousideas.com (jennyw) Date: Tue, 10 Jan 2006 12:27:03 -0800 Subject: [Ferret-talk] Ferret with IMAP dirs In-Reply-To: References: Message-ID: <43C41897.8040003@dangerousideas.com> John Wells wrote: >I'd like to use ferret to build an imap indexer and search utility, but >want to check first to see if anyone else is working on this and offer >my help. Anyone? > > This could be really challenging if you want it to work for multiple IMAP servers. If you target a specific one, though, you might have better luck. The biggest issue I see is that the UID of messages, although implied to always be the same by the IMAP RFC, my understanding is that it's not always the same on all implementations. Also, it may be tough to keep track of all changes to a user's inbox. If there's a way to communicate with the IMAP server via an API specific to that server, especially if there's a hook that can be called on updates to the message store, that would be ideal. Good luck! Jen From lists at sourceillustrated.com Wed Jan 11 07:17:01 2006 From: lists at sourceillustrated.com (John Wells) Date: Wed, 11 Jan 2006 13:17:01 +0100 Subject: [Ferret-talk] Ferret with IMAP dirs In-Reply-To: <43C41897.8040003@dangerousideas.com> References: <43C41897.8040003@dangerousideas.com> Message-ID: <674e6b47e05e868ae78d5b78f79e9062@ruby-forum.com> jennyw jennyw wrote: > This could be really challenging if you want it to work for multiple > IMAP servers. If you target a specific one, though, you might have > better luck. The biggest issue I see is that the UID of messages, > although implied to always be the same by the IMAP RFC, my understanding > is that it's not always the same on all implementations. Also, it may be > tough to keep track of all changes to a user's inbox. If there's a way > to communicate with the IMAP server via an API specific to that server, > especially if there's a hook that can be called on updates to the > message store, that would be ideal. Thanks Jen. I know Zoe (http://www.zoe.nu) uses Lucene to index IMAP dirs, but I'm uncertain how it goes about it...that might be a place to start. Thanks! -- Posted via http://www.ruby-forum.com/. From dbalmain.ml at gmail.com Wed Jan 11 19:58:00 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Thu, 12 Jan 2006 09:58:00 +0900 Subject: [Ferret-talk] Indexing so slow...... In-Reply-To: <35ae50b10512231828o754b4512q@mail.gmail.com> References: <35ae50b10512190611l6761152bo@mail.gmail.com> <6e72bbd70512190821h1aa276a7uad4e57faaeaa51d2@mail.gmail.com> <1135013028.2981.5.camel@localhost.localdomain> <35ae50b10512191649t4f86750fr@mail.gmail.com> <35ae50b10512200606u72abb805h@mail.gmail.com> <35ae50b10512230120n51b5f9efn@mail.gmail.com> <35ae50b10512231828o754b4512q@mail.gmail.com> Message-ID: Hi Hui, On 12/24/05, hui wrote: > I tried lucene last night, it is so fast. > about just one hour indexed 130,000 records (stored all data), which > was 10 hous using ferret (only id and without opmiziting). Lucene is certainly a lot faster than Ferret. That's why I'm working on cFerret. > > and it seems ferrer cannot use the lucene index data, I got an error: > The error probably occurred because of a difference in the way Lucene handles UTF-8 strings. Ferret always treats strings as an array of bytes, while Lucene treats them as an array of characters in some instances (not all). For example, a chinese character might have a length of 1 in a Lucene index and 4 in a Ferret index so the two indexes will be incompatible. This is something I discovered just before I went on holiday. I'm still contemplating what to do about this. Treating strings as arrays of bytes still seems preferable to me but it will make the Lucene indexes incompatible. Anyway, I'm afraid it's not high priority right now. I'd rather get cFerret finished so less people need/want to use Lucene. Hope this isn't too confusing. Cheers, Dave From erik at ehatchersolutions.com Wed Jan 11 20:26:46 2006 From: erik at ehatchersolutions.com (Erik Hatcher) Date: Wed, 11 Jan 2006 20:26:46 -0500 Subject: [Ferret-talk] Indexing so slow...... In-Reply-To: References: <35ae50b10512190611l6761152bo@mail.gmail.com> <6e72bbd70512190821h1aa276a7uad4e57faaeaa51d2@mail.gmail.com> <1135013028.2981.5.camel@localhost.localdomain> <35ae50b10512191649t4f86750fr@mail.gmail.com> <35ae50b10512200606u72abb805h@mail.gmail.com> <35ae50b10512230120n51b5f9efn@mail.gmail.com> <35ae50b10512231828o754b4512q@mail.gmail.com> Message-ID: On Jan 11, 2006, at 7:58 PM, David Balmain wrote: > Treating strings as arrays of bytes still seems preferable to me > but it will make the Lucene indexes incompatible. Please please don't let Ferret stay incompatible with Java Lucene. Interoping indexes is a major *feature* for me at least. There are very solid reasons to want interoperability. For example, there are fine libraries in Java to index various types of content that don't have decent Ruby counterparts, so indexing with Java can be preferable in these cases. If Java Lucene needs to change (and yes this issue has come up with someone doing a port of Java Lucene to Perl, no not Plucene but a different port). Check the java-dev archives (or maybe java-user?). There were patches offered, but there were downsides to it in terms of performace, if I recall correctly. Erik From dbalmain.ml at gmail.com Wed Jan 11 20:43:47 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Thu, 12 Jan 2006 10:43:47 +0900 Subject: [Ferret-talk] aligning Ferret's IndexSearcher.search API with Lucene's In-Reply-To: <6e72bbd70601021520s325943c7p7ff99be722c8aa03@mail.gmail.com> References: <6e72bbd70601021520s325943c7p7ff99be722c8aa03@mail.gmail.com> Message-ID: On 1/3/06, Finn Smith wrote: > Recently I've been revisiting some of my search code. With a greater > understanding of how Java Lucene implements its search methods, I > realized that one level of abstraction is not present in the Ferret > classes/methods. Here are the relevant method signatures: > > Ferret's search methods: > > in Ferret::Index::Index: > search(query, options = {}) -> returns a TopDocs > search_each(query, options = {}) {|doc, score| ...} -> yields to > context w/ doc and score for each hit > > in Ferret::Search::IndexSearcher: > search(query, options = {}) -> returns a TopDocs > search_each(query, filter = nil) {|doc, score| ...} -> yields to > context w/ doc and score for each hit > > > Lucene's search methods: > > in the interface Searchable: > public void search(Query query, Filter filter, HitCollector results) > public TopDocs search(Query query, Filter filter, int n) > public TopFieldDocs search(Query query, Filter filter, int n, Sort sort) > > in org.apache.lucene.search.Searcher (which implements Searchable): > public final Hits search(Query query) > public Hits search(Query query, Filter filter) > public Hits search(Query query, Sort sort) > public Hits search(Query query, Filter filter, Sort sort) > > > I was wondering if there were plans to implement the Hits class in > Ferret. (Or if someone were to write a patch implementing them, would > David integrate it into the source?) I'd be happy to integrate it if someone sends me a patch. Having said that... > It seems like it is a useful > abstraction since TopDocs does not allow you to access its hits by > index, only by the .each() method call. Actually you can access the hits by index like this; hit_three = topdocs.score_docs[2] The reason I didn't bother implementing the hits class is that I can't see that it adds anything useful. Really it all just seems a matter of notation. What is easiest for people to understand and remember. Adding the hits class might just make everthing a little more complicated. Please refer to Martin Fowler's discussion on the Humane interface; http://www.martinfowler.com/bliki/HumaneInterface.html While Java likes to have multiple different implementations of simple interfaces and a separate class for each data structure, in Ruby you can use an array for many different jobs; stack, list queue etc. I feel it would be better to do the same thing with TopDocs. Rather than adding the Hits class I feel it would be better to add the desired functionality to TopDocs. I'm happy to listen to other points of view. > Some questions: > * Will changing these methods break people's existing code? Perhaps. Depends what we change. Ferret is still beta though so I think it's open to non-backwards compatible changes if necessary, although we should avoid this if possible. > * Where is the proper place to put these methods? Move the methods > that return TopDocs to a module, which is more or less the same as a > Java interface, and implement the methods that return Hits directly in > the class? What is a good way to do this that feels Rubyish and takes > advantage of its strengths and idioms? I think I answered this already. I'd like to keep TopDocs as a class as add the desired functionality to it. > * The options to limit the search (first_doc and num_doc) in > Search::IndexSearcher and the code that implements them should > probably be moved out of Search::IndexSearcher into Index::Index I think this needs to stay in IndexSearcher as it limits the amount of memory used by a search. Even the java version allows you to specify nDocs. Hope this helps. Feedback is welcome. Cheers, Dave > * Are there lower level issues I am not aware of that would make any > of this a bad idea? > > Am I missing something here? Are there reasons not to have Ferret's > implementation of these methods and classes follow Java Lucene's as > closely as possible? I'd appreciate hearing your thoughts. > > -F > > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > From dbalmain.ml at gmail.com Wed Jan 11 21:23:30 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Thu, 12 Jan 2006 11:23:30 +0900 Subject: [Ferret-talk] Indexing so slow...... In-Reply-To: References: <35ae50b10512190611l6761152bo@mail.gmail.com> <1135013028.2981.5.camel@localhost.localdomain> <35ae50b10512191649t4f86750fr@mail.gmail.com> <35ae50b10512200606u72abb805h@mail.gmail.com> <35ae50b10512230120n51b5f9efn@mail.gmail.com> <35ae50b10512231828o754b4512q@mail.gmail.com> Message-ID: On 1/12/06, Erik Hatcher wrote: > > On Jan 11, 2006, at 7:58 PM, David Balmain wrote: > > Treating strings as arrays of bytes still seems preferable to me > > but it will make the Lucene indexes incompatible. > > Please please don't let Ferret stay incompatible with Java Lucene. > Interoping indexes is a major *feature* for me at least. There are > very solid reasons to want interoperability. For example, there are > fine libraries in Java to index various types of content that don't > have decent Ruby counterparts, so indexing with Java can be > preferable in these cases. Agreed. Don't for one second assume that I don't think this is important. It's just that it's not an easy issue to solve and I'd be wasting my time if I started working on it in the pure ruby version of Ferret. I'd have to repeat the work once cFerret is finished. > If Java Lucene needs to change (and yes this issue has come up with > someone doing a port of Java Lucene to Perl, no not Plucene but a > different port). Check the java-dev archives (or maybe java-user?). > There were patches offered, but there were downsides to it in terms > of performace, if I recall correctly. Here is the discussion; http://www.gossamer-threads.com/lists/lucene/java-dev/28334?search_string=perl%20unicode;#28334 >From reading this there are more issues at hand than just the performance. And I haven't seen any patches coming in for this so I'm evidently not the only person who thinks this is a difficult problem. My feeling is that I'll be better off submitting a patch to Lucene rather than fitting Ferret to work with the current Lucene files. That is probably what I'll do once I finish cFerret. Hopefully someone will get to it before I do. ;-) Just for sake of discussion, the alternative is to add another Directory implementation that is compatible with Lucene index. Not the most elegent solution but it will do the job and we won't have to sacrifice performance in Ferret for non-java indexes. I should note at this point that there would be a definite sacrifice in performance to make Ferret compatible with Lucene indexes but I'm not so sure the same is true the other way around. Dave > Erik > > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > From erik at ehatchersolutions.com Thu Jan 12 08:24:14 2006 From: erik at ehatchersolutions.com (Erik Hatcher) Date: Thu, 12 Jan 2006 08:24:14 -0500 Subject: [Ferret-talk] aligning Ferret's IndexSearcher.search API with Lucene's In-Reply-To: References: <6e72bbd70601021520s325943c7p7ff99be722c8aa03@mail.gmail.com> Message-ID: On Jan 11, 2006, at 8:43 PM, David Balmain wrote: >> I was wondering if there were plans to implement the Hits class in >> Ferret. (Or if someone were to write a patch implementing them, would >> David integrate it into the source?) > > I'd be happy to integrate it if someone sends me a patch. Having > said that... > >> It seems like it is a useful >> abstraction since TopDocs does not allow you to access its hits by >> index, only by the .each() method call. > > Actually you can access the hits by index like this; > > hit_three = topdocs.score_docs[2] > > The reason I didn't bother implementing the hits class is that I can't > see that it adds anything useful. Really it all just seems a matter of > notation. It's more than just notation. Hits performs some caching of Document objects as well as providing a means to iterate through the hits without having to manually re-search as it does it under the covers. Sure, it's perhaps a mere convenience, but a handy abstraction nonetheless. > What is easiest for people to understand and remember. > Adding the hits class might just make everthing a little more > complicated. Please refer to Martin Fowler's discussion on the Humane > interface; > > http://www.martinfowler.com/bliki/HumaneInterface.html > > While Java likes to have multiple different implementations of simple > interfaces and a separate class for each data structure, in Ruby you > can use an array for many different jobs; stack, list queue etc. I > feel it would be better to do the same thing with TopDocs. Rather than > adding the Hits class I feel it would be better to add the desired > functionality to TopDocs. I'm happy to listen to other points of view. I think not having Hits makes it more complicated for those coming from Java Lucene at least, but it is also a conceptual abstraction. One thinks of getting "hits" back from a search, not "top docs". So in that sense, the semantics of having Hits is powerful. Part of Fowler's argument is to have redundancy, aliases, and conveniences for the humane interface, and I think Hits would qualify in that regard. Erik From erik at ehatchersolutions.com Thu Jan 12 08:37:37 2006 From: erik at ehatchersolutions.com (Erik Hatcher) Date: Thu, 12 Jan 2006 08:37:37 -0500 Subject: [Ferret-talk] Ferret with IMAP dirs In-Reply-To: <674e6b47e05e868ae78d5b78f79e9062@ruby-forum.com> References: <43C41897.8040003@dangerousideas.com> <674e6b47e05e868ae78d5b78f79e9062@ruby-forum.com> Message-ID: <91FEA35F-2E66-409A-8B05-D9BA47004F43@ehatchersolutions.com> On Jan 11, 2006, at 7:17 AM, John Wells wrote: > jennyw jennyw wrote: >> This could be really challenging if you want it to work for multiple >> IMAP servers. If you target a specific one, though, you might have >> better luck. The biggest issue I see is that the UID of messages, >> although implied to always be the same by the IMAP RFC, my >> understanding >> is that it's not always the same on all implementations. Also, it >> may be >> tough to keep track of all changes to a user's inbox. If there's >> a way >> to communicate with the IMAP server via an API specific to that >> server, >> especially if there's a hook that can be called on updates to the >> message store, that would be ideal. > > Thanks Jen. I know Zoe (http://www.zoe.nu) uses Lucene to index IMAP > dirs, but I'm uncertain how it goes about it...that might be a > place to > start. Thanks! ZOE uses the IMAP (and POP, and others) networking protocols to read e-mail and then to index it in all sorts of intense and sophisticated ways. I'm not sure what Java library ZOE uses for this, but knowing the creator of it (we met once a couple of years ago) he probably built his own IMAP API from scratch using sockets. net/imap is built into Ruby itself, and is probably the way to start what you're doing. Erik From jennyw at dangerousideas.com Thu Jan 12 12:49:27 2006 From: jennyw at dangerousideas.com (jennyw) Date: Thu, 12 Jan 2006 09:49:27 -0800 Subject: [Ferret-talk] Ferret with IMAP dirs In-Reply-To: <91FEA35F-2E66-409A-8B05-D9BA47004F43@ehatchersolutions.com> References: <43C41897.8040003@dangerousideas.com> <674e6b47e05e868ae78d5b78f79e9062@ruby-forum.com> <91FEA35F-2E66-409A-8B05-D9BA47004F43@ehatchersolutions.com> Message-ID: <43C696A7.6060801@dangerousideas.com> Erik Hatcher wrote: >ZOE uses the IMAP (and POP, and others) networking protocols to read >e-mail and then to index it in all sorts of intense and sophisticated >ways. I'm not sure what Java library ZOE uses for this, but knowing >the creator of it (we met once a couple of years ago) he probably >built his own IMAP API from scratch using sockets. > > I'm pretty sure ZOE downloads all e-mail from the server and into its own message store. You then point your e-mail client to ZOE as your server. Last I checked, ZOE only supported POP clients, though. Jen From erik at ehatchersolutions.com Thu Jan 12 14:00:09 2006 From: erik at ehatchersolutions.com (Erik Hatcher) Date: Thu, 12 Jan 2006 14:00:09 -0500 Subject: [Ferret-talk] Ferret with IMAP dirs In-Reply-To: <43C696A7.6060801@dangerousideas.com> References: <43C41897.8040003@dangerousideas.com> <674e6b47e05e868ae78d5b78f79e9062@ruby-forum.com> <91FEA35F-2E66-409A-8B05-D9BA47004F43@ehatchersolutions.com> <43C696A7.6060801@dangerousideas.com> Message-ID: On Jan 12, 2006, at 12:49 PM, jennyw wrote: > Erik Hatcher wrote: > >> ZOE uses the IMAP (and POP, and others) networking protocols to read >> e-mail and then to index it in all sorts of intense and sophisticated >> ways. I'm not sure what Java library ZOE uses for this, but knowing >> the creator of it (we met once a couple of years ago) he probably >> built his own IMAP API from scratch using sockets. >> >> > I'm pretty sure ZOE downloads all e-mail from the server and into its > own message store. You then point your e-mail client to ZOE as your > server. Last I checked, ZOE only supported POP clients, though. I guess its a bit confusing on what aspect we're talking about here. ZOE is both a client and a server. ZOE is both a POP and IMAP _client_, but also a POP server as well as an SMTP server. I think it also serves as an IMAP server, though I'm not entirely sure. Pretty snazzy, and it's use of Lucene is uncanny. The main point here is that ZOE does speak IMAP and can grab mails from it. Erik From dbalmain.ml at gmail.com Thu Jan 12 18:52:08 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Fri, 13 Jan 2006 08:52:08 +0900 Subject: [Ferret-talk] aligning Ferret's IndexSearcher.search API with Lucene's In-Reply-To: References: <6e72bbd70601021520s325943c7p7ff99be722c8aa03@mail.gmail.com> Message-ID: On 1/12/06, Erik Hatcher wrote: > > On Jan 11, 2006, at 8:43 PM, David Balmain wrote: > >> I was wondering if there were plans to implement the Hits class in > >> Ferret. (Or if someone were to write a patch implementing them, would > >> David integrate it into the source?) > > > > I'd be happy to integrate it if someone sends me a patch. Having > > said that... > > > >> It seems like it is a useful > >> abstraction since TopDocs does not allow you to access its hits by > >> index, only by the .each() method call. > > > > Actually you can access the hits by index like this; > > > > hit_three = topdocs.score_docs[2] > > > > The reason I didn't bother implementing the hits class is that I can't > > see that it adds anything useful. Really it all just seems a matter of > > notation. > > It's more than just notation. Hits performs some caching of Document > objects as well as providing a means to iterate through the hits > without having to manually re-search as it does it under the covers. > Sure, it's perhaps a mere convenience, but a handy abstraction > nonetheless. > > > What is easiest for people to understand and remember. > > Adding the hits class might just make everthing a little more > > complicated. Please refer to Martin Fowler's discussion on the Humane > > interface; > > > > http://www.martinfowler.com/bliki/HumaneInterface.html > > > > While Java likes to have multiple different implementations of simple > > interfaces and a separate class for each data structure, in Ruby you > > can use an array for many different jobs; stack, list queue etc. I > > feel it would be better to do the same thing with TopDocs. Rather than > > adding the Hits class I feel it would be better to add the desired > > functionality to TopDocs. I'm happy to listen to other points of view. > > I think not having Hits makes it more complicated for those coming > from Java Lucene at least, but it is also a conceptual abstraction. > One thinks of getting "hits" back from a search, not "top docs". So > in that sense, the semantics of having Hits is powerful. Part of > Fowler's argument is to have redundancy, aliases, and conveniences > for the humane interface, and I think Hits would qualify in that regard. > > Erik I'm not arguing that TopDocs is a better name than Hits. Rather that having search methods return two different classes is unnecessary and not "The Ruby Way". My goal is to make Ferret easy for Ruby programmers to use, not Java programmers. So what I'd like to hear is an argument as to why having two separate classes - TopDocs and Hits - is superior to combining the functionality of both into one class. My personal feeling is that this is where the difference lies between Java and Ruby but I could easily be swayed. Dave From erik at ehatchersolutions.com Thu Jan 12 21:12:57 2006 From: erik at ehatchersolutions.com (Erik Hatcher) Date: Thu, 12 Jan 2006 21:12:57 -0500 Subject: [Ferret-talk] aligning Ferret's IndexSearcher.search API with Lucene's In-Reply-To: References: <6e72bbd70601021520s325943c7p7ff99be722c8aa03@mail.gmail.com> Message-ID: On Jan 12, 2006, at 6:52 PM, David Balmain wrote: > On 1/12/06, Erik Hatcher wrote: >> >> On Jan 11, 2006, at 8:43 PM, David Balmain wrote: >>>> I was wondering if there were plans to implement the Hits class in >>>> Ferret. (Or if someone were to write a patch implementing them, >>>> would >>>> David integrate it into the source?) >>> >>> I'd be happy to integrate it if someone sends me a patch. Having >>> said that... >>> >>>> It seems like it is a useful >>>> abstraction since TopDocs does not allow you to access its hits by >>>> index, only by the .each() method call. >>> >>> Actually you can access the hits by index like this; >>> >>> hit_three = topdocs.score_docs[2] >>> >>> The reason I didn't bother implementing the hits class is that I >>> can't >>> see that it adds anything useful. Really it all just seems a >>> matter of >>> notation. >> >> It's more than just notation. Hits performs some caching of Document >> objects as well as providing a means to iterate through the hits >> without having to manually re-search as it does it under the covers. >> Sure, it's perhaps a mere convenience, but a handy abstraction >> nonetheless. >> >>> What is easiest for people to understand and remember. >>> Adding the hits class might just make everthing a little more >>> complicated. Please refer to Martin Fowler's discussion on the >>> Humane >>> interface; >>> >>> http://www.martinfowler.com/bliki/HumaneInterface.html >>> >>> While Java likes to have multiple different implementations of >>> simple >>> interfaces and a separate class for each data structure, in Ruby you >>> can use an array for many different jobs; stack, list queue etc. I >>> feel it would be better to do the same thing with TopDocs. Rather >>> than >>> adding the Hits class I feel it would be better to add the desired >>> functionality to TopDocs. I'm happy to listen to other points of >>> view. >> >> I think not having Hits makes it more complicated for those coming >> from Java Lucene at least, but it is also a conceptual abstraction. >> One thinks of getting "hits" back from a search, not "top docs". So >> in that sense, the semantics of having Hits is powerful. Part of >> Fowler's argument is to have redundancy, aliases, and conveniences >> for the humane interface, and I think Hits would qualify in that >> regard. >> >> Erik > > I'm not arguing that TopDocs is a better name than Hits. Rather that > having search methods return two different classes is unnecessary and > not "The Ruby Way". My goal is to make Ferret easy for Ruby > programmers to use, not Java programmers. So what I'd like to hear is > an argument as to why having two separate classes - TopDocs and Hits - > is superior to combining the functionality of both into one class. My > personal feeling is that this is where the difference lies between > Java and Ruby but I could easily be swayed. It seems an injustice to Java in this regard. Surely Hits and TopDocs could have their functionality blended together into single class. There was an intentional separation, not some constraint that Java the language imposed. I'm being a bit defensive of the Lucene API here and don't want to see Ferret diverge too much from it for no real benefit. What's one more class in Ruby in this situation to maintain consistency across languages for the finest search engine available? Seems a small sacrifice of Ruby "purity" to make for the noble cause :) Just my $0.02. Practically no one in Java Lucene uses TopDocs - you'll notice that all of those search methods are marked as "Expert". Hits is the most common way to access search results, allowing them to automatically be paged through and have a bit of caching along with it. Erik From vamlists at gmx.net Fri Jan 13 00:18:53 2006 From: vamlists at gmx.net (Vamsee Kanakala) Date: Fri, 13 Jan 2006 10:48:53 +0530 Subject: [Ferret-talk] aligning Ferret's IndexSearcher.search API with Lucene's In-Reply-To: References: <6e72bbd70601021520s325943c7p7ff99be722c8aa03@mail.gmail.com> Message-ID: <43C7383D.2080505@gmx.net> David Balmain wrote: > My >personal feeling is that this is where the difference lies between >Java and Ruby but I could easily be swayed. > > > Hi Dave & Erik, I don't intend to hurt anybody's opinions, but let me speak up on something: I did some 2 years of Java programming, and was never really comfortable with its verbosity though I liked it for other things. I felt Ferret's API is already a bit un-Rubyish, if you know what I mean. It almost feels like I'm back to using Java libraries again. Learning Rails took a complete break from the way I was doing J2EE programming. But I totally love it, for it's refreshing simplicity and thereby came to love Ruby too. So what I mean to say is, don't be afraid to break compatibility - if it makes programmer's life easier. I agree with Dave's sentiments that Ferret should be better than Lucene. I'm guilty of not understanding what it really takes, but I think we should put 'making a developer's job easy' before anything else. If it means breaking compatibility with Lucene, I don't really mind. Of course, I speak from a very selfish point of view - I don't have any running Lucene apps to run or port. Just my 2 cents. Regards, Vamsee. From lists at sourceillustrated.com Fri Jan 13 01:13:05 2006 From: lists at sourceillustrated.com (John Wells) Date: Fri, 13 Jan 2006 07:13:05 +0100 Subject: [Ferret-talk] uninitialized constant Ferret Message-ID: I get this error message from the following code: require 'rubygems' require 'ferret' include Ferret index = Index::Index.new(:path => '/tmp/index') index << {:title => "Programming Ruby", :content => "blah blah blah"} index << {:title => "Programming Ruby", :content => "yada yada yada"} Yes, gems is installed and ferret is as well. Here's the exact output: ./ferret.rb:3: uninitialized constant Ferret (NameError) from /usr/local/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:18:in `require__' from /usr/local/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:18:in `require' from test_ferret.rb:2 Any ideas what I'm doing wrong? Ruby version 1.8.2 Thanks! John -- Posted via http://www.ruby-forum.com/. From mrj at bigpond.net.au Fri Jan 13 02:28:07 2006 From: mrj at bigpond.net.au (Mark James) Date: Fri, 13 Jan 2006 18:28:07 +1100 Subject: [Ferret-talk] Scoring by field Message-ID: <43C75687.10208@bigpond.net.au> Hello, I've added a feature to Ferret that breaks down the score of each matched document by the contribution from each of the document's fields. IndexSearcher now has a search_each_with_field_scores method, and TopDocs has an each_with_field_scores method. Both yield a third parameter, being a new score_by_field attribute of ScoreDoc. score_by_field is a hash with field_name keys and score values. I find this useful when I have indexed a document by using a separate field for each section of the document. The score breakdown can be used to provide a user with not only a link to each document in the search results, but to also a link to the most relevant section of those documents. Would this be something worthwhile to integrate into the Ferret distribution? None of the high-level APIs are broken, though scorers now return an array pair [score, score_by_field], rather than just the score. From mrj at bigpond.net.au Fri Jan 13 03:13:54 2006 From: mrj at bigpond.net.au (Mark James) Date: Fri, 13 Jan 2006 19:13:54 +1100 Subject: [Ferret-talk] uninitialized constant Ferret In-Reply-To: References: Message-ID: <43C76142.1030006@bigpond.net.au> John Wells wrote: > I get this error message from the following code: > > require 'rubygems' > require 'ferret' > include Ferret > > index = Index::Index.new(:path => '/tmp/index') > > index << {:title => "Programming Ruby", :content => "blah blah blah"} > index << {:title => "Programming Ruby", :content => "yada yada yada"} > > > > Yes, gems is installed and ferret is as well. Here's the exact output: > > ./ferret.rb:3: uninitialized constant Ferret (NameError) > from > /usr/local/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:18:in > `require__' > from > /usr/local/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:18:in > `require' > from test_ferret.rb:2 > > Any ideas what I'm doing wrong? Ruby version 1.8.2 I think it's because you've called your file 'ferret.rb'. It's requiring itself. From dbalmain.ml at gmail.com Fri Jan 13 05:13:52 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Fri, 13 Jan 2006 19:13:52 +0900 Subject: [Ferret-talk] aligning Ferret's IndexSearcher.search API with Lucene's In-Reply-To: References: <6e72bbd70601021520s325943c7p7ff99be722c8aa03@mail.gmail.com> Message-ID: On 1/13/06, Erik Hatcher wrote: > > On Jan 12, 2006, at 6:52 PM, David Balmain wrote: > > > On 1/12/06, Erik Hatcher wrote: > >> > >> On Jan 11, 2006, at 8:43 PM, David Balmain wrote: > >>>> I was wondering if there were plans to implement the Hits class in > >>>> Ferret. (Or if someone were to write a patch implementing them, > >>>> would > >>>> David integrate it into the source?) > >>> > >>> I'd be happy to integrate it if someone sends me a patch. Having > >>> said that... > >>> > >>>> It seems like it is a useful > >>>> abstraction since TopDocs does not allow you to access its hits by > >>>> index, only by the .each() method call. > >>> > >>> Actually you can access the hits by index like this; > >>> > >>> hit_three = topdocs.score_docs[2] > >>> > >>> The reason I didn't bother implementing the hits class is that I > >>> can't > >>> see that it adds anything useful. Really it all just seems a > >>> matter of > >>> notation. > >> > >> It's more than just notation. Hits performs some caching of Document > >> objects as well as providing a means to iterate through the hits > >> without having to manually re-search as it does it under the covers. > >> Sure, it's perhaps a mere convenience, but a handy abstraction > >> nonetheless. > >> > >>> What is easiest for people to understand and remember. > >>> Adding the hits class might just make everthing a little more > >>> complicated. Please refer to Martin Fowler's discussion on the > >>> Humane > >>> interface; > >>> > >>> http://www.martinfowler.com/bliki/HumaneInterface.html > >>> > >>> While Java likes to have multiple different implementations of > >>> simple > >>> interfaces and a separate class for each data structure, in Ruby you > >>> can use an array for many different jobs; stack, list queue etc. I > >>> feel it would be better to do the same thing with TopDocs. Rather > >>> than > >>> adding the Hits class I feel it would be better to add the desired > >>> functionality to TopDocs. I'm happy to listen to other points of > >>> view. > >> > >> I think not having Hits makes it more complicated for those coming > >> from Java Lucene at least, but it is also a conceptual abstraction. > >> One thinks of getting "hits" back from a search, not "top docs". So > >> in that sense, the semantics of having Hits is powerful. Part of > >> Fowler's argument is to have redundancy, aliases, and conveniences > >> for the humane interface, and I think Hits would qualify in that > >> regard. > >> > >> Erik > > > > I'm not arguing that TopDocs is a better name than Hits. Rather that > > having search methods return two different classes is unnecessary and > > not "The Ruby Way". My goal is to make Ferret easy for Ruby > > programmers to use, not Java programmers. So what I'd like to hear is > > an argument as to why having two separate classes - TopDocs and Hits - > > is superior to combining the functionality of both into one class. My > > personal feeling is that this is where the difference lies between > > Java and Ruby but I could easily be swayed. > > It seems an injustice to Java in this regard. Surely Hits and > TopDocs could have their functionality blended together into single > class. There was an intentional separation, not some constraint that > Java the language imposed. I was never implying there was some constraint imposed by the language itself. I'm talking about the way things are done in Ruby versus the way things are done in Java. There was an intentional seperation of ArrayList, Vector, Stack etc in Java too but it doesn't mean we have to do the same thing in Ruby. I'm not saying one way is better than the other. But Ferret is a Ruby library so I'd like to do it the Ruby way where possible. > I'm being a bit defensive of the Lucene API here and don't want to > see Ferret diverge too much from it for no real benefit. What's one > more class in Ruby in this situation to maintain consistency across > languages for the finest search engine available? Seems a small > sacrifice of Ruby "purity" to make for the noble cause :) Just my > $0.02. > > Practically no one in Java Lucene uses TopDocs - you'll notice that > all of those search methods are marked as "Expert". Hits is the most > common way to access search results, allowing them to automatically > be paged through and have a bit of caching along with it. If this is the case then maybe I should just return a Hits object, roll the TopDocs functionality into it and be done with it. If practically no one is using TopDocs then practically no one will miss it. ;-) What I was really looking for (and still hope to see) was an argument discussing the pros and cons of having the two separate classes (and "That's what they did in Java" doesn't count :-). Nevertheless, I've revisted the Hits class in Lucene and I've thought more about the issue at hand and Hits will be coming in the next release of Ferret. I haven't decided exactly how I'm going to do it yet. There will probably still be some differences from the Lucene API. For example, search_each() is here to stay. I'll probably bring it up for discussion again when I come to it. I still have a fair bit of work in cFerret before I get to that stage. Cheers, Dave From erik at ehatchersolutions.com Fri Jan 13 05:28:39 2006 From: erik at ehatchersolutions.com (Erik Hatcher) Date: Fri, 13 Jan 2006 05:28:39 -0500 Subject: [Ferret-talk] aligning Ferret's IndexSearcher.search API with Lucene's In-Reply-To: <43C7383D.2080505@gmx.net> References: <6e72bbd70601021520s325943c7p7ff99be722c8aa03@mail.gmail.com> <43C7383D.2080505@gmx.net> Message-ID: <3E81E9C7-C6DB-4300-865E-079B45C3904A@ehatchersolutions.com> Vamsee, No feelings hurt here, and I completely understand your sentiment. There are folks using Java Lucene that have expressed similar sentiment about its API, which is more C-like than Java-like in many ways. But, let's focus on the heart of this thing... a high-powered full-text search engine. The goal is speed and efficient use of resources. An elegant API is desirable but of a secondary nature. Hits is a pretty elegant way to navigate search results. I hope this simple class can find its way into Ferret and that the IndexSearcher API be made reasonably similar to Java Lucene. Dave has done a great job with the Index class in Ferret, which has features that Java Lucene does not - that of being able to flush and see changes right away (which is harder to manage in Java Lucene) and also of having keys to documents and managing an "update". In Java Lucene there is no concept of an update - there is only a remove and an add. I'm all for a slick Ruby API, but I would very much like to see it built on top of a Lucene compatible index format for interoperability. That interoperability is important to me and will very likely be important to others. Consider Nutch for example. It is an incredibly scalable web crawler and indexer. With index compatibility you could use Nutch to crawl the web and use Ferret for searching. Further, the HTML parsers I've used in Ruby are lousy compared to what is available in Java. Indexing in Java makes a lot of sense in many circumstances, but fronting an application with Rails and Ferret also makes a lot of sense. Dave is, of course, the creator and driver of Ferret. I encourage him to consider keeping index file compatibility, and keep basic API in tact for the classes that are direct ports of Lucene, and innovate on top of them rather than change them. He certainly may choose to do otherwise, but doing so would likely drive me (and perhaps others?) to other solutions. Erik On Jan 13, 2006, at 12:18 AM, Vamsee Kanakala wrote: > Hi Dave & Erik, > > I don't intend to hurt anybody's opinions, but let me speak up on > something: I did some 2 years of Java programming, and was never > really > comfortable with its verbosity though I liked it for other things. I > felt Ferret's API is already a bit un-Rubyish, if you know what I > mean. > It almost feels like I'm back to using Java libraries again. > > Learning Rails took a complete break from the way I was doing J2EE > programming. But I totally love it, for it's refreshing simplicity and > thereby came to love Ruby too. So what I mean to say is, don't be > afraid > to break compatibility - if it makes programmer's life easier. I agree > with Dave's sentiments that Ferret should be better than Lucene. I'm > guilty of not understanding what it really takes, but I think we > should > put 'making a developer's job easy' before anything else. If it means > breaking compatibility with Lucene, I don't really mind. > > Of course, I speak from a very selfish point of view - I don't have > any > running Lucene apps to run or port. Just my 2 cents. > > Regards, > Vamsee. > > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk From erik at ehatchersolutions.com Fri Jan 13 05:36:31 2006 From: erik at ehatchersolutions.com (Erik Hatcher) Date: Fri, 13 Jan 2006 05:36:31 -0500 Subject: [Ferret-talk] aligning Ferret's IndexSearcher.search API with Lucene's In-Reply-To: References: <6e72bbd70601021520s325943c7p7ff99be722c8aa03@mail.gmail.com> Message-ID: On Jan 13, 2006, at 5:13 AM, David Balmain wrote: > >> I'm being a bit defensive of the Lucene API here and don't want to >> see Ferret diverge too much from it for no real benefit. What's one >> more class in Ruby in this situation to maintain consistency across >> languages for the finest search engine available? Seems a small >> sacrifice of Ruby "purity" to make for the noble cause :) Just my >> $0.02. >> >> Practically no one in Java Lucene uses TopDocs - you'll notice that >> all of those search methods are marked as "Expert". Hits is the most >> common way to access search results, allowing them to automatically >> be paged through and have a bit of caching along with it. > > If this is the case then maybe I should just return a Hits object, > roll the TopDocs functionality into it and be done with it. If > practically no one is using TopDocs then practically no one will miss > it. ;-) My argument is mainly on why there is an issue with one more internal class. You've ported quite faithfully the underlying Lucene class structure and API. Why is this one more item a big deal? TopDocs is useful, don't get me wrong. It is just not used by the general Lucene consuming public, but many expert level folks do use it. I hope that Hits and TopDocs can stay, and whether it makes sense for them to be separate classes or not is really immaterial, but at least keep the most public and useful part of Lucene's API, IndexSearcher, as compatible as possible. > What I was really looking for (and still hope to see) was an argument > discussing the pros and cons of having the two separate classes (and > "That's what they did in Java" doesn't count :-). Keeping a consistent IndexSearcher API between Java Lucene and Ferret is definitely an argument that counts for me personally. Innovating "Ruby Way" features alongside that is also greatly desirable for sure! > Nevertheless, I've > revisted the Hits class in Lucene and I've thought more about the > issue at hand and Hits will be coming in the next release of Ferret. Yay!!! > I > haven't decided exactly how I'm going to do it yet. There will > probably still be some differences from the Lucene API. For example, > search_each() is here to stay. I'll probably bring it up for > discussion again when I come to it. I still have a fair bit of work in > cFerret before I get to that stage. Adding conveniences with block iteration and such make me extremely happy! PyLucene did the same thing. Erik p.s. whispering *GCJ... SWIG....* :) From lists at sourceillustrated.com Fri Jan 13 06:15:37 2006 From: lists at sourceillustrated.com (John Wells) Date: Fri, 13 Jan 2006 12:15:37 +0100 Subject: [Ferret-talk] uninitialized constant Ferret In-Reply-To: <43C76142.1030006@bigpond.net.au> References: <43C76142.1030006@bigpond.net.au> Message-ID: Mark James wrote: > John Wells wrote: >> >> `require' >> from test_ferret.rb:2 >> >> Any ideas what I'm doing wrong? Ruby version 1.8.2 > > I think it's because you've called your file 'ferret.rb'. > It's requiring itself. Hah...so it is! I changed it in the output I posted because I thought it might lead to confusion, but it never occurred to me that it might be causing problems for ruby itself. Working now...thanks! John -- Posted via http://www.ruby-forum.com/. From lists at sourceillustrated.com Fri Jan 13 09:31:54 2006 From: lists at sourceillustrated.com (John Wells) Date: Fri, 13 Jan 2006 15:31:54 +0100 Subject: [Ferret-talk] Ferret with IMAP dirs In-Reply-To: References: <43C41897.8040003@dangerousideas.com> <674e6b47e05e868ae78d5b78f79e9062@ruby-forum.com> <91FEA35F-2E66-409A-8B05-D9BA47004F43@ehatchersolutions.com> <43C696A7.6060801@dangerousideas.com> Message-ID: <0998009b68866bc69a0b2ef9a9daedaa@ruby-forum.com> Erik Hatcher wrote: > The main point here is that ZOE does speak IMAP and can grab mails > from it. Yep, and using net/imap in combination with Ferret is working very well so far. What a great project...thanks! John -- Posted via http://www.ruby-forum.com/. From fcsmith at gmail.com Fri Jan 13 14:05:08 2006 From: fcsmith at gmail.com (Finn Smith) Date: Fri, 13 Jan 2006 14:05:08 -0500 Subject: [Ferret-talk] aligning Ferret's IndexSearcher.search API with Lucene's In-Reply-To: References: <6e72bbd70601021520s325943c7p7ff99be722c8aa03@mail.gmail.com> Message-ID: <6e72bbd70601131105k126322bcm176804e802f52b71@mail.gmail.com> On 1/11/06, David Balmain wrote: > > * The options to limit the search (first_doc and num_doc) in > > Search::IndexSearcher and the code that implements them should > > probably be moved out of Search::IndexSearcher into Index::Index > > I think this needs to stay in IndexSearcher as it limits the amount of > memory used by a search. Even the java version allows you to specify > nDocs. Reviewing the code again, and taking another look at the Java code I think you're right about this. If there is a more general search method exposed that returns Hits I'll be happy. -F From fcsmith at gmail.com Fri Jan 13 14:42:26 2006 From: fcsmith at gmail.com (Finn Smith) Date: Fri, 13 Jan 2006 14:42:26 -0500 Subject: [Ferret-talk] aligning Ferret's IndexSearcher.search API with Lucene's In-Reply-To: References: <6e72bbd70601021520s325943c7p7ff99be722c8aa03@mail.gmail.com> Message-ID: <6e72bbd70601131142t1d1d09f8w16f43b04eec3e698@mail.gmail.com> On 1/13/06, David Balmain wrote: > On 1/13/06, Erik Hatcher wrote: > > Practically no one in Java Lucene uses TopDocs - you'll notice that > > all of those search methods are marked as "Expert". Hits is the most > > common way to access search results, allowing them to automatically > > be paged through and have a bit of caching along with it. > > If this is the case then maybe I should just return a Hits object, > roll the TopDocs functionality into it and be done with it. If > practically no one is using TopDocs then practically no one will miss > it. ;-) > > What I was really looking for (and still hope to see) was an argument > discussing the pros and cons of having the two separate classes (and > "That's what they did in Java" doesn't count :-). Nevertheless, I've > revisted the Hits class in Lucene and I've thought more about the > issue at hand and Hits will be coming in the next release of Ferret. I > haven't decided exactly how I'm going to do it yet. There will > probably still be some differences from the Lucene API. For example, > search_each() is here to stay. I'll probably bring it up for > discussion again when I come to it. I still have a fair bit of work in > cFerret before I get to that stage. I was curious how this problem was addressed in other languages that are not as strongly typed as Java so I took a look at the Plucene implementation. In Plucene there is an abstract base class Searcher which IndexSearcher inherits from. Searcher has the method search which instantiates a Hits object and passes "self" in as the searcher argument before returning the newly created Hits object. The abstract method search_top is implemented in IndexSearcher and returns TopDocs. The search_top method is used internally by Hits when retrieving results. This follows the Java implementation pretty closely while still having some of the advantages of more dynamic languages. A method isn't defined for each possible combination of arguments. Rather, methods are identified by their functionality as reflected in their name. This is in contrast to Java where a bunch of methods with the same name ("search") are identified by the method signature consisting of return type, name and arguments. I don't know if it will be any help, but it might be worth glancing through the Plucene code for another perspective on how to organize the various objects and their interactions. -F From lists at sourceillustrated.com Fri Jan 13 23:11:42 2006 From: lists at sourceillustrated.com (John Wells) Date: Sat, 14 Jan 2006 05:11:42 +0100 Subject: [Ferret-talk] Ferret with IMAP dirs In-Reply-To: <0998009b68866bc69a0b2ef9a9daedaa@ruby-forum.com> References: <43C41897.8040003@dangerousideas.com> <674e6b47e05e868ae78d5b78f79e9062@ruby-forum.com> <91FEA35F-2E66-409A-8B05-D9BA47004F43@ehatchersolutions.com> <43C696A7.6060801@dangerousideas.com> <0998009b68866bc69a0b2ef9a9daedaa@ruby-forum.com> Message-ID: <5b56a5d485eec9f0bd4b39cdbfd73448@ruby-forum.com> John Wells wrote: > Yep, and using net/imap in combination with Ferret is working very well > so far. Correction...was working fine. It seems to freeze up when the index directory size hits around 178 megs (I'm indexing a 2.2 G mail account). Has anyone else experienced any problems with large indexes? Strace'ing to the process shows no activity at all, yet CPU utilization by the process in at 97.6%. Any ideas? Btw, the index it was able to create works great...I can't wait to have the whole 2 GB indexed! Thanks, John -- Posted via http://www.ruby-forum.com/. From lists at sourceillustrated.com Fri Jan 13 23:41:35 2006 From: lists at sourceillustrated.com (John Wells) Date: Sat, 14 Jan 2006 05:41:35 +0100 Subject: [Ferret-talk] Ferret with IMAP dirs In-Reply-To: <5b56a5d485eec9f0bd4b39cdbfd73448@ruby-forum.com> References: <43C41897.8040003@dangerousideas.com> <674e6b47e05e868ae78d5b78f79e9062@ruby-forum.com> <91FEA35F-2E66-409A-8B05-D9BA47004F43@ehatchersolutions.com> <43C696A7.6060801@dangerousideas.com> <0998009b68866bc69a0b2ef9a9daedaa@ruby-forum.com> <5b56a5d485eec9f0bd4b39cdbfd73448@ruby-forum.com> Message-ID: <8a7a35eaa0d44213aa890d083b6158cc@ruby-forum.com> Here's the stack trace when I control+c out of it: /usr/lib/ruby/gems/1.8/gems/ferret-0.3.2/lib/ferret/analysis/tokenizers.rb:49:in `scan_until': Interrupt from /usr/lib/ruby/gems/1.8/gems/ferret-0.3.2/lib/ferret/analysis/tokenizers.rb:49:in `next' from /usr/lib/ruby/gems/1.8/gems/ferret-0.3.2/lib/ferret/analysis/token_filters.rb:21:in `next' from /usr/lib/ruby/gems/1.8/gems/ferret-0.3.2/lib/ferret/analysis/token_filters.rb:52:in `next' from /usr/lib/ruby/gems/1.8/gems/ferret-0.3.2/lib/ferret/index/document_writer.rb:122:in `invert_document' from /usr/lib/ruby/gems/1.8/gems/ferret-0.3.2/lib/ferret/index/document_writer.rb:88:in `each' from /usr/lib/ruby/gems/1.8/gems/ferret-0.3.2/lib/ferret/index/document_writer.rb:88:in `invert_document' from /usr/lib/ruby/gems/1.8/gems/ferret-0.3.2/lib/ferret/index/document_writer.rb:58:in `add_document' from /usr/lib/ruby/gems/1.8/gems/ferret-0.3.2/lib/ferret/index/index_writer.rb:158:in `add_document' from /usr/lib/ruby/gems/1.8/gems/ferret-0.3.2/lib/ferret/index/index.rb:270:in `<<' from /usr/lib/ruby/gems/1.8/gems/ferret-0.3.2/lib/ferret/index/index.rb:238:in `synchronize' from /usr/lib/ruby/gems/1.8/gems/ferret-0.3.2/lib/ferret/index/index.rb:238:in `<<' from /home/jb/ruby/fermail.rb:43:in `index_it' from /home/jb/ruby/fermail.rb:18:in `each' from /home/jb/ruby/fermail.rb:18:in `index_it' from /home/jb/ruby/fermail.rb:70 from /home/jb/ruby/fermail.rb:64:in `each' from /home/jb/ruby/fermail.rb:64 -- Posted via http://www.ruby-forum.com/. From dbalmain.ml at gmail.com Sat Jan 14 05:20:21 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Sat, 14 Jan 2006 19:20:21 +0900 Subject: [Ferret-talk] aligning Ferret's IndexSearcher.search API with Lucene's In-Reply-To: <6e72bbd70601131142t1d1d09f8w16f43b04eec3e698@mail.gmail.com> References: <6e72bbd70601021520s325943c7p7ff99be722c8aa03@mail.gmail.com> <6e72bbd70601131142t1d1d09f8w16f43b04eec3e698@mail.gmail.com> Message-ID: On 1/14/06, Finn Smith wrote: > On 1/13/06, David Balmain wrote: > > On 1/13/06, Erik Hatcher wrote: > > > Practically no one in Java Lucene uses TopDocs - you'll notice that > > > all of those search methods are marked as "Expert". Hits is the most > > > common way to access search results, allowing them to automatically > > > be paged through and have a bit of caching along with it. > > > > If this is the case then maybe I should just return a Hits object, > > roll the TopDocs functionality into it and be done with it. If > > practically no one is using TopDocs then practically no one will miss > > it. ;-) > > > > What I was really looking for (and still hope to see) was an argument > > discussing the pros and cons of having the two separate classes (and > > "That's what they did in Java" doesn't count :-). Nevertheless, I've > > revisted the Hits class in Lucene and I've thought more about the > > issue at hand and Hits will be coming in the next release of Ferret. I > > haven't decided exactly how I'm going to do it yet. There will > > probably still be some differences from the Lucene API. For example, > > search_each() is here to stay. I'll probably bring it up for > > discussion again when I come to it. I still have a fair bit of work in > > cFerret before I get to that stage. > > I was curious how this problem was addressed in other languages that > are not as strongly typed as Java so I took a look at the Plucene > implementation. > > In Plucene there is an abstract base class Searcher which > IndexSearcher inherits from. Searcher has the method search which > instantiates a Hits object and passes "self" in as the searcher > argument before returning the newly created Hits object. The abstract > method search_top is implemented in IndexSearcher and returns TopDocs. > The search_top method is used internally by Hits when retrieving > results. > > This follows the Java implementation pretty closely while still having > some of the advantages of more dynamic languages. A method isn't > defined for each possible combination of arguments. Rather, methods > are identified by their functionality as reflected in their name. This > is in contrast to Java where a bunch of methods with the same name > ("search") are identified by the method signature consisting of return > type, name and arguments. > > I don't know if it will be any help, but it might be worth glancing > through the Plucene code for another perspective on how to organize > the various objects and their interactions. > > -F Thanks Finn. I have downloaded PyLucene, Plucene, Lupy etc. and I have been using all of them to solve various problems. I will certainly study all of their search APIs. Cheers, Dave From mrj at bigpond.net.au Sat Jan 14 07:37:54 2006 From: mrj at bigpond.net.au (Mark James) Date: Sat, 14 Jan 2006 23:37:54 +1100 Subject: [Ferret-talk] Adjusting scores Message-ID: <43C8F0A2.10502@bigpond.net.au> One other mod to Ferret I've found useful is to add the following line at the top of the each_hit() block in Search::IndexSearcher.search: score = yield( doc, score ) if block_given? This allows a block attached to a search call to adjust document scores before documents are sorted, based on some (possibly dynamic) numerical factors associated with the document, e.g. the number and importance of incoming links to the document (Google's PageRank). From lists at sourceillustrated.com Sat Jan 14 09:56:10 2006 From: lists at sourceillustrated.com (John Wells) Date: Sat, 14 Jan 2006 15:56:10 +0100 Subject: [Ferret-talk] Ferret with IMAP dirs In-Reply-To: <8a7a35eaa0d44213aa890d083b6158cc@ruby-forum.com> References: <43C41897.8040003@dangerousideas.com> <674e6b47e05e868ae78d5b78f79e9062@ruby-forum.com> <91FEA35F-2E66-409A-8B05-D9BA47004F43@ehatchersolutions.com> <43C696A7.6060801@dangerousideas.com> <0998009b68866bc69a0b2ef9a9daedaa@ruby-forum.com> <5b56a5d485eec9f0bd4b39cdbfd73448@ruby-forum.com> <8a7a35eaa0d44213aa890d083b6158cc@ruby-forum.com> Message-ID: <774932d859a3d14419ed384fa19e73b4@ruby-forum.com> I removed the message it was hanging on, but it's still stopping at 178 meg, no matter what I do. Any ideas what might be causing this? I have plenty of disk space... Thanks, John -- Posted via http://www.ruby-forum.com/. From dbalmain.ml at gmail.com Sat Jan 14 18:24:15 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Sun, 15 Jan 2006 08:24:15 +0900 Subject: [Ferret-talk] Adjusting scores In-Reply-To: <43C8F0A2.10502@bigpond.net.au> References: <43C8F0A2.10502@bigpond.net.au> Message-ID: Thanks Mark. Keep these ideas coming. I'll be considering them all for the next version of Ferret. Dave On 1/14/06, Mark James wrote: > One other mod to Ferret I've found useful is to add > the following line at the top of the each_hit() block > in Search::IndexSearcher.search: > > score = yield( doc, score ) if block_given? > > This allows a block attached to a search call to adjust > document scores before documents are sorted, based on > some (possibly dynamic) numerical factors associated > with the document, e.g. the number and importance of > incoming links to the document (Google's PageRank). > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > From dbalmain.ml at gmail.com Sat Jan 14 18:28:04 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Sun, 15 Jan 2006 08:28:04 +0900 Subject: [Ferret-talk] Ferret with IMAP dirs In-Reply-To: <774932d859a3d14419ed384fa19e73b4@ruby-forum.com> References: <43C41897.8040003@dangerousideas.com> <674e6b47e05e868ae78d5b78f79e9062@ruby-forum.com> <91FEA35F-2E66-409A-8B05-D9BA47004F43@ehatchersolutions.com> <43C696A7.6060801@dangerousideas.com> <0998009b68866bc69a0b2ef9a9daedaa@ruby-forum.com> <5b56a5d485eec9f0bd4b39cdbfd73448@ruby-forum.com> <8a7a35eaa0d44213aa890d083b6158cc@ruby-forum.com> <774932d859a3d14419ed384fa19e73b4@ruby-forum.com> Message-ID: Hi John, I'm not exactly sure what is causing your problem. It may just be that the 178Mgb mark is the point where you have 10,000 documents being merged or something. Do you know how documents are in the index at that point? Anyway, I don't really have time to look into it right now as I think most of these types of problems will be sorted out when I finally release the new version of Ferret backed by cFerret. I can't say when that will be but hopefully it won't be too far away. Sorry to keep everyone waiting. Cheers, Dave On 1/14/06, John Wells wrote: > I removed the message it was hanging on, but it's still stopping at 178 > meg, no matter what I do. Any ideas what might be causing this? I have > plenty of disk space... > > Thanks, > John > > -- > Posted via http://www.ruby-forum.com/. > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > From lists at sourceillustrated.com Sun Jan 15 00:57:44 2006 From: lists at sourceillustrated.com (John Wells) Date: Sun, 15 Jan 2006 06:57:44 +0100 Subject: [Ferret-talk] Ferret with IMAP dirs In-Reply-To: References: <43C41897.8040003@dangerousideas.com> <674e6b47e05e868ae78d5b78f79e9062@ruby-forum.com> <91FEA35F-2E66-409A-8B05-D9BA47004F43@ehatchersolutions.com> <43C696A7.6060801@dangerousideas.com> <0998009b68866bc69a0b2ef9a9daedaa@ruby-forum.com> <5b56a5d485eec9f0bd4b39cdbfd73448@ruby-forum.com> <8a7a35eaa0d44213aa890d083b6158cc@ruby-forum.com> <774932d859a3d14419ed384fa19e73b4@ruby-forum.com> Message-ID: <69db56a9e39706c2746b48aceb09806e@ruby-forum.com> David Balmain wrote: > I'm not exactly sure what is causing your problem. It may just be that > the 178Mgb mark is the point where you have 10,000 documents being > merged or something. Do you know how documents are in the index at > that point? Anyway, I don't really have time to look into it right now > as I think most of these types of problems will be sorted out when I > finally release the new version of Ferret backed by cFerret. I can't > say when that will be but hopefully it won't be too far away. Hello Dave, It stops consistently at 2902 documents, but when I disabled fetching of the email body it went beyond this. Strange error indeed. I'm going to continue trying to figure out what's going on. Any chance you could reenable the cFerret svn repository on your server? Tried to download per instructions but received connection refused. Thanks for your help! John -- Posted via http://www.ruby-forum.com/. From dbalmain.ml at gmail.com Sun Jan 15 06:31:08 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Sun, 15 Jan 2006 20:31:08 +0900 Subject: [Ferret-talk] Ferret with IMAP dirs In-Reply-To: <69db56a9e39706c2746b48aceb09806e@ruby-forum.com> References: <91FEA35F-2E66-409A-8B05-D9BA47004F43@ehatchersolutions.com> <43C696A7.6060801@dangerousideas.com> <0998009b68866bc69a0b2ef9a9daedaa@ruby-forum.com> <5b56a5d485eec9f0bd4b39cdbfd73448@ruby-forum.com> <8a7a35eaa0d44213aa890d083b6158cc@ruby-forum.com> <774932d859a3d14419ed384fa19e73b4@ruby-forum.com> <69db56a9e39706c2746b48aceb09806e@ruby-forum.com> Message-ID: > Any chance you could reenable the cFerret svn repository on your server? > Tried to download per instructions but received connection refused. done From lists at sourceillustrated.com Mon Jan 16 12:01:46 2006 From: lists at sourceillustrated.com (John Wells) Date: Mon, 16 Jan 2006 18:01:46 +0100 Subject: [Ferret-talk] Ferret with IMAP dirs In-Reply-To: References: <43C41897.8040003@dangerousideas.com> <674e6b47e05e868ae78d5b78f79e9062@ruby-forum.com> <91FEA35F-2E66-409A-8B05-D9BA47004F43@ehatchersolutions.com> <43C696A7.6060801@dangerousideas.com> <0998009b68866bc69a0b2ef9a9daedaa@ruby-forum.com> <5b56a5d485eec9f0bd4b39cdbfd73448@ruby-forum.com> <8a7a35eaa0d44213aa890d083b6158cc@ruby-forum.com> <774932d859a3d14419ed384fa19e73b4@ruby-forum.com> <69db56a9e39706c2746b48aceb09806e@ruby-forum.com> Message-ID: <000a4eb92375754d4c0f454b70576985@ruby-forum.com> David Balmain wrote: >> Any chance you could reenable the cFerret svn repository on your server? >> Tried to download per instructions but received connection refused. David, Thanks. Btw, I'm very interested in still understanding what's causing my current problem. I'd like to take a stab at it myself, but would ask for a pointer on getting started. What approach would you take in tracking this problem down? I thought about running the script in the debugger but man, the added overhead would've caused it to run forever. Any debug logging I can enable in ferret? Anything else you could suggest? Thanks for the great work and the help! John -- Posted via http://www.ruby-forum.com/. From seth.fitzsimmons at gmail.com Wed Jan 18 14:54:09 2006 From: seth.fitzsimmons at gmail.com (Seth Fitzsimmons) Date: Wed, 18 Jan 2006 14:54:09 -0500 Subject: [Ferret-talk] different Analyzer defaults Message-ID: Hi. While experimenting with the QueryParser to search fields containing only numbers, I discovered that the default Analyzer for IndexWriter differs from QueryParser's (StandardAnalyzer vs. Analyzer). Is there a reason for this? (Trac ticket is here:) http://ferret.davebalmain.com/trac/ticket/27 seth From atomgiant at gmail.com Fri Jan 20 08:39:42 2006 From: atomgiant at gmail.com (Tom Davies) Date: Fri, 20 Jan 2006 08:39:42 -0500 Subject: [Ferret-talk] Questions about Searching Message-ID: Hi, I have some questions about searching with Ferret. I have a user index with first_name, last_name and full_name (which is just first plus last with a space). Here are a couple of questions: 1) If I store the fields tokenized, it appears as though queries are case-insensitive. However, for untokenized, the query is case-sensitive. How can I make the untokenized searches case-insensitive? 2) If I have a field with whitespace in it, how can I search for the whitespace using wildcard searches. For instance, if the full_name I am searching for is "John Doe", how can I build a query for that. I have tried numerous combinations, here are a couple I tried: full_name:"#{query}"* <-- This will match every field in the index full_name:"#{query}*" <-- This matches nothing 3) When I store the fields as untokenized, exact matches seem to not work for me anymore. For instance, this query worked for tokenized first_name, but does not for untokenized first_name: first_name:John But this query will return results: first_name:Joh? 4) Is there a better way to search for the first and last name combination that storing another index with them concatenated? Thanks, Tom From erik at ehatchersolutions.com Fri Jan 20 10:34:17 2006 From: erik at ehatchersolutions.com (Erik Hatcher) Date: Fri, 20 Jan 2006 10:34:17 -0500 Subject: [Ferret-talk] Questions about Searching In-Reply-To: References: Message-ID: <8344221E-14BF-4857-8908-3AE69C2642C8@ehatchersolutions.com> On Jan 20, 2006, at 8:39 AM, Tom Davies wrote: > Here are a couple of questions: > > 1) If I store the fields tokenized, it appears as though queries are > case-insensitive. However, for untokenized, the query is > case-sensitive. How can I make the untokenized searches > case-insensitive? By lowercasing the text you index and lowercasing the text in the query. Search matches are case sensitive always, but generally tokenized fields get lowercased along the way, and the query parser lowercases terms also (generally by the same analyzer). > 2) If I have a field with whitespace in it, how can I search for the > whitespace using wildcard searches. For instance, if the full_name I > am searching for is "John Doe", how can I build a query for that. I > have tried numerous combinations, here are a couple I tried: > full_name:"#{query}"* <-- This will match every field in the index > full_name:"#{query}*" <-- This matches nothing I strongly suspect the issue is the field being analyzed during query parsing. I'm not sure what facilities Ferret has for doing this exactly off the top of my head, but in Java Lucene there is a PerFieldAnalyzerWrapper that helps with this. The space would be problematic, as well as the double quotes in how you have created it. You may need to create a WildcardQuery via the API rather than using the parser. > 3) When I store the fields as untokenized, exact matches seem to not > work for me anymore. For instance, this query worked for tokenized > first_name, but does not for untokenized first_name: > first_name:John > > But this query will return results: > first_name:Joh? This again has to do with the case and analyzer issue. You are using a parser that does analysis of the text. Try using the parser to create a Query and see what it consists of (.to_s?). > 4) Is there a better way to search for the first and last name > combination that storing another index with them concatenated? It really all depends on what your searching needs are. What does the user interface for searching demand? Erik From atomgiant at gmail.com Fri Jan 20 10:56:46 2006 From: atomgiant at gmail.com (Tom Davies) Date: Fri, 20 Jan 2006 10:56:46 -0500 Subject: [Ferret-talk] Questions about Searching In-Reply-To: <8344221E-14BF-4857-8908-3AE69C2642C8@ehatchersolutions.com> References: <8344221E-14BF-4857-8908-3AE69C2642C8@ehatchersolutions.com> Message-ID: Thanks Erik. Very informative. I suspect the QueryParser either has some bugs or is not designed to handle this scenario. I will try manually building the specific types of queries via the API. > It really all depends on what your searching needs are. What does > the user interface for searching demand? For the full name searches, I just wanted wild card matches on the right hand side of the query. For instance, any of these should result in john doe being found: J, Jo, Joh, John, John D, etc. Tom From erik at ehatchersolutions.com Fri Jan 20 13:15:33 2006 From: erik at ehatchersolutions.com (Erik Hatcher) Date: Fri, 20 Jan 2006 13:15:33 -0500 Subject: [Ferret-talk] Questions about Searching In-Reply-To: References: <8344221E-14BF-4857-8908-3AE69C2642C8@ehatchersolutions.com> Message-ID: <8E046E6E-B8E7-46FB-A0A6-F34EA21FA82F@ehatchersolutions.com> On Jan 20, 2006, at 10:56 AM, Tom Davies wrote: > Thanks Erik. Very informative. I suspect the QueryParser either has > some bugs or is not designed to handle this scenario. I will try > manually building the specific types of queries via the API. There are many tricky scenarios because of the necessity for whitespace and special characters to be handled as separators and operators and the analyzer (and when it is used) with the query parser. So no bugs, per se, I don't think in this case. My article at java.net covers this (in the context of Java) in some of its glory and frustration I think: >> It really all depends on what your searching needs are. What does >> the user interface for searching demand? > > For the full name searches, I just wanted wild card matches on the > right hand side of the query. For instance, any of these should > result in john doe being found: > J, Jo, Joh, John, John D, etc. The simplest thing to do in this case is what you're doing for indexing... combine a field with "firstname lastname" as untokenized, though lowercased. Then build a WildcardQuery for "piece*" - though this isn't going to be possible with the whitespace involved when using the parser, I don't think (unless you can escape it somehow). Be sure to lowercase the query also. Erik From blee at alumni.caltech.edu Sat Jan 21 19:31:33 2006 From: blee at alumni.caltech.edu (Ben Lee) Date: Sun, 22 Jan 2006 01:31:33 +0100 Subject: [Ferret-talk] Balancing relevancy and recentness Message-ID: <71889b359b8616848f44a801fb2eacd2@ruby-forum.com> I was wondering if there was a good way to either balance the relevancy score with recentness of matching documents- or include the recentness in the score somehow? Thanks, Ben -- Posted via http://www.ruby-forum.com/. From dbalmain.ml at gmail.com Sat Jan 21 20:10:54 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Sun, 22 Jan 2006 10:10:54 +0900 Subject: [Ferret-talk] Balancing relevancy and recentness In-Reply-To: <71889b359b8616848f44a801fb2eacd2@ruby-forum.com> References: <71889b359b8616848f44a801fb2eacd2@ruby-forum.com> Message-ID: Hi Ben, Currently there is no way to do this. You can easily sort by the age of the document but to score by the age of the document is not possible without making a change to Ferret. Mark James came up with this idea recently; > One other mod to Ferret I've found useful is to add > the following line at the top of the each_hit() block > in Search::IndexSearcher.search: > > score = yield( doc, score ) if block_given? > > This allows a block attached to a search call to adjust > document scores before documents are sorted, based on > some (possibly dynamic) numerical factors associated > with the document, e.g. the number and importance With this change you'd be able to modify the score based on the age of the document. Hope that helps. Cheers, Dave On 1/22/06, Ben Lee wrote: > I was wondering if there was a good way to either balance the relevancy > score with recentness of matching documents- or include the recentness > in the score somehow? > > Thanks, > Ben > > -- > Posted via http://www.ruby-forum.com/. > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > From erik at ehatchersolutions.com Sun Jan 22 20:37:05 2006 From: erik at ehatchersolutions.com (Erik Hatcher) Date: Sun, 22 Jan 2006 20:37:05 -0500 Subject: [Ferret-talk] Balancing relevancy and recentness In-Reply-To: References: <71889b359b8616848f44a801fb2eacd2@ruby-forum.com> Message-ID: <87A9B956-B922-42E8-A107-75561B1A5E66@ehatchersolutions.com> As long as Ferret does what Lucene does with boosts, you could scale document boosts at indexing time by some factor related to age and that will factor into scoring. Right, Dave? For a real-world example of this, look at TheServerSide case study in "Lucene in Action" and online here: (search for "boost" to hone in on the specific topic) Erik On Jan 21, 2006, at 8:10 PM, David Balmain wrote: > Hi Ben, > > Currently there is no way to do this. You can easily sort by the age > of the document but to score by the age of the document is not > possible without making a change to Ferret. Mark James came up with > this idea recently; > >> One other mod to Ferret I've found useful is to add >> the following line at the top of the each_hit() block >> in Search::IndexSearcher.search: >> >> score = yield( doc, score ) if block_given? >> >> This allows a block attached to a search call to adjust >> document scores before documents are sorted, based on >> some (possibly dynamic) numerical factors associated >> with the document, e.g. the number and importance > > With this change you'd be able to modify the score based on the age of > the document. Hope that helps. > > Cheers, > Dave > > On 1/22/06, Ben Lee wrote: >> I was wondering if there was a good way to either balance the >> relevancy >> score with recentness of matching documents- or include the >> recentness >> in the score somehow? >> >> Thanks, >> Ben >> >> -- >> Posted via http://www.ruby-forum.com/. >> _______________________________________________ >> Ferret-talk mailing list >> Ferret-talk at rubyforge.org >> http://rubyforge.org/mailman/listinfo/ferret-talk >> > > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk From atomgiant at gmail.com Tue Jan 24 08:05:00 2006 From: atomgiant at gmail.com (Tom Davies) Date: Tue, 24 Jan 2006 08:05:00 -0500 Subject: [Ferret-talk] Questions about Searching In-Reply-To: <8E046E6E-B8E7-46FB-A0A6-F34EA21FA82F@ehatchersolutions.com> References: <8344221E-14BF-4857-8908-3AE69C2642C8@ehatchersolutions.com> <8E046E6E-B8E7-46FB-A0A6-F34EA21FA82F@ehatchersolutions.com> Message-ID: Thanks Erik. Nice article. I was able to get the wildcard search to work including whitespace by manually creating the query as follows: qp = Ferret::QueryParser.new query = qp.get_wild_query('full_name', "#{partial}*") INDEX.search_each(query) do |doc, score| where #{partial} is the partial portion of the full name. Thanks for your responses. Tom On 1/20/06, Erik Hatcher wrote: > > On Jan 20, 2006, at 10:56 AM, Tom Davies wrote: > > Thanks Erik. Very informative. I suspect the QueryParser either has > > some bugs or is not designed to handle this scenario. I will try > > manually building the specific types of queries via the API. > > There are many tricky scenarios because of the necessity for > whitespace and special characters to be handled as separators and > operators and the analyzer (and when it is used) with the query parser. > > So no bugs, per se, I don't think in this case. > > My article at java.net covers this (in the context of Java) in some > of its glory and frustration I think: > > > > >> It really all depends on what your searching needs are. What does > >> the user interface for searching demand? > > > > For the full name searches, I just wanted wild card matches on the > > right hand side of the query. For instance, any of these should > > result in john doe being found: > > J, Jo, Joh, John, John D, etc. > > The simplest thing to do in this case is what you're doing for > indexing... combine a field with "firstname lastname" as untokenized, > though lowercased. Then build a WildcardQuery for "piece*" - though > this isn't going to be possible with the whitespace involved when > using the parser, I don't think (unless you can escape it somehow). > Be sure to lowercase the query also. > > Erik > > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > From dbalmain.ml at gmail.com Tue Jan 24 21:04:29 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Wed, 25 Jan 2006 11:04:29 +0900 Subject: [Ferret-talk] Balancing relevancy and recentness In-Reply-To: <87A9B956-B922-42E8-A107-75561B1A5E66@ehatchersolutions.com> References: <71889b359b8616848f44a801fb2eacd2@ruby-forum.com> <87A9B956-B922-42E8-A107-75561B1A5E66@ehatchersolutions.com> Message-ID: On 1/23/06, Erik Hatcher wrote: > As long as Ferret does what Lucene does with boosts, you could scale > document boosts at indexing time by some factor related to age and > that will factor into scoring. Right, Dave? Sorry for the slow reply. Sure you could do this. Ferret handles boosts in exactly the same way. > For a real-world example of this, look at TheServerSide case study in > "Lucene in Action" and online here: > > > > (search for "boost" to hone in on the specific topic) > > Erik > > > On Jan 21, 2006, at 8:10 PM, David Balmain wrote: > > > Hi Ben, > > > > Currently there is no way to do this. You can easily sort by the age > > of the document but to score by the age of the document is not > > possible without making a change to Ferret. Mark James came up with > > this idea recently; > > > >> One other mod to Ferret I've found useful is to add > >> the following line at the top of the each_hit() block > >> in Search::IndexSearcher.search: > >> > >> score = yield( doc, score ) if block_given? > >> > >> This allows a block attached to a search call to adjust > >> document scores before documents are sorted, based on > >> some (possibly dynamic) numerical factors associated > >> with the document, e.g. the number and importance > > > > With this change you'd be able to modify the score based on the age of > > the document. Hope that helps. > > > > Cheers, > > Dave > > > > On 1/22/06, Ben Lee wrote: > >> I was wondering if there was a good way to either balance the > >> relevancy > >> score with recentness of matching documents- or include the > >> recentness > >> in the score somehow? > >> > >> Thanks, > >> Ben > >> > >> -- > >> Posted via http://www.ruby-forum.com/. > >> _______________________________________________ > >> Ferret-talk mailing list > >> Ferret-talk at rubyforge.org > >> http://rubyforge.org/mailman/listinfo/ferret-talk > >> > > > > _______________________________________________ > > Ferret-talk mailing list > > Ferret-talk at rubyforge.org > > http://rubyforge.org/mailman/listinfo/ferret-talk > > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > From oordopjes at hotmail.com Wed Jan 25 05:05:29 2006 From: oordopjes at hotmail.com (Joost) Date: Wed, 25 Jan 2006 11:05:29 +0100 Subject: [Ferret-talk] Ferret with IMAP dirs In-Reply-To: <000a4eb92375754d4c0f454b70576985@ruby-forum.com> References: <43C41897.8040003@dangerousideas.com> <674e6b47e05e868ae78d5b78f79e9062@ruby-forum.com> <91FEA35F-2E66-409A-8B05-D9BA47004F43@ehatchersolutions.com> <43C696A7.6060801@dangerousideas.com> <0998009b68866bc69a0b2ef9a9daedaa@ruby-forum.com> <5b56a5d485eec9f0bd4b39cdbfd73448@ruby-forum.com> <8a7a35eaa0d44213aa890d083b6158cc@ruby-forum.com> <774932d859a3d14419ed384fa19e73b4@ruby-forum.com> <69db56a9e39706c2746b48aceb09806e@ruby-forum.com> <000a4eb92375754d4c0f454b70576985@ruby-forum.com> Message-ID: <3eed96da91d7e2c5b7450d70f6ec4c78@ruby-forum.com> John, I am very interested in the Ruby-Ferret IMAP search tool. Did you already manage to index 2Gb of emails? Are you willing to share your code so I can also search thru my email? It's not yet 2Gb but keeps on growing :) Joost -- Posted via http://www.ruby-forum.com/. From weibel at gmail.com Wed Jan 25 08:23:06 2006 From: weibel at gmail.com (Kasper Weibel) Date: Wed, 25 Jan 2006 14:23:06 +0100 Subject: [Ferret-talk] acts_as_ferret now under MIT license Message-ID: Hi all After reading one of DHH's recent posts, I realized that it's important to release even small pieces of software under a license. As the author of the first incarnation of acts_as_ferret I felt the need to put it under a licensing scheme. MIT was chosen, as suggested by DHH. It's now included in the source listing on the ferret wiki. http://ferret.davebalmain.com/trac/wiki/FerretOnRails -- Posted via http://www.ruby-forum.com/. From lists at sourceillustrated.com Wed Jan 25 16:07:36 2006 From: lists at sourceillustrated.com (John Wells) Date: Wed, 25 Jan 2006 22:07:36 +0100 Subject: [Ferret-talk] Ferret with IMAP dirs In-Reply-To: <3eed96da91d7e2c5b7450d70f6ec4c78@ruby-forum.com> References: <43C41897.8040003@dangerousideas.com> <674e6b47e05e868ae78d5b78f79e9062@ruby-forum.com> <91FEA35F-2E66-409A-8B05-D9BA47004F43@ehatchersolutions.com> <43C696A7.6060801@dangerousideas.com> <0998009b68866bc69a0b2ef9a9daedaa@ruby-forum.com> <5b56a5d485eec9f0bd4b39cdbfd73448@ruby-forum.com> <8a7a35eaa0d44213aa890d083b6158cc@ruby-forum.com> <774932d859a3d14419ed384fa19e73b4@ruby-forum.com> <69db56a9e39706c2746b48aceb09806e@ruby-forum.com> <000a4eb92375754d4c0f454b70576985@ruby-forum.com> <3eed96da91d7e2c5b7450d70f6ec4c78@ruby-forum.com> Message-ID: <53489df3a62d606727cbf62d9b569984@ruby-forum.com> Joost wrote: > John, I am very interested in the Ruby-Ferret IMAP search tool. Did you > already manage to index 2Gb of emails? Are you willing to share your > code so I can also search thru my email? It's not yet 2Gb but keeps on > growing :) Hi Joost, Well, it's certainly not perfect code...more of a dirty hack to try it out. And, as noted, if I try to index the body it doesn't fair very well. That said, I'd be happy to share it. I'll post it later tonight when I have access to it. Thanks, John -- Posted via http://www.ruby-forum.com/. From admin at mad4dos.com Wed Jan 25 17:23:44 2006 From: admin at mad4dos.com (andrew yearp) Date: Wed, 25 Jan 2006 23:23:44 +0100 Subject: [Ferret-talk] problem with id's Message-ID: <0a4029a21c283b5da36105208f6ad90d@ruby-forum.com> please can some one advise to how to solve this problem : i am useing acts_as_ferret to index 3 active record models the problem is it's mixing up the id's from the models ie one model contains ids up to 45 and the other only up to 10 if i search on the smaller one it's gives the error item with id 45 not found is there any way to fix this ? -- Posted via http://www.ruby-forum.com/. From chan.joseph.w at gmail.com Wed Jan 25 19:14:52 2006 From: chan.joseph.w at gmail.com (Joseph Chan) Date: Thu, 26 Jan 2006 11:14:52 +1100 Subject: [Ferret-talk] Question on scalability Message-ID: Hi all, I've had a lot of fun over the last couple of days integrating Ferret with my Rails app. Many thanks to Dave Balmain for writing this. Everything seems to work well and my site now has full search capabilities. But I'm wondering how I can scale my site now? What would be the recommended approach in terms of sharing indexes across multiple servers? Is the solution to use a shared filesystem? Is it ok for multiple applications to update the indexes concurrently? Thanks, Joe -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20060126/93beb493/attachment.htm From lists at sourceillustrated.com Wed Jan 25 21:14:07 2006 From: lists at sourceillustrated.com (John Wells) Date: Thu, 26 Jan 2006 03:14:07 +0100 Subject: [Ferret-talk] Ferret with IMAP dirs In-Reply-To: <53489df3a62d606727cbf62d9b569984@ruby-forum.com> References: <43C41897.8040003@dangerousideas.com> <674e6b47e05e868ae78d5b78f79e9062@ruby-forum.com> <91FEA35F-2E66-409A-8B05-D9BA47004F43@ehatchersolutions.com> <43C696A7.6060801@dangerousideas.com> <0998009b68866bc69a0b2ef9a9daedaa@ruby-forum.com> <5b56a5d485eec9f0bd4b39cdbfd73448@ruby-forum.com> <8a7a35eaa0d44213aa890d083b6158cc@ruby-forum.com> <774932d859a3d14419ed384fa19e73b4@ruby-forum.com> <69db56a9e39706c2746b48aceb09806e@ruby-forum.com> <000a4eb92375754d4c0f454b70576985@ruby-forum.com> <3eed96da91d7e2c5b7450d70f6ec4c78@ruby-forum.com> <53489df3a62d606727cbf62d9b569984@ruby-forum.com> Message-ID: Ok...it's neither pretty nor clean nor idiomatic Ruby (I'm a nuby ;), but as a dirty hack it works (unless you fetch the body...that is). Let me know if you have any questions: #!/usr/bin/env ruby require 'rubygems' require 'ferret' include Ferret include Ferret::Document require 'net/imap' index = Index::Index.new(:path=>"/path/to/index/goes/here") $count = 0 $imap = Net::IMAP.new('server_ip_address_goes_here', 143, false) $imap.login('username_goes_here', 'password_goes_here') print $imap.examine("INBOX") def index_it(imapobj, index, box) imapobj.search(["ALL"]).each do |message_id| begin msg = imapobj.fetch(message_id, "(UID RFC822.SIZE ENVELOPE BODY[TEXT])")[0] envelope = msg.attr["ENVELOPE"] body = msg.attr["BODY[TEXT]"] uid = msg.attr["UID"] size = msg.attr["RFC822.SIZE"] date = envelope.date subject = envelope.subject if envelope.from != nil and envelope.from.size > 0 from = envelope.from[0].name end sender = envelope.sender to = envelope.to in_reply_to = envelope.in_reply_to doc = Document.new doc << Field.new("id", message_id, Field::Store::YES, Field::Index::TOKENIZED) doc << Field.new("body", body, Field::Store::NO, Field::Index::TOKENIZED) doc << Field.new("from", from, Field::Store::YES, Field::Index::TOKENIZED) doc << Field.new("subject", subject, Field::Store::YES, Field::Index::TOKENIZED) doc << Field.new("date", date, Field::Store::YES, Field::Index::TOKENIZED) doc << Field.new("uid", uid, Field::Store::YES, Field::Index::TOKENIZED) doc << Field.new("size", size, Field::Store::YES, Field::Index::TOKENIZED) doc << Field.new("sender", sender, Field::Store::YES, Field::Index::TOKENIZED) doc << Field.new("in_reply_to", in_reply_to, Field::Store::YES, Field::Index::TOKENIZED) doc << Field.new("mailbox", box, Field::Store::YES, Field::Index::UNTOKENIZED) index << doc $count = $count + 1 print "#{$count} : #{from} <==> #{subject}\n" $retry = 0 rescue => detail print detail print detail.backtrace.join("\n") print "Retrying" $retry = 1 + $retry if $retry < 20 retry else print "Retry threshold reached. Exiting..." exit!(99) end $retry = 0 end end end $imap.examine("INBOX") $imap.list("", "*").each do |box| name = box.name print "NAME: #{name}:#{box.class}\n" if name and name != "" and name !~/customflags/ begin $imap.select(name) index_it($imap, index, name) rescue => detail print "ERROR: " + detail.message + "\n" end end end -- Posted via http://www.ruby-forum.com/. From oordopjes at hotmail.com Thu Jan 26 08:38:39 2006 From: oordopjes at hotmail.com (Joost) Date: Thu, 26 Jan 2006 14:38:39 +0100 Subject: [Ferret-talk] Ferret with IMAP dirs In-Reply-To: References: <43C41897.8040003@dangerousideas.com> <674e6b47e05e868ae78d5b78f79e9062@ruby-forum.com> <91FEA35F-2E66-409A-8B05-D9BA47004F43@ehatchersolutions.com> <43C696A7.6060801@dangerousideas.com> <0998009b68866bc69a0b2ef9a9daedaa@ruby-forum.com> <5b56a5d485eec9f0bd4b39cdbfd73448@ruby-forum.com> <8a7a35eaa0d44213aa890d083b6158cc@ruby-forum.com> <774932d859a3d14419ed384fa19e73b4@ruby-forum.com> <69db56a9e39706c2746b48aceb09806e@ruby-forum.com> <000a4eb92375754d4c0f454b70576985@ruby-forum.com> <3eed96da91d7e2c5b7450d70f6ec4c78@ruby-forum.com> <53489df3a62d606727cbf62d9b569984@ruby-forum.com> Message-ID: <5ce5420a554410824ed820ed85c89580@ruby-forum.com> Hi John, Thanks for the quick reaction. I'm a nuby too :) At the moment I haven't got the time to look at the code.. when I have I'll certainly do. I hope there is a new version of Ferret out by then..so it'll work completely & fast. Thanks, Joost -- Posted via http://www.ruby-forum.com/. From f at andreas-s.net Thu Jan 26 08:42:17 2006 From: f at andreas-s.net (Andreas S.) Date: Thu, 26 Jan 2006 14:42:17 +0100 Subject: [Ferret-talk] Question on scalability In-Reply-To: References: Message-ID: <3ccf8bb7f6e473ed1f25817679ff86df@ruby-forum.com> Joseph Chan wrote: > What would be the recommended approach in terms of sharing indexes > across > multiple servers? I'd try using a DRb server for the index. It's very easy to implement and should avoid locking problems. > Is the solution to use a shared filesystem? Is > it ok > for multiple applications to update the indexes concurrently? In my experience, no. I often had problems with stale lockfiles and other weird behaviour, and I found it much more reliable to use only one writing process. -- Posted via http://www.ruby-forum.com/. From lists at sourceillustrated.com Thu Jan 26 08:53:00 2006 From: lists at sourceillustrated.com (John Wells) Date: Thu, 26 Jan 2006 14:53:00 +0100 Subject: [Ferret-talk] Ferret with IMAP dirs In-Reply-To: <5ce5420a554410824ed820ed85c89580@ruby-forum.com> References: <43C41897.8040003@dangerousideas.com> <674e6b47e05e868ae78d5b78f79e9062@ruby-forum.com> <91FEA35F-2E66-409A-8B05-D9BA47004F43@ehatchersolutions.com> <43C696A7.6060801@dangerousideas.com> <0998009b68866bc69a0b2ef9a9daedaa@ruby-forum.com> <5b56a5d485eec9f0bd4b39cdbfd73448@ruby-forum.com> <8a7a35eaa0d44213aa890d083b6158cc@ruby-forum.com> <774932d859a3d14419ed384fa19e73b4@ruby-forum.com> <69db56a9e39706c2746b48aceb09806e@ruby-forum.com> <000a4eb92375754d4c0f454b70576985@ruby-forum.com> <3eed96da91d7e2c5b7450d70f6ec4c78@ruby-forum.com> <53489df3a62d606727cbf62d9b569984@ruby-forum.com> <5ce5420a554410824ed820ed85c89580@ruby-forum.com> Message-ID: <5a6f9ef2a02664c113d16797210c2f2c@ruby-forum.com> Joost wrote: > Hi John, > > Thanks for the quick reaction. I'm a nuby too :) At the moment I haven't > got the time to look at the code.. when I have I'll certainly do. I hope > there is a new version of Ferret out by then..so it'll work completely & > fast. Ok... ;) Btw, that code only creates the index. You'll then have to implement code to search it, and you'll probably want it to dig out the UID for you. Here's a sample of a search: ############################################ #!/usr/bin/env ruby require 'rubygems' require 'ferret' include Ferret require 'net/imap' 50.times { print "-" }; print "\n" index = Index::Index.new(:path=>"/path/to/index/goes/here") index.search_each('body:"' + ARGV[0] + '"') do |doc, score| puts "Document #{doc} found with a score of #{score}" print index[doc]["from"] + " <--> " + index[doc]["subject"] + + index[doc]["uid"] + "\n" end 50.times { print "-" }; print "\n" ############################################ -- Posted via http://www.ruby-forum.com/. From nick.snels at gmail.com Thu Jan 26 16:11:33 2006 From: nick.snels at gmail.com (Nick Snels) Date: Thu, 26 Jan 2006 22:11:33 +0100 Subject: [Ferret-talk] How to handle non-ascii characters Message-ID: Hi, the last couple of days I'm trying to index some txt files. Once indexed I have the habit of checking the contents of the Ferret index with Luke. But everytime I tried to open the index I got a 'read past EOF' error. I managed to get it down to the way Ferret handles non-ascii characters. I have one txt file with the following content 'a o b c' and one with '? ? ? ?' . If I index the first one I can read the index perfectly, however when I index the second one I get the EOF error. The error is with the standard and whitespace analyzers. The stop analyzer just ignores these characters. How can I solve this, so that Ferret handles these 'special' characters correctly. Thanks. Kind regards, Nick -- Posted via http://www.ruby-forum.com/. From dbalmain.ml at gmail.com Thu Jan 26 22:09:35 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Fri, 27 Jan 2006 12:09:35 +0900 Subject: [Ferret-talk] How to handle non-ascii characters In-Reply-To: References: Message-ID: Hi Nick, Sorry but this is due to an incompatibilities with the index. It's complicated but basically, Ferret counts string lengths in bytes while Lucene sometimes uses number of characters. I do plan to fix this in the future but it could be a month or two. Hope you can wait that long. Cheers, Dave On 1/27/06, Nick Snels wrote: > Hi, > > the last couple of days I'm trying to index some txt files. Once indexed > I have the habit of checking the contents of the Ferret index with Luke. > But everytime I tried to open the index I got a 'read past EOF' error. I > managed to get it down to the way Ferret handles non-ascii characters. I > have one txt file with the following content 'a o b c' and one with '? ? > ? ?' . If I index the first one I can read the index perfectly, however > when I index the second one I get the EOF error. The error is with the > standard and whitespace analyzers. The stop analyzer just ignores these > characters. How can I solve this, so that Ferret handles these 'special' > characters correctly. Thanks. > > Kind regards, > > Nick > > -- > Posted via http://www.ruby-forum.com/. > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk From nick.snels at gmail.com Fri Jan 27 05:29:21 2006 From: nick.snels at gmail.com (Nick Snels) Date: Fri, 27 Jan 2006 11:29:21 +0100 Subject: [Ferret-talk] How to handle non-ascii characters In-Reply-To: References: Message-ID: Hi David, good to hear that it will be fixed in the near future. For me personally it doesn't matter that it takes a month or two. I have tons of other stuff I have to add, before it is finished. Will this be around the same period that cFerret will be ready for prime time? Kind regards, Nick -- Posted via http://www.ruby-forum.com/. From dbalmain.ml at gmail.com Fri Jan 27 06:20:07 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Fri, 27 Jan 2006 20:20:07 +0900 Subject: [Ferret-talk] How to handle non-ascii characters In-Reply-To: References: Message-ID: On 1/27/06, Nick Snels wrote: > Hi David, > > good to hear that it will be fixed in the near future. For me personally > it doesn't matter that it takes a month or two. I have tons of other > stuff I have to add, before it is finished. Will this be around the same > period that cFerret will be ready for prime time? Hopefully cFerret will be finished before then. I just have to finish implementing span queries and threading and then I'll be ready to start adding the ruby bindings. The fix to make the indexes of Ferret and Lucene compatible will hopefully involve a patch to Lucene rather than a fix to Ferret but I may have difficulty getting it accepted. I realize index compatibility with Lucene is a show stopper for many people so it's definitely high priority and I'll get it done one way or another. Dave > Kind regards, > > Nick > > > -- > Posted via http://www.ruby-forum.com/. > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > From atomgiant at gmail.com Fri Jan 27 08:10:03 2006 From: atomgiant at gmail.com (Tom Davies) Date: Fri, 27 Jan 2006 08:10:03 -0500 Subject: [Ferret-talk] Using ID as Key Message-ID: Hi, I followed the howto to use keys for documents: http://ferret.davebalmain.com/trac/wiki/HowTos#Howtousekeysfordocument If I add two documents with the same id, only one gets added to the index as expected. However, I have found the key and id do not match. So, attempting to access the index with the id does not work. For instance, when I run this search: INDEX.search_each(query) do |doc, score| logger.debug("Found doc: #{doc}, id: #{INDEX[doc]['id']}") end The following is output: Found doc: 3, id: 69 Found doc: 17, id: 88 Is this as designed or am I missing something? Thanks, Tom From atomgiant at gmail.com Fri Jan 27 08:34:08 2006 From: atomgiant at gmail.com (Tom Davies) Date: Fri, 27 Jan 2006 08:34:08 -0500 Subject: [Ferret-talk] Grouping results Message-ID: I have a general question about using a Ferret/Lucene index for grouping results. I am not sure how much of the heavy lifting the index can do for me, so I would appreciate any input. I am using ferret to index some objects that have the following properties: url, image_url, price, tags (space separated tags), created_at I would like search the index for any documents that match a specific tag. The way these results will be processed is as follows: Each URL must be unique in the results. If there are duplicates, I would like to merge the results using some fuzzy merge criteria. Ideally, this merge would take the most common occurrence of each of the properties and apply them to the final single result. My current thoughts on how to implement this is to search the index using a standard search and sorting by the URL. Then I will just manually apply the merge logic to each set of URLs. Does this sound reasonable? Thanks, Tom From erik at ehatchersolutions.com Fri Jan 27 13:33:17 2006 From: erik at ehatchersolutions.com (Erik Hatcher) Date: Fri, 27 Jan 2006 13:33:17 -0500 Subject: [Ferret-talk] Using ID as Key In-Reply-To: References: Message-ID: <1B958E7E-22CA-4206-9478-EED63BAC58A8@ehatchersolutions.com> On Jan 27, 2006, at 8:10 AM, Tom Davies wrote: > I followed the howto to use keys for documents: > > http://ferret.davebalmain.com/trac/wiki/HowTos#Howtousekeysfordocument > > If I add two documents with the same id, only one gets added to the > index as expected. However, I have found the key and id do not match. > So, attempting to access the index with the id does not work. > > For instance, when I run this search: > > INDEX.search_each(query) do |doc, score| > logger.debug("Found doc: #{doc}, id: #{INDEX[doc]['id']}") > end > > The following is output: > > Found doc: 3, id: 69 > Found doc: 17, id: 88 > > Is this as designed or am I missing something? The doc variable in your code is what is known in Lucene as the document "id". This is an internal number used by the index. It has no relation to the primary key feature that Ferret adds. You've called your field "id", which confuses things a bit. The document id is subject to change, if documents are deleted in the middle and the index is optimized. So don't rely on the internal number for anything long-term. Erik From atomgiant at gmail.com Fri Jan 27 17:11:38 2006 From: atomgiant at gmail.com (Tom Davies) Date: Fri, 27 Jan 2006 17:11:38 -0500 Subject: [Ferret-talk] Using ID as Key In-Reply-To: <1B958E7E-22CA-4206-9478-EED63BAC58A8@ehatchersolutions.com> References: <1B958E7E-22CA-4206-9478-EED63BAC58A8@ehatchersolutions.com> Message-ID: Hi Erik, Thanks for your response. Perhaps I am misunderstanding the how to, but it implies that when you create an index and map the key to the id as follows: index = Index::Index.new(:key => :id) index << {:id => 23, :data => "This is the data..."} index << {:id => 23, :data => "This is the new data..."} Then you can access this document by using either of the following: index["23"] #Get document with key 23 index[23] #Get document with internal number 23. It is NOT key field. It is just internal Ferret id. This implies that the id and key are the same, but according to my first email example, they are not. Is this howto just misleading? Based on what you said, the internal number will not necessarily match the key. Tom On 1/27/06, Erik Hatcher wrote: > > On Jan 27, 2006, at 8:10 AM, Tom Davies wrote: > > I followed the howto to use keys for documents: > > > > http://ferret.davebalmain.com/trac/wiki/HowTos#Howtousekeysfordocument > > > > If I add two documents with the same id, only one gets added to the > > index as expected. However, I have found the key and id do not match. > > So, attempting to access the index with the id does not work. > > > > For instance, when I run this search: > > > > INDEX.search_each(query) do |doc, score| > > logger.debug("Found doc: #{doc}, id: #{INDEX[doc]['id']}") > > end > > > > The following is output: > > > > Found doc: 3, id: 69 > > Found doc: 17, id: 88 > > > > Is this as designed or am I missing something? > > The doc variable in your code is what is known in Lucene as the > document "id". This is an internal number used by the index. It has > no relation to the primary key feature that Ferret adds. You've > called your field "id", which confuses things a bit. > > The document id is subject to change, if documents are deleted in the > middle and the index is optimized. So don't rely on the internal > number for anything long-term. > > Erik > > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > From dbalmain.ml at gmail.com Fri Jan 27 23:09:09 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Sat, 28 Jan 2006 13:09:09 +0900 Subject: [Ferret-talk] Using ID as Key In-Reply-To: References: <1B958E7E-22CA-4206-9478-EED63BAC58A8@ehatchersolutions.com> Message-ID: Hi Tom, I can see how this would be confusing. The internal id and the id you give a document are unrelated and they'll only be the same like this when you add documents in order starting with id 0. I'll change the howto to remove the confusion. Cheers, Dave On 1/28/06, Tom Davies wrote: > Hi Erik, > > Thanks for your response. Perhaps I am misunderstanding the how to, > but it implies that when you create an index and map the key to the id > as follows: > > index = Index::Index.new(:key => :id) > index << {:id => 23, :data => "This is the data..."} > index << {:id => 23, :data => "This is the new data..."} > > Then you can access this document by using either of the following: > > index["23"] #Get document with key 23 > index[23] #Get document with internal number 23. It is NOT key > field. It is just internal Ferret id. > > This implies that the id and key are the same, but according to my > first email example, they are not. Is this howto just misleading? > Based on what you said, the internal number will not necessarily match > the key. > > Tom > > > On 1/27/06, Erik Hatcher wrote: > > > > On Jan 27, 2006, at 8:10 AM, Tom Davies wrote: > > > I followed the howto to use keys for documents: > > > > > > http://ferret.davebalmain.com/trac/wiki/HowTos#Howtousekeysfordocument > > > > > > If I add two documents with the same id, only one gets added to the > > > index as expected. However, I have found the key and id do not match. > > > So, attempting to access the index with the id does not work. > > > > > > For instance, when I run this search: > > > > > > INDEX.search_each(query) do |doc, score| > > > logger.debug("Found doc: #{doc}, id: #{INDEX[doc]['id']}") > > > end > > > > > > The following is output: > > > > > > Found doc: 3, id: 69 > > > Found doc: 17, id: 88 > > > > > > Is this as designed or am I missing something? > > > > The doc variable in your code is what is known in Lucene as the > > document "id". This is an internal number used by the index. It has > > no relation to the primary key feature that Ferret adds. You've > > called your field "id", which confuses things a bit. > > > > The document id is subject to change, if documents are deleted in the > > middle and the index is optimized. So don't rely on the internal > > number for anything long-term. > > > > Erik > > > > _______________________________________________ > > Ferret-talk mailing list > > Ferret-talk at rubyforge.org > > http://rubyforge.org/mailman/listinfo/ferret-talk > > > > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > From dbalmain.ml at gmail.com Fri Jan 27 23:25:22 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Sat, 28 Jan 2006 13:25:22 +0900 Subject: [Ferret-talk] Grouping results In-Reply-To: References: Message-ID: On 1/27/06, Tom Davies wrote: > I have a general question about using a Ferret/Lucene index for > grouping results. I am not sure how much of the heavy lifting the > index can do for me, so I would appreciate any input. I am using > ferret to index some objects that have the following properties: > > url, image_url, price, tags (space separated tags), created_at > > I would like search the index for any documents that match a specific > tag. The way these results will be processed is as follows: > > Each URL must be unique in the results. If there are duplicates, I > would like to merge the results using some fuzzy merge criteria. > Ideally, this merge would take the most common occurrence of each of > the properties and apply them to the final single result. > > My current thoughts on how to implement this is to search the index > using a standard search and sorting by the URL. Then I will just > manually apply the merge logic to each set of URLs. > > Does this sound reasonable? Hi Tom, That sounds like the way I'd probably do it. I don't know if this will help but did you know that documents can contain multiple fields with the same name? So effectively you could store a unique document for each URL and store an array of image_urls, prices and tags in that document. Hope that helps, Dave From atomgiant at gmail.com Sat Jan 28 07:27:32 2006 From: atomgiant at gmail.com (Tom Davies) Date: Sat, 28 Jan 2006 07:27:32 -0500 Subject: [Ferret-talk] Grouping results In-Reply-To: References: Message-ID: Thanks Dave. Actually I did not know that. That may be a useful feature. The only problem I forsee is how to remove a reference to each of those properties from the array when a document is deleted. I will give it some more thought, but it is nice to have options. Thanks again, Tom On 1/27/06, David Balmain wrote: > On 1/27/06, Tom Davies wrote: > > I have a general question about using a Ferret/Lucene index for > > grouping results. I am not sure how much of the heavy lifting the > > index can do for me, so I would appreciate any input. I am using > > ferret to index some objects that have the following properties: > > > > url, image_url, price, tags (space separated tags), created_at > > > > I would like search the index for any documents that match a specific > > tag. The way these results will be processed is as follows: > > > > Each URL must be unique in the results. If there are duplicates, I > > would like to merge the results using some fuzzy merge criteria. > > Ideally, this merge would take the most common occurrence of each of > > the properties and apply them to the final single result. > > > > My current thoughts on how to implement this is to search the index > > using a standard search and sorting by the URL. Then I will just > > manually apply the merge logic to each set of URLs. > > > > Does this sound reasonable? > > Hi Tom, > > That sounds like the way I'd probably do it. I don't know if this will > help but did you know that documents can contain multiple fields with > the same name? So effectively you could store a unique document for > each URL and store an array of image_urls, prices and tags in that > document. > > Hope that helps, > Dave > > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > From marcora at caltech.edu Sun Jan 29 19:04:24 2006 From: marcora at caltech.edu (Edoardo Marcora) Date: Mon, 30 Jan 2006 01:04:24 +0100 Subject: [Ferret-talk] Example on how to boost a field? Message-ID: <1ca775d4188b3cf9ea0749256ac8d6f9@ruby-forum.com> I am wondering whether one could boost certain fields (e.g., title and keywords) to make them 'weigh' more against searches across all fields. I know that Document::Field has a boost attribute and that it accepts a boost argument in the constructor (although I've had not much luck with it, I am just getting back exceptions upon init w/ boost), but I was wondering whether the boost parameter could be set on the index or search and not at indexing time on the doc field. Moreover, I see that the default is set to 1.0 but what's the max? What are reasonable values for the boost parameter. How would boosting up to 1.1 or 10 or 100 affect the search results? Thank you in advance for your consideration, Dado -- Posted via http://www.ruby-forum.com/. From dbalmain.ml at gmail.com Mon Jan 30 02:55:58 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Mon, 30 Jan 2006 16:55:58 +0900 Subject: [Ferret-talk] Example on how to boost a field? In-Reply-To: <1ca775d4188b3cf9ea0749256ac8d6f9@ruby-forum.com> References: <1ca775d4188b3cf9ea0749256ac8d6f9@ruby-forum.com> Message-ID: Hi Dado, Could you give an example of your code that is causing exceptions? If you want to see the effect that a boost has on the results you can use IndexSearcher#explain(query, doc) to see how the boost effects the query score. There is no maximum or minimum value for the boost. Hope that helps, Dave On 1/30/06, Edoardo Marcora wrote: > I am wondering whether one could boost certain fields (e.g., title and > keywords) to make them 'weigh' more against searches across all fields. > > I know that Document::Field has a boost attribute and that it accepts a > boost argument in the constructor (although I've had not much luck with > it, I am just getting back exceptions upon init w/ boost), but I was > wondering whether the boost parameter could be set on the index or > search and not at indexing time on the doc field. Moreover, I see that > the default is set to 1.0 but what's the max? What are reasonable values > for the boost parameter. How would boosting up to 1.1 or 10 or 100 > affect the search results? > > Thank you in advance for your consideration, > > Dado > > -- > Posted via http://www.ruby-forum.com/. > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk >