From ehabkost at raisama.net Thu Nov 13 13:29:25 2008 From: ehabkost at raisama.net (Eduardo Habkost) Date: Thu, 13 Nov 2008 16:29:25 -0200 Subject: [Ferret-talk] Ferret crash when using lazy_doc after closing IndexReader Message-ID: <20081113182925.GQ4481@blackpad> Hi, As the Ferret site is down for a while, I am reporting this bug here, so it gets documented somewhere and people with more experience with Ferret can comment. I was hitting this crash easily on Sup, that uses Ferret for its index[1]. After some investigation I've found the cause of the crash, but I don't know what would be the best behaviour for Ferret on this case. The crash happens when the IndexReader object where a lazy_doc was loaded from gets closed. After closing the IndexReader, trying to get a field from the lazy_doc will trigger a read from a closed and freed InputStream, sometimes causing segfaults, sometimes causing spurious I/O errors. I've initially seen the bug using Ferret 0.11.6, but I've tested this using Ferret from the git repository[2], and it happens there, also. Below is a simple script that will trigger the crash: ======================================== # Example A require "ferret" p = "/tmp/ferret-test.#$$" puts "Using #{p} as storage" i = Ferret::Index::Index.new(:path => p) i << { :body => "Loren ipsum dolor "*1000 } doc = i[0] # this will cause the IndexReader to be closed by Ferret::Index::Index i << { :body => "another document" } puts doc[:body] ======================================== It happens because writing to the Ferret index will close the IndexReader. A simpler code that trigger the crash is: ======================================== # Example B require "ferret" p = "/tmp/ferret-test.#$$" puts "Using #{p} as storage" puts "Generating a simple index" i = Ferret::Index::Index.new(:path => p) i << { :body => "Loren ipsum dolor "*1000 } i.close puts "Closed it. Will reopen and use it" i = Ferret::Index::IndexReader.new(p) doc = i[0] i.close puts doc[:body] ======================================== I see two issues here: The first one is the crash itself: what should happen to loaded lazy_docs when an IndexReader is closed? Lucene documentation[3] says an exception may be thrown on these cases. The same behavior could be the proper fix for Ferret on Example B, that can be considered invalid usage of the IndexReader anyway. The second issue is what should be the behaviour of Ferret::Index::Index after writing to the index with documents loaded (Example A). Should it really invalidate all lazy_docs read from the index on every write? That is the current behavior because its IndexReader is always closed when writing to the index, but I wonder if it is really desired. [1] http://rubyforge.org/pipermail/sup-talk/2008-November/001782.html [2] http://github.com/dbalmain/ferret [3] http://lucene.apache.org/java/2_4_0/api/core/org/apache/lucene/index/IndexReader.html#document(int,%20org.apache.lucene.document.FieldSelector) -- Eduardo From lyesjob at gmail.com Sun Nov 30 03:49:43 2008 From: lyesjob at gmail.com (Lyes Amazouz) Date: Sun, 30 Nov 2008 09:49:43 +0100 Subject: [Ferret-talk] Need some information about Ferret Message-ID: <60d886530811300049i26d2ac42w8a0bc6a268a6ca2f@mail.gmail.com> Hi everybody! In our company, we want to use Ferret as the main index/search engine of our applications. And we are looking for some testimonies about how Ferret is efficient when deployed in production. * Was Ferret already deployed in production in some companies? is there some testimonies about that? * What is the maximum number of documents we can index with ferret? Has some one informations about that. * What is the best way to access a very huge Ferret Index? May we distribute it on several machines or not? By the way, can Ferret read Solr indexes as they are both clones of luceen? thank you -- =========== | Lyes Amazouz | USTHB, Algiers =========== -------------- next part -------------- An HTML attachment was scrubbed... URL: From erik at ehatchersolutions.com Sun Nov 30 04:48:21 2008 From: erik at ehatchersolutions.com (Erik Hatcher) Date: Sun, 30 Nov 2008 04:48:21 -0500 Subject: [Ferret-talk] Need some information about Ferret In-Reply-To: <60d886530811300049i26d2ac42w8a0bc6a268a6ca2f@mail.gmail.com> References: <60d886530811300049i26d2ac42w8a0bc6a268a6ca2f@mail.gmail.com> Message-ID: <54218BA4-A5E2-42A2-B6A6-C0FF7E875C35@ehatchersolutions.com> On Nov 30, 2008, at 3:49 AM, Lyes Amazouz wrote: > By the way, can Ferret read Solr indexes as they are both clones of > luceen? No. While Ferret was designed around the Lucene index file format, it is not compatible with Java Lucene (and thus Solr). Erik From lyesjob at gmail.com Sun Nov 30 10:54:22 2008 From: lyesjob at gmail.com (Lyes Amazouz) Date: Sun, 30 Nov 2008 16:54:22 +0100 Subject: [Ferret-talk] Need some information about Ferret In-Reply-To: <54218BA4-A5E2-42A2-B6A6-C0FF7E875C35@ehatchersolutions.com> References: <60d886530811300049i26d2ac42w8a0bc6a268a6ca2f@mail.gmail.com> <54218BA4-A5E2-42A2-B6A6-C0FF7E875C35@ehatchersolutions.com> Message-ID: <60d886530811300754t47c9efb1uf3b3f4e5c18c0330@mail.gmail.com> On Sun, Nov 30, 2008 at 10:48 AM, Erik Hatcher wrote: > > On Nov 30, 2008, at 3:49 AM, Lyes Amazouz wrote: > >> By the way, can Ferret read Solr indexes as they are both clones of >> luceen? >> > > No. While Ferret was designed around the Lucene index file format, it is > not compatible with Java Lucene (and thus Solr). > > Erik > > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > Hello Eik! thank you for the information. But is there a mean to recover an existing Solr index content and reindex it with Ferret? -- =========== | Lyes Amazouz | USTHB, Algiers =========== -------------- next part -------------- An HTML attachment was scrubbed... URL: From erik at ehatchersolutions.com Sun Nov 30 12:04:17 2008 From: erik at ehatchersolutions.com (Erik Hatcher) Date: Sun, 30 Nov 2008 12:04:17 -0500 Subject: [Ferret-talk] Need some information about Ferret In-Reply-To: <60d886530811300754t47c9efb1uf3b3f4e5c18c0330@mail.gmail.com> References: <60d886530811300049i26d2ac42w8a0bc6a268a6ca2f@mail.gmail.com> <54218BA4-A5E2-42A2-B6A6-C0FF7E875C35@ehatchersolutions.com> <60d886530811300754t47c9efb1uf3b3f4e5c18c0330@mail.gmail.com> Message-ID: <1C29582C-07C7-43A1-BBDB-984587E0C3C3@ehatchersolutions.com> On Nov 30, 2008, at 10:54 AM, Lyes Amazouz wrote: > > thank you for the information. But is there a mean to recover an > existing Solr index content and reindex it with Ferret? It'll probably be easier and faster to reindex your original content, which presumably you still have handy. But... you'd have to have your fields "stored" in Solr for them to be recoverable. Using solr-ruby's Solr::Importer::SolrSource would makes it easy to iterate over all documents in Solr (using a query of *:*). But why move from Solr to Ferret? Erik