From aslak.hellesoy at gmail.com Wed Nov 16 21:57:22 2005 From: aslak.hellesoy at gmail.com (aslak hellesoy) Date: Wed, 16 Nov 2005 21:57:22 -0500 Subject: [Ferret-talk] lock problems from concurrent processes. Message-ID: <8d961d900511161857v4a53b40co6a786088a343ff0f@mail.gmail.com> Hi! First, thanks a LOT for ferret. The API and documentation is great. I'm trying to integrate ferret into a RoR app (DamageControl) and have run into a problem with locks. DamageControl consists of two processes that start up and run in parallel. The first one is the webapp (which is just a plain RoR app). The second is a daemon process that runs in the background. The daemon process writes to the index, and the webapp reads from it. It's the same index, stored in the same directory. My problem is that the webapp gets lock errors: /usr/lib/ruby/gems/1.8/gems/ferret-0.2.1/lib/ferret/store/fs_store.rb:226:in `obtain': could not obtain lock: /Users/aslakhellesoy/scm/dc_svn/branches/damagecontrol_active_record/testdata/index/ferret-11f222dc32bbe019198a2b42644196f9write.lock (RuntimeError) from /usr/lib/ruby/gems/1.8/gems/ferret-0.2.1/lib/ferret/index/index_writer.rb:100:in `initialize' from /usr/lib/ruby/gems/1.8/gems/ferret-0.2.1/lib/ferret/index/index.rb:111:in `new' from /usr/lib/ruby/gems/1.8/gems/ferret-0.2.1/lib/ferret/index/index.rb:111:in `initialize' from /usr/lib/ruby/gems/1.8/gems/ferret-0.2.1/lib/ferret/index/index.rb:109:in `synchronize' from /usr/lib/ruby/gems/1.8/gems/ferret-0.2.1/lib/ferret/index/index.rb:109:in `initialize' from ./lib/damagecontrol/ferret_config.rb:6:in `new' from ./lib/damagecontrol/ferret_config.rb:6:in `get_index' from ./lib/damagecontrol/build_daemon.rb:10 Is it possible to create a 'read-only' index that doesn't try to acquire a lock? Or is there a different way to achieve concurrent access to an index from different processes where one of them is only writing and the other is only reading? Cheers, Aslak From dbalmain.ml at gmail.com Thu Nov 17 08:03:54 2005 From: dbalmain.ml at gmail.com (David Balmain) Date: Thu, 17 Nov 2005 22:03:54 +0900 Subject: [Ferret-talk] lock problems from concurrent processes. In-Reply-To: <8d961d900511161857v4a53b40co6a786088a343ff0f@mail.gmail.com> References: <8d961d900511161857v4a53b40co6a786088a343ff0f@mail.gmail.com> Message-ID: Hi Aslak, Great to hear you are integrating Ferret into DamageControl. I'll try to be as much help as possible. Ferret is designed to work with multiple processes accessing the index (hence the locks) so this problem shouldn't be too hard to solve. You have two options. The first might be a little easier but performance want be as good. That is to flush the index after you do a write so that it won't hold the lock for an extended period of time. See Ferret::Index::Index#flush(). This is the best solution if multiple processes are reading and writing. The second option shouldn't be too difficult either although I haven't documented it very well yet. That is to use Index::IndexWriter for writing to the indexing and Search::IndexSearcher for searching the index. Actually, you could continue to use Index::Index for the process that is writing to the index and Index::Searcher for the read only process. Index::IndexSearcher will never open any locks. Probably the best place to look for examples of how to use Index::IndexWriter and Search::IndexSearcher is actually within the Index::Index class itself. Hope this helps. Cheers, Dave PS: One thing I should mention is that deletes actually happen through Index::IndexReader. This probably seems a little confusing. It did to me to start with anyway. Again, check out the code in Index::Index to see how it handles deletes. On 11/17/05, aslak hellesoy wrote: > Hi! > > First, thanks a LOT for ferret. The API and documentation is great. > > I'm trying to integrate ferret into a RoR app (DamageControl) and have > run into a problem with locks. > DamageControl consists of two processes that start up and run in > parallel. The first one is the webapp (which > is just a plain RoR app). The second is a daemon process that runs in > the background. > > The daemon process writes to the index, and the webapp reads from it. > It's the same index, stored in the same directory. > > My problem is that the webapp gets lock errors: > > /usr/lib/ruby/gems/1.8/gems/ferret-0.2.1/lib/ferret/store/fs_store.rb:226:in > `obtain': could not obtain lock: > /Users/aslakhellesoy/scm/dc_svn/branches/damagecontrol_active_record/testdata/index/ferret-11f222dc32bbe019198a2b42644196f9write.lock > (RuntimeError) > from /usr/lib/ruby/gems/1.8/gems/ferret-0.2.1/lib/ferret/index/index_writer.rb:100:in > `initialize' > from /usr/lib/ruby/gems/1.8/gems/ferret-0.2.1/lib/ferret/index/index.rb:111:in > `new' > from /usr/lib/ruby/gems/1.8/gems/ferret-0.2.1/lib/ferret/index/index.rb:111:in > `initialize' > from /usr/lib/ruby/gems/1.8/gems/ferret-0.2.1/lib/ferret/index/index.rb:109:in > `synchronize' > from /usr/lib/ruby/gems/1.8/gems/ferret-0.2.1/lib/ferret/index/index.rb:109:in > `initialize' > from ./lib/damagecontrol/ferret_config.rb:6:in `new' > from ./lib/damagecontrol/ferret_config.rb:6:in `get_index' > from ./lib/damagecontrol/build_daemon.rb:10 > > Is it possible to create a 'read-only' index that doesn't try to > acquire a lock? Or is there a different way to achieve concurrent > access to an index from different processes where one of them is only > writing and the other is only reading? > > Cheers, > Aslak > > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > From aslak.hellesoy at gmail.com Thu Nov 17 09:35:42 2005 From: aslak.hellesoy at gmail.com (aslak hellesoy) Date: Thu, 17 Nov 2005 09:35:42 -0500 Subject: [Ferret-talk] lock problems from concurrent processes. In-Reply-To: References: <8d961d900511161857v4a53b40co6a786088a343ff0f@mail.gmail.com> Message-ID: <8d961d900511170635y2bb9ff4ej9c6578f0afe4bd94@mail.gmail.com> On 11/17/05, David Balmain wrote: > Hi Aslak, > > Great to hear you are integrating Ferret into DamageControl. I'll try > to be as much help as possible. Ferret is designed to work with > multiple processes accessing the index (hence the locks) so this > problem shouldn't be too hard to solve. You have two options. > > The first might be a little easier but performance want be as good. > That is to flush the index after you do a write so that it won't hold > the lock for an extended period of time. See > Ferret::Index::Index#flush(). This is the best solution if multiple > processes are reading and writing. > > The second option shouldn't be too difficult either although I haven't > documented it very well yet. That is to use Index::IndexWriter for > writing to the indexing and Search::IndexSearcher for searching the > index. Actually, you could continue to use Index::Index for the > process that is writing to the index and Index::Searcher for the read > only process. Index::IndexSearcher will never open any locks. Probably > the best place to look for examples of how to use Index::IndexWriter > and Search::IndexSearcher is actually within the Index::Index class > itself. > Great - this makes a lot of sense. > Hope this helps. > Cheers, > Dave > > PS: One thing I should mention is that deletes actually happen through > Index::IndexReader. This probably seems a little confusing. It did to > me to start with anyway. Something you wrote was confusing for you? Would it be possible to make it a bit more intuitive? Ferret is one of the best-written Ruby frameworks I have seen so far (both in implementation and API design), so you might as well shoot for complete excellence :-) > Again, check out the code in Index::Index to > see how it handles deletes. > > > On 11/17/05, aslak hellesoy wrote: > > Hi! > > > > First, thanks a LOT for ferret. The API and documentation is great. > > > > I'm trying to integrate ferret into a RoR app (DamageControl) and have > > run into a problem with locks. > > DamageControl consists of two processes that start up and run in > > parallel. The first one is the webapp (which > > is just a plain RoR app). The second is a daemon process that runs in > > the background. > > > > The daemon process writes to the index, and the webapp reads from it. > > It's the same index, stored in the same directory. > > > > My problem is that the webapp gets lock errors: > > > > /usr/lib/ruby/gems/1.8/gems/ferret-0.2.1/lib/ferret/store/fs_store.rb:226:in > > `obtain': could not obtain lock: > > /Users/aslakhellesoy/scm/dc_svn/branches/damagecontrol_active_record/testdata/index/ferret-11f222dc32bbe019198a2b42644196f9write.lock > > (RuntimeError) > > from /usr/lib/ruby/gems/1.8/gems/ferret-0.2.1/lib/ferret/index/index_writer.rb:100:in > > `initialize' > > from /usr/lib/ruby/gems/1.8/gems/ferret-0.2.1/lib/ferret/index/index.rb:111:in > > `new' > > from /usr/lib/ruby/gems/1.8/gems/ferret-0.2.1/lib/ferret/index/index.rb:111:in > > `initialize' > > from /usr/lib/ruby/gems/1.8/gems/ferret-0.2.1/lib/ferret/index/index.rb:109:in > > `synchronize' > > from /usr/lib/ruby/gems/1.8/gems/ferret-0.2.1/lib/ferret/index/index.rb:109:in > > `initialize' > > from ./lib/damagecontrol/ferret_config.rb:6:in `new' > > from ./lib/damagecontrol/ferret_config.rb:6:in `get_index' > > from ./lib/damagecontrol/build_daemon.rb:10 > > > > Is it possible to create a 'read-only' index that doesn't try to > > acquire a lock? Or is there a different way to achieve concurrent > > access to an index from different processes where one of them is only > > writing and the other is only reading? > > > > Cheers, > > Aslak > > > > _______________________________________________ > > Ferret-talk mailing list > > Ferret-talk at rubyforge.org > > http://rubyforge.org/mailman/listinfo/ferret-talk > > > From aslak.hellesoy at gmail.com Thu Nov 17 09:43:22 2005 From: aslak.hellesoy at gmail.com (aslak hellesoy) Date: Thu, 17 Nov 2005 09:43:22 -0500 Subject: [Ferret-talk] indexing source code Message-ID: <8d961d900511170643n3f73ea9bsb7e8b59fdc68a317@mail.gmail.com> Hi again, I'm using ferret to index source code - DamageControl will allow users to search for text in source code. Currently I'm using the default index with no custom analyzer (I'm using the StandardAnalyzer). Do you have any recommendations about how to write an analyzer that will index source code in a more 'optimal' way? I.e. disregard common sourcecode tokens and take into account dots and such when tokenizing. For example, if the source code looks like this: def foo(bar) bar.zap() end searching for 'def' should not match (too common). searching for 'zap' should match (even if it's not surrounded by spaces, but 'ignorable characters'. Also, it might make sense to use different analyzers for different source code types (java, ruby, haskell etc). Some hints for this would be great. Cheers, Aslak From erik at ehatchersolutions.com Thu Nov 17 10:05:31 2005 From: erik at ehatchersolutions.com (Erik Hatcher) Date: Thu, 17 Nov 2005 10:05:31 -0500 Subject: [Ferret-talk] lock problems from concurrent processes. In-Reply-To: <8d961d900511170635y2bb9ff4ej9c6578f0afe4bd94@mail.gmail.com> References: <8d961d900511161857v4a53b40co6a786088a343ff0f@mail.gmail.com> <8d961d900511170635y2bb9ff4ej9c6578f0afe4bd94@mail.gmail.com> Message-ID: <07C0EF6F-B496-4C23-AF64-3822C3A5A902@ehatchersolutions.com> On 17 Nov 2005, at 09:35, aslak hellesoy wrote: >> PS: One thing I should mention is that deletes actually happen >> through >> Index::IndexReader. This probably seems a little confusing. It did to >> me to start with anyway. >> > > Something you wrote was confusing for you? Would it be possible to > make it a bit more intuitive? > > Ferret is one of the best-written Ruby frameworks I have seen so far > (both in implementation and API design), so you might as well shoot > for complete excellence :-) Dave has indeed done an amazing job of porting Lucene! The confusing aspect here is that his port is faithful enough to pass through a confusing piece of the Java Lucene API. His Index::Index class is not really part of Java Lucene, so you're getting an extra bonus there instead of dealing with IndexWriter, IndexReader, and IndexSearcher directly. IndexReader in Java Lucene is used for reading _and_ deleting documents - this is just the nature of the beast. Deleting a document in Lucene merely flags it as deleted and doesn't actual remove anything - thus the IndexReader facility is used for this operation, not the IndexWriter which has a much more substantial role in indexing new documents. Erik From dbalmain.ml at gmail.com Thu Nov 17 21:31:52 2005 From: dbalmain.ml at gmail.com (David Balmain) Date: Fri, 18 Nov 2005 11:31:52 +0900 Subject: [Ferret-talk] Fwd: indexing source code In-Reply-To: References: <8d961d900511170643n3f73ea9bsb7e8b59fdc68a317@mail.gmail.com> Message-ID: Hi Aslak, Ignoring things like def is easy using the stop filter. Something like this; include Ferret analyzer = Analysis::StandardAnalyzer.new(["def", "class", "end", "module"]) index = Index::Index({:analyzer => analyzer}) But the standard analyzer is probably not the best to use. It will parse name.split.each as one token. I'm not sure that this is what you want. I'd use a simple letter tokenizer but I'd add a few other symbols. Probably something like this; class CodeTokenizer < Analysis::RegExpTokenizer protected # Collects only characters which satisfy the regular expression # _/[[:alpha:]_@$?!]+/_. def token_re() /[[:alpha:]_@$?!]+/ end end class CodeAnalyzer < StopAnalyzer # An array containing some common ruby words that are not usually useful # for searching. CODE_STOP_WORDS = [def", "class", "end", "module"] #etc # Builds an analyzer which removes words in the provided array. def initialize(stop_words = CODE_STOP_WORDS) @stop_words = stop_words end # Filters CodeTokenizer with StopFilter. def token_stream(field, string) return StopFilter.new(CodeTokenizer.new(string), @stop_words) end end This should be a good start for you. Incidently, my next project is going to be to build a kind of documentation wiki. I'll be integrating rdoc with ferret to make the documentation searchable and commentable. I'll actually be using Ferret quite heavily so it should be an good example app for people to work off. I'm working on cFerret integration (huge performance improvement) at the moment though so it could be a while. Cheers, Dave On 11/17/05, aslak hellesoy wrote: > Hi again, > > I'm using ferret to index source code - DamageControl will allow users > to search for text in source code. > > Currently I'm using the default index with no custom analyzer (I'm > using the StandardAnalyzer). Do you have any recommendations about how > to write an analyzer that will index source code in a more 'optimal' > way? I.e. disregard common sourcecode tokens and take into account > dots and such when tokenizing. > > For example, if the source code looks like this: > > def foo(bar) > bar.zap() > end > > searching for 'def' should not match (too common). searching for 'zap' > should match (even if it's not surrounded by spaces, but 'ignorable > characters'. > > Also, it might make sense to use different analyzers for different > source code types (java, ruby, haskell etc). > > Some hints for this would be great. > > Cheers, > Aslak > > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > From aslak.hellesoy at gmail.com Thu Nov 17 22:29:45 2005 From: aslak.hellesoy at gmail.com (aslak hellesoy) Date: Thu, 17 Nov 2005 22:29:45 -0500 Subject: [Ferret-talk] lock problems from concurrent processes. In-Reply-To: References: <8d961d900511161857v4a53b40co6a786088a343ff0f@mail.gmail.com> Message-ID: <8d961d900511171929p16a6f7c2kf04ddd3aadc26624@mail.gmail.com> On 11/17/05, David Balmain wrote: > Hi Aslak, > > Great to hear you are integrating Ferret into DamageControl. I'll try > to be as much help as possible. Ferret is designed to work with > multiple processes accessing the index (hence the locks) so this > problem shouldn't be too hard to solve. You have two options. > > The first might be a little easier but performance want be as good. > That is to flush the index after you do a write so that it won't hold > the lock for an extended period of time. See > Ferret::Index::Index#flush(). This is the best solution if multiple > processes are reading and writing. > > The second option shouldn't be too difficult either although I haven't > documented it very well yet. That is to use Index::IndexWriter for > writing to the indexing and Search::IndexSearcher for searching the > index. Actually, you could continue to use Index::Index for the > process that is writing to the index and Index::Searcher for the read > only process. Index::IndexSearcher will never open any locks. Probably > the best place to look for examples of how to use Index::IndexWriter > and Search::IndexSearcher is actually within the Index::Index class > itself. > Reader/Searcher sounds like the best option for me. I still have some questions though: Index::IndexSearcher.search_query doesn't understand String queries like Index::Index does - I need a Search::Query object. Since I still want my API to be able to use FQL, I need a QueryParser. In order to understand how to use QueryParser I peeked at Index::Index' use of QueryParser. I see: if @qp.nil? @qp = Ferret::QueryParser.new(@default_search_field, @options) end # we need to set this ever time, in case a new field has been added @qp.fields = @reader.get_field_names.to_a query = @qp.parse(query) So to my best judgement it looks like I need a Index::IndexReader in order to use QueryParser. This is where I run into problems. I'm using IndexReader.open, passing in a dir as a String. What happens then is that my existing index directory gets wiped out. It happens when open invokes Store::FSDirectory.new(directory, true). I'm now sufficiently deep into my rabbit hole that I'm not sure when I dug too deep :-) So my question is: How do I create and use a QueryParser without wiping out the existing index files? Cheers, Aslak > Hope this helps. > Cheers, > Dave > > PS: One thing I should mention is that deletes actually happen through > Index::IndexReader. This probably seems a little confusing. It did to > me to start with anyway. Again, check out the code in Index::Index to > see how it handles deletes. > > > On 11/17/05, aslak hellesoy wrote: > > Hi! > > > > First, thanks a LOT for ferret. The API and documentation is great. > > > > I'm trying to integrate ferret into a RoR app (DamageControl) and have > > run into a problem with locks. > > DamageControl consists of two processes that start up and run in > > parallel. The first one is the webapp (which > > is just a plain RoR app). The second is a daemon process that runs in > > the background. > > > > The daemon process writes to the index, and the webapp reads from it. > > It's the same index, stored in the same directory. > > > > My problem is that the webapp gets lock errors: > > > > /usr/lib/ruby/gems/1.8/gems/ferret-0.2.1/lib/ferret/store/fs_store.rb:226:in > > `obtain': could not obtain lock: > > /Users/aslakhellesoy/scm/dc_svn/branches/damagecontrol_active_record/testdata/index/ferret-11f222dc32bbe019198a2b42644196f9write.lock > > (RuntimeError) > > from /usr/lib/ruby/gems/1.8/gems/ferret-0.2.1/lib/ferret/index/index_writer.rb:100:in > > `initialize' > > from /usr/lib/ruby/gems/1.8/gems/ferret-0.2.1/lib/ferret/index/index.rb:111:in > > `new' > > from /usr/lib/ruby/gems/1.8/gems/ferret-0.2.1/lib/ferret/index/index.rb:111:in > > `initialize' > > from /usr/lib/ruby/gems/1.8/gems/ferret-0.2.1/lib/ferret/index/index.rb:109:in > > `synchronize' > > from /usr/lib/ruby/gems/1.8/gems/ferret-0.2.1/lib/ferret/index/index.rb:109:in > > `initialize' > > from ./lib/damagecontrol/ferret_config.rb:6:in `new' > > from ./lib/damagecontrol/ferret_config.rb:6:in `get_index' > > from ./lib/damagecontrol/build_daemon.rb:10 > > > > Is it possible to create a 'read-only' index that doesn't try to > > acquire a lock? Or is there a different way to achieve concurrent > > access to an index from different processes where one of them is only > > writing and the other is only reading? > > > > Cheers, > > Aslak > > > > _______________________________________________ > > Ferret-talk mailing list > > Ferret-talk at rubyforge.org > > http://rubyforge.org/mailman/listinfo/ferret-talk > > > From dbalmain.ml at gmail.com Thu Nov 17 22:51:13 2005 From: dbalmain.ml at gmail.com (David Balmain) Date: Fri, 18 Nov 2005 12:51:13 +0900 Subject: [Ferret-talk] lock problems from concurrent processes. In-Reply-To: <8d961d900511171929p16a6f7c2kf04ddd3aadc26624@mail.gmail.com> References: <8d961d900511161857v4a53b40co6a786088a343ff0f@mail.gmail.com> <8d961d900511171929p16a6f7c2kf04ddd3aadc26624@mail.gmail.com> Message-ID: Hi Aslak, Probably the easiest way to get the reader is streat from the Search::IndexSearcher object. reader is one of it's attributes so there is no need to open a new one. That should solve your problem. ie; if @qp.nil? @qp = Ferret::QueryParser.new(@default_search_field, @options) end # we need to set this ever time, in case a new field has been added @qp.fields = @searcher.reader.get_field_names.to_a query = @qp.parse(query) However, you don't really need the "@qp.fields" line unless you want to allow multi field queries with the '*' symbol. For example; index.search_each("*:Customer") do |doc, score| That searches for the word Customer in all fields in the document. But in your case you may have only one field in which case it isn't necessary. Or, you are going to know what fields exist before hand so you can just feed them in when you create the query parser like this; if @qp.nil? options[:analyzer] = @analyzer options[:fields] = ["source", "comments"] @qp = Ferret::QueryParser.new(@default_search_field, options) end query = @qp.parse(query) Having said all this, I have to admit that you've actually found a bug so it'll be fixed in the next version. IndexReader#open should invoke Store::FSDirectory.new(directory, false). Thanks. Cheers, Dave On 11/18/05, aslak hellesoy wrote: > On 11/17/05, David Balmain wrote: > > Hi Aslak, > > > > Great to hear you are integrating Ferret into DamageControl. I'll try > > to be as much help as possible. Ferret is designed to work with > > multiple processes accessing the index (hence the locks) so this > > problem shouldn't be too hard to solve. You have two options. > > > > The first might be a little easier but performance want be as good. > > That is to flush the index after you do a write so that it won't hold > > the lock for an extended period of time. See > > Ferret::Index::Index#flush(). This is the best solution if multiple > > processes are reading and writing. > > > > The second option shouldn't be too difficult either although I haven't > > documented it very well yet. That is to use Index::IndexWriter for > > writing to the indexing and Search::IndexSearcher for searching the > > index. Actually, you could continue to use Index::Index for the > > process that is writing to the index and Index::Searcher for the read > > only process. Index::IndexSearcher will never open any locks. Probably > > the best place to look for examples of how to use Index::IndexWriter > > and Search::IndexSearcher is actually within the Index::Index class > > itself. > > > > Reader/Searcher sounds like the best option for me. I still have some > questions though: > > Index::IndexSearcher.search_query doesn't understand String queries > like Index::Index does - I need a Search::Query object. Since I still > want my API to be able to use FQL, I need a QueryParser. > > In order to understand how to use QueryParser I peeked at > Index::Index' use of QueryParser. I see: > > if @qp.nil? > @qp = Ferret::QueryParser.new(@default_search_field, @options) > end > # we need to set this ever time, in case a new field has been added > @qp.fields = @reader.get_field_names.to_a > query = @qp.parse(query) > > So to my best judgement it looks like I need a Index::IndexReader in > order to use QueryParser. > This is where I run into problems. > > I'm using IndexReader.open, passing in a dir as a String. What happens > then is that my existing index directory gets wiped out. It happens > when open invokes Store::FSDirectory.new(directory, true). > > I'm now sufficiently deep into my rabbit hole that I'm not sure when I > dug too deep :-) > > So my question is: How do I create and use a QueryParser without > wiping out the existing index files? > > Cheers, > Aslak > > > Hope this helps. > > Cheers, > > Dave > > > > PS: One thing I should mention is that deletes actually happen through > > Index::IndexReader. This probably seems a little confusing. It did to > > me to start with anyway. Again, check out the code in Index::Index to > > see how it handles deletes. > > > > > > On 11/17/05, aslak hellesoy wrote: > > > Hi! > > > > > > First, thanks a LOT for ferret. The API and documentation is great. > > > > > > I'm trying to integrate ferret into a RoR app (DamageControl) and have > > > run into a problem with locks. > > > DamageControl consists of two processes that start up and run in > > > parallel. The first one is the webapp (which > > > is just a plain RoR app). The second is a daemon process that runs in > > > the background. > > > > > > The daemon process writes to the index, and the webapp reads from it. > > > It's the same index, stored in the same directory. > > > > > > My problem is that the webapp gets lock errors: > > > > > > /usr/lib/ruby/gems/1.8/gems/ferret-0.2.1/lib/ferret/store/fs_store.rb:226:in > > > `obtain': could not obtain lock: > > > /Users/aslakhellesoy/scm/dc_svn/branches/damagecontrol_active_record/testdata/index/ferret-11f222dc32bbe019198a2b42644196f9write.lock > > > (RuntimeError) > > > from /usr/lib/ruby/gems/1.8/gems/ferret-0.2.1/lib/ferret/index/index_writer.rb:100:in > > > `initialize' > > > from /usr/lib/ruby/gems/1.8/gems/ferret-0.2.1/lib/ferret/index/index.rb:111:in > > > `new' > > > from /usr/lib/ruby/gems/1.8/gems/ferret-0.2.1/lib/ferret/index/index.rb:111:in > > > `initialize' > > > from /usr/lib/ruby/gems/1.8/gems/ferret-0.2.1/lib/ferret/index/index.rb:109:in > > > `synchronize' > > > from /usr/lib/ruby/gems/1.8/gems/ferret-0.2.1/lib/ferret/index/index.rb:109:in > > > `initialize' > > > from ./lib/damagecontrol/ferret_config.rb:6:in `new' > > > from ./lib/damagecontrol/ferret_config.rb:6:in `get_index' > > > from ./lib/damagecontrol/build_daemon.rb:10 > > > > > > Is it possible to create a 'read-only' index that doesn't try to > > > acquire a lock? Or is there a different way to achieve concurrent > > > access to an index from different processes where one of them is only > > > writing and the other is only reading? > > > > > > Cheers, > > > Aslak > > > > > > _______________________________________________ > > > Ferret-talk mailing list > > > Ferret-talk at rubyforge.org > > > http://rubyforge.org/mailman/listinfo/ferret-talk > > > > > > From aslak.hellesoy at gmail.com Thu Nov 17 23:06:57 2005 From: aslak.hellesoy at gmail.com (aslak hellesoy) Date: Thu, 17 Nov 2005 23:06:57 -0500 Subject: [Ferret-talk] lock problems from concurrent processes. In-Reply-To: <8d961d900511171929p16a6f7c2kf04ddd3aadc26624@mail.gmail.com> References: <8d961d900511161857v4a53b40co6a786088a343ff0f@mail.gmail.com> <8d961d900511171929p16a6f7c2kf04ddd3aadc26624@mail.gmail.com> Message-ID: <8d961d900511172006q3d616f26t9423fc128c26da4b@mail.gmail.com> > Index::IndexSearcher.search_query doesn't understand String queries > like Index::Index does - I need a Search::Query object. Since I still > want my API to be able to use FQL, I need a QueryParser. > > In order to understand how to use QueryParser I peeked at > Index::Index' use of QueryParser. I see: > > if @qp.nil? > @qp = Ferret::QueryParser.new(@default_search_field, @options) > end > # we need to set this ever time, in case a new field has been added > @qp.fields = @reader.get_field_names.to_a > query = @qp.parse(query) > > So to my best judgement it looks like I need a Index::IndexReader in > order to use QueryParser. > This is where I run into problems. > > I'm using IndexReader.open, passing in a dir as a String. What happens > then is that my existing index directory gets wiped out. It happens > when open invokes Store::FSDirectory.new(directory, true). > > I'm now sufficiently deep into my rabbit hole that I'm not sure when I > dug too deep :-) > > So my question is: How do I create and use a QueryParser without > wiping out the existing index files? > Please disregard my previous question. I did it like this: module RevisionFileSearching # Searches for RevisionFile instances using the Ferret index def search_each(query) #:yield: revision_file# dir = Ferret::Store::FSDirectory.new("my_index_dir", false) @index_searcher ||= Ferret::Search::IndexSearcher.new(dir) @index_reader ||= Ferret::Index::IndexReader.open(dir, false) @query_parser ||= Ferret::QueryParser.new("data", {}) @query_parser.fields = @index_reader.get_field_names.to_a query = @query_parser.parse(query) @index_searcher.search_each(query) do |doc, score| id = @index_reader.get_document(doc)["id"] yield RevisionFile.find(id) end end end I'll knock up a blog entry about this and join the Ferret propaganda machine :-) Cheers, Aslak From itsme213 at hotmail.com Tue Nov 22 11:53:17 2005 From: itsme213 at hotmail.com (itsme213) Date: Tue, 22 Nov 2005 10:53:17 -0600 Subject: [Ferret-talk] Ferret NoMethodError Message-ID: Using the current ferret gem (on Win-XP): irb(main):001:0> require 'ferret' c:/ruby/lib/ruby/1.8/fileutils.rb:950: warning: already initialized constant OPT_TABLE => true irb(main):002:0> include Ferret => Object irb(main):003:0> index = Index::Index.new => # irb(main):004:0> index << "This is a new doc" => nil irb(main):005:0> index.search_each('*:this') do |d, s| puts d end NoMethodError: undefined method `weight' for nil:NilClass from c:/ruby/lib/ruby/gems/1.8/gems/ferret-0.2.2/lib/ferret/search/index_searcher.rb:104:in `search' from c:/ruby/lib/ruby/gems/1.8/gems/ferret-0.2.2/lib/ferret/index/index.rb:588:in `do_search ' from c:/ruby/lib/ruby/gems/1.8/gems/ferret-0.2.2/lib/ferret/index/index.rb:291:in `search_ea ch' from c:/ruby/lib/ruby/gems/1.8/gems/ferret-0.2.2/lib/ferret/index/index.rb:290:in `synchroni ze' from c:/ruby/lib/ruby/gems/1.8/gems/ferret-0.2.2/lib/ferret/index/index.rb:290:in `search_ea ch' from (irb):5 irb(main):006:0> From fcsmith at gmail.com Tue Nov 22 14:18:19 2005 From: fcsmith at gmail.com (Finn Smith) Date: Tue, 22 Nov 2005 14:18:19 -0500 Subject: [Ferret-talk] Ferret NoMethodError In-Reply-To: References: Message-ID: <6e72bbd70511221118k21266715t4aa0afd442a012e9@mail.gmail.com> On 11/22/05, itsme213 wrote: > Using the current ferret gem (on Win-XP): > > irb(main):005:0> index.search_each('*:this') do |d, s| puts d end > NoMethodError: undefined method `weight' for nil:NilClass Probably not too much help, but I can confirm the same sequence of code leads to the same exception being thrown using the 0.2.2 gem on Linux. Why is this exception thrown? On line 598 of lib/ferret/index/index.rb: query = @qp.parse(query) @qp.parse(query) is returning nil. This nil object is passed back up into Index::Index's private do_search method. Then the nil object is passed as an argument into the searcher's search method, where the exception is thrown. I couldn't figure out why the QueryParser is returning a nil object from its parse() method. It looks like a bug in the racc grammar file that generates the code that parses the queries. I just installed the 0.2.2 gem, so I haven't encountered any other errors of this nature myself. -F From anatol.pomozov at gmail.com Fri Nov 25 21:36:33 2005 From: anatol.pomozov at gmail.com (Anatol Pomozov) Date: Sat, 26 Nov 2005 03:36:33 +0100 Subject: [Ferret-talk] Several questions about Ferret. Message-ID: <3665a1a00511251836q7e5edfc8tc105cbe1352548f3@mail.gmail.com> Hi. First of all I would like to say "thank you" to David for its really valuable work. Ferret is a great project and it have great future. Well now is my questions as beginner in Ferret. How to remove ALL documents from index. Remove files is not a solution. I am interesting in something like index.remove_index or something like this. What is a usual way of doing it?? What is the name of default key field. (Field that we could later used in method like as index.remove("23") ). In some docs I seen the name :id in other as :key What is the difference in soring field as string and as integer. For example how should be id field stored. As integer?? ( index << {:id=>self.id.to_s} )?? How index.update() works?? What if document with given id not found. Is such document will be created?? What is the best practice to use Rails hooks for ferret?? I am tried to use following code but it seems does not work correctly. Document indexed twice after object update. Could you help me to write right Rails hook methods?? def after_save index = FerretConfig::INDEX index.remove(self.id.to_s) index.update(self.id.to_s, self.to_document) index.optimize end def before_destroy index = FerretConfig::INDEX index.remove(self.id.to_s) index.optimize end def to_document doc = Document.new doc << Field.new('id', self.id.to_s, Field::Store::YES, Field::Index::UNTOKENIZED) doc << Field.new('body_en', self.body_en, Field::Store::YES, Field::Index::TOKENIZED, Field::TermVector::NO, false, 1.0) doc << Field.new('title_en', self.title_en, Field::Store::YES, Field::Index::TOKENIZED, Field::TermVector::NO, false, 3.0) -- anatol (http://pomozov.info) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20051126/ed338171/attachment-0001.htm From anatol.pomozov at gmail.com Sat Nov 26 04:33:48 2005 From: anatol.pomozov at gmail.com (Anatol Pomozov) Date: Sat, 26 Nov 2005 10:33:48 +0100 Subject: [Ferret-talk] Several questions about Ferret. In-Reply-To: References: <3665a1a00511251836q7e5edfc8tc105cbe1352548f3@mail.gmail.com> Message-ID: <3665a1a00511260133o54f053dey10c3eb4c681ab5ce@mail.gmail.com> Hi, David. Thank you for the answers. It were helpful. I just missed :key=>:id in index creation. Is any FAQ page in wiki where I could add this information?? I just have added Ferret to rails app at seems that it plays correctly. I also could add some code examples to Ferret wiki if you dont mind. And again - thank you David for the Ferret. On 11/26/05, David Balmain wrote: > > On 11/26/05, Anatol Pomozov wrote: > > Hi. > > > > First of all I would like to say "thank you" to David for its really > > valuable work. Ferret is a great project and it have great future. > > Hi Anatol, > You're welcome. I hope you find it fills your needs. > > > Well now is my questions as beginner in Ferret. > > > > How to remove ALL documents from index. Remove files is not a solution. > I am > > interesting in something like > > index.remove_index or something like this. What is a usual way of doing > it?? > > Perhaps this is the best solution; > > index.size.times {|i| index.delete(i)} > > > What is the name of default key field. (Field that we could later used > in > > method like as index.remove("23") ). In some docs I seen the name :id in > > other as :key > > The id field used in update, doc and delete is always "id". I may make > this an option in future. This field is used when you pass a string to > any of those three methods. If you pass an integer, the document > number in the index is assumed. > > If you want to use a different field as your key, for example "key", > you can use the query_delete method; > > index.query_delete("key:23") > > > What is the difference in soring field as string and as integer. For > example > > how should be id field stored. As integer?? ( index << {:id=> > self.id.to_s} > > )?? > > All fields are stored as strings. Even if you use index << > {:id=>self.id}, id will be converted into a string. > > > How index.update() works?? What if document with given id not found. Is > such > > document will be created?? > > update only updates a document if it already exists. What you are > looking for is add_document ("<<") mixed with the :key option. For > example; > > index = Index::Index.new(:key => :id) > > index << {:id => 23, :data => "This is the data..."} > index << {:id => 23, :data => "This is the new data..."} > > You can even use this when indexing multiple tables like this; > > index = Index::Index.new(:key => [:id, :table]) > > index << {:id => 23, :table => "content", :data => "This is the > data..."} > index << {:id => 23, :table => "content", :data => "This is the > new data..."} > > Note that :key is nil by default so adding a new document will always > add a new document. > > > > > What is the best practice to use Rails hooks for ferret?? I am tried to > use > > following code but it seems does not work correctly. Document indexed > twice > > after object update. Could you help me to write right Rails hook > methods?? > > def after_save > > index = FerretConfig::INDEX > > index.remove(self.id.to_s) > > index.update(self.id.to_s, self.to_document) > > > > index.optimize > > end > > > > def before_destroy > > index = FerretConfig::INDEX > > index.remove(self.id.to_s) > > index.optimize > > end > > > > def to_document > > doc = Document.new > > doc << Field.new('id', self.id.to_s, Field::Store::YES, > > Field::Index::UNTOKENIZED) > > doc << Field.new('body_en', self.body_en, Field::Store::YES, > > Field::Index::TOKENIZED, Field::TermVector::NO, false, 1.0) > > doc << Field.new('title_en', self.title_en, Field::Store::YES, > > Field::Index::TOKENIZED, Field::TermVector::NO, false, 3.0) > > Unfortunately I haven't had enough time to play with Ferret + Rails > yet. There is a tutorial by Jan Prill here; > > http://wiki.rubyonrails.com/rails/pages/HowToIntegrateFerretWithRails > > I'm not sure where the remove method comes from. Perhaps you've mapped > it to delete. Also, personally, I wouldn't use optimize all the time > like that unless updates are very rare. It's not really necessary. Of > course there is a payoff between update speed and query speed. You > should play around with or without the optimize to see what works best > for you. This is what I would do; > > # create the index with the :key option plus whatever other options > you're using; > FerretConfig::INDEX = Index::Index.new(:key => :id) > > def after_save > FerretConfig::INDEX << self.to_document > end > > def before_destroy > # NOTE: the "to_s" is necessary here so that Ferret > #knows to use the id field > FerretConfig::INDEX.delete(self.id.to_s) > end > > If you find a better way to do this, please contribute to Jan Prills > rails wiki entry. I'll work on better integration between Ferret and > Rails when I finish working on the performance. I'm currently very > busy integrating my C indexer which has been a little harder than I > thought. Currently 2000 lines of C this week and still growing. > > Please let me know if you run into any more problems. > > Cheers, > Dave > -- anatol (http://pomozov.info) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20051126/8741dd6a/attachment.htm From anatol.pomozov at gmail.com Sat Nov 26 08:00:07 2005 From: anatol.pomozov at gmail.com (Anatol Pomozov) Date: Sat, 26 Nov 2005 14:00:07 +0100 Subject: [Ferret-talk] Get number of found documents Message-ID: <3665a1a00511260500l41a69938g15161938bac5a1c2@mail.gmail.com> Hi David again. I would say that Ferret works great with Rails. And now I am trying to create pagination. Because site could have millions of documents I need to create on page link something like "Page #100". Rather usual situation. But to create this links I need to know how many documents Ferret found in index. For now I am doing it with following code index = FerretConfig::INDEX index_size = index.search(query).size @document_pages = Paginator.new self, index_size, PAGE_SIZE, page_num But I am not sure that statement index.search(query).size will be effective here. Because it returns an array of found documents. What if such documents will be 100000?? I don't want to select all this documents, just to know how many found, and then select for example 10 of them index.search_each(query, :num_docs=>PAGE_SIZE, :first_doc=>PAGE_SIZE*(page_num-1)) do |doc_num, score| @documents << index[doc_num] end So quesition is: what is the most effective way to calculate (just calculate) number of found documents? -- anatol (http://pomozov.info) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20051126/af356487/attachment.htm From dbalmain.ml at gmail.com Sat Nov 26 12:23:49 2005 From: dbalmain.ml at gmail.com (David Balmain) Date: Sun, 27 Nov 2005 02:23:49 +0900 Subject: [Ferret-talk] Several questions about Ferret. In-Reply-To: References: <3665a1a00511251836q7e5edfc8tc105cbe1352548f3@mail.gmail.com> Message-ID: On 11/26/05, Anatol Pomozov wrote: > Hi. > > First of all I would like to say "thank you" to David for its really > valuable work. Ferret is a great project and it have great future. Hi Anatol, You're welcome. I hope you find it fills your needs. > Well now is my questions as beginner in Ferret. > > How to remove ALL documents from index. Remove files is not a solution. I am > interesting in something like > index.remove_index or something like this. What is a usual way of doing it?? Perhaps this is the best solution; index.size.times {|i| index.delete(i)} > What is the name of default key field. (Field that we could later used in > method like as index.remove("23") ). In some docs I seen the name :id in > other as :key The id field used in update, doc and delete is always "id". I may make this an option in future. This field is used when you pass a string to any of those three methods. If you pass an integer, the document number in the index is assumed. If you want to use a different field as your key, for example "key", you can use the query_delete method; index.query_delete("key:23") > What is the difference in soring field as string and as integer. For example > how should be id field stored. As integer?? ( index << {:id=>self.id.to_s} > )?? All fields are stored as strings. Even if you use index << {:id=>self.id}, id will be converted into a string. > How index.update() works?? What if document with given id not found. Is such > document will be created?? update only updates a document if it already exists. What you are looking for is add_document ("<<") mixed with the :key option. For example; index = Index::Index.new(:key => :id) index << {:id => 23, :data => "This is the data..."} index << {:id => 23, :data => "This is the new data..."} You can even use this when indexing multiple tables like this; index = Index::Index.new(:key => [:id, :table]) index << {:id => 23, :table => "content", :data => "This is the data..."} index << {:id => 23, :table => "content", :data => "This is the new data..."} Note that :key is nil by default so adding a new document will always add a new document. > > What is the best practice to use Rails hooks for ferret?? I am tried to use > following code but it seems does not work correctly. Document indexed twice > after object update. Could you help me to write right Rails hook methods?? > def after_save > index = FerretConfig::INDEX > index.remove(self.id.to_s) > index.update(self.id.to_s, self.to_document) > > index.optimize > end > > def before_destroy > index = FerretConfig::INDEX > index.remove(self.id.to_s) > index.optimize > end > > def to_document > doc = Document.new > doc << Field.new('id', self.id.to_s, Field::Store::YES, > Field::Index::UNTOKENIZED) > doc << Field.new('body_en', self.body_en, Field::Store::YES, > Field::Index::TOKENIZED, Field::TermVector::NO, false, 1.0) > doc << Field.new('title_en', self.title_en, Field::Store::YES, > Field::Index::TOKENIZED, Field::TermVector::NO, false, 3.0) Unfortunately I haven't had enough time to play with Ferret + Rails yet. There is a tutorial by Jan Prill here; http://wiki.rubyonrails.com/rails/pages/HowToIntegrateFerretWithRails I'm not sure where the remove method comes from. Perhaps you've mapped it to delete. Also, personally, I wouldn't use optimize all the time like that unless updates are very rare. It's not really necessary. Of course there is a payoff between update speed and query speed. You should play around with or without the optimize to see what works best for you. This is what I would do; # create the index with the :key option plus whatever other options you're using; FerretConfig::INDEX = Index::Index.new(:key => :id) def after_save FerretConfig::INDEX << self.to_document end def before_destroy # NOTE: the "to_s" is necessary here so that Ferret #knows to use the id field FerretConfig::INDEX.delete(self.id.to_s) end If you find a better way to do this, please contribute to Jan Prills rails wiki entry. I'll work on better integration between Ferret and Rails when I finish working on the performance. I'm currently very busy integrating my C indexer which has been a little harder than I thought. Currently 2000 lines of C this week and still growing. Please let me know if you run into any more problems. Cheers, Dave From dbalmain.ml at gmail.com Sat Nov 26 12:23:57 2005 From: dbalmain.ml at gmail.com (David Balmain) Date: Sun, 27 Nov 2005 02:23:57 +0900 Subject: [Ferret-talk] Several questions about Ferret. In-Reply-To: References: <3665a1a00511251836q7e5edfc8tc105cbe1352548f3@mail.gmail.com> <3665a1a00511260133o54f053dey10c3eb4c681ab5ce@mail.gmail.com> Message-ID: On 11/26/05, Anatol Pomozov wrote: > Hi, David. > > Thank you for the answers. It were helpful. I just missed :key=>:id in index > creation. > > Is any FAQ page in wiki where I could add this information?? At the moment there is a howtos page here; http://ferret.davebalmain.com/trac/wiki/HowTos It's not very well organized yet. It's another thing I just haven't gotten around to. Please feel free to add to it. Also, I just added a Powered By page so please add your site there when it goes live. And if you write any articles or blog entries, please link to them on the FerretArticles page. Thanks, Dave > I just have added Ferret to rails app at seems that it plays correctly. I > also could add some code examples to Ferret wiki if you dont mind. > > And again - thank you David for the Ferret. > > > On 11/26/05, David Balmain wrote: > > On 11/26/05, Anatol Pomozov wrote: > > > Hi. > > > > > > First of all I would like to say "thank you" to David for its really > > > valuable work. Ferret is a great project and it have great future. > > > > Hi Anatol, > > You're welcome. I hope you find it fills your needs. > > > > > Well now is my questions as beginner in Ferret. > > > > > > How to remove ALL documents from index. Remove files is not a solution. > I am > > > interesting in something like > > > index.remove_index or something like this. What is a usual way of doing > it?? > > > > Perhaps this is the best solution; > > > > index.size.times {|i| index.delete(i)} > > > > > What is the name of default key field. (Field that we could later used > in > > > method like as index.remove("23") ). In some docs I seen the name :id in > > > other as :key > > > > The id field used in update, doc and delete is always "id". I may make > > this an option in future. This field is used when you pass a string to > > any of those three methods. If you pass an integer, the document > > number in the index is assumed. > > > > If you want to use a different field as your key, for example "key", > > you can use the query_delete method; > > > > index.query_delete("key:23") > > > > > What is the difference in soring field as string and as integer. For > example > > > how should be id field stored. As integer?? ( index << > {:id=>self.id.to_s} > > > )?? > > > > All fields are stored as strings. Even if you use index << > > {:id=>self.id }, id will be converted into a string. > > > > > How index.update() works?? What if document with given id not found. Is > such > > > document will be created?? > > > > update only updates a document if it already exists. What you are > > looking for is add_document ("<<") mixed with the :key option. For > > example; > > > > index = Index::Index.new(:key => :id) > > > > index << {:id => 23, :data => "This is the data..."} > > index << {:id => 23, :data => "This is the new data..."} > > > > You can even use this when indexing multiple tables like this; > > > > index = Index::Index.new(:key => [:id, :table]) > > > > index << {:id => 23, :table => "content", :data => "This is the > data..."} > > index << {:id => 23, :table => "content", :data => "This is the > > new data..."} > > > > Note that :key is nil by default so adding a new document will always > > add a new document. > > > > > > > > What is the best practice to use Rails hooks for ferret?? I am tried to > use > > > following code but it seems does not work correctly. Document indexed > twice > > > after object update. Could you help me to write right Rails hook > methods?? > > > def after_save > > > index = FerretConfig::INDEX > > > index.remove(self.id.to_s) > > > index.update(self.id.to_s , self.to_document) > > > > > > index.optimize > > > end > > > > > > def before_destroy > > > index = FerretConfig::INDEX > > > index.remove(self.id.to_s) > > > index.optimize > > > end > > > > > > def to_document > > > doc = Document.new > > > doc << Field.new('id', self.id.to_s, Field::Store::YES, > > > Field::Index::UNTOKENIZED) > > > doc << Field.new('body_en', self.body_en, Field::Store::YES, > > > Field::Index::TOKENIZED, Field::TermVector::NO, false, 1.0) > > > doc << Field.new('title_en', self.title_en, Field::Store::YES, > > > Field::Index::TOKENIZED, Field::TermVector::NO, false, 3.0) > > > > Unfortunately I haven't had enough time to play with Ferret + Rails > > yet. There is a tutorial by Jan Prill here; > > > > > http://wiki.rubyonrails.com/rails/pages/HowToIntegrateFerretWithRails > > > > I'm not sure where the remove method comes from. Perhaps you've mapped > > it to delete. Also, personally, I wouldn't use optimize all the time > > like that unless updates are very rare. It's not really necessary. Of > > course there is a payoff between update speed and query speed. You > > should play around with or without the optimize to see what works best > > for you. This is what I would do; > > > > # create the index with the :key option plus whatever other options > > you're using; > > FerretConfig::INDEX = Index::Index.new(:key => :id) > > > > def after_save > > FerretConfig::INDEX << self.to_document > > end > > > > def before_destroy > > # NOTE: the "to_s" is necessary here so that Ferret > > #knows to use the id field > > FerretConfig::INDEX.delete(self.id.to_s) > > end > > > > If you find a better way to do this, please contribute to Jan Prills > > rails wiki entry. I'll work on better integration between Ferret and > > Rails when I finish working on the performance. I'm currently very > > busy integrating my C indexer which has been a little harder than I > > thought. Currently 2000 lines of C this week and still growing. > > > > Please let me know if you run into any more problems. > > > > Cheers, > > Dave > > > > > > -- > anatol (http://pomozov.info) > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > > > From dbalmain.ml at gmail.com Sat Nov 26 12:24:46 2005 From: dbalmain.ml at gmail.com (David Balmain) Date: Sun, 27 Nov 2005 02:24:46 +0900 Subject: [Ferret-talk] Get number of found documents In-Reply-To: References: <3665a1a00511260500l41a69938g15161938bac5a1c2@mail.gmail.com> Message-ID: On 11/26/05, Anatol Pomozov wrote: > Hi David again. > > I would say that Ferret works great with Rails. > > And now I am trying to create pagination. Because site could have millions > of documents I need to create on page link something like > "Page #100". Rather usual situation. > > But to create this links I need to know how many documents Ferret found in > index. > For now I am doing it with following code > > index = FerretConfig::INDEX > index_size = index.search(query).size > @document_pages = Paginator.new self, index_size, PAGE_SIZE, page_num Actually, > index_size = index.search(query).size will return the number of hits in the result set which will be at most :num_docs which defaults to 10. You would need to set set num_docs to 10000 if you wanted to get 10000 results back. What you want is something like this; result_set = index.search(query, :num_docs => 20) puts result_set.size # => 20 puts result_set.total_hits # => 293487 So total hits could be 10000000 but the number of results will still only be the top 10 or whatever you set :num_docs to. If you really want, you could set :num_docs to 1. You could also use search_each like this; total_hits = index.search_each(query) do |doc_num, score| @documents << index[doc_num] end Hope that helps, Dave > But I am not sure that statement index.search(query).size will be effective > here. Because it returns an array of found documents. What if such documents > will be 100000?? I don't want to select all this documents, just to know how > many found, and then select for example 10 of them > > index.search_each(query, :num_docs=>PAGE_SIZE, > :first_doc=>PAGE_SIZE*(page_num-1)) do |doc_num, score| > @documents << index[doc_num] > end > > So quesition is: what is the most effective way to calculate (just > calculate) number of found documents? > > -- > anatol (http://pomozov.info) > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > > > From anatol.pomozov at gmail.com Sat Nov 26 12:47:31 2005 From: anatol.pomozov at gmail.com (Anatol Pomozov) Date: Sat, 26 Nov 2005 18:47:31 +0100 Subject: [Ferret-talk] Get number of found documents In-Reply-To: References: <3665a1a00511260500l41a69938g15161938bac5a1c2@mail.gmail.com> Message-ID: <3665a1a00511260947l4ae183ean1d5d14459187795c@mail.gmail.com> On 11/26/05, David Balmain wrote: > > On 11/26/05, Anatol Pomozov wrote: > > Hi David again. > > > > I would say that Ferret works great with Rails. > > > > And now I am trying to create pagination. Because site could have > millions > > of documents I need to create on page link something like > > "Page #100". Rather usual situation. > > > > But to create this links I need to know how many documents Ferret found > in > > index. > > For now I am doing it with following code > > > > index = FerretConfig::INDEX > > index_size = index.search(query).size > > @document_pages = Paginator.new self, index_size, PAGE_SIZE, > page_num > > Actually, > > > index_size = index.search(query).size > > will return the number of hits in the result set which will be at most > :num_docs which defaults to 10. You would need to set set num_docs to > 10000 if you wanted to get 10000 results back. > > What you want is something like this; > > result_set = index.search(query, :num_docs => 20) > puts result_set.size # => 20 > puts result_set.total_hits # => 293487 > > So total hits could be 10000000 but the number of results will still > only be the top 10 or whatever you set :num_docs to. If you really > want, you could set :num_docs to 1. You could also use search_each > like this; > > total_hits = index.search_each(query) do |doc_num, score| > @documents << index[doc_num] > end Wow. It is really cool and simple. I aready *LOVE* Ferret. Hope that helps, > Dave > > > > But I am not sure that statement index.search(query).size will be > effective > > here. Because it returns an array of found documents. What if such > documents > > will be 100000?? I don't want to select all this documents, just to know > how > > many found, and then select for example 10 of them > > > > index.search_each(query, :num_docs=>PAGE_SIZE, > > :first_doc=>PAGE_SIZE*(page_num-1)) do |doc_num, score| > > @documents << index[doc_num] > > end > > > > So quesition is: what is the most effective way to calculate (just > > calculate) number of found documents? > > > > -- > > anatol (http://pomozov.info) > > _______________________________________________ > > Ferret-talk mailing list > > Ferret-talk at rubyforge.org > > http://rubyforge.org/mailman/listinfo/ferret-talk > -- anatol (http://pomozov.info) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20051126/9ab173d7/attachment.htm From anatol.pomozov at gmail.com Sat Nov 26 12:49:16 2005 From: anatol.pomozov at gmail.com (Anatol Pomozov) Date: Sat, 26 Nov 2005 18:49:16 +0100 Subject: [Ferret-talk] Get number of found documents In-Reply-To: References: <3665a1a00511260500l41a69938g15161938bac5a1c2@mail.gmail.com> Message-ID: <3665a1a00511260949o22e36ef5t178d2dbe32190d59@mail.gmail.com> It is just a small contribution to your great work. Looking forward to help you with Ferret further (Probably testing cferret on Windoze??) On 11/26/05, David Balmain wrote: > > Hi Anatol, > > Thanks for adding to the Howtos page. I really appreciate any help I can > get. > > Regards, > Dave > -- anatol (http://pomozov.info) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20051126/1ac75a3d/attachment.htm From anatol.pomozov at gmail.com Wed Nov 30 14:04:59 2005 From: anatol.pomozov at gmail.com (Anatol Pomozov) Date: Wed, 30 Nov 2005 20:04:59 +0100 Subject: [Ferret-talk] Compilation of ferret C-extension under Windows. Message-ID: <3665a1a00511301104o40a16c2ta30a92141a12cbb9@mail.gmail.com> Hi, David. I have recently fixed ferret C sources and successfully compile extension with MSVC.Net The problem was that MS compiler is more stricter that GCC and require that all variables were declared before using. There was ~30 such declaration. I have fixed them all. But I am not sure that it works because tests failed with following error both on clean and patched versions. So seems that it is ferret internal error. test_persist_index(IndexTest): RuntimeError: could not obtain lock: C:/work/opensource/1/111/test/temp/fsdir/ferret- e0bcfc4d8e4ef5b2678a85120e4b572ccommit.lock C:/work/opensource/1/111/test/../lib/ferret/store/fs_store.rb:234:in `obtain' C:/work/opensource/1/111/test/../lib/ferret/store/directory.rb:133:in `while_locked' C:/work/opensource/1/111/test/../lib/ferret/index/index_writer.rb:417:in `merge_segments' C:/work/opensource/1/111/test/../lib/ferret/index/index_writer.rb:416:in `synchronize' C:/work/opensource/1/111/test/../lib/ferret/index/index_writer.rb:416:in `merge_segments' C:/work/opensource/1/111/test/../lib/ferret/index/index_writer.rb:183:in `optimize' C:/work/opensource/1/111/test/../lib/ferret/index/index_writer.rb:173:in `synchronize' C:/work/opensource/1/111/test/../lib/ferret/index/index_writer.rb:173:in `optimize' C:/work/opensource/1/111/test/../lib/ferret/index/index_writer.rb:221:in `add_indexes' C:/work/opensource/1/111/test/../lib/ferret/index/index_writer.rb:198:in `synchronize' C:/work/opensource/1/111/test/../lib/ferret/index/index_writer.rb:198:in `add_indexes' C:/work/opensource/1/111/test/../lib/ferret/index/index.rb:545:in `persist' C:/work/opensource/1/111/test/../lib/ferret/index/index.rb:535:in `synchronize' C:/work/opensource/1/111/test/../lib/ferret/index/index.rb:535:in `persist' C:/work/opensource/1/111/test/unit/../unit/analysis/../../unit/document/../../unit/index/tc_index.rb:260:in `test_persist_index' Anyway I could share or send a patch for C sources if you like. -- anatol (http://pomozov.info) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20051130/45d158ae/attachment.htm From dbalmain.ml at gmail.com Wed Nov 30 20:56:55 2005 From: dbalmain.ml at gmail.com (David Balmain) Date: Thu, 1 Dec 2005 10:56:55 +0900 Subject: [Ferret-talk] Compilation of ferret C-extension under Windows. In-Reply-To: References: <3665a1a00511301104o40a16c2ta30a92141a12cbb9@mail.gmail.com> Message-ID: Hi Anatol, On 12/1/05, Anatol Pomozov wrote: > Hi, David. > > I have recently fixed ferret C sources and successfully compile extension > with MSVC.Net The problem was that MS compiler is more stricter that GCC and > require that all variables were declared before using. There was ~30 such > declaration. I have fixed them all. > > But I am not sure that it works because tests failed with following error > both on clean and patched versions. So seems that it is ferret internal > error. > > test_persist_index(IndexTest): > RuntimeError: could not obtain lock: > C:/work/opensource/1/111/test/temp/fsdir/ferret- > e0bcfc4d8e4ef5b2678a85120e4b572ccommit.lock > This isn't a bug but rather caused by the fact that you have a lock still open in your index. I have put finalizers on the lock in the version of Ferret in trunk to stop this from happening but it is better if you make sure that you close the index before you shut down the process. I think a lot of people are getting this error when they're running ferret in a webapp and they kill the server process. To get it working again, just delete the lock file. > > Anyway I could share or send a patch for C sources if you like. A patch would be great if its not too much trouble. Otherwise, I'd love to see an example of what exactly is causing the error. Do you mean it doesn't accept; int x = 3; Thanks for your help. Dave