From g2710 at hotmail.com Mon May 1 02:55:22 2006 From: g2710 at hotmail.com (SchmakO) Date: Mon, 1 May 2006 08:55:22 +0200 Subject: [Ferret-talk] pagination in acts_as_ferret Message-ID: I'm just wondering where I would put the pagination for search results when using "acts_as_ferret". At the moment my search code is.. def search @query = params[:query] || '' unless @query.blank? @results = Tutorial.find_by_contents @query end end Cheers SchmakO -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Tue May 2 03:07:49 2006 From: kraemer at webit.de (Jens Kraemer) Date: Tue, 2 May 2006 09:07:49 +0200 Subject: [Ferret-talk] Locale not set error when trying to use C version in a rails app In-Reply-To: References: Message-ID: <20060502070749.GB29289@cordoba.webit.de> On Sat, Apr 29, 2006 at 07:38:23PM -0600, Carl Youngblood wrote: > I'm getting a locale not set error. Does anyone know how I should set > my locale in my rails environment so that ferret knows what to do? try setting ENV['LANG']='en_US.UTF-8' in your environment.rb. That worked for me. > Why isn't this a problem in the ruby version of ferret? I have no idea... Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From weibel at gmail.com Tue May 2 04:33:28 2006 From: weibel at gmail.com (Kasper Weibel) Date: Tue, 2 May 2006 10:33:28 +0200 Subject: [Ferret-talk] Ferret failing to rebuild_index - occasionally unable to In-Reply-To: References: Message-ID: I have looked at this problem recently but didn't solve it entirely. I suspect that windows is not releasing the files soon enough for the next test to work on them. This post might offer some insight http://www.talkaboutprogramming.com/group/comp.lang.ruby/messages/78404.html Kasper > Loaded suite E:/rails/thatsprogress/test/unit/item_test > Started > ........EEEE > Finished in 8.922 seconds. > > 1) Error: > test_search_for_count(ItemTest): > Errno::EACCES: Permission denied - > E:/rails/thatsprogress/config/../index/test/Item/_j.cfs > .... -- Posted via http://www.ruby-forum.com/. From dbalmain.ml at gmail.com Tue May 2 10:46:40 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Tue, 2 May 2006 23:46:40 +0900 Subject: [Ferret-talk] Migrating to 0.9.1 In-Reply-To: References: <0af228cb1ab449e88a50228d909183eb@ruby-forum.com> Message-ID: On 4/28/06, Onur Turgay wrote: > hi david, > will 0.9.2 contain defining our own analyzers, filters and tokenizers? Yes. But there are still a few buts in there. If you run into any problems I'd recommend sticking with pure ruby version. From steven_shingler at hotmail.com Tue May 2 11:16:55 2006 From: steven_shingler at hotmail.com (steven) Date: Tue, 2 May 2006 17:16:55 +0200 Subject: [Ferret-talk] Indexing Speed? Message-ID: <2f78c4522fee00919326d127a7bacc02@ruby-forum.com> Hi all, Have been looking at lucene and ferret. Have noticed that ferret takes ~463 seconds to index 200Mb of docs, whereas lucene takes ~60 seconds. I'm using the standard "get you started" sort of code provided by both libraries. My ruby code is: (abridged) @index = Index::Index.new(:path => inIndexPath) def createIndex(inRepositoryPath) Find.find(inRepositoryPath) do |path| if FileTest.file?(path) File.open(path) do |file| @index.add_document(:file =>path, :content => file.readlines) end My Java code is basically a direct port. Has anyone else noticed this difference in speed? Am I doing something wrong? Is this speed normal? Any advice gratefully received. Thanks, Steven -- Posted via http://www.ruby-forum.com/. From carl at youngbloods.org Tue May 2 15:42:56 2006 From: carl at youngbloods.org (Carl Youngblood) Date: Tue, 2 May 2006 13:42:56 -0600 Subject: [Ferret-talk] Locale not set error when trying to use C version in a rails app In-Reply-To: <20060502070749.GB29289@cordoba.webit.de> References: <20060502070749.GB29289@cordoba.webit.de> Message-ID: Thanks. I tried that after an online search (before receiving your response) and it worked. On 5/2/06, Jens Kraemer wrote: > On Sat, Apr 29, 2006 at 07:38:23PM -0600, Carl Youngblood wrote: > > I'm getting a locale not set error. Does anyone know how I should set > > my locale in my rails environment so that ferret knows what to do? > > try setting ENV['LANG']='en_US.UTF-8' in your environment.rb. That > worked for me. > > > Why isn't this a problem in the ruby version of ferret? > > I have no idea... > > Jens > > -- > webit! Gesellschaft f?r neue Medien mbH www.webit.de > Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de > Schnorrstra?e 76 Tel +49 351 46766 0 > D-01069 Dresden Fax +49 351 46766 66 > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > From akb at mutualaid.org Tue May 2 18:18:53 2006 From: akb at mutualaid.org (akb) Date: Wed, 3 May 2006 00:18:53 +0200 Subject: [Ferret-talk] Ferret failing to rebuild_index - occasionally unable to In-Reply-To: References: Message-ID: <5b52265f4407bceb632ee40d0de52946@ruby-forum.com> Most of the tests also fail on Windows with a similar error on the demo provided in svn. I tried adding some sleeps but that doesn't make a difference. Kasper Weibel wrote: > I have looked at this problem recently but didn't solve it entirely. I > suspect that windows is not releasing the files soon enough for the next > test to work on them. > > This post might offer some insight > http://www.talkaboutprogramming.com/group/comp.lang.ruby/messages/78404.html > > Kasper > >> Loaded suite E:/rails/thatsprogress/test/unit/item_test >> Started >> ........EEEE >> Finished in 8.922 seconds. >> >> 1) Error: >> test_search_for_count(ItemTest): >> Errno::EACCES: Permission denied - >> E:/rails/thatsprogress/config/../index/test/Item/_j.cfs >> .... -- Posted via http://www.ruby-forum.com/. From carl at youngbloods.org Tue May 2 22:45:32 2006 From: carl at youngbloods.org (Carl Youngblood) Date: Tue, 2 May 2006 20:45:32 -0600 Subject: [Ferret-talk] Is it safe to delete ferret-write.lck if it is stale? Message-ID: I'm noticing that ferret-write.lck sometimes stays in my index directory and throws exceptions whenever someone tries to do a search. Apparently there are some cases where ferret doesn't realize that the file is old and can be deleted. I'm wondering what the best way to recover from this error is. Am I safe just writing a cron job that deletes this file if it is over 10 minutes old or something? Is there any additional cleanup that I need to do? Thanks! Carl From dbalmain.ml at gmail.com Tue May 2 23:01:06 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Wed, 3 May 2006 12:01:06 +0900 Subject: [Ferret-talk] Is it safe to delete ferret-write.lck if it is stale? In-Reply-To: References: Message-ID: Hi Carl, I don't think this is a good idea. If the lock file is still open then there are probably changes that still need to be written to the index. Instead we should try and work out why this is happening. If you could somehow write a failing test I'm sure I'll be able to fix this. Cheers, Dave On 5/3/06, Carl Youngblood wrote: > I'm noticing that ferret-write.lck sometimes stays in my index > directory and throws exceptions whenever someone tries to do a search. > Apparently there are some cases where ferret doesn't realize that the > file is old and can be deleted. I'm wondering what the best way to > recover from this error is. Am I safe just writing a cron job that > deletes this file if it is over 10 minutes old or something? Is there > any additional cleanup that I need to do? > > Thanks! > > Carl > > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > From dbalmain.ml at gmail.com Tue May 2 22:53:15 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Wed, 3 May 2006 11:53:15 +0900 Subject: [Ferret-talk] Indexing Speed? In-Reply-To: <2f78c4522fee00919326d127a7bacc02@ruby-forum.com> References: <2f78c4522fee00919326d127a7bacc02@ruby-forum.com> Message-ID: Hi Steven, Are the indexes you get the same size? My guess is that the code isn't really equivalent. Ferret should be faster than Lucene. Try this; include Ferret::Document @index = Index::Index.new(:path => inIndexPath) def createIndex(inRepositoryPath) Find.find(inRepositoryPath) do |path| if FileTest.file?(path) File.open(path) do |file| doc = Document.new() doc << Field.new(:file, path, Field::Store::YES, Field::Index::UNTOKENIZED) doc << Field.new(:content, file.readlines, Field::Store::NO, Field::Index::TOKENIZED) @index << doc end end end end Let me know if this helps. Cheers, Dave On 5/3/06, steven wrote: > Hi all, > > Have been looking at lucene and ferret. > > Have noticed that ferret takes ~463 seconds to index 200Mb of docs, > whereas lucene takes ~60 seconds. > > I'm using the standard "get you started" sort of code provided by both > libraries. > > My ruby code is: (abridged) > > @index = Index::Index.new(:path => inIndexPath) > > def createIndex(inRepositoryPath) > Find.find(inRepositoryPath) do |path| > if FileTest.file?(path) > File.open(path) do |file| > @index.add_document(:file =>path, :content => > file.readlines) > end > > My Java code is basically a direct port. > > Has anyone else noticed this difference in speed? Am I doing something > wrong? Is this speed normal? > > Any advice gratefully received. > Thanks, > Steven > > -- > Posted via http://www.ruby-forum.com/. > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > From carl at youngbloods.org Wed May 3 11:45:26 2006 From: carl at youngbloods.org (Carl Youngblood) Date: Wed, 3 May 2006 09:45:26 -0600 Subject: [Ferret-talk] Is it safe to delete ferret-write.lck if it is stale? In-Reply-To: References: Message-ID: I'll work on trying to reproduce it. I should also mention that I'm running the pure ruby version of ferret since the C one doesn't compile on freebsd yet. On 5/2/06, David Balmain wrote: > Hi Carl, > > I don't think this is a good idea. If the lock file is still open then > there are probably changes that still need to be written to the index. > Instead we should try and work out why this is happening. If you could > somehow write a failing test I'm sure I'll be able to fix this. > > Cheers, > Dave > > On 5/3/06, Carl Youngblood wrote: > > I'm noticing that ferret-write.lck sometimes stays in my index > > directory and throws exceptions whenever someone tries to do a search. > > Apparently there are some cases where ferret doesn't realize that the > > file is old and can be deleted. I'm wondering what the best way to > > recover from this error is. Am I safe just writing a cron job that > > deletes this file if it is over 10 minutes old or something? Is there > > any additional cleanup that I need to do? > > > > Thanks! > > > > Carl > > > > _______________________________________________ > > Ferret-talk mailing list > > Ferret-talk at rubyforge.org > > http://rubyforge.org/mailman/listinfo/ferret-talk > > > > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > From kraemer at webit.de Wed May 3 12:30:44 2006 From: kraemer at webit.de (Jens Kraemer) Date: Wed, 3 May 2006 18:30:44 +0200 Subject: [Ferret-talk] pagination in acts_as_ferret In-Reply-To: References: Message-ID: <20060503163044.GS29289@cordoba.webit.de> Hi! On Mon, May 01, 2006 at 08:55:22AM +0200, SchmakO wrote: > I'm just wondering where I would put the pagination for search results > when using "acts_as_ferret". > > At the moment my search code is.. > > def search > @query = params[:query] || '' > unless @query.blank? > @results = Tutorial.find_by_contents @query > end > end find_by_contents has two options suitable for paging: :first_doc (first result to retrieve) and :num_docs (number of results to retrieve). so to retrieve results 10 to 20, you would use @results = Tutorial.find_by_contents(@query,:first_doc=>10,:num_docs=>10) hth, Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From kraemer at webit.de Wed May 3 12:54:50 2006 From: kraemer at webit.de (Jens Kraemer) Date: Wed, 3 May 2006 18:54:50 +0200 Subject: [Ferret-talk] Search multiple models In-Reply-To: References: Message-ID: <20060503165450.GT29289@cordoba.webit.de> Hi, and sorry for the late reply. On Wed, Apr 26, 2006 at 12:41:10PM +0200, Frank Rosquin wrote: > Hello, > > Lets say you have a few models like Post, Article, Wiki, Comment, And > you want to use ferret to search all of them at once. How would I set up > the latest acts_as_ferret to accomplish this? And what would be fastest > for searches? 1 index for all models, or have an index per model? Which would be fastest depends on the type of your queries. If most of your queries search all models at once, a single index should be faster. If you tend to query mainly a single model and queries across all models are the exception, the index-per-model approach should be better suited. However the difference won't matter until you get to really big indexes. If you go the multiple index route (declaring acts_as_ferret in each of the models you want to search), you can use the multi_search(query, additional_models = [], options = {}) method on any of these model classes, giving the list of all other model classes to search through as the second parameter. the options hash is the same as for find_by_contents. You have to add the :store_class_name => true option to your acts_as_ferret calls. That turns class name storage in the indexes on and let's multi_search know what class to query for a given hit. For the single index route, using Rails single table inheritance is the easiest approach. Just call acts_as_ferret once in your base class, and use find_by_contents as usual. This is known to work, I use this with Typo's Content base class. If this is no option for you, you can configure each model class to use the same index directory. This approach should work but hasn't got much (if any) testing so far. One problem here is that we use the id column as a key in ferret indexes, too. So the id has to be unique across the models you want to search. In addition, you would be on your own for querying the index, I don't think any of the existing searching methods will work out of the box in this scenario. The :store_class_name option to acts_as_ferret should be useful in this contextm, too. Patches regarding these issues would be very welcome - my hacking time is quite constrained atm... After all, I'd either suggest the STI approach or, if that doesn't fit, the multi-index route. hope this helps, Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From shanti at braford.org Thu May 4 19:20:33 2006 From: shanti at braford.org (Shanti Braford) Date: Fri, 5 May 2006 01:20:33 +0200 Subject: [Ferret-talk] How to install Ferret to get the best performance Message-ID: Hey all, After dabbling with ActiveSearch, we're coming back around to take another look at Ferret. ActiveSearch slowed to a crawl after indexing about 20k documents, each 20 lines each. This time we may attempt to create multiple Ferret indexes (isolating each organization's data individually), since we eventually could have upwards of 20k documents for some organizations. I was wondering - what is the best current way to get optimal performance out of Ferret? (C bindings, etc) I'm developing locally on an OS X Tiger (latest updates). (PowerPC architecture, not Intel just yet) I tried the instructions laid out here for the Archive Install of ferret-0.9.1-alpha: http://ferret.davebalmain.com/trac/ After running: > sudo ruby setup.rb I got: make: *** [ferret_ext.bundle] Error 1 The C extensions were not installed. But don't worry. Everything should work fine Then I tried installing the gem and everything seemed to work like a charm. (no error messages, etc) Does installing the gem give you the fast C extensions? Also, we'll eventually be deploying this onto a production Linux box. Should we use the Archive Install method or gem install method for that (to get the C extensions), or does it not matter? Thanks! - Shanti http://sproutit.com - group email for support@, sales@ addresses http://sablog.com - personal blog -- Posted via http://www.ruby-forum.com/. From shanti at braford.org Thu May 4 19:32:34 2006 From: shanti at braford.org (Shanti Braford) Date: Fri, 5 May 2006 01:32:34 +0200 Subject: [Ferret-talk] How to install Ferret to get the best performance In-Reply-To: References: Message-ID: <807df4c6e9aed7d59a65e2bd0b280acb@ruby-forum.com> Ok, still reading up a bit here =) This post sheds some light on the situation: http://article.gmane.org/gmane.comp.lang.ruby.rails/27008 Eagerly awaiting cFerret, - Shanti -- Posted via http://www.ruby-forum.com/. From dbalmain.ml at gmail.com Thu May 4 21:15:53 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Fri, 5 May 2006 10:15:53 +0900 Subject: [Ferret-talk] How to install Ferret to get the best performance In-Reply-To: <807df4c6e9aed7d59a65e2bd0b280acb@ruby-forum.com> References: <807df4c6e9aed7d59a65e2bd0b280acb@ruby-forum.com> Message-ID: Hi Shanti, Actually, cferret is now used as the back end for ferret. But if you got an error running the install then it won't have been installed and you'll just be using the pure C version. There was a known issue with the compile on OS X which I fixed a while back but I haven't released it yet as I've been trying to get ferret to compile on Windows. That has been a major hassle so I might just put a release out today. As for method of install, it shouldn't make a difference in terms of performance. I'd recommend the gem install for ease of use. Cheers, Dave On 5/5/06, Shanti Braford wrote: > Ok, still reading up a bit here =) > > This post sheds some light on the situation: > http://article.gmane.org/gmane.comp.lang.ruby.rails/27008 > > Eagerly awaiting cFerret, > - Shanti > > -- > Posted via http://www.ruby-forum.com/. > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > From weibel at gmail.com Thu May 4 21:24:24 2006 From: weibel at gmail.com (Kasper Weibel) Date: Fri, 5 May 2006 03:24:24 +0200 Subject: [Ferret-talk] Ferret failing to rebuild_index - occasionally unable to In-Reply-To: <5b52265f4407bceb632ee40d0de52946@ruby-forum.com> References: <5b52265f4407bceb632ee40d0de52946@ruby-forum.com> Message-ID: <00615ea77943d230bf32ee446f5a4b3d@ruby-forum.com> I think windows is locking the file for deletion until the test process ends - or something similar. akb wrote: > Most of the tests also fail on Windows with a similar error on the demo > provided in svn. I tried adding some sleeps but that doesn't make a > difference. > -- Posted via http://www.ruby-forum.com/. From Pedro.CorteReal at iantt.pt Fri May 5 07:14:34 2006 From: Pedro.CorteReal at iantt.pt (Pedro =?ISO-8859-1?Q?C=F4rte-Real?=) Date: Fri, 05 May 2006 12:14:34 +0100 Subject: [Ferret-talk] Sorting by score Message-ID: <1146827674.10604.1.camel@localhost.localdomain> I'm trying to sort by score but it seems like SortField::SortType::SCORE is 0 instead of a SortType. A test case is attached. Without the C extensions the test passes, so I guess it's a bug in them. Should I be using it without the extensions? Because if that's the case I have some other bugs to report. Greetings, Pedro C?rte-Real -------------- next part -------------- A non-text attachment was scrubbed... Name: ferret_sort_field.rb Type: application/x-ruby Size: 218 bytes Desc: not available Url : http://rubyforge.org/pipermail/ferret-talk/attachments/20060505/1a7a08c6/attachment.bin From steven_shingler at hotmail.com Fri May 5 09:15:20 2006 From: steven_shingler at hotmail.com (steven) Date: Fri, 5 May 2006 15:15:20 +0200 Subject: [Ferret-talk] Indexing Speed? In-Reply-To: References: <2f78c4522fee00919326d127a7bacc02@ruby-forum.com> Message-ID: Hi Dave, Thanks very much for getting back to me. You were right about the indexes being different... Your snippet has helped - but still nowhere near as fast as the Java version: doc.add(new Field("path", f.getPath(), Field.Store.YES, Field.Index.UN_TOKENIZED)); doc.add(new Field("modified",DateTools.timeToString(f.lastModified(), DateTools.Resolution.MINUTE), Field.Store.YES, Field.Index.UN_TOKENIZED)); doc.add(new Field("contents", new FileReader(f))); Could it be that ruby's file.readlines is slower than Java's FileReader? Another possible snafu is that the Directory contains loads of pdfs and other binary files which neither lucene or ferret can index - could it be that ferret is slower at dealing with things like that? (Just a thought) Would love to hear any thoughts. Many Thanks, Steven. -- Posted via http://www.ruby-forum.com/. From dbalmain.ml at gmail.com Fri May 5 10:41:53 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Fri, 5 May 2006 23:41:53 +0900 Subject: [Ferret-talk] Indexing Speed? In-Reply-To: References: <2f78c4522fee00919326d127a7bacc02@ruby-forum.com> Message-ID: Hi Steven, Once you made those changes were the indexes approximately the same size? You'll get the most accurate results if the indexes are identical. Also, which version of Ferret are you using? I just tried 200Mb here (~600 files). In my case all of it is text and everything gets indexed. Lucene took ~120 seconds and Ferret took ~55 seconds. Both indexes are identical. I'm using the Sun JVM. I look forward to your reply. Cheers, Dave On 5/5/06, steven wrote: > Hi Dave, > > Thanks very much for getting back to me. > > You were right about the indexes being different... > > Your snippet has helped - but still nowhere near as fast as the Java > version: > > doc.add(new Field("path", f.getPath(), Field.Store.YES, > Field.Index.UN_TOKENIZED)); > doc.add(new Field("modified",DateTools.timeToString(f.lastModified(), > DateTools.Resolution.MINUTE), Field.Store.YES, > Field.Index.UN_TOKENIZED)); > doc.add(new Field("contents", new FileReader(f))); > > Could it be that ruby's file.readlines is slower than Java's FileReader? > > Another possible snafu is that the Directory contains loads of pdfs and > other binary files which neither lucene or ferret can index - could it > be that ferret is slower at dealing with things like that? (Just a > thought) > > Would love to hear any thoughts. > > Many Thanks, > Steven. > > -- > Posted via http://www.ruby-forum.com/. > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > From dbalmain.ml at gmail.com Fri May 5 10:49:06 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Fri, 5 May 2006 23:49:06 +0900 Subject: [Ferret-talk] Sorting by score In-Reply-To: <1146827674.10604.1.camel@localhost.localdomain> References: <1146827674.10604.1.camel@localhost.localdomain> Message-ID: Hi Pedro, This isn't a bug in either. Ferret with the extensions differs slightly from Ferret without the extensions. This is unfortunate but it's something you'll have to live with for a while yet. I have a lot of other things to work on before I get back to working on the pure Ruby version of Ferret. Sorry about this. Dave On 5/5/06, Pedro C?rte-Real wrote: > I'm trying to sort by score but it seems like SortField::SortType::SCORE > is 0 instead of a SortType. A test case is attached. Without the C > extensions the test passes, so I guess it's a bug in them. Should I be > using it without the extensions? Because if that's the case I have some > other bugs to report. > > Greetings, > > Pedro C?rte-Real > > > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > > > From Pedro.CorteReal at iantt.pt Fri May 5 11:00:36 2006 From: Pedro.CorteReal at iantt.pt (Pedro =?ISO-8859-1?Q?C=F4rte-Real?=) Date: Fri, 05 May 2006 16:00:36 +0100 Subject: [Ferret-talk] Sorting by score In-Reply-To: References: <1146827674.10604.1.camel@localhost.localdomain> Message-ID: <1146841236.10604.3.camel@localhost.localdomain> On Fri, 2006-05-05 at 23:49 +0900, David Balmain wrote: > Hi Pedro, > > This isn't a bug in either. Ferret with the extensions differs > slightly from Ferret without the extensions. This is unfortunate but > it's something you'll have to live with for a while yet. I have a lot > of other things to work on before I get back to working on the pure > Ruby version of Ferret. Sorry about this. No problem, but then how do I sort by score with the extensions? Pedro. From dbalmain.ml at gmail.com Fri May 5 11:26:01 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Sat, 6 May 2006 00:26:01 +0900 Subject: [Ferret-talk] Sorting by score In-Reply-To: <1146841236.10604.3.camel@localhost.localdomain> References: <1146827674.10604.1.camel@localhost.localdomain> <1146841236.10604.3.camel@localhost.localdomain> Message-ID: On 5/6/06, Pedro C?rte-Real wrote: > On Fri, 2006-05-05 at 23:49 +0900, David Balmain wrote: > > Hi Pedro, > > > > This isn't a bug in either. Ferret with the extensions differs > > slightly from Ferret without the extensions. This is unfortunate but > > it's something you'll have to live with for a while yet. I have a lot > > of other things to work on before I get back to working on the pure > > Ruby version of Ferret. Sorry about this. > > No problem, but then how do I sort by score with the extensions? > > Pedro. Oh, now I see what you are asking. Just the same way as you would without the extensions. This much is the same in both versions. The fact that the SortTypes are integers instead of SortType objects is just an implementation issue. Actually you should use SortField::FIELD_SCORE. So if you wanted to sort by create_date (stored like "20060505") and then by score you could create a sort like this; date_sf = SortField.new(:create_date, {:sort_type => SortField::SortType::INTEGER}) sort = Sort.new([date_sf, SortField::FIELD_SCORE]) In fact, it will work out that the field is an integer field itself so you could just do this; sort = Sort.new(["create_date", SortField::FIELD_SCORE]) Hope that helps, Dave From atomgiant at gmail.com Fri May 5 12:42:22 2006 From: atomgiant at gmail.com (Tom Davies) Date: Fri, 5 May 2006 12:42:22 -0400 Subject: [Ferret-talk] pagination in acts_as_ferret In-Reply-To: <20060503163044.GS29289@cordoba.webit.de> References: <20060503163044.GS29289@cordoba.webit.de> Message-ID: To add to what Jens said, you may find this code useful: In your model: def self.search(q, options = {}) return nil if q.nil? default_options = {:limit => 10, :page => 1} options = default_options.merge options options[:offset] = options[:limit] * (options[:page].to_i-1) ... snip ... num = INDEX.search_each(query, {:num_docs => options[:limit], :first_doc => options[:offset]}) do |doc, score| ... snip ... [num, results] end Notice that I return the total matches as num, plus the results. The total matches is necessary to generate a paginator across all the items. For the pagination, I created this simple method in my application controller (note it assumes a params[:page] being passed around): def pages_for(size, options = {}) default_options = {:per_page => 10} options = default_options.merge options pages = Paginator.new self, size, options[:per_page], (params[:page]||1) pages end And lastly, to use it in a controller: @total, @results = YourModel.search(@query, :page => (params[:page]||1) @result_pages = pages_for(@total) Tom On 5/3/06, Jens Kraemer wrote: > Hi! > > On Mon, May 01, 2006 at 08:55:22AM +0200, SchmakO wrote: > > I'm just wondering where I would put the pagination for search results > > when using "acts_as_ferret". > > > > At the moment my search code is.. > > > > def search > > @query = params[:query] || '' > > unless @query.blank? > > @results = Tutorial.find_by_contents @query > > end > > end > > find_by_contents has two options suitable for paging: > :first_doc (first result to retrieve) and > :num_docs (number of results to retrieve). > > so to retrieve results 10 to 20, you would use > @results = Tutorial.find_by_contents(@query,:first_doc=>10,:num_docs=>10) > > hth, > Jens > > > -- > webit! Gesellschaft f?r neue Medien mbH www.webit.de > Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de > Schnorrstra?e 76 Tel +49 351 46766 0 > D-01069 Dresden Fax +49 351 46766 66 > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > -- Tom Davies http://blog.atomgiant.com http://gifthat.com From joshuabates at gmail.com Fri May 5 16:11:25 2006 From: joshuabates at gmail.com (Joshua Bates) Date: Fri, 5 May 2006 14:11:25 -0600 Subject: [Ferret-talk] Is there any working way to search multiple indexes? Message-ID: I'm running from the trunk, and hitting road blocks no matter which way I attempt to search across multiple indexes. I tried a MultiSearcher, but I can't pass a string for the search query ms.search "iraq" TypeError: wrong argument type String (expected Data) So I tried creating a QueryParser to pass, but I can't get the fields from the reader r.get_field_names NoMethodError: undefined method `get_field_names' for # and even if I could, I cant pass any fields to the QueryParser qp.fields = ["title"] NoMethodError: undefined method `fields=' for # So I tried using the MultiReader, like acts_as_ferret does, but it won't take any arguments @reader = Index::MultiReader.new(sub_readers) ArgumentError: wrong number of arguments (1 for 0) Is everything I'm trying to do unimplemented in the c version, or am I just overlooking something obvious? thanks, joshua -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20060505/c6c20065/attachment.htm From shanti at braford.org Fri May 5 18:42:46 2006 From: shanti at braford.org (Shanti Braford) Date: Sat, 6 May 2006 00:42:46 +0200 Subject: [Ferret-talk] How to install Ferret to get the best performance In-Reply-To: References: <807df4c6e9aed7d59a65e2bd0b280acb@ruby-forum.com> Message-ID: <1c95b02e0ca15aaf38ab0108a8ad67b2@ruby-forum.com> Hi David, Sweet!! So if I installed the gem w/o errors, it's now rocking the latest cFerret engine that's 100x faster or so? Awesome. Thank you fine sir, - Shanti (I couldn't code my way out of a wet paper bag in C. *ahem*) David Balmain wrote: > Hi Shanti, > > Actually, cferret is now used as the back end for ferret. But if you > got an error running the install then it won't have been installed and > you'll just be using the pure C version. There was a known issue with > the compile on OS X which I fixed a while back but I haven't released > it yet as I've been trying to get ferret to compile on Windows. > > That has been a major hassle so I might just put a release out today. > As for method of install, it shouldn't make a difference in terms of > performance. I'd recommend the gem install for ease of use. > > Cheers, > Dave -- Posted via http://www.ruby-forum.com/. From dbalmain.ml at gmail.com Fri May 5 20:46:03 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Sat, 6 May 2006 09:46:03 +0900 Subject: [Ferret-talk] Is there any working way to search multiple indexes? In-Reply-To: References: Message-ID: Hi Joshua, The best place to check how to use the MultiSearcher is the MultiSearcher unit tests; test/unit/search/tc_multi_searcher.rb test/unit/search/tc_multi_searcher2.rb Create a MultiSearcher like this; @multi = Ferret::Search::MultiSearcher.new([IndexSearcher.new(dir1), IndexSearcher.new(dir2)]) You can have as many IndexSearchers as you like. You actually need to pass a query to both IndexSearcher and MultiSearcher. Strings are not allowed in Ferret 0.9.1. That may change in future versions. So do something like this; def do_search(query) @query_parser ||= Ferret::QueryParser.new(['date', 'field', 'cat'], :analyzer => WhiteSpaceAnalyzer.new()) query = @query_parser.parse(query) if (query.is_a? String) return @multi.search(query) end Hope that helps. Cheers, Dave On 5/6/06, Joshua Bates wrote: > I'm running from the trunk, and hitting road blocks no matter which way I > attempt > to search across multiple indexes. > > I tried a MultiSearcher, but I can't pass a string for the search query > > ms.search "iraq" > TypeError: wrong argument type String (expected Data) > > So I tried creating a QueryParser to pass, but I can't get the fields from > the reader > r.get_field_names > NoMethodError: undefined method `get_field_names' for > # > > and even if I could, I cant pass any fields to the QueryParser > qp.fields = ["title"] > NoMethodError: undefined method `fields=' for > # > > So I tried using the MultiReader, like acts_as_ferret does, but it won't > take any arguments > @reader = Index::MultiReader.new(sub_readers) > ArgumentError: wrong number of arguments (1 for 0) > > Is everything I'm trying to do unimplemented in the c version, or am I just > overlooking something obvious? > > thanks, > joshua > > > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > > From atomgiant at gmail.com Mon May 8 06:25:07 2006 From: atomgiant at gmail.com (Tom Davies) Date: Mon, 8 May 2006 06:25:07 -0400 Subject: [Ferret-talk] How to install Ferret to get the best performance In-Reply-To: <1c95b02e0ca15aaf38ab0108a8ad67b2@ruby-forum.com> References: <807df4c6e9aed7d59a65e2bd0b280acb@ruby-forum.com> <1c95b02e0ca15aaf38ab0108a8ad67b2@ruby-forum.com> Message-ID: > But if you got an error running the install then it won't have been installed and > you'll just be using the pure C version Dave, Is that a typo? Did you mean "you'll just be using the pure ruby version"? Tom On 5/5/06, Shanti Braford wrote: > Hi David, > > Sweet!! So if I installed the gem w/o errors, it's now rocking the > latest cFerret engine that's 100x faster or so? Awesome. > > Thank you fine sir, > - Shanti > > (I couldn't code my way out of a wet paper bag in C. *ahem*) > > David Balmain wrote: > > Hi Shanti, > > > > Actually, cferret is now used as the back end for ferret. But if you > > got an error running the install then it won't have been installed and > > you'll just be using the pure C version. There was a known issue with > > the compile on OS X which I fixed a while back but I haven't released > > it yet as I've been trying to get ferret to compile on Windows. > > > > That has been a major hassle so I might just put a release out today. > > As for method of install, it shouldn't make a difference in terms of > > performance. I'd recommend the gem install for ease of use. > > > > Cheers, > > Dave > > > -- > Posted via http://www.ruby-forum.com/. > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > -- Tom Davies http://blog.atomgiant.com http://gifthat.com From dbalmain.ml at gmail.com Mon May 8 10:46:30 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Mon, 8 May 2006 23:46:30 +0900 Subject: [Ferret-talk] How to install Ferret to get the best performance In-Reply-To: References: <807df4c6e9aed7d59a65e2bd0b280acb@ruby-forum.com> <1c95b02e0ca15aaf38ab0108a8ad67b2@ruby-forum.com> Message-ID: Ah yes. If the C extensions fail to compile then Ferret will rescue by using the pure Ruby version instead. On 5/8/06, Tom Davies wrote: > > But if you got an error running the install then it won't have been installed and > > you'll just be using the pure C version > > Dave, > > Is that a typo? Did you mean "you'll just be using the pure ruby version"? > > Tom > > On 5/5/06, Shanti Braford wrote: > > Hi David, > > > > Sweet!! So if I installed the gem w/o errors, it's now rocking the > > latest cFerret engine that's 100x faster or so? Awesome. > > > > Thank you fine sir, > > - Shanti > > > > (I couldn't code my way out of a wet paper bag in C. *ahem*) > > > > David Balmain wrote: > > > Hi Shanti, > > > > > > Actually, cferret is now used as the back end for ferret. But if you > > > got an error running the install then it won't have been installed and > > > you'll just be using the pure C version. There was a known issue with > > > the compile on OS X which I fixed a while back but I haven't released > > > it yet as I've been trying to get ferret to compile on Windows. > > > > > > That has been a major hassle so I might just put a release out today. > > > As for method of install, it shouldn't make a difference in terms of > > > performance. I'd recommend the gem install for ease of use. > > > > > > Cheers, > > > Dave > > > > > > -- > > Posted via http://www.ruby-forum.com/. > > _______________________________________________ > > Ferret-talk mailing list > > Ferret-talk at rubyforge.org > > http://rubyforge.org/mailman/listinfo/ferret-talk > > > > > -- > Tom Davies > > http://blog.atomgiant.com > http://gifthat.com > > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > From shanti at braford.org Mon May 8 18:35:48 2006 From: shanti at braford.org (Shanti Braford) Date: Tue, 9 May 2006 00:35:48 +0200 Subject: [Ferret-talk] Index::Index.new vs. Readers and Writers Message-ID: Hey gang, A post on the Rails forum a while back had it sound like you pretty much had to use the Index Readers & Writers if you were going to be potentially accessing an index from more than one process. (i.e. multiple dispatch.fcgi's, etc) Is this still the case, or does the main Index class do that black magic behind the scenes? =) I was having trouble implementing the Readers & Writers so I thought I'd post an example stub of what I have here. Any feedback would be much appreciated. # Non-Reader/Writer Example - Main Index::Index.new only # works like a charm but haven't tried firing up a bunch to see if we get IO blocks. require 'ferret' class SearchEngine include Ferret include Ferret::Document def self.get_index() index_dir = "/var/search/index" index = Index::Index.new(:path => index_dir, :create_if_missing => true) return index end end # Reader/Writer Example require 'ferret' class SearchEngine include Ferret include Ferret::Document # Creates or returns an existing index for an organization def self.get_index(type = 'writer') index_dir = "/var/search/index" if type == 'writer' index = Index::IndexWriter.new(index_dir, :create_if_missing => true) elsif type == 'reader' index = Index::IndexReader.open(index_dir, false) end return index end end Thanks!! - Shanti -- Posted via http://www.ruby-forum.com/. From dbalmain.ml at gmail.com Mon May 8 21:20:32 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Tue, 9 May 2006 10:20:32 +0900 Subject: [Ferret-talk] Index::Index.new vs. Readers and Writers In-Reply-To: References: Message-ID: Hi Shanti, When you have multi processes accessing the index, it's not a matter of which class you use but how many processes you have writing to the index. The recommended way to do things is to have only one process writing to the index. You can have as many index readers open as you like. The trouble is that the IndexWriter opens a commit lock on the index. If another IndexWriter comes along and tries to open the lock at the same time it will raise an exception. The same thing goes for using the Index class as it is just really a simple interface to the IndexWriter and IndexReader classes. One possibility is to use the Index class with :autoflush set to true. This should work most of the time as the IndexWriter class will keep trying for 5 seconds (broken in C version of 0.9.0, 0.9.1) to gain the commit lock so if it misses the first time it should eventually get it. This is an easy way to do things but it's still dangerous. I'd recommend using a single IndexWriter as described above. That doesn't mean you have to use the IndexWriter and IndexReader classes. You can still use the Index class as long as only one Index is doing the writing. I hope that helps. Stay tuned for much better documentation on this. Dave On 5/9/06, Shanti Braford wrote: > Hey gang, > > A post on the Rails forum a while back had it sound like you pretty much > had to use the Index Readers & Writers if you were going to be > potentially accessing an index from more than one process. (i.e. > multiple dispatch.fcgi's, etc) > > Is this still the case, or does the main Index class do that black magic > behind the scenes? =) > > I was having trouble implementing the Readers & Writers so I thought I'd > post an example stub of what I have here. Any feedback would be much > appreciated. > > # Non-Reader/Writer Example - Main Index::Index.new only > # works like a charm but haven't tried firing up a bunch to see if we > get IO blocks. > > require 'ferret' > > class SearchEngine > include Ferret > include Ferret::Document > > def self.get_index() > index_dir = "/var/search/index" > > index = Index::Index.new(:path => index_dir, > :create_if_missing => true) > return index > end > end > > # Reader/Writer Example > > require 'ferret' > > class SearchEngine > include Ferret > include Ferret::Document > > > # Creates or returns an existing index for an organization > def self.get_index(type = 'writer') > index_dir = "/var/search/index" > if type == 'writer' > index = Index::IndexWriter.new(index_dir, > :create_if_missing => true) > elsif type == 'reader' > index = Index::IndexReader.open(index_dir, false) > end > return index > end > end > > > Thanks!! > > - Shanti > > -- > Posted via http://www.ruby-forum.com/. > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > From Pedro.CorteReal at iantt.pt Tue May 9 05:13:56 2006 From: Pedro.CorteReal at iantt.pt (Pedro =?ISO-8859-1?Q?C=F4rte-Real?=) Date: Tue, 09 May 2006 10:13:56 +0100 Subject: [Ferret-talk] Reverse sorts by score Message-ID: <1147166036.9220.3.camel@localhost.localdomain> The docs for Sort.new say: reverse: pass true if you want the sort order to be reversed. Only works if you pass the field names. Does this mean it's not possible to do a reverse sort by score? If it is it seems to be broken as I don't seem to be able to reverse the order of the sort. I'll write a test case if this is not a known problem. Greetings, Pedro. From dbalmain.ml at gmail.com Tue May 9 05:41:12 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Tue, 9 May 2006 18:41:12 +0900 Subject: [Ferret-talk] Reverse sorts by score In-Reply-To: <1147166036.9220.3.camel@localhost.localdomain> References: <1147166036.9220.3.camel@localhost.localdomain> Message-ID: Hi Pedro, To sort reverse by score do this; s = Sort.new(SortField.new("Reverse Score", {:sort_type => SortField::SortType::SCORE, :reverse => true})) Hope that helps. Dave On 5/9/06, Pedro C?rte-Real wrote: > The docs for Sort.new say: > > reverse: pass true if you want the sort order to be reversed. Only works > if you pass the field names. > > Does this mean it's not possible to do a reverse sort by score? If it is > it seems to be broken as I don't seem to be able to reverse the order of > the sort. I'll write a test case if this is not a known problem. > > Greetings, > > Pedro. > > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > From Pedro.CorteReal at iantt.pt Tue May 9 06:05:26 2006 From: Pedro.CorteReal at iantt.pt (Pedro =?ISO-8859-1?Q?C=F4rte-Real?=) Date: Tue, 09 May 2006 11:05:26 +0100 Subject: [Ferret-talk] Sorting Search results In-Reply-To: References: <1145631039.9222.4.camel@localhost.localdomain> Message-ID: <1147169126.9220.6.camel@localhost.localdomain> On Tue, 2006-04-25 at 22:20 +0900, David Balmain wrote: > Thanks Pedro, > > This was a bug. It is now fixed. I think I can still see it in my search results. Are you sure it's fixed? I'm using SVN now. Greetings, Pedro. From dbalmain.ml at gmail.com Tue May 9 07:31:51 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Tue, 9 May 2006 20:31:51 +0900 Subject: [Ferret-talk] Sorting Search results In-Reply-To: <1147169126.9220.6.camel@localhost.localdomain> References: <1145631039.9222.4.camel@localhost.localdomain> <1147169126.9220.6.camel@localhost.localdomain> Message-ID: It's working here; require 'ferret' include Ferret include Ferret::Index i = Index.new i << {:a => "one", :b => 1} i << {:a => "two two", :b => 2} i << {:a => "three three three", :b => 3} i << {:a => "four", :b => 4} i.search_each("one two three four") do |d, s| puts "#{s} => #{i[d]}" end # 0.0612355209887028 => Document { # stored/uncompressed,indexed,tokenized, # stored/uncompressed,indexed,tokenized, # } # 0.0612355209887028 => Document { # stored/uncompressed,indexed,tokenized, # stored/uncompressed,indexed,tokenized, # } # 0.0541250631213188 => Document { # stored/uncompressed,indexed,tokenized, # stored/uncompressed,indexed,tokenized, # } # 0.0530315153300762 => Document { # stored/uncompressed,indexed,tokenized, # stored/uncompressed,indexed,tokenized, # } i.search_each("one two three four", :sort => :b) do |d, s| puts "#{s} => #{i[d]}" end # 0.0612355209887028 => Document { # stored/uncompressed,indexed,tokenized, # stored/uncompressed,indexed,tokenized, # } # 0.0541250631213188 => Document { # stored/uncompressed,indexed,tokenized, # stored/uncompressed,indexed,tokenized, # } # 0.0530315153300762 => Document { # stored/uncompressed,indexed,tokenized, # stored/uncompressed,indexed,tokenized, # } # 0.0612355209887028 => Document { # stored/uncompressed,indexed,tokenized, # stored/uncompressed,indexed,tokenized, # } i.search_each("one two three four", {:sort => :b, :first_doc => 1, :num_docs => 2}) do |d, s| puts "#{s} => #{i[d]}" end # 0.0541250631213188 => Document { # stored/uncompressed,indexed,tokenized, # stored/uncompressed,indexed,tokenized, # } # 0.0530315153300762 => Document { # stored/uncompressed,indexed,tokenized, # stored/uncompressed,indexed,tokenized, # } On 5/9/06, Pedro C?rte-Real wrote: > On Tue, 2006-04-25 at 22:20 +0900, David Balmain wrote: > > Thanks Pedro, > > > > This was a bug. It is now fixed. > > I think I can still see it in my search results. Are you sure it's > fixed? I'm using SVN now. > > Greetings, > > Pedro. > > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > From shanti at braford.org Tue May 9 22:09:39 2006 From: shanti at braford.org (Shanti Braford) Date: Wed, 10 May 2006 04:09:39 +0200 Subject: [Ferret-talk] Index::Index.new vs. Readers and Writers In-Reply-To: References: Message-ID: <35c6a83a07d450da4631cc2b2d02458a@ruby-forum.com> Hi David, Thanks for the heads up re: index readers & writers. Just one more question: how do you search an Index in read-only mode? The :autoflush option sounds like a viable backup scenario as well, but I couldn't find anything in the docs about it. (tried passing it into index via something like: Index::Index.new(:autoflush => true) but it dodn't like that either) Cheers, - Shanti David Balmain wrote: > Hi Shanti, > > When you have multi processes accessing the index, it's not a matter > of which class you use but how many processes you have writing to the > index. The recommended way to do things is to have only one process > writing to the index. You can have as many index readers open as you > like. The trouble is that the IndexWriter opens a commit lock on the > index. If another IndexWriter comes along and tries to open the lock > at the same time it will raise an exception. The same thing goes for > using the Index class as it is just really a simple interface to the > IndexWriter and IndexReader classes. > > One possibility is to use the Index class with :autoflush set to true. > This should work most of the time as the IndexWriter class will keep > trying for 5 seconds (broken in C version of 0.9.0, 0.9.1) to gain the > commit lock so if it misses the first time it should eventually get > it. This is an easy way to do things but it's still dangerous. I'd > recommend using a single IndexWriter as described above. That doesn't > mean you have to use the IndexWriter and IndexReader classes. You can > still use the Index class as long as only one Index is doing the > writing. > > I hope that helps. Stay tuned for much better documentation on this. > > Dave -- Posted via http://www.ruby-forum.com/. From dbalmain.ml at gmail.com Tue May 9 22:46:25 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Wed, 10 May 2006 11:46:25 +0900 Subject: [Ferret-talk] Index::Index.new vs. Readers and Writers In-Reply-To: <35c6a83a07d450da4631cc2b2d02458a@ruby-forum.com> References: <35c6a83a07d450da4631cc2b2d02458a@ruby-forum.com> Message-ID: Hi Shanti, It's :auto_flush, not :autoflush. Sorry for the confusion. Dave On 5/10/06, Shanti Braford wrote: > Hi David, > > Thanks for the heads up re: index readers & writers. > > Just one more question: how do you search an Index in read-only mode? > > The :autoflush option sounds like a viable backup scenario as well, but > I couldn't find anything in the docs about it. (tried passing it into > index via something like: Index::Index.new(:autoflush => true) but it > dodn't like that either) > > Cheers, > > - Shanti > > David Balmain wrote: > > Hi Shanti, > > > > When you have multi processes accessing the index, it's not a matter > > of which class you use but how many processes you have writing to the > > index. The recommended way to do things is to have only one process > > writing to the index. You can have as many index readers open as you > > like. The trouble is that the IndexWriter opens a commit lock on the > > index. If another IndexWriter comes along and tries to open the lock > > at the same time it will raise an exception. The same thing goes for > > using the Index class as it is just really a simple interface to the > > IndexWriter and IndexReader classes. > > > > One possibility is to use the Index class with :autoflush set to true. > > This should work most of the time as the IndexWriter class will keep > > trying for 5 seconds (broken in C version of 0.9.0, 0.9.1) to gain the > > commit lock so if it misses the first time it should eventually get > > it. This is an easy way to do things but it's still dangerous. I'd > > recommend using a single IndexWriter as described above. That doesn't > > mean you have to use the IndexWriter and IndexReader classes. You can > > still use the Index class as long as only one Index is doing the > > writing. > > > > I hope that helps. Stay tuned for much better documentation on this. > > > > Dave > > > -- > Posted via http://www.ruby-forum.com/. > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > From samuel.kvarnbrink at minervaskolan.se Wed May 10 08:17:21 2006 From: samuel.kvarnbrink at minervaskolan.se (Samuel Kvarnbrink) Date: Wed, 10 May 2006 14:17:21 +0200 Subject: [Ferret-talk] acts_as_ferret choking Message-ID: <7FB806C2-D75B-4543-9AFE-4EEA624C995F@minervaskolan.se> Hi all, I've ran into a problem with Ferret on my rails app, and I don't really have a clue about why. When running the unit tests, I get the following error output: > Exception raised: > Class: > Message: <"You have a nil object when you didn't expect it!\nThe > error occured while evaluating nil.version"> > ---Backtrace--- > /opt/local/lib/ruby/gems/1.8/gems/ferret-0.9.1/lib/ferret/index/ > index_reader.rb:365:in `latest?' > /Users/sk/Documents/svn/cms/branches/nodes/vendor/plugins/ > acts_as_ferret/lib/acts_as_ferret.rb:470:in `latest?' > /Users/sk/Documents/svn/cms/branches/nodes/vendor/plugins/ > acts_as_ferret/lib/acts_as_ferret.rb:470:in `latest?' > /opt/local/lib/ruby/gems/1.8/gems/ferret-0.9.1/lib/ferret/index/ > index.rb:635:in `ensure_reader_open' > /opt/local/lib/ruby/gems/1.8/gems/ferret-0.9.1/lib/ferret/index/ > index.rb:650:in `ensure_searcher_open' > /opt/local/lib/ruby/gems/1.8/gems/ferret-0.9.1/lib/ferret/index/ > index.rb:390:in `query_delete' > /opt/local/lib/ruby/1.8/monitor.rb:229:in `synchronize' > /opt/local/lib/ruby/gems/1.8/gems/ferret-0.9.1/lib/ferret/index/ > index.rb:389:in `query_delete' > /opt/local/lib/ruby/gems/1.8/gems/ferret-0.9.1/lib/ferret/index/ > index.rb:293:in `<<' > /opt/local/lib/ruby/1.8/monitor.rb:229:in `synchronize' > /opt/local/lib/ruby/gems/1.8/gems/ferret-0.9.1/lib/ferret/index/ > index.rb:258:in `<<' > /Users/sk/Documents/svn/cms/branches/nodes/vendor/plugins/ > acts_as_ferret/lib/acts_as_ferret.rb:407:in `ferret_update' > (...) The problem seems to be caused by the following code in acts_as_ferret: @sub_readers.each { |r| return false unless r.latest? } which, in turn, causes a NoMethodError because @segment_infos is nil. The problem (or, at least, the symptom) goes away when I add a rescue clause inside the block that gets passed to @sub_readers.each, but it feels like a bad way of solving it. Anyone who knows why this happens in the first place? And what should I do to avoid it? //samuel _______________________ Samuel Kvarnbrink mail: samuel.kvarnbrink at minervaskolan.se blog: http://samuelk.info "I once had a problem. I thought: "Oh, I know: I'll just use XML!" Now I had two problems." From weibel at gmail.com Wed May 10 09:30:34 2006 From: weibel at gmail.com (Kasper Weibel) Date: Wed, 10 May 2006 15:30:34 +0200 Subject: [Ferret-talk] acts_as_ferret choking In-Reply-To: <7FB806C2-D75B-4543-9AFE-4EEA624C995F@minervaskolan.se> References: <7FB806C2-D75B-4543-9AFE-4EEA624C995F@minervaskolan.se> Message-ID: <3e0e0477313ebe836c4dbd4b28dfbba6@ruby-forum.com> Hi Samuel This is beeing discussed under ticket #6 on the aaf trac http://projects.jkraemer.net/acts_as_ferret/ticket/6 A fair number of test problems with aaf has been resolved with the newly released ferret 0.9.2 and the trunk version of aaf but I haven't had time to check for this problem. Kasper Samuel Kvarnbrink wrote: > Hi all, > > I've ran into a problem with Ferret on my rails app, and I don't > really have a clue about why. When running the unit tests, I get the > following error output: > >> acts_as_ferret/lib/acts_as_ferret.rb:470:in `latest?' >> index.rb:293:in `<<' >> /opt/local/lib/ruby/1.8/monitor.rb:229:in `synchronize' >> /opt/local/lib/ruby/gems/1.8/gems/ferret-0.9.1/lib/ferret/index/ >> index.rb:258:in `<<' >> /Users/sk/Documents/svn/cms/branches/nodes/vendor/plugins/ >> acts_as_ferret/lib/acts_as_ferret.rb:407:in `ferret_update' >> (...) > > The problem seems to be caused by the following code in acts_as_ferret: > > @sub_readers.each { |r| return false unless r.latest? } > > which, in turn, causes a NoMethodError because @segment_infos is nil. > The problem (or, at least, the symptom) goes away when I add a rescue > clause inside the block that gets passed to @sub_readers.each, but it > feels like a bad way of solving it. Anyone who knows why this happens > in the first place? And what should I do to avoid it? > > //samuel > > _______________________ > Samuel Kvarnbrink > > mail: samuel.kvarnbrink at minervaskolan.se > blog: http://samuelk.info > > > "I once had a problem. > I thought: "Oh, I know: I'll just use XML!" > > Now I had two problems." -- Posted via http://www.ruby-forum.com/. From weibel at gmail.com Wed May 10 13:19:43 2006 From: weibel at gmail.com (Kasper Weibel) Date: Wed, 10 May 2006 19:19:43 +0200 Subject: [Ferret-talk] Ferret failing to rebuild_index - occasionally unable to In-Reply-To: <00615ea77943d230bf32ee446f5a4b3d@ruby-forum.com> References: <5b52265f4407bceb632ee40d0de52946@ruby-forum.com> <00615ea77943d230bf32ee446f5a4b3d@ruby-forum.com> Message-ID: This problem seems to have been solved with ferret 0.9.2 Kasper Weibel wrote: > I think windows is locking the file for deletion until the test process > ends - or something similar. > > akb wrote: >> Most of the tests also fail on Windows with a similar error on the demo >> provided in svn. I tried adding some sleeps but that doesn't make a >> difference. >> -- Posted via http://www.ruby-forum.com/. From srackham at methods.co.nz Wed May 10 19:50:28 2006 From: srackham at methods.co.nz (Stuart Rackham) Date: Thu, 11 May 2006 01:50:28 +0200 Subject: [Ferret-talk] Gem 0.9.2: undefined symbol: mtde_create Message-ID: <01f802816d0e8bfc4028125b36330d28@ruby-forum.com> Upon opening an existing index after upgrading to the 0.9.2 ferret gem I get the following error: undefined symbol: mtde_create. Here's an irb transcript which recreates the error: irb(main):011:0> index = Index::Index.new({:path => '/home/srackham/bin/ff_index'}) => # irb(main):012:0> index.search_each('rails') irb: relocation error: /usr/local/lib/ruby/gems/1.8/gems/ferret-0.9.2/lib/ferret_ext.so: undefined symbol: mtde_create The gem install went smoothly: # gem update : : make install /usr/bin/install -c -m 0755 ferret_ext.so /usr/local/lib/ruby/gems/1.8/gems/ferret-0.9.2/lib Successfully installed ferret-0.9.2 Installing RDoc documentation for ferret-0.9.2... Gems: [ferret] updated I double checked ./ext/gem_make.out to verify the ferret_ext.so compile -- there were no errors or warnings. A bit of poking around in the 0.9.1 gem realease indicates that the mtde_create() function is in termdocs.c, but there is no termdocs.c (or any other source file defining mtde_create) in the gem ferret-0.9.2/ext directory. Any ideas as to what I'm missing? Cheers, Stuart -- Posted via http://www.ruby-forum.com/. From pcrosbynet at gmail.com Wed May 10 21:02:59 2006 From: pcrosbynet at gmail.com (patrick) Date: Thu, 11 May 2006 03:02:59 +0200 Subject: [Ferret-talk] Gem 0.9.2: undefined symbol: mtde_create In-Reply-To: <01f802816d0e8bfc4028125b36330d28@ruby-forum.com> References: <01f802816d0e8bfc4028125b36330d28@ruby-forum.com> Message-ID: i'm getting a similar error...0.9.2 is the first version i've ever tried, though, so it's not just a problem with existing indices. Stuart Rackham wrote: > Upon opening an existing index after upgrading to the 0.9.2 ferret gem > I get the following error: undefined symbol: mtde_create. Here's an > irb transcript which recreates the error: > > irb(main):011:0> index = Index::Index.new({:path => > '/home/srackham/bin/ff_index'}) > => # > irb(main):012:0> index.search_each('rails') > irb: relocation error: > /usr/local/lib/ruby/gems/1.8/gems/ferret-0.9.2/lib/ferret_ext.so: > undefined symbol: mtde_create -- Posted via http://www.ruby-forum.com/. From dbalmain.ml at gmail.com Wed May 10 21:08:02 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Thu, 11 May 2006 10:08:02 +0900 Subject: [Ferret-talk] Gem 0.9.2: undefined symbol: mtde_create In-Reply-To: <01f802816d0e8bfc4028125b36330d28@ruby-forum.com> References: <01f802816d0e8bfc4028125b36330d28@ruby-forum.com> Message-ID: Hi Stuart, Apologies and thanks for pointing this out. I've just 0.9.2. In case you are interested, here is how the file got left out. I have an exclusion rule on my package FileList that looks like this; PKG_FILES.exclude('**/*.o') Unfortunately Rakes FileList translates this to a regular expression by using a Dir[pat] like so; Dir[pat].each do |p| ignores << p end re_str = ignores.collect { |p| "(" + p.to_s + ")" }.join("|") Which turns into a regular expression which inludes this; / ... |(ext/term.o)| ... / Which of course unfortunately matches ext/termdocs.c. Cheers, Dave On 5/11/06, Stuart Rackham wrote: > Upon opening an existing index after upgrading to the 0.9.2 ferret gem > I get the following error: undefined symbol: mtde_create. Here's an > irb transcript which recreates the error: > > irb(main):011:0> index = Index::Index.new({:path => > '/home/srackham/bin/ff_index'}) > => # > irb(main):012:0> index.search_each('rails') > irb: relocation error: > /usr/local/lib/ruby/gems/1.8/gems/ferret-0.9.2/lib/ferret_ext.so: > undefined symbol: mtde_create > > The gem install went smoothly: > > # gem update > : > : > make install > /usr/bin/install -c -m 0755 ferret_ext.so > /usr/local/lib/ruby/gems/1.8/gems/ferret-0.9.2/lib > Successfully installed ferret-0.9.2 > Installing RDoc documentation for ferret-0.9.2... > Gems: [ferret] updated > > I double checked ./ext/gem_make.out to verify the ferret_ext.so > compile -- there were no errors or warnings. > > A bit of poking around in the 0.9.1 gem realease indicates that the > mtde_create() function is in termdocs.c, but there is no termdocs.c > (or any other source file defining mtde_create) in the gem > ferret-0.9.2/ext directory. > > Any ideas as to what I'm missing? > > > Cheers, Stuart > > -- > Posted via http://www.ruby-forum.com/. > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > From rjm2 at cornell.edu Wed May 10 22:54:41 2006 From: rjm2 at cornell.edu (Richard Marisa) Date: Wed, 10 May 2006 22:54:41 -0400 Subject: [Ferret-talk] problem with solaris install Message-ID: I was trying to install ferret 0.9.2 on solaris (SunOS 5.8) which does not have a sys/dir.h nix_io.c:5:21: sys/dir.h: No such file or directory make: *** [nix_io.o] Error 1 I couldn't find an obvious way around this... any suggestions? Thanks, Rich Marisa Cornell Information Technologies Cornell University From dbalmain.ml at gmail.com Wed May 10 23:44:25 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Thu, 11 May 2006 12:44:25 +0900 Subject: [Ferret-talk] problem with solaris install In-Reply-To: References: Message-ID: Hi Richard, Can you try changing line 5 of ext/nix_io.c to #include I think that may be the equivalent on solaris. Please report back if it works. Cheers, Dave On 5/11/06, Richard Marisa wrote: > I was trying to install ferret 0.9.2 on solaris (SunOS 5.8) which > does not have a sys/dir.h > > nix_io.c:5:21: sys/dir.h: No such file or directory > make: *** [nix_io.o] Error 1 > > I couldn't find an obvious way around this... any suggestions? > > Thanks, > Rich Marisa > Cornell Information Technologies > Cornell University > > > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > From dbalmain.ml at gmail.com Wed May 10 23:47:44 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Thu, 11 May 2006 12:47:44 +0900 Subject: [Ferret-talk] problem with solaris install In-Reply-To: References: Message-ID: On second thoughts, just delete that line. Hopefully that'll fix it. On 5/11/06, David Balmain wrote: > Hi Richard, > > Can you try changing line 5 of ext/nix_io.c to > > #include > > I think that may be the equivalent on solaris. Please report back if it works. > > Cheers, > Dave > > On 5/11/06, Richard Marisa wrote: > > I was trying to install ferret 0.9.2 on solaris (SunOS 5.8) which > > does not have a sys/dir.h > > > > nix_io.c:5:21: sys/dir.h: No such file or directory > > make: *** [nix_io.o] Error 1 > > > > I couldn't find an obvious way around this... any suggestions? > > > > Thanks, > > Rich Marisa > > Cornell Information Technologies > > Cornell University > > > > > > _______________________________________________ > > Ferret-talk mailing list > > Ferret-talk at rubyforge.org > > http://rubyforge.org/mailman/listinfo/ferret-talk > > > From shingler at gmail.com Thu May 11 11:18:13 2006 From: shingler at gmail.com (steven shingler) Date: Thu, 11 May 2006 17:18:13 +0200 Subject: [Ferret-talk] Indexing Speed? In-Reply-To: References: <2f78c4522fee00919326d127a7bacc02@ruby-forum.com> Message-ID: <1583aeddf7ecdf903df211422050096c@ruby-forum.com> Just for completeness' sake... After conversations offline with David, it turns out I have been working with the pure ruby version of ferret, without the C extensions, obviously explaining the slower performance. -- Posted via http://www.ruby-forum.com/. From rjm2 at cornell.edu Thu May 11 12:14:59 2006 From: rjm2 at cornell.edu (Rich Marisa) Date: Thu, 11 May 2006 12:14:59 -0400 Subject: [Ferret-talk] problem with solaris install In-Reply-To: References: Message-ID: <44636303.1060404@cornell.edu> Dave, Deleting sys/dir.h allows the code to compile but results in an unhandled exception: 254: sudo ruby setup.rb test Running tests... Loaded suite test Started ........F...../test/unit/../unit/document/../../unit/index/../../unit/store/tc_fs_store.rb:18:in `refresh': : Error occured at :122 (Exception) Error: exception 2 not handled: No such file or directory from ./test/unit/../unit/document/../../unit/index/../../unit/store/tc_fs_store.rb:18:in `teardown' from /usr/local/lib/ruby/1.8/test/unit/testcase.rb:77:in `run' from /usr/local/lib/ruby/1.8/test/unit/testsuite.rb:32:in `run' from /usr/local/lib/ruby/1.8/test/unit/testsuite.rb:31:in `run' from /usr/local/lib/ruby/1.8/test/unit/testsuite.rb:32:in `run' from /usr/local/lib/ruby/1.8/test/unit/testsuite.rb:31:in `run' from /usr/local/lib/ruby/1.8/test/unit/ui/testrunnermediator.rb:44:in `run_suite' from /usr/local/lib/ruby/1.8/test/unit/ui/console/testrunner.rb:65:in `start_mediator' from /usr/local/lib/ruby/1.8/test/unit/ui/console/testrunner.rb:39:in `start' from /usr/local/lib/ruby/1.8/test/unit/ui/testrunnerutilities.rb:27:in `run' from /usr/local/lib/ruby/1.8/test/unit/autorunner.rb:200:in `run' from setup.rb:1426:in `exec_test' from setup.rb:1006:in `exec_test' from setup.rb:829:in `invoke' from setup.rb:776:in `invoke' from setup.rb:1548 There are a bunch of compiler warnings in the Solaris build as well... here is a sample: In file included from /usr/local/lib/ruby/1.8/sparc-solaris2.8/ruby.h:24, from lang.h:4, from global.h:20, from index.h:5, from search.h:8, from src/query_parser/q_parser.y:3: /usr/local/lib/ruby/1.8/sparc-solaris2.8/config.h:17:1: warning: "_FILE_OFFSET_BITS" redefined In file included from /usr/include/iso/string_iso.h:31, from /usr/include/string.h:18, from src/query_parser/q_parser.y:2: /usr/include/sys/feature_tests.h:96:1: warning: this is the location of the previous definition gcc -fPIC -g -O2 -fno-common -I. -I/usr/local/lib/ruby/1.8/sparc-solaris2.8 -I/usr/local/lib/ruby/1.8/sparc-solaris2.8 -I. -c q_const_score.c In file included from /usr/local/lib/ruby/1.8/sparc-solaris2.8/ruby.h:24, from lang.h:4, from global.h:20, from index.h:5, from search.h:8, from q_const_score.c:1: /usr/local/lib/ruby/1.8/sparc-solaris2.8/config.h:17:1: warning: "_FILE_OFFSET_BITS" redefined In file included from /opt/common/lib/gcc-lib/sparc-sun-solaris2.8/3.2.3/include/syslimits.h:27, from /opt/common/lib/gcc-lib/sparc-sun-solaris2.8/3.2.3/include/limits.h:11, from index.h:4, from search.h:8, from q_const_score.c:1: /usr/include/sys/feature_tests.h:96:1: warning: this is the location of the previous definition gcc -fPIC -g -O2 -fno-common -I. -I/usr/local/lib/ruby/1.8/sparc-solaris2.8 -I/usr/local/lib/ruby/1.8/sparc-solaris2.8 -I. -c q_boolean.c In file included from /usr/local/lib/ruby/1.8/sparc-solaris2.8/ruby.h:24, from lang.h:4, from global.h:20, from index.h:5, from search.h:8, from q_boolean.c:2: /usr/local/lib/ruby/1.8/sparc-solaris2.8/config.h:17:1: warning: "_FILE_OFFSET_BITS" redefined I found there is a "compatability library" of include files in /usr/ucbinclude which has include files, including dir.h, but I haven't been able to find a way to integrate it into the build without causing lots of other incompatabilities to pop up in the compilation... Thanks, -- Rich David Balmain wrote: >On second thoughts, just delete that line. Hopefully that'll fix it. > > >On 5/11/06, David Balmain wrote: > > >>Hi Richard, >> >>Can you try changing line 5 of ext/nix_io.c to >> >>#include >> >>I think that may be the equivalent on solaris. Please report back if it works. >> >>Cheers, >>Dave >> >>On 5/11/06, Richard Marisa wrote: >> >> >>>I was trying to install ferret 0.9.2 on solaris (SunOS 5.8) which >>>does not have a sys/dir.h >>> >>>nix_io.c:5:21: sys/dir.h: No such file or directory >>>make: *** [nix_io.o] Error 1 >>> >>>I couldn't find an obvious way around this... any suggestions? >>> >>>Thanks, >>> Rich Marisa >>> Cornell Information Technologies >>> Cornell University >>> >>> >>>_______________________________________________ >>>Ferret-talk mailing list >>>Ferret-talk at rubyforge.org >>>http://rubyforge.org/mailman/listinfo/ferret-talk >>> >>> >>> > >_______________________________________________ >Ferret-talk mailing list >Ferret-talk at rubyforge.org >http://rubyforge.org/mailman/listinfo/ferret-talk > > From dbalmain.ml at gmail.com Thu May 11 14:15:49 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Fri, 12 May 2006 03:15:49 +0900 Subject: [Ferret-talk] problem with solaris install In-Reply-To: <44636303.1060404@cornell.edu> References: <44636303.1060404@cornell.edu> Message-ID: Hi Rich, Can you try running rake instead of setup.rb to run tests. The problem is that the incorrect tests are being run. Sorry, my fault. I need to fix setup.rb. > sudo rake # assuming you need sudo for ferret to create the directories. Apart from that you should be able to ignore the warnings. I'll make sure both of these problems are fixed in the next release. Cheers, Dave On 5/12/06, Rich Marisa wrote: > Dave, > > Deleting sys/dir.h allows the code to compile but results in an > unhandled exception: > > 254: sudo ruby setup.rb test > Running tests... > Loaded suite test > Started > ........F...../test/unit/../unit/document/../../unit/index/../../unit/store/tc_fs_store.rb:18:in > `refresh': : Error occured at :122 (Exception) > Error: exception 2 not handled: No such file or directory > from > ./test/unit/../unit/document/../../unit/index/../../unit/store/tc_fs_store.rb:18:in > `teardown' > from /usr/local/lib/ruby/1.8/test/unit/testcase.rb:77:in `run' > from /usr/local/lib/ruby/1.8/test/unit/testsuite.rb:32:in `run' > from /usr/local/lib/ruby/1.8/test/unit/testsuite.rb:31:in `run' > from /usr/local/lib/ruby/1.8/test/unit/testsuite.rb:32:in `run' > from /usr/local/lib/ruby/1.8/test/unit/testsuite.rb:31:in `run' > from > /usr/local/lib/ruby/1.8/test/unit/ui/testrunnermediator.rb:44:in `run_suite' > from > /usr/local/lib/ruby/1.8/test/unit/ui/console/testrunner.rb:65:in > `start_mediator' > from > /usr/local/lib/ruby/1.8/test/unit/ui/console/testrunner.rb:39:in `start' > from > /usr/local/lib/ruby/1.8/test/unit/ui/testrunnerutilities.rb:27:in `run' > from /usr/local/lib/ruby/1.8/test/unit/autorunner.rb:200:in `run' > from setup.rb:1426:in `exec_test' > from setup.rb:1006:in `exec_test' > from setup.rb:829:in `invoke' > from setup.rb:776:in `invoke' > from setup.rb:1548 > > There are a bunch of compiler warnings in the Solaris build as well... > here is a sample: > > In file included from /usr/local/lib/ruby/1.8/sparc-solaris2.8/ruby.h:24, > from lang.h:4, > from global.h:20, > from index.h:5, > from search.h:8, > from src/query_parser/q_parser.y:3: > /usr/local/lib/ruby/1.8/sparc-solaris2.8/config.h:17:1: warning: > "_FILE_OFFSET_BITS" redefined > In file included from /usr/include/iso/string_iso.h:31, > from /usr/include/string.h:18, > from src/query_parser/q_parser.y:2: > /usr/include/sys/feature_tests.h:96:1: warning: this is the location of > the previous definition > gcc -fPIC -g -O2 -fno-common -I. > -I/usr/local/lib/ruby/1.8/sparc-solaris2.8 > -I/usr/local/lib/ruby/1.8/sparc-solaris2.8 -I. -c q_const_score.c > In file included from /usr/local/lib/ruby/1.8/sparc-solaris2.8/ruby.h:24, > from lang.h:4, > from global.h:20, > from index.h:5, > from search.h:8, > from q_const_score.c:1: > /usr/local/lib/ruby/1.8/sparc-solaris2.8/config.h:17:1: warning: > "_FILE_OFFSET_BITS" redefined > In file included from > /opt/common/lib/gcc-lib/sparc-sun-solaris2.8/3.2.3/include/syslimits.h:27, > from > /opt/common/lib/gcc-lib/sparc-sun-solaris2.8/3.2.3/include/limits.h:11, > from index.h:4, > from search.h:8, > from q_const_score.c:1: > /usr/include/sys/feature_tests.h:96:1: warning: this is the location of > the previous definition > gcc -fPIC -g -O2 -fno-common -I. > -I/usr/local/lib/ruby/1.8/sparc-solaris2.8 > -I/usr/local/lib/ruby/1.8/sparc-solaris2.8 -I. -c q_boolean.c > In file included from /usr/local/lib/ruby/1.8/sparc-solaris2.8/ruby.h:24, > from lang.h:4, > from global.h:20, > from index.h:5, > from search.h:8, > from q_boolean.c:2: > /usr/local/lib/ruby/1.8/sparc-solaris2.8/config.h:17:1: warning: > "_FILE_OFFSET_BITS" redefined > > > I found there is a "compatability library" of include files in > /usr/ucbinclude which has include files, including dir.h, but I haven't > been able to find a way to integrate it into the build without causing > lots of other incompatabilities to pop up in the compilation... > > Thanks, > -- Rich > > David Balmain wrote: > > >On second thoughts, just delete that line. Hopefully that'll fix it. > > > > > >On 5/11/06, David Balmain wrote: > > > > > >>Hi Richard, > >> > >>Can you try changing line 5 of ext/nix_io.c to > >> > >>#include > >> > >>I think that may be the equivalent on solaris. Please report back if it works. > >> > >>Cheers, > >>Dave > >> > >>On 5/11/06, Richard Marisa wrote: > >> > >> > >>>I was trying to install ferret 0.9.2 on solaris (SunOS 5.8) which > >>>does not have a sys/dir.h > >>> > >>>nix_io.c:5:21: sys/dir.h: No such file or directory > >>>make: *** [nix_io.o] Error 1 > >>> > >>>I couldn't find an obvious way around this... any suggestions? > >>> > >>>Thanks, > >>> Rich Marisa > >>> Cornell Information Technologies > >>> Cornell University > >>> > >>> > >>>_______________________________________________ > >>>Ferret-talk mailing list > >>>Ferret-talk at rubyforge.org > >>>http://rubyforge.org/mailman/listinfo/ferret-talk > >>> > >>> > >>> > > > >_______________________________________________ > >Ferret-talk mailing list > >Ferret-talk at rubyforge.org > >http://rubyforge.org/mailman/listinfo/ferret-talk > > > > > > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > From dbalmain.ml at gmail.com Thu May 11 14:24:46 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Fri, 12 May 2006 03:24:46 +0900 Subject: [Ferret-talk] problem with solaris install In-Reply-To: <44636303.1060404@cornell.edu> References: <44636303.1060404@cornell.edu> Message-ID: On 5/12/06, Rich Marisa wrote: > There are a bunch of compiler warnings in the Solaris build as well... > here is a sample: > > In file included from /usr/local/lib/ruby/1.8/sparc-solaris2.8/ruby.h:24, > from lang.h:4, > from global.h:20, > from index.h:5, > from search.h:8, > from src/query_parser/q_parser.y:3: > /usr/local/lib/ruby/1.8/sparc-solaris2.8/config.h:17:1: warning: > "_FILE_OFFSET_BITS" redefined This warning is due to the fact that _FILE_OFFSET_BITS is defined in ruby's config.h file and also in the sparc-solaris2.8 config.h. Since it has nothing to do with Ferret I'm afraid I can't do anything about it. I don't think it'll cause any problems though. If you got any other warnings that weren't to do with _FILE_OFFSET_BITS, please do send them. Cheers, Dave From rjm2 at cornell.edu Thu May 11 14:44:42 2006 From: rjm2 at cornell.edu (Rich Marisa) Date: Thu, 11 May 2006 14:44:42 -0400 Subject: [Ferret-talk] problem with solaris install In-Reply-To: References: <44636303.1060404@cornell.edu> Message-ID: <4463861A.7060806@cornell.edu> Hi Dave, I'm still seeing Error occured at :123 (Exception) Error: exception 2 not handled: No such file or directory when running rake. Here is the complete listing. -- Rich Marisa hawk{rjm2}298: sudo rake (in /users/rjm2/ferretstuff/ferret-0.9.2) setup.rb Rakefile TODO README MIT-LICENSE TUTORIAL CHANGELOG ext/ferret.c ext/lang.c ext/r_analysis.c ext/r_doc.c ext/r_index_io.c ext/r_qparser.c ext/r_search.c ext/r_store.c ext/r_term.c ext/extconf.rb ext/inc ext/analysis.c ext/stopwords.c ext/document.c ext/compound_io.c ext/index_rw.c ext/vector.c ext/field.c ext/term.c ext/q_parser.c ext/q_const_score.c ext/q_boolean.c ext/q_match_all.c ext/q_phrase.c ext/q_filtered_query.c ext/search.c ext/dummy.exe ext/q_fuzzy.c ext/q_wildcard.c ext/ind.c ext/q_range.c ext/q_multi_phrase.c ext/q_prefix.c ext/q_span.c ext/filter.c ext/similarity.c ext/q_term.c ext/sort.c ext/index_io.c ext/fs_store.c ext/ram_store.c ext/store.c ext/hashset.c ext/array.c ext/bitvector.c ext/helper.c ext/global.c ext/hash.c ext/except.c ext/priorityqueue.c ext/libstemmer.h ext/libstemmer.c ext/modules.h ext/utilities.c ext/api.h ext/api.c ext/header.h ext/stem_ISO_8859_1_italian.c ext/stem_UTF_8_portuguese.c ext/stem_UTF_8_portuguese.h ext/ferret.h ext/stem_UTF_8_french.c ext/stem_UTF_8_spanish.c ext/stem_UTF_8_dutch.c ext/stem_ISO_8859_1_italian.h ext/stem_UTF_8_german.c ext/stem_UTF_8_french.h ext/stem_UTF_8_spanish.h ext/tags ext/stem_ISO_8859_1_english.c ext/stem_ISO_8859_1_norwegian.c ext/stem_UTF_8_porter.c ext/stem_UTF_8_dutch.h ext/stem_UTF_8_german.h ext/stem_ISO_8859_1_english.h ext/stem_ISO_8859_1_norwegian.h ext/stem_UTF_8_porter.h ext/stem_ISO_8859_1_portuguese.c ext/stem_UTF_8_russian.c ext/stem_ISO_8859_1_spanish.c ext/stem_ISO_8859_1_french.c ext/stem_ISO_8859_1_portuguese.h ext/stem_ISO_8859_1_dutch.c ext/stem_UTF_8_russian.h ext/stem_KOI8_R_russian.c ext/stem_ISO_8859_1_german.c ext/stem_ISO_8859_1_spanish.h ext/stem_ISO_8859_1_french.h ext/stem_ISO_8859_1_porter.c ext/stem_ISO_8859_1_dutch.h ext/stem_UTF_8_finnish.c ext/stem_KOI8_R_russian.h ext/stem_ISO_8859_1_german.h ext/stem_ISO_8859_1_porter.h ext/stem_UTF_8_finnish.h ext/stem_UTF_8_danish.c ext/stem_UTF_8_swedish.c ext/stem_UTF_8_danish.h ext/stem_UTF_8_swedish.h ext/store.h ext/stem_ISO_8859_1_finnish.c ext/stem_UTF_8_italian.c ext/stem_ISO_8859_1_finnish.h ext/stem_UTF_8_italian.h ext/stem_ISO_8859_1_swedish.c ext/stem_ISO_8859_1_danish.c ext/stem_UTF_8_english.c ext/stem_UTF_8_norwegian.c ext/stem_ISO_8859_1_swedish.h ext/stem_ISO_8859_1_danish.h ext/stem_UTF_8_english.h ext/stem_UTF_8_norwegian.h ext/document.h ext/array.h ext/priorityqueue.h ext/hashset.h ext/helper.h ext/global.h ext/bitvector.h ext/analysis.h ext/hash.h ext/search.h ext/similarity.h ext/index.h ext/except.h ext/lang.h ext/frtio.h ext/w32_io.c ext/nix_io.c ext/inc/lang.h ext/inc/except.h lib/ferret.rb lib/rferret.rb lib/ferret/analysis.rb lib/ferret/document.rb lib/ferret/index.rb lib/ferret/search.rb lib/ferret/query_parser.rb lib/ferret/stemmers.rb lib/ferret/store.rb lib/ferret/utils.rb lib/ferret/analysis/standard_tokenizer.rb lib/ferret/analysis/word_list_loader.rb lib/ferret/analysis/tokenizers.rb lib/ferret/analysis/token.rb lib/ferret/analysis/analyzers.rb lib/ferret/analysis/token_filters.rb lib/ferret/analysis/token_stream.rb lib/ferret/document/document.rb lib/ferret/document/field.rb lib/ferret/index/fields_io.rb lib/ferret/index/index.rb lib/ferret/index/segment_merge_queue.rb lib/ferret/index/segment_term_enum.rb lib/ferret/index/term_info.rb lib/ferret/index/field_infos.rb lib/ferret/index/segment_reader.rb lib/ferret/index/index_file_names.rb lib/ferret/index/index_reader.rb lib/ferret/index/term_infos_io.rb lib/ferret/index/term_enum.rb lib/ferret/index/term.rb lib/ferret/index/segment_merge_info.rb lib/ferret/index/segment_merger.rb lib/ferret/index/segment_infos.rb lib/ferret/index/term_doc_enum.rb lib/ferret/index/term_buffer.rb lib/ferret/index/multi_reader.rb lib/ferret/index/term_vectors_io.rb lib/ferret/index/index_writer.rb lib/ferret/index/term_vector_offset_info.rb lib/ferret/index/segment_term_vector.rb lib/ferret/index/document_writer.rb lib/ferret/index/compound_file_io.rb lib/ferret/index/multiple_term_doc_pos_enum.rb lib/ferret/search/query.rb lib/ferret/search/non_matching_scorer.rb lib/ferret/search/top_field_docs.rb lib/ferret/search/weight.rb lib/ferret/search/field_doc.rb lib/ferret/search/hit_queue.rb lib/ferret/search/sort.rb lib/ferret/search/disjunction_sum_scorer.rb lib/ferret/search/boolean_scorer.rb lib/ferret/search/boolean_query.rb lib/ferret/search/field_sorted_hit_queue.rb lib/ferret/search/conjunction_scorer.rb lib/ferret/search/top_docs.rb lib/ferret/search/req_opt_sum_scorer.rb lib/ferret/search/boolean_clause.rb lib/ferret/search/score_doc_comparator.rb lib/ferret/search/req_excl_scorer.rb lib/ferret/search/explanation.rb lib/ferret/search/index_searcher.rb lib/ferret/search/term_scorer.rb lib/ferret/search/filter.rb lib/ferret/search/similarity.rb lib/ferret/search/fuzzy_term_enum.rb lib/ferret/search/sort_field.rb lib/ferret/search/hit_collector.rb lib/ferret/search/term_query.rb lib/ferret/search/scorer.rb lib/ferret/search/score_doc.rb lib/ferret/search/prefix_query.rb lib/ferret/search/fuzzy_query.rb lib/ferret/search/spans.rb lib/ferret/search/caching_wrapper_filter.rb lib/ferret/search/filtered_term_enum.rb lib/ferret/search/multi_term_query.rb lib/ferret/search/query_filter.rb lib/ferret/search/wildcard_query.rb lib/ferret/search/wildcard_term_enum.rb lib/ferret/search/phrase_query.rb lib/ferret/search/exact_phrase_scorer.rb lib/ferret/search/sloppy_phrase_scorer.rb lib/ferret/search/phrase_positions.rb lib/ferret/search/phrase_scorer.rb lib/ferret/search/range_query.rb lib/ferret/search/multi_phrase_query.rb lib/ferret/search/filtered_query.rb lib/ferret/search/sort_comparator.rb lib/ferret/search/range_filter.rb lib/ferret/search/field_cache.rb lib/ferret/search/match_all_query.rb lib/ferret/search/multi_searcher.rb lib/ferret/search/spans/near_spans_enum.rb lib/ferret/search/spans/span_first_query.rb lib/ferret/search/spans/spans_enum.rb lib/ferret/search/spans/span_not_query.rb lib/ferret/search/spans/span_near_query.rb lib/ferret/search/spans/span_or_query.rb lib/ferret/search/spans/span_term_query.rb lib/ferret/search/spans/span_query.rb lib/ferret/search/spans/span_scorer.rb lib/ferret/search/spans/span_weight.rb lib/ferret/query_parser/query_parser.tab.rb lib/ferret/stemmers/porter_stemmer.rb lib/ferret/store/index_io.rb lib/ferret/store/directory.rb lib/ferret/store/buffered_index_io.rb lib/ferret/store/fs_store.rb lib/ferret/store/ram_store.rb lib/ferret/utils/priority_queue.rb lib/ferret/utils/bit_vector.rb lib/ferret/utils/parameter.rb lib/ferret/utils/weak_key_hash.rb lib/ferret/utils/string_helper.rb lib/ferret/utils/number_tools.rb lib/ferret/utils/date_tools.rb lib/ferret/utils/thread_local.rb test/test_helper.rb test/test_all.rb test/unit/ts_document.rb test/unit/ts_index.rb test/unit/ts_store.rb test/unit/ts_analysis.rb test/unit/ts_utils.rb test/unit/ts_search.rb test/unit/ts_query_parser.rb test/unit/utils/rtc_thread.rb test/unit/utils/rtc_priority_queue.rb test/unit/utils/rtc_bit_vector.rb test/unit/utils/rtc_parameter.rb test/unit/utils/rtc_weak_key_hash.rb test/unit/utils/rtc_string_helper.rb test/unit/utils/rtc_date_tools.rb test/unit/utils/rtc_number_tools.rb test/unit/analysis/rtc_standard_tokenizer.rb test/unit/analysis/rtc_letter_tokenizer.rb test/unit/analysis/rtc_stop_filter.rb test/unit/analysis/rtc_stop_analyzer.rb test/unit/analysis/ctc_analyzer.rb test/unit/analysis/tc_analyzer.rb test/unit/analysis/rtc_standard_analyzer.rb test/unit/analysis/rtc_lower_case_filter.rb test/unit/analysis/rtc_porter_stem_filter.rb test/unit/analysis/rtc_lower_case_tokenizer.rb test/unit/analysis/rtc_white_space_tokenizer.rb test/unit/analysis/tc_token.rb test/unit/analysis/ctc_tokenstream.rb test/unit/analysis/rtc_white_space_analyzer.rb test/unit/analysis/rtc_word_list_loader.rb test/unit/analysis/rtc_per_field_analyzer_wrapper.rb test/unit/index/rtc_term_buffer.rb test/unit/index/rtc_fields_io.rb test/unit/index/rtc_term_vectors_io.rb test/unit/index/rtc_segment_infos.rb test/unit/index/tc_index.rb test/unit/index/rtc_compound_file_io.rb test/unit/index/rtc_field_infos.rb test/unit/index/tc_index_writer.rb test/unit/index/th_doc.rb test/unit/index/rtc_segment_term_enum.rb test/unit/index/rtc_term_info.rb test/unit/index/tc_term.rb test/unit/index/tc_term_voi.rb test/unit/index/tc_index_reader.rb test/unit/index/rtc_term_infos_io.rb test/unit/index/rtc_segment_term_docs.rb test/unit/index/rtc_segment_term_vector.rb test/unit/index/rtc_multiple_term_doc_pos_enum.rb test/unit/store/rtm_store_lock.rb test/unit/store/tm_store_lock.rb test/unit/store/tc_fs_store.rb test/unit/store/tc_ram_store.rb test/unit/store/rtm_store.rb test/unit/store/tm_store.rb test/unit/store/rtc_ram_store.rb test/unit/store/rtc_fs_store.rb test/unit/document/tc_field.rb test/unit/document/rtc_field.rb test/unit/document/tc_document.rb test/unit/query_parser/tc_query_parser.rb test/unit/search/tc_fuzzy_query.rb test/unit/search/tc_multi_searcher2.rb test/unit/search/tc_index_searcher.rb test/unit/search/tc_spans.rb test/unit/search/tc_filter.rb test/unit/search/tc_sort.rb test/unit/search/tc_sort_field.rb test/unit/search/tc_multi_searcher.rb test/unit/search/rtc_sort_field.rb test/unit/search/rtc_similarity.rb test/unit/search/tc_search_and_sort.rb test/longrunning/tc_numbertools.rb test/longrunning/tm_store.rb test/benchmark/tb_rw_vint.rb test/benchmark/tb_ram_store.rb test/functional/thread_safety_index_test.rb test/functional/thread_safety_test.rb test/utils/number_to_spoken.rb test/unit/analysis/data/wordfile rake_utils/code_statistics.rb Rakefile /usr/local/bin/ruby -Ilib:test/unit -r 'lib/rferret' "/usr/local/lib/ruby/gems/1.8/gems/rake-0.7.1/lib/rake/rake_test_loader.rb" "test/unit/ts_document.rb" "test/unit/ts_index.rb" "test/unit/ts_store.rb" "test/unit/ts_analysis.rb" "test/unit/ts_utils.rb" "test/unit/ts_search.rb" "test/unit/ts_query_parser.rb" ./test/unit/../unit/index/../../../lib/rferret.rb:26: warning: already initialized constant VERSION Loaded suite /usr/local/lib/ruby/gems/1.8/gems/rake-0.7.1/lib/rake/rake_test_loader Started ....................................................................................................................................................................................................................... Finished in 274.465615 seconds. 215 tests, 5517 assertions, 0 failures, 0 errors cd ext cd .. cp ext/inc/lang.h ext/lang.h cp ext/inc/except.h ext/except.h cd ext make gcc -fPIC -g -O2 -fno-common -I. -I/usr/local/lib/ruby/1.8/sparc-solaris2.8 -I/usr/local/lib/ruby/1.8/sparc-solaris2.8 -I. -c nix_io.c In file included from /usr/local/lib/ruby/1.8/sparc-solaris2.8/ruby.h:24, from lang.h:4, from global.h:20, from nix_io.c:3: /usr/local/lib/ruby/1.8/sparc-solaris2.8/config.h:17:1: warning: "_FILE_OFFSET_BITS" redefined In file included from /opt/common/lib/gcc-lib/sparc-sun-solaris2.8/3.2.3/include/iso/stdlib_iso.h:39, from /usr/include/stdlib.h:18, from global.h:4, from nix_io.c:3: /usr/include/sys/feature_tests.h:96:1: warning: this is the location of the previous definition gcc -Wl,-G -L'/usr/local/lib' -Wl,-R'/usr/local/lib' -o ferret_ext.so ferret.o lang.o r_analysis.o r_doc.o r_index_io.o r_qparser.o r_search.o r_store.o r_term.o analysis.o stopwords.o document.o compound_io.o index_rw.o vector.o field.o term.o q_parser.o q_const_score.o q_boolean.o q_match_all.o q_phrase.o q_filtered_query.o search.o q_fuzzy.o q_wildcard.o ind.o q_range.o q_multi_phrase.o q_prefix.o q_span.o filter.o similarity.o q_term.o sort.o index_io.o fs_store.o ram_store.o store.o hashset.o array.o bitvector.o helper.o global.o hash.o except.o priorityqueue.o libstemmer.o utilities.o api.o stem_ISO_8859_1_italian.o stem_UTF_8_portuguese.o stem_UTF_8_french.o stem_UTF_8_spanish.o stem_UTF_8_dutch.o stem_UTF_8_german.o stem_ISO_8859_1_english.o stem_ISO_8859_1_norwegian.o stem_UTF_8_porter.o stem_ISO_8859_1_portuguese.o stem_UTF_8_russian.o stem_ISO_8859_1_spanish.o stem_ISO_8859_1_french.o stem_ISO_8859_1_dutch.o stem_KOI8_R_russian.o stem_ISO_8859_1_german.o stem_ISO_8859_1_porter.o stem_UTF_8_finnish.o stem_UTF_8_danish.o stem_UTF_8_swedish.o stem_ISO_8859_1_finnish.o stem_UTF_8_italian.o stem_ISO_8859_1_swedish.o stem_ISO_8859_1_danish.o stem_UTF_8_english.o stem_UTF_8_norwegian.o w32_io.o nix_io.o -ldl -lcrypt -lm -lc cd .. /usr/local/bin/ruby -Ilib:test/unit "/usr/local/lib/ruby/gems/1.8/gems/rake-0.7.1/lib/rake/rake_test_loader.rb" "test/unit/ts_document.rb" "test/unit/ts_index.rb" "test/unit/ts_store.rb" "test/unit/ts_analysis.rb" "test/unit/ts_utils.rb" "test/unit/ts_search.rb" "test/unit/ts_query_parser.rb" Loaded suite /usr/local/lib/ruby/gems/1.8/gems/rake-0.7.1/lib/rake/rake_test_loader Started ........F...../test/unit/../unit/document/../../unit/index/../../unit/store/tc_fs_store.rb:18:in `refresh': : Error occured at :123 (Exception) Error: exception 2 not handled: No such file or directory from ./test/unit/../unit/document/../../unit/index/../../unit/store/tc_fs_store.rb:18:in `teardown' from /usr/local/lib/ruby/1.8/test/unit/testcase.rb:77:in `run' from /usr/local/lib/ruby/1.8/test/unit/testsuite.rb:32:in `run' from /usr/local/lib/ruby/1.8/test/unit/testsuite.rb:31:in `run' from /usr/local/lib/ruby/1.8/test/unit/testsuite.rb:32:in `run' from /usr/local/lib/ruby/1.8/test/unit/testsuite.rb:31:in `run' from /usr/local/lib/ruby/1.8/test/unit/ui/testrunnermediator.rb:44:in `run_suite' from /usr/local/lib/ruby/1.8/test/unit/ui/console/testrunner.rb:65:in `start_mediator' from /usr/local/lib/ruby/1.8/test/unit/ui/console/testrunner.rb:39:in `start' from /usr/local/lib/ruby/1.8/test/unit/ui/testrunnerutilities.rb:27:in `run' from /usr/local/lib/ruby/1.8/test/unit/autorunner.rb:200:in `run' from /usr/local/lib/ruby/1.8/test/unit/autorunner.rb:13:in `run' from /usr/local/lib/ruby/1.8/test/unit.rb:285 from /usr/local/lib/ruby/gems/1.8/gems/rake-0.7.1/lib/rake/rake_test_loader.rb:5 rake aborted! Command failed with status (1): [/usr/local/bin/ruby -Ilib:test/unit "/usr/...] (See full trace by running task with --trace) hawk{rjm2}299: From john.m.andrews at gmail.com Fri May 12 14:30:58 2006 From: john.m.andrews at gmail.com (John Andrews) Date: Fri, 12 May 2006 14:30:58 -0400 Subject: [Ferret-talk] Problems installing on Fedora Message-ID: <8bc6d8730605121130x798682e8v6d7bda32a2c676fd@mail.gmail.com> Hello everyone, I am new to Ferret and new to this mailing list. I am having a problem installing. sudo gem install ferret Attempting local installation of 'ferret' Local gem file not found: ferret*.gem Attempting remote installation of 'ferret' Updating Gem source index for: http://gems.rubyforge.org Building native extensions. This could take a while... can't find header files for ruby. ERROR: While executing gem ... (RuntimeError) ERROR: Failed to build gem native extension. Gem files will remain installed in /usr/lib/ruby/gems/1.8/gems/ferret-0.9.3for inspection. ruby extconf.rb install ferret\n Results logged to /usr/lib/ruby/gems/1.8/gems/ferret-0.9.3/ext/gem_make.out After that I thought that maybe the c extensions just didn't compile but then I tried to fire up irb... irb(main):001:0> require 'ferret' LoadError: no such file to load -- ferret from (irb):1:in `require' from (irb):1 irb(main):002:0> require 'rubygems' => true irb(main):003:0> require 'ferret' LoadError: no such file to load -- ferret from /usr/lib/site_ruby/1.8/rubygems/custom_require.rb:21:in `require__' from /usr/lib/site_ruby/1.8/rubygems/custom_require.rb:21:in `require' from (irb):3 This machine is a brand new installation of Fedora Core 5. I also just installed ruby and gem. Am I missing a library somewhere? Any help is greatly appreciated. Regards, John -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20060512/b618fdd6/attachment.htm From JanPrill at blauton.de Fri May 12 17:38:46 2006 From: JanPrill at blauton.de (Jan Prill) Date: Fri, 12 May 2006 23:38:46 +0200 Subject: [Ferret-talk] Benchmark - Thanks Dave for making this gnawer this FAST!! Message-ID: Hi List, I've took some time and made some tests on the performance of java-lucene, hyperestraier and ferret as Dave encourages the community of ferret to do so. Quite intersting numbers. Ferret indeed deserves to be called a high-performance port!! It's MyFirstBenchmark ( http://ferret.davebalmain.com/trac/wiki/MyFirstBenchmark ) so please don't be too cruel on critizing the method. It's just a hack and it's flawed - as every other benchmark. But it provides some numbers and regardlass how flawed it is, one thing remains true: All of these search engines are fast enough for most of us... Regards Jan -- Posted via http://www.ruby-forum.com/. From dbalmain.ml at gmail.com Sat May 13 00:34:53 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Sat, 13 May 2006 13:34:53 +0900 Subject: [Ferret-talk] Problems installing on Fedora In-Reply-To: <8bc6d8730605121130x798682e8v6d7bda32a2c676fd@mail.gmail.com> References: <8bc6d8730605121130x798682e8v6d7bda32a2c676fd@mail.gmail.com> Message-ID: Hi John, You're missing the ruby header files, ie ruby.h. I'm guessing Fedora doesn't come with ruby-devel installed by default. (I use Ubuntu). Try installing the RPM from here; http://rpmfind.net/linux/rpm2html/search.php?query=ruby-devel Or perhaps this will do the job; sudo yum install ruby-devel Please let us know if you get it working. Cheers, Dave On 5/13/06, John Andrews wrote: > Hello everyone, > I am new to Ferret and new to this mailing list. I am having a problem > installing. > > sudo gem install ferret > Attempting local installation of 'ferret' > Local gem file not found: ferret*.gem > Attempting remote installation of 'ferret' > Updating Gem source index for: http://gems.rubyforge.org > Building native extensions. This could take a while... > can't find header files for ruby. > ERROR: While executing gem ... (RuntimeError) > ERROR: Failed to build gem native extension. > Gem files will remain installed in > /usr/lib/ruby/gems/1.8/gems/ferret-0.9.3 for inspection. > ruby extconf.rb install ferret\n > Results logged to > /usr/lib/ruby/gems/1.8/gems/ferret-0.9.3/ext/gem_make.out > > After that I thought that maybe the c extensions just didn't compile but > then I tried to fire up irb... > irb(main):001:0> require 'ferret' > LoadError: no such file to load -- ferret > from (irb):1:in `require' > from (irb):1 > irb(main):002:0> require 'rubygems' > => true > irb(main):003:0> require 'ferret' > LoadError: no such file to load -- ferret > from > /usr/lib/site_ruby/1.8/rubygems/custom_require.rb:21:in > `require__' from > /usr/lib/site_ruby/1.8/rubygems/custom_require.rb:21:in > `require' > from (irb):3 > > This machine is a brand new installation of Fedora Core 5. I also just > installed ruby and gem. Am I missing a library somewhere? Any help is > greatly appreciated. > > Regards, > John > > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > > From JanPrill at blauton.de Sat May 13 10:59:17 2006 From: JanPrill at blauton.de (Jan Prill) Date: Sat, 13 May 2006 16:59:17 +0200 Subject: [Ferret-talk] Locale not set error when trying to use C version in a r In-Reply-To: References: Message-ID: <62dfc1da4fc74a30e071809f30413125@ruby-forum.com> Carl Youngblood wrote: > I'm getting a locale not set error. Does anyone know how I should set > my locale in my rails environment so that ferret knows what to do? > Why isn't this a problem in the ruby version of ferret? > > Thanks, > Carl Hi, Carl, have you had success in ruling out this issue? While I was fiddling around with ferret 0.9.3 while testing for http://ferret.davebalmain.com/trac/wiki/MyFirstBenchmark I had problems with the set locale on a gentoo box. This happened to be a problem of the c-analyzer to read documents with special characters on my installation. The encoding was posix and I had to change it to de_DE or us_US to index project gutenberg files. On gentoo linux this was done with "locale -a" to check out the possible locales and after that setting one of these locales by "export LANG = de_DE", "export LC_ALL = de_DE" and so on. Afterwards I changed the locale of the whole system to Unicode by changing /etc/env.d/02locale. This is on gentoo, you need to check out how it is on your distro. ymmv... Regards Jan -- Posted via http://www.ruby-forum.com/. From akb at mutualaid.org Sun May 14 14:01:04 2006 From: akb at mutualaid.org (akb) Date: Sun, 14 May 2006 20:01:04 +0200 Subject: [Ferret-talk] Ferret failing to rebuild_index - occasionally unable to In-Reply-To: References: <5b52265f4407bceb632ee40d0de52946@ruby-forum.com> <00615ea77943d230bf32ee446f5a4b3d@ruby-forum.com> Message-ID: <9b340c28d5a2d1fd6bfa90d639e6be05@ruby-forum.com> Unfortunately I still see this problem with both 0.9.2 and 0.9.3. a Kasper Weibel wrote: > This problem seems to have been solved with ferret 0.9.2 > > Kasper Weibel wrote: >> I think windows is locking the file for deletion until the test process >> ends - or something similar. >> >> akb wrote: >>> Most of the tests also fail on Windows with a similar error on the demo >>> provided in svn. I tried adding some sleeps but that doesn't make a >>> difference. >>> -- Posted via http://www.ruby-forum.com/. From shingler at gmail.com Mon May 15 12:08:20 2006 From: shingler at gmail.com (steven shingler) Date: Mon, 15 May 2006 18:08:20 +0200 Subject: [Ferret-talk] Ferret not able to read a Lucene Index? Message-ID: <5ca813e9e688fa4ac053b23de043b3eb@ruby-forum.com> Hi all, Having problems trying to get Ferret to read an index generated by Lucene. Am I right in thinking Ferret should be able to read a Lucene generated index no problem? Using the code snippets detailed in http://www.ruby-forum.com/topic/64099#new Any advice gratefully received. Many Thanks, Steven -- Posted via http://www.ruby-forum.com/. From erik at ehatchersolutions.com Mon May 15 12:15:20 2006 From: erik at ehatchersolutions.com (Erik Hatcher) Date: Mon, 15 May 2006 12:15:20 -0400 Subject: [Ferret-talk] Ferret not able to read a Lucene Index? In-Reply-To: <5ca813e9e688fa4ac053b23de043b3eb@ruby-forum.com> References: <5ca813e9e688fa4ac053b23de043b3eb@ruby-forum.com> Message-ID: On May 15, 2006, at 12:08 PM, steven shingler wrote: > Am I right in thinking Ferret should be able to read a Lucene > generated > index no problem? That would be nice, but it is not currently the case because of Java's wacky "modified" UTF-8 serialization. I've seen that plain ol' ASCII text indexes will be compatible, but once you put in some higher order characters things go askew. Erik From marvin at rectangular.com Mon May 15 19:39:48 2006 From: marvin at rectangular.com (Marvin Humphrey) Date: Mon, 15 May 2006 16:39:48 -0700 Subject: [Ferret-talk] Benchmark - Thanks Dave for making this gnawer this FAST!! In-Reply-To: References: Message-ID: <85497015-BA45-43DA-9463-BA61FC9DFABE@rectangular.com> On May 12, 2006, at 2:38 PM, Jan Prill wrote: > Hi List, > > I've took some time and made some tests on the performance of > java-lucene, hyperestraier and ferret as Dave encourages the community > of ferret to do so. Hello, Jan... On the benchmarking page you make this request. "If you are an expert in one of these search-engines than provide some information about the best optimizations." As the author of another Lucene port (KinoSearch, Perl/C), I know a fair amount about Lucene. Better, I put together some benchmarks comparing Lucene, KinoSearch and Plucene, a little while ago , and I solicited the help of the Lucene developers list to help tune the Lucene benchmarking app. By the end it performed around twice as well as my initial version. In order to max out Lucene's indexing speed... * Don't use the compound file format: indexWriter.setUseCompoundFile(false); * Set maxBufferedDocs to at least 100, and if you have the RAM, 1000: indexWriter.setMaxBufferedDocs(1000); * Give the JVM a generous heap and run it under -server: java -Xmx500M -server MyIndexer * Make sure that JVM startup time is not factored into the results unless you intend it to be. All this in addition to good stuff like warming up OS caches with dry runs prior to test runs, ensuring that the machine is otherwise idle, making sure that the analyzers are exactly equivalent (the fact that the search results differ is a red flag -- I'd use WhiteSpaceAnalyzer instead of whatever you're using), and other such steps to isolate the variables you intend to measure. Then, perform multiple iterations. > It's MyFirstBenchmark ( > http://ferret.davebalmain.com/trac/wiki/MyFirstBenchmark ) so please > don't be too cruel on critizing the method. It's very difficult to run a good scientific experiment of any kind. In fact my current results are flawed -- I left out a call to optimize () in the Lucene benchmark, so Lucene performs not quite so well as the numbers on my page would indicate. But I'd rather err on that side than on the giving the engine I'm attached to a leg up. > one thing remains true: All of these search > engines are fast enough for most of us... Yes. Things are different than they were just a couple years ago. Marvin Humphrey Rectangular Research http://www.rectangular.com/ From jan.prill at gmail.com Tue May 16 03:02:30 2006 From: jan.prill at gmail.com (Jan Prill) Date: Tue, 16 May 2006 09:02:30 +0200 Subject: [Ferret-talk] Benchmark - Thanks Dave for making this gnawer this FAST!! In-Reply-To: <85497015-BA45-43DA-9463-BA61FC9DFABE@rectangular.com> References: <85497015-BA45-43DA-9463-BA61FC9DFABE@rectangular.com> Message-ID: <562a35c10605160002s56f7a92bsb15a380f27592d40@mail.gmail.com> Hi, Marvin, thank you very much. I will take these advices into account when I'm doing other tests. As a first step I'll add a link to your post to the ferret wiki to let people know... Regards Jan Prill On 5/16/06, Marvin Humphrey wrote: > > > On May 12, 2006, at 2:38 PM, Jan Prill wrote: > > > Hi List, > > > > I've took some time and made some tests on the performance of > > java-lucene, hyperestraier and ferret as Dave encourages the community > > of ferret to do so. > > Hello, Jan... On the benchmarking page you make this request. > > "If you are an expert in one of these search-engines than provide > some information about the best optimizations." > > As the author of another Lucene port (KinoSearch, Perl/C), I know a > fair amount about Lucene. Better, I put together some benchmarks > comparing Lucene, KinoSearch and Plucene, a little while ago www.rectangular.com/kinosearch/benchmarks.html>, and I solicited the > help of the Lucene developers list to help tune the Lucene > benchmarking app. By the end it performed around twice as well as my > initial version. > > In order to max out Lucene's indexing speed... > > * Don't use the compound file format: > indexWriter.setUseCompoundFile(false); > * Set maxBufferedDocs to at least 100, and if you have the RAM, 1000: > indexWriter.setMaxBufferedDocs(1000); > * Give the JVM a generous heap and run it under -server: > java -Xmx500M -server MyIndexer > * Make sure that JVM startup time is not factored into the results > unless you intend it to be. > > All this in addition to good stuff like warming up OS caches with dry > runs prior to test runs, ensuring that the machine is otherwise idle, > making sure that the analyzers are exactly equivalent (the fact that > the search results differ is a red flag -- I'd use WhiteSpaceAnalyzer > instead of whatever you're using), and other such steps to isolate > the variables you intend to measure. Then, perform multiple iterations. > > > It's MyFirstBenchmark ( > > http://ferret.davebalmain.com/trac/wiki/MyFirstBenchmark ) so please > > don't be too cruel on critizing the method. > > It's very difficult to run a good scientific experiment of any kind. > In fact my current results are flawed -- I left out a call to optimize > () in the Lucene benchmark, so Lucene performs not quite so well as > the numbers on my page would indicate. But I'd rather err on that > side than on the giving the engine I'm attached to a leg up. > > > one thing remains true: All of these search > > engines are fast enough for most of us... > > Yes. Things are different than they were just a couple years ago. > > Marvin Humphrey > Rectangular Research > http://www.rectangular.com/ > > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20060516/4c376f6f/attachment.htm From shingler at gmail.com Tue May 16 05:55:48 2006 From: shingler at gmail.com (steven shingler) Date: Tue, 16 May 2006 11:55:48 +0200 Subject: [Ferret-talk] Ferret not able to read a Lucene Index? In-Reply-To: References: <5ca813e9e688fa4ac053b23de043b3eb@ruby-forum.com> Message-ID: <67f083291357bf5ca7e7b161c7164d03@ruby-forum.com> Hi Erik, Thanks for getting back to me. Ahh yes, I see what you mean - if I "Lucene-Index" only plain text files, Ferret can search that index fine (it seems). However, what I'm trying to do is index pdfs, using PDFBox to create the Lucene documents - but Ferret isn't at all pleased when I try to search: NoMethodError: You have a nil object when you didn't expect it! The error occured while evaluating nil.name c:/ruby/lib/ruby/site_ruby/1.8/ferret/index/term_buffer.rb:31:in `read' c:/ruby/lib/ruby/site_ruby/1.8/ferret/index/segment_term_enum.rb:90:in `next ?' c:/ruby/lib/ruby/site_ruby/1.8/ferret/index/segment_term_enum.rb:118:in `sca n_to' c:/ruby/lib/ruby/site_ruby/1.8/ferret/index/term_infos_io.rb:285:in `scan_fo r_term_info' c:/ruby/lib/ruby/site_ruby/1.8/ferret/index/term_infos_io.rb:163:in `get_ter m_info' c:/ruby/lib/ruby/site_ruby/1.8/ferret/index/segment_reader.rb:176:in `doc_fr eq' c:/ruby/lib/ruby/site_ruby/1.8/ferret/index/multi_reader.rb:169:in `doc_freq ' c:/ruby/lib/ruby/site_ruby/1.8/ferret/index/multi_reader.rb:169:in `each' c:/ruby/lib/ruby/site_ruby/1.8/ferret/index/multi_reader.rb:169:in `doc_freq ' c:/ruby/lib/ruby/site_ruby/1.8/ferret/search/index_searcher.rb:47:in `doc_fr eq' c:/ruby/lib/ruby/site_ruby/1.8/ferret/search/term_query.rb:13:in `initialize ' c:/ruby/lib/ruby/site_ruby/1.8/ferret/search/term_query.rb:99:in `new' c:/ruby/lib/ruby/site_ruby/1.8/ferret/search/term_query.rb:99:in `create_wei ght' c:/ruby/lib/ruby/site_ruby/1.8/ferret/search/boolean_query.rb:113:in `initia lize' c:/ruby/lib/ruby/site_ruby/1.8/ferret/search/boolean_query.rb:112:in `each' c:/ruby/lib/ruby/site_ruby/1.8/ferret/search/boolean_query.rb:112:in `initia lize' c:/ruby/lib/ruby/site_ruby/1.8/ferret/search/boolean_query.rb:209:in `new' c:/ruby/lib/ruby/site_ruby/1.8/ferret/search/boolean_query.rb:209:in `create _weight' c:/ruby/lib/ruby/site_ruby/1.8/ferret/search/query.rb:51:in `weight' c:/ruby/lib/ruby/site_ruby/1.8/ferret/search/index_searcher.rb:107:in `searc h' c:/ruby/lib/ruby/site_ruby/1.8/ferret/index/index.rb:660:in `do_search' c:/ruby/lib/ruby/site_ruby/1.8/ferret/index/index.rb:331:in `search_each' c:/ruby/lib/ruby/site_ruby/1.8/ferret/index/index.rb:330:in `synchronize' c:/ruby/lib/ruby/site_ruby/1.8/ferret/index/index.rb:330:in `search_each' ./lib/ferret_client.rb:34:in `search_index' test/functional/ferret_client_test.rb:12:in `test_search_index' This is a shame, as I thought I was onto a winner with the Lucene/Ferret combo - especially with PDFBox able to create Lucene Docs so easily. This may not actually relate to your point of higher order chars...? Does anyone have any experience of indexing pdfs in Lucene (using PDFBox) and searching with Ferret? Or of course creating Ferret Index Docs from pdf files in ruby? Any ideas or advice gratefully received. Thanks, Steven -- Posted via http://www.ruby-forum.com/. From jan.prill at gmail.com Tue May 16 06:02:44 2006 From: jan.prill at gmail.com (Jan Prill) Date: Tue, 16 May 2006 12:02:44 +0200 Subject: [Ferret-talk] Ferret not able to read a Lucene Index? In-Reply-To: <67f083291357bf5ca7e7b161c7164d03@ruby-forum.com> References: <5ca813e9e688fa4ac053b23de043b3eb@ruby-forum.com> <67f083291357bf5ca7e7b161c7164d03@ruby-forum.com> Message-ID: <562a35c10605160302r212b2f62v8741636906d54e48@mail.gmail.com> Hi, steven, first of all: would you mind to provide a little more info on the environment you are on: os, version of ferret, version of ruby et al. second: You might be interested in FerretFinder utility as well as RDig. Links to both of them you'll find at the bottom of the howto section on ferret trac: http://ferret.davebalmain.com/trac/wiki/HowTos . Both of these tools seem to use pdftotext to extract content from PDFs but might be of help to you anyways. Regards Jan Prill On 5/16/06, steven shingler wrote: > > Hi Erik, Thanks for getting back to me. > > Ahh yes, I see what you mean - if I "Lucene-Index" only plain text > files, Ferret can search that index fine (it seems). > > However, what I'm trying to do is index pdfs, using PDFBox to create the > Lucene documents - but Ferret isn't at all pleased when I try to search: > > NoMethodError: You have a nil object when you didn't expect it! > The error occured while evaluating nil.name > c:/ruby/lib/ruby/site_ruby/1.8/ferret/index/term_buffer.rb:31:in > `read' > c:/ruby/lib/ruby/site_ruby/1.8/ferret/index/segment_term_enum.rb:90:in > `next > ?' > > c:/ruby/lib/ruby/site_ruby/1.8/ferret/index/segment_term_enum.rb:118:in > `sca > n_to' > c:/ruby/lib/ruby/site_ruby/1.8/ferret/index/term_infos_io.rb:285:in > `scan_fo > r_term_info' > c:/ruby/lib/ruby/site_ruby/1.8/ferret/index/term_infos_io.rb:163:in > `get_ter > m_info' > c:/ruby/lib/ruby/site_ruby/1.8/ferret/index/segment_reader.rb:176:in > `doc_fr > eq' > c:/ruby/lib/ruby/site_ruby/1.8/ferret/index/multi_reader.rb:169:in > `doc_freq > ' > c:/ruby/lib/ruby/site_ruby/1.8/ferret/index/multi_reader.rb:169:in > `each' > c:/ruby/lib/ruby/site_ruby/1.8/ferret/index/multi_reader.rb:169:in > `doc_freq > ' > c:/ruby/lib/ruby/site_ruby/1.8/ferret/search/index_searcher.rb:47:in > `doc_fr > eq' > c:/ruby/lib/ruby/site_ruby/1.8/ferret/search/term_query.rb:13:in > `initialize > ' > c:/ruby/lib/ruby/site_ruby/1.8/ferret/search/term_query.rb:99:in > `new' > c:/ruby/lib/ruby/site_ruby/1.8/ferret/search/term_query.rb:99:in > `create_wei > ght' > c:/ruby/lib/ruby/site_ruby/1.8/ferret/search/boolean_query.rb:113:in > `initia > lize' > c:/ruby/lib/ruby/site_ruby/1.8/ferret/search/boolean_query.rb:112:in > `each' > c:/ruby/lib/ruby/site_ruby/1.8/ferret/search/boolean_query.rb:112:in > `initia > lize' > c:/ruby/lib/ruby/site_ruby/1.8/ferret/search/boolean_query.rb:209:in > `new' > c:/ruby/lib/ruby/site_ruby/1.8/ferret/search/boolean_query.rb:209:in > `create > _weight' > c:/ruby/lib/ruby/site_ruby/1.8/ferret/search/query.rb:51:in `weight' > c:/ruby/lib/ruby/site_ruby/1.8/ferret/search/index_searcher.rb:107:in > `searc > h' > c:/ruby/lib/ruby/site_ruby/1.8/ferret/index/index.rb:660:in > `do_search' > c:/ruby/lib/ruby/site_ruby/1.8/ferret/index/index.rb:331:in > `search_each' > c:/ruby/lib/ruby/site_ruby/1.8/ferret/index/index.rb:330:in > `synchronize' > c:/ruby/lib/ruby/site_ruby/1.8/ferret/index/index.rb:330:in > `search_each' > ./lib/ferret_client.rb:34:in `search_index' > test/functional/ferret_client_test.rb:12:in `test_search_index' > > This is a shame, as I thought I was onto a winner with the Lucene/Ferret > combo - especially with PDFBox able to create Lucene Docs so easily. > > This may not actually relate to your point of higher order chars...? > > Does anyone have any experience of indexing pdfs in Lucene (using > PDFBox) and searching with Ferret? Or of course creating Ferret Index > Docs from pdf files in ruby? > > Any ideas or advice gratefully received. > Thanks, > Steven > > > -- > Posted via http://www.ruby-forum.com/. > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20060516/9c917523/attachment-0001.htm From shingler at gmail.com Tue May 16 06:07:00 2006 From: shingler at gmail.com (steven shingler) Date: Tue, 16 May 2006 12:07:00 +0200 Subject: [Ferret-talk] Ferret not able to read a Lucene Index? In-Reply-To: <562a35c10605160302r212b2f62v8741636906d54e48@mail.gmail.com> References: <5ca813e9e688fa4ac053b23de043b3eb@ruby-forum.com> <67f083291357bf5ca7e7b161c7164d03@ruby-forum.com> <562a35c10605160302r212b2f62v8741636906d54e48@mail.gmail.com> Message-ID: Hi Jan, Right - sorry. I'm on Windows XP(pro); ferret 0.9.1 (pure ruby); ruby 1.8.2 I'll look into those links now. Many Thanks Steven -- Posted via http://www.ruby-forum.com/. From jan.prill at gmail.com Tue May 16 06:17:48 2006 From: jan.prill at gmail.com (Jan Prill) Date: Tue, 16 May 2006 12:17:48 +0200 Subject: [Ferret-talk] Ferret not able to read a Lucene Index? In-Reply-To: References: <5ca813e9e688fa4ac053b23de043b3eb@ruby-forum.com> <67f083291357bf5ca7e7b161c7164d03@ruby-forum.com> <562a35c10605160302r212b2f62v8741636906d54e48@mail.gmail.com> Message-ID: <562a35c10605160317v265fb495g24c105e85c26f6c@mail.gmail.com> hey steven, have you got a linux box to your availability too? It might be of interest if the problem persists with ferret 0.9.3. If you got any scripts and test data of your pdfs I might as well check this out for you on linux, ferret 0.9.3 and ruby 1.8.4 regards Jan On 5/16/06, steven shingler wrote: > > Hi Jan, > > Right - sorry. > > I'm on Windows XP(pro); ferret 0.9.1 (pure ruby); ruby 1.8.2 > > I'll look into those links now. > Many Thanks > Steven > > > -- > Posted via http://www.ruby-forum.com/. > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20060516/fd3cf380/attachment.htm From shingler at gmail.com Tue May 16 06:54:03 2006 From: shingler at gmail.com (steven shingler) Date: Tue, 16 May 2006 12:54:03 +0200 Subject: [Ferret-talk] Ferret not able to read a Lucene Index? In-Reply-To: <562a35c10605160317v265fb495g24c105e85c26f6c@mail.gmail.com> References: <5ca813e9e688fa4ac053b23de043b3eb@ruby-forum.com> <67f083291357bf5ca7e7b161c7164d03@ruby-forum.com> <562a35c10605160302r212b2f62v8741636906d54e48@mail.gmail.com> <562a35c10605160317v265fb495g24c105e85c26f6c@mail.gmail.com> Message-ID: <4c5a15169002088c70ba759d3551ed76@ruby-forum.com> Hi Jan, Yes, I've got an Ubuntu box I can try it on - just updated to ferret 0.9.3 and ruby 1.8.4 on it. Will have a look now and report back. Many Thanks for your help. S~ p.s. the ferret_helper finder utils look v interesting -- Posted via http://www.ruby-forum.com/. From john.m.andrews at gmail.com Tue May 16 09:37:03 2006 From: john.m.andrews at gmail.com (John Andrews) Date: Tue, 16 May 2006 09:37:03 -0400 Subject: [Ferret-talk] Problems installing on Fedora In-Reply-To: References: <8bc6d8730605121130x798682e8v6d7bda32a2c676fd@mail.gmail.com> Message-ID: <8bc6d8730605160637r6130f375r5cd28813884cb9d@mail.gmail.com> That was it, Dave. Thanks! -John On 5/13/06, David Balmain wrote: > You're missing the ruby header files, ie ruby.h. I'm guessing Fedora > doesn't come with ruby-devel installed by default. (I use Ubuntu). Try > installing the RPM from here; > > http://rpmfind.net/linux/rpm2html/search.php?query=ruby-devel > > Or perhaps this will do the job; > > sudo yum install ruby-devel > > Please let us know if you get it working. > > Cheers, > Dave > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20060516/bb217241/attachment.htm From dbalmain.ml at gmail.com Tue May 16 10:53:57 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Tue, 16 May 2006 23:53:57 +0900 Subject: [Ferret-talk] Ferret not able to read a Lucene Index? In-Reply-To: References: <5ca813e9e688fa4ac053b23de043b3eb@ruby-forum.com> Message-ID: On 5/16/06, Erik Hatcher wrote: > > On May 15, 2006, at 12:08 PM, steven shingler wrote: > > Am I right in thinking Ferret should be able to read a Lucene > > generated > > index no problem? > > That would be nice, but it is not currently the case because of > Java's wacky "modified" UTF-8 serialization. I've seen that plain > ol' ASCII text indexes will be compatible, but once you put in some > higher order characters things go askew. Hey guys, What Erik said is exactly correct. Marvin Humphrey, (author of KinoSearch, a Perl port of Lucene) has submitted a patch to Lucene so that non-java ports of Lucene will be able to read Lucene indexes. It currently slows Lucene down by about 25% at the moment (I think??) so I'm going to be working with him to improve the performance of the patch so that it can one day be included in Lucene. Don't hold your breath though. It's going to take us a while to get it in there. For now, I'd recommend using pdftotext as Jan already mentioned. I'm not sure what is available on Windows but I'm sure it would be trivial to write your own pdftotext using Java's PDFBox and then call it from Ruby. Cheers, Dave From marvin at rectangular.com Tue May 16 12:51:42 2006 From: marvin at rectangular.com (Marvin Humphrey) Date: Tue, 16 May 2006 09:51:42 -0700 Subject: [Ferret-talk] Ferret not able to read a Lucene Index? In-Reply-To: References: <5ca813e9e688fa4ac053b23de043b3eb@ruby-forum.com> Message-ID: On May 16, 2006, at 7:53 AM, David Balmain wrote: > On 5/16/06, Erik Hatcher wrote: >> >> On May 15, 2006, at 12:08 PM, steven shingler wrote: >>> Am I right in thinking Ferret should be able to read a Lucene >>> generated >>> index no problem? >> >> That would be nice, but it is not currently the case because of >> Java's wacky "modified" UTF-8 serialization. I've seen that plain >> ol' ASCII text indexes will be compatible, but once you put in some >> higher order characters things go askew. > > Hey guys, > > What Erik said is exactly correct. Marvin Humphrey, (author of > KinoSearch, a Perl port of Lucene) has submitted a patch to Lucene so > that non-java ports of Lucene will be able to read Lucene indexes. It > currently slows Lucene down by about 25% at the moment (I think??) Around 20% for indexing according to my benchmarker. I don't have a benchmark for searching. Modified UTF-8 is not so much the problem for performance of my patch, nor is it actually causing the index incompatibility in this case. Modified UTF-8 is problematic for a couple other reasons. When text contains either null bytes or Unicode code points above the Basic Multilingual Plane (values 2^16 and up, such as U+1D160 "MUSICAL SYMBOL EIGHTH NOTE"), KinoSearch and Ferret, if they write legal UTF-8, would write indexes which would cause Lucene to crash from time to time with a baffling "read past EOF" error. Therefore, to be Lucene-compatible they'd have to pre-scan all text to detect those conditions, which would impose a performance burden and require some crufty auxilliary code to turn the legal UTF-8 into Modified UTF-8. Also, non-shortest-form UTF-8 presents a theoretical security risk, and Perl is set up to issue a warning whenever a scalar which is marked as UTF-8 isn't shortest-form. That condition would occur whenever Modified UTF-8 containing null bytes or code points above the BMP was read in -- thus requiring that all incoming text be pre- scanned as well. Those are rare conditions, but it isn't realistic to just say "KinoSearch|Ferret doesn't support null bytes or characters above the BMP", because a lot of times the source text that goes into an index isn't under the full control of the indexing/search app's author. To be fair to Java and Lucene, they are paying a price for early commitment to the Unicode standard. Lucene's UTF-8 encoding/decoding hasn't been touched since Doug Cutting wrote it in 1998, when non- shortest-form UTF-8 was still legal and Unicode was still 16-bit. You could argue that the Unicode consortium pulled the rug out from under its early champions by changing the spec so that existing implementations were no longer compliant. The performance problem sof my patch and the crashing are actually tied to the Lucene File Format's definition of a String. A String in Lucene is the length of the string in Java chars, followed by the character data translated to Modified UTF-8. A String in KinoSearch, and if I am not mistaken in Ferret as well, is the length of the character data in bytes, followed by the character data. Those two definitions of String result in identical indexes so long as your text is pure ASCII, but as Erik noted, when you add higher order characters to the mix, problems arise. You end up reading either too few bytes or too many, the stream gets out of sync, and whammo: 'Read past EOF'. My patch modifies Lucene to use bytecounts as the prefix to its Strings. Unfortunately, there are encoding/decoding inefficiencies associated with the new way of doing things. Under Lucene's current definition of a string you allocate an array of Java char then read characters into it one by one. With the new patch, you don't know how many chars you need, so you might have to re-allocate several times. There are ways to address that inefficiency, but they'd take a while to explain. > Don't hold your > breath though. It's going to take us a while to get it in there. Yeah. Modifying Lucene so that it can read both the old index format and the new without suffering a performance degradation in either case is going to be non-trivial. I'm sympathetic to the notion that it may not be worth it and that Lucene should declare its file format private. There are a lot of issues in play. No KinoSearch user has yet complained about Lucene/KinoSearch file- format compatibility. The only thing I miss is Luke -- which is significant, because Luke is really handy. How many users here care about Lucene compatibility, and why? Marvin Humphrey Rectangular Research http://www.rectangular.com/ From erik at ehatchersolutions.com Tue May 16 13:17:22 2006 From: erik at ehatchersolutions.com (Erik Hatcher) Date: Tue, 16 May 2006 13:17:22 -0400 Subject: [Ferret-talk] Ferret not able to read a Lucene Index? In-Reply-To: References: <5ca813e9e688fa4ac053b23de043b3eb@ruby-forum.com> Message-ID: <57A8BD9B-FD20-4EA5-BF7C-E34FC7E95E5B@ehatchersolutions.com> On May 16, 2006, at 12:51 PM, Marvin Humphrey wrote: > How many users here care about Lucene compatibility, and why? Personally I'm putting my eggs into the Solr basket - http:// incubator.apache.org/solr Solr has a ton of benefits over using raw Lucene with its caching and configurable handling of putting new searchers online, etc. Its got plenty of room for improvement, and those improvements are in progress. I am integrating Solr into a Ruby on Rails front-end as we speak, but doing so crudely through a rough HTTP API, but abstracting that communication layer behind a nice Rubyish DSL would be quite cool. I used to really really want Lucene index compatibility at the file format layer along with a really fast Ruby implementation. At this point I've changed my mind and Solr is my recommended basis for search integration into non-Java (and even Java perhaps) applications. I just wanted to toss out my thoughts since I've been mostly silent on the Ferret/KinoSearch issues. I still day dream of GCJ'd Java Lucene being the basis for cross-language integration using PyLucene as a great example. They achieve 100% index compatibility with Java Lucene because it *is* Java Lucene. I'm still extremely pleased to see folks like Dave and Marvin digging deep in to Ruby and Perl integration and starting to work together. Very promising no matter how this ends up. I'm optimistic we'll have Lucene in Ruby one of these days in a compatible way and incredibly performant way! Erik From nick.snels at gmail.com Tue May 16 15:30:01 2006 From: nick.snels at gmail.com (Nick Snels) Date: Tue, 16 May 2006 21:30:01 +0200 Subject: [Ferret-talk] Ferret not able to read a Lucene Index? In-Reply-To: <57A8BD9B-FD20-4EA5-BF7C-E34FC7E95E5B@ehatchersolutions.com> References: <5ca813e9e688fa4ac053b23de043b3eb@ruby-forum.com> <57A8BD9B-FD20-4EA5-BF7C-E34FC7E95E5B@ehatchersolutions.com> Message-ID: <7c97ac9a32b14cc20c0b33dff1015e6a@ruby-forum.com> I don't care about the fact that Ferret isn't able to read a Lucene index. The only problem is that when the Ferret index isn't compatible with Lucene as is the case right now (damn EOF errors), you are not able to use Luke to take a quick peek inside the index. So a port of Luke to access Ferret would be great. Ferret should be fast, have the power of Lucene searches and be easy to access from Ruby, as it is right now. If you are going to use Lucene, go all the way and stick to Java. Only problem with Ferret is that the C version isn't available on Windows (for testing purposes) yet, but that is being worked on. GJC and SWIG sounds great but setting it up is a real pain in the ass, great for techies, but horrible for all the others. Solr looks a promising project, only problem I have with it is that you need Tomcat and a JVM. This adds two more variables to your configuration you have to control. Great if you know Java, but I'm programming in Ruby so I don't have to program in Java or .NET, or whatever. So I prefer a Ruby only environment for it's simplicity. So Luke is a definite plus as a debugging tool. Kind regards, Nick -- Posted via http://www.ruby-forum.com/. From erik at ehatchersolutions.com Tue May 16 15:45:03 2006 From: erik at ehatchersolutions.com (Erik Hatcher) Date: Tue, 16 May 2006 15:45:03 -0400 Subject: [Ferret-talk] Ferret not able to read a Lucene Index? In-Reply-To: <7c97ac9a32b14cc20c0b33dff1015e6a@ruby-forum.com> References: <5ca813e9e688fa4ac053b23de043b3eb@ruby-forum.com> <57A8BD9B-FD20-4EA5-BF7C-E34FC7E95E5B@ehatchersolutions.com> <7c97ac9a32b14cc20c0b33dff1015e6a@ruby-forum.com> Message-ID: On May 16, 2006, at 3:30 PM, Nick Snels wrote: > Solr looks a promising project, only problem I have with it is that > you > need Tomcat and a JVM. This adds two more variables to your > configuration you have to control. Great if you know Java, but I'm > programming in Ruby so I don't have to program in Java or .NET, or > whatever. So I prefer a Ruby only environment for it's simplicity. A fair and expected critique of using Solr in a Ruby environment. Every language enjoys a bit of lock-in and programmers obviously would prefer to work with native API's. It is true you need a JVM to run Solr, but it doesn't have to be Tomcat. I use Jetty. To fire up Solr in my Rails environment only required I customize its schema.xml and solrconfig.xml files and run "java -jar start.jar". And voila, its up and running. So while it does add an entirely new moving piece, I view it as something akin to adding a database. As long as there is a good way to communicate with it natively (a Ruby/Solr API would be well received, methinks) then Solr isn't any more, actually less, overhead to a projects deployment than adding a database server. Erik From marvin at rectangular.com Tue May 16 17:27:00 2006 From: marvin at rectangular.com (Marvin Humphrey) Date: Tue, 16 May 2006 14:27:00 -0700 Subject: [Ferret-talk] Ferret not able to read a Lucene Index? In-Reply-To: <7c97ac9a32b14cc20c0b33dff1015e6a@ruby-forum.com> References: <5ca813e9e688fa4ac053b23de043b3eb@ruby-forum.com> <57A8BD9B-FD20-4EA5-BF7C-E34FC7E95E5B@ehatchersolutions.com> <7c97ac9a32b14cc20c0b33dff1015e6a@ruby-forum.com> Message-ID: <472CE35F-4704-479A-BAA2-93837ADBB957@rectangular.com> On May 16, 2006, at 12:30 PM, Nick Snels wrote: > I don't care about the fact that Ferret isn't able to read a Lucene > index. The only problem is that when the Ferret index isn't compatible > with Lucene as is the case right now (damn EOF errors), you are not > able > to use Luke to take a quick peek inside the index. So a port of > Luke to > access Ferret would be great. You know what... I think using Luke powered by a version of Lucene with my patch applied would allow it to read Ferret indexes. I don't have time to check this out right now. And ironically, I've made further mods to KinoSearch's file format, so it wouldn't make Luke available to KinoSearch users unless I change it back. hahaha. ":o The patch was prepared against subversion, but it might work against 1.9.1. If it doesn't, it would be trival to finish it and package it up. Maybe we can convince the Lucene folks to distribute it through their channels... or I can put it up at my site. Maybe Luke's author would be amenable to distributing it from his site, but I dunno about that - people might blame him rather than me or Balmain when stuff fails to work. Marvin Humphrey Rectangular Research http://www.rectangular.com/ From dbalmain.ml at gmail.com Wed May 17 03:12:00 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Wed, 17 May 2006 16:12:00 +0900 Subject: [Ferret-talk] Ferret not able to read a Lucene Index? In-Reply-To: References: <5ca813e9e688fa4ac053b23de043b3eb@ruby-forum.com> Message-ID: On 5/17/06, Marvin Humphrey wrote: > How many users here care about Lucene compatibility, and why? Great question. Who does care, and why? Performance used to be a very good reason but that doesn't apply anymore. Is it Java's libraries? Java does have PDFBox for example. Unfortunately Ruby doesn't yet have an equivalent but there are ways around this. The only good reason I can think of is the lack of a Luke port. Anyone care to enlighten us? Cheers, Dave From jan.prill at gmail.com Wed May 17 03:12:21 2006 From: jan.prill at gmail.com (Jan Prill) Date: Wed, 17 May 2006 09:12:21 +0200 Subject: [Ferret-talk] Ferret not able to read a Lucene Index? In-Reply-To: <472CE35F-4704-479A-BAA2-93837ADBB957@rectangular.com> References: <5ca813e9e688fa4ac053b23de043b3eb@ruby-forum.com> <57A8BD9B-FD20-4EA5-BF7C-E34FC7E95E5B@ehatchersolutions.com> <7c97ac9a32b14cc20c0b33dff1015e6a@ruby-forum.com> <472CE35F-4704-479A-BAA2-93837ADBB957@rectangular.com> Message-ID: <562a35c10605170012t68889189s28303944ada64463@mail.gmail.com> hey Marvin, is there a link in this thread already? I've found http://issues.apache.org/jira/browse/LUCENE-510?page=comments#action_12378519as well as the links at the bottom of http://www.archivum.info/java-dev at lucene.apache.org/2005-09/msg00025.htmlwith google. Is there anything else? I'll definitly try this out but wanted to make sure if this is the latest development... Regards Jan On 5/16/06, Marvin Humphrey wrote: > > > On May 16, 2006, at 12:30 PM, Nick Snels wrote: > > > I don't care about the fact that Ferret isn't able to read a Lucene > > index. The only problem is that when the Ferret index isn't compatible > > with Lucene as is the case right now (damn EOF errors), you are not > > able > > to use Luke to take a quick peek inside the index. So a port of > > Luke to > > access Ferret would be great. > > You know what... I think using Luke powered by a version of Lucene > with my patch applied would allow it to read Ferret indexes. > > I don't have time to check this out right now. And ironically, I've > made further mods to KinoSearch's file format, so it wouldn't make > Luke available to KinoSearch users unless I change it back. hahaha. ":o > > The patch was prepared against subversion, but it might work against > 1.9.1. If it doesn't, it would be trival to finish it and package it > up. Maybe we can convince the Lucene folks to distribute it through > their channels... or I can put it up at my site. Maybe Luke's author > would be amenable to distributing it from his site, but I dunno about > that - people might blame him rather than me or Balmain when stuff > fails to work. > > Marvin Humphrey > Rectangular Research > http://www.rectangular.com/ > > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20060517/9b7c026b/attachment.htm From jan.prill at gmail.com Wed May 17 03:30:34 2006 From: jan.prill at gmail.com (Jan Prill) Date: Wed, 17 May 2006 09:30:34 +0200 Subject: [Ferret-talk] Ferret not able to read a Lucene Index? In-Reply-To: References: <5ca813e9e688fa4ac053b23de043b3eb@ruby-forum.com> Message-ID: <562a35c10605170030s3084611dj81c13682258f63f2@mail.gmail.com> Hi Dave, IMHO there are two things: 1. these little marketing and management issues that often have no valid reason but make a big difference: Programmer / Freelancer : let's use ruby we'll even be able to build a superfast search interface to all your great marketing docs with ferret, rails and ruby Manager: i think we've got this, it's implemented by something called bluezeneeee P/F: yes we even might use the indexes of this and perform searches with the old system while we are changing... M: changing what P/F: the system to ruby, ferret... M: WTF? for these conversations it would be of help to stay in the background as much as possible with changes as possible... 2. Tools around Lucene I think people will now give marvins patch and luke a try, but luke is not the only thing. Thanks to eric for putting up solr. I think it's a little bit of the old java 90%/10% - thingy. For 90% of webapps all the java, spring, hibernate stuff is damn complex and you'll be faster with ruby. but the 10 or less percent, often the big money stuff of fortune companys, of banks etc. made their management decision to either j2ee or .net. And for these projects the programming teams often need distributed and high volume things, see cnet and solr. I've heard about solr on this thread for the first time and wonder a little how it does together with nutch / hadoop for the distributed things but will do some googleing on this myself. I think there is definitly need - also in the ruby world - for search engines and crawlers. And nutch has some nifty features about RDig. Discussions about the interchangeability between nutch and ferret are showing that people are interested in using Lucene tools but front end with ruby, rails and ferret. I've for example tried to work with ferret on a nutch index and luckily ferret didn't choke on the index because there were no utf-8 chars in there. So I could extract url, segment, docno but then there came this nfs / hadoop thing to extract content and summaries as well and I gave up. There also seems to be interest and need in distributed search architectures as the p2p efforts of hyperestraier as well as nfs / hadoop and solr (rsync?) are showing... Regards Jan On 5/17/06, David Balmain wrote: > > On 5/17/06, Marvin Humphrey wrote: > > How many users here care about Lucene compatibility, and why? > > Great question. Who does care, and why? Performance used to be a very > good reason but that doesn't apply anymore. Is it Java's libraries? > Java does have PDFBox for example. Unfortunately Ruby doesn't yet have > an equivalent but there are ways around this. The only good reason I > can think of is the lack of a Luke port. Anyone care to enlighten us? > > Cheers, > Dave > > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20060517/11d7d92d/attachment.htm From shingler at gmail.com Wed May 17 06:15:33 2006 From: shingler at gmail.com (steven shingler) Date: Wed, 17 May 2006 12:15:33 +0200 Subject: [Ferret-talk] Ferret not able to read a Lucene Index? In-Reply-To: <562a35c10605170030s3084611dj81c13682258f63f2@mail.gmail.com> References: <5ca813e9e688fa4ac053b23de043b3eb@ruby-forum.com> <562a35c10605170030s3084611dj81c13682258f63f2@mail.gmail.com> Message-ID: I agree with Jan's 'real-world' scenario - it is the reason I started this thread in the first place... :) ...not so much because of management pressures, but I see merit in being able to create indexes in either Java or Ruby, then use Rails to present a query interface. It keeps one's options open - particularly with PDFBox and POI in the Java space, although I'm looking into both routes of the pdftotext/ferret_helper tools, and applying Marvin's patch - so perhaps both paths can remain open. Thanks to all though, for contributing to this very interesting thread! :) Cheers Steven -- Posted via http://www.ruby-forum.com/. From sergei at redleafsoft.com Wed May 17 16:38:50 2006 From: sergei at redleafsoft.com (Sergei Serdyuk) Date: Wed, 17 May 2006 22:38:50 +0200 Subject: [Ferret-talk] Ferret causing "out of memory" Message-ID: <9bf5877483d26fac3eb8c35732eb4881@ruby-forum.com> I am a bit puzzled. I am having issues with memory leaking away. It seems to be related to searching documents by 'id' column value. This is what I am doing: cached_annotations = Ferret::Index::Index.new(:path => "#{RAILS_ROOT}/db/ferret/tmp/annotations", :auto_flush => false) 100000.times { |x| doc=cached_annotations[x.to_s] } Every 100000.times eats 100MB of memory. I am on ferret 0.9.3 with C extentions. Ubuntu Linux is the OS. -- Posted via http://www.ruby-forum.com/. From dbalmain.ml at gmail.com Thu May 18 01:20:53 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Thu, 18 May 2006 14:20:53 +0900 Subject: [Ferret-talk] Ferret causing "out of memory" In-Reply-To: <9bf5877483d26fac3eb8c35732eb4881@ruby-forum.com> References: <9bf5877483d26fac3eb8c35732eb4881@ruby-forum.com> Message-ID: Thanks Sergei, Well spotted. This will be fixed in the next version. Cheers, Dave On 5/18/06, Sergei Serdyuk wrote: > I am a bit puzzled. I am having issues with memory leaking away. It > seems to be related to searching documents by 'id' column value. > > This is what I am doing: > > cached_annotations = Ferret::Index::Index.new(:path => > "#{RAILS_ROOT}/db/ferret/tmp/annotations", :auto_flush => false) > > 100000.times { |x| doc=cached_annotations[x.to_s] } > > Every 100000.times eats 100MB of memory. > > I am on ferret 0.9.3 with C extentions. Ubuntu Linux is the OS. > > -- > Posted via http://www.ruby-forum.com/. > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > From gdagley at yahoo.com Thu May 18 01:25:59 2006 From: gdagley at yahoo.com (Geoffrey Dagley) Date: Thu, 18 May 2006 07:25:59 +0200 Subject: [Ferret-talk] multi_search problem Message-ID: I am running into the following error when I try to search across multiple models with multi_search() I have rebuilt the indices and can search on each model individually using the Rails console. Here is the output from the console. >> Post.multi_search('new', [Message, WikiPage]) ArgumentError: wrong number of arguments (1 for 0) from ./script/../config/../config/../vendor/plugins/acts_as_ferret/lib/acts_as_ferret.rb:441:in `initialize' from ./script/../config/../config/../vendor/plugins/acts_as_ferret/lib/acts_as_ferret.rb:441:in `create_new_multi_reader' from ./script/../config/../config/../vendor/plugins/acts_as_ferret/lib/acts_as_ferret.rb:406:in `ensure_reader' from ./script/../config/../config/../vendor/plugins/acts_as_ferret/lib/acts_as_ferret.rb:397:in `initialize' from ./script/../config/../config/../vendor/plugins/acts_as_ferret/lib/acts_as_ferret.rb:378:in `multi_index' from ./script/../config/../config/../vendor/plugins/acts_as_ferret/lib/acts_as_ferret.rb:364:in `id_multi_search' from ./script/../config/../config/../vendor/plugins/acts_as_ferret/lib/acts_as_ferret.rb:353:in `multi_search' from (irb):1 >> Post.find_by_contents('new') => [#> Message.find_by_contents('new') => [#> I am using the latest aaf from subversion and ferret 0.9.3 All models declare acts_as_ferret :store_class_name => true -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Thu May 18 04:09:12 2006 From: kraemer at webit.de (Jens Kraemer) Date: Thu, 18 May 2006 10:09:12 +0200 Subject: [Ferret-talk] multi_search problem In-Reply-To: References: Message-ID: <20060518080912.GA10819@cordoba.webit.de> Hi Geoffrey, On Thu, May 18, 2006 at 07:25:59AM +0200, Geoffrey Dagley wrote: > I am running into the following error when I try to search across > multiple models with multi_search() I have rebuilt the indices and can > search on each model individually using the Rails console. > > Here is the output from the console. > >> Post.multi_search('new', [Message, WikiPage]) > ArgumentError: wrong number of arguments (1 for 0) [..] this is a known problem with Ferret's 0.9.x C-Version. using either the Ruby-only version (require 'rferret') or version 0.3.2 (which with it's C-extension is way faster than Ruby-only 0.9) should fix this. I'm working on an aaf version fully compatible with Ferret 0.9.3 (C-Version), hope to get this done soon. Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From weibel at gmail.com Thu May 18 14:08:06 2006 From: weibel at gmail.com (Kasper Weibel) Date: Thu, 18 May 2006 20:08:06 +0200 Subject: [Ferret-talk] [ANN] acts_as_ferret 0.2.1 Message-ID: The svn repository for acts_as_ferret has just been tagged with version 0.2.1. This is the first version of aaf to support the 0.9.x branch of Ferret. See http://projects.jkraemer.net/acts_as_ferret/wiki *Features* * High speed full text search across the contents of any Rails model class, without any hassles. The index will be kept up to date automagically while you work with your Rails model classes as usual. * Each Model class calling acts_as_ferret gets it's own Ferret index on disk, but you can search multiple models at once using the multi_search method. * Supports Rails' single table inheritance mechanism (just declare acts_as_ferret in the base class, and be able to search across all inheriting classes, see TypoWithFerret for an example) * Aaf is not limited to indexing the attributes of your model: You can tell it to index the result of any instance method of your model class. * Further customization of the indexing process can be achieved by overriding the to_doc instance method in your model class, which is supposed to return the Ferret document object to be stored in the index. * Use my_model_instance#more_like_this to retrieve objects having similar contents like my_model_instance. Great for suggesting related pages to your readers, or related products to your customers. -- Posted via http://www.ruby-forum.com/. From sergei at redleafsoft.com Fri May 19 16:04:15 2006 From: sergei at redleafsoft.com (Sergei Serdyuk) Date: Fri, 19 May 2006 22:04:15 +0200 Subject: [Ferret-talk] imdex.update is 10 times slower than index.add_doc. Normal? Message-ID: <534148f527dc09b9abd066cdec66d524@ruby-forum.com> Hi, I am seeing that doc = index['mykey'] index.update 'mykey', doc is about 10 times slower than doc = Document.new doc['id'] = 'mykey' index << doc It looks like #update is _much_ slower that #<<. Is it as expected? Sergei. -- Posted via http://www.ruby-forum.com/. From dbalmain.ml at gmail.com Fri May 19 20:10:55 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Sat, 20 May 2006 09:10:55 +0900 Subject: [Ferret-talk] imdex.update is 10 times slower than index.add_doc. Normal? In-Reply-To: <534148f527dc09b9abd066cdec66d524@ruby-forum.com> References: <534148f527dc09b9abd066cdec66d524@ruby-forum.com> Message-ID: On 5/20/06, Sergei Serdyuk wrote: > Hi, > > I am seeing that > > doc = index['mykey'] > index.update 'mykey', doc > > is about 10 times slower than > doc = Document.new > doc['id'] = 'mykey' > index << doc > > It looks like #update is _much_ slower that #<<. Is it as expected? Hi Sergei, Yes, it is expected. When you use update it has to lookup the document with the same id. It then checks each field in the document to see which fields have been changed and updates then. It deletes the old document and adds the new one. This also means that it has to open an IndexReader and then closes it and opens an IndexWriter. This is a lot of processing. If you want fast update then you need to do it yourself. Just adding a document doesn't even open and IndexReader so it is going to be faster then updating a document no matter how you do it. The fastest way to update documents is in a batch. So if you want to update 10 documents., delete all 10 together, then add the 10 updated documents together. Hope that helps, Dave > Sergei. > > -- > Posted via http://www.ruby-forum.com/. > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > From srackham at methods.co.nz Sat May 20 17:36:41 2006 From: srackham at methods.co.nz (Stuart Rackham) Date: Sat, 20 May 2006 23:36:41 +0200 Subject: [Ferret-talk] [ANN] acts_as_ferret 0.2.1 In-Reply-To: References: Message-ID: Thanks Kasper -- I've been running it with Ferret 0.9.3 (C extensions), Rails 1.1.2, Ruby 1,8.4 on a Kubuntu 5.04 box for the last day and a half without any regression problems. Cheers, Stuart -- Posted via http://www.ruby-forum.com/. From tom.oristian at gmail.com Mon May 22 01:47:27 2006 From: tom.oristian at gmail.com (Tom) Date: Mon, 22 May 2006 07:47:27 +0200 Subject: [Ferret-talk] how to index the result of any instance method Message-ID: <7b42c730d49cce49b9dd38ba16909019@ruby-forum.com> Hi, One of the AAF features is to be able to index results of methods, but I haven't seen anywhere how to do this. I have a method that returns the full text of a file and I'd like for this to be indexed. Can anyone out there help me out on this one? Tom -- Posted via http://www.ruby-forum.com/. From jan.prill at gmail.com Mon May 22 03:37:54 2006 From: jan.prill at gmail.com (Jan Prill) Date: Mon, 22 May 2006 09:37:54 +0200 Subject: [Ferret-talk] how to index the result of any instance method In-Reply-To: <7b42c730d49cce49b9dd38ba16909019@ruby-forum.com> References: <7b42c730d49cce49b9dd38ba16909019@ruby-forum.com> Message-ID: <562a35c10605220037g4c63caaeq11d01e2a1da5f89c@mail.gmail.com> Hi, Tom, haven't used this myself but if I understand http://projects.jkraemer.net/acts_as_ferret/rdoc/classes/FerretMixin/Acts/ARFerret/ClassMethods.html#M000005correctly you are able to add symbols of your instance methods to the fields of the options hash of acts_as_ferret: "fields:names all fields to include in the index. If not given, all attributes of the class will be indexed. You may also give symbols pointing to instance methods of your model here, i.e. to retrieve and index data from a related model. " Regards Jan On 5/22/06, Tom wrote: > > Hi, > > One of the AAF features is to be able to index results of methods, but I > haven't seen anywhere how to do this. I have a method that returns the > full text of a file and I'd like for this to be indexed. Can anyone out > there help me out on this one? > > Tom > > -- > Posted via http://www.ruby-forum.com/. > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20060522/4621bbd0/attachment.htm From kraemer at webit.de Mon May 22 05:21:41 2006 From: kraemer at webit.de (Jens Kraemer) Date: Mon, 22 May 2006 11:21:41 +0200 Subject: [Ferret-talk] how to index the result of any instance method In-Reply-To: <562a35c10605220037g4c63caaeq11d01e2a1da5f89c@mail.gmail.com> References: <7b42c730d49cce49b9dd38ba16909019@ruby-forum.com> <562a35c10605220037g4c63caaeq11d01e2a1da5f89c@mail.gmail.com> Message-ID: <20060522092141.GF26544@cordoba.webit.de> Hi! On Mon, May 22, 2006 at 09:37:54AM +0200, Jan Prill wrote: > Hi, Tom, > > haven't used this myself but if I understand > http://projects.jkraemer.net/acts_as_ferret/rdoc/classes/FerretMixin/Acts/ARFerret/ClassMethods.html#M000005correctly > you are able to add symbols of your instance methods to the fields > of the options hash of acts_as_ferret: > > "fields:names all fields to include in the index. If not given, > > all attributes of the class will be indexed. You may also give > symbols pointing to instance methods of your model here, i.e. > to retrieve and index data from a related model. > " right, just use the method name as a symbol in the fields list: class MyModel < ActiveRecord::Base acts_as_ferret :fields => [ :full_text, 'title' ] def full_text # return content to be indexed here end end where title is a normal database backed attribute. Cheers, Jens > On 5/22/06, Tom wrote: > > > >Hi, > > > >One of the AAF features is to be able to index results of methods, but I > >haven't seen anywhere how to do this. I have a method that returns the > >full text of a file and I'd like for this to be indexed. Can anyone out > >there help me out on this one? > > > >Tom > > > >-- > >Posted via http://www.ruby-forum.com/. > >_______________________________________________ > >Ferret-talk mailing list > >Ferret-talk at rubyforge.org > >http://rubyforge.org/mailman/listinfo/ferret-talk > > > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From srackham at methods.co.nz Mon May 22 22:50:54 2006 From: srackham at methods.co.nz (Stuart Rackham) Date: Tue, 23 May 2006 04:50:54 +0200 Subject: [Ferret-talk] acts_as_ferret: dynamic index selection Message-ID: Hi What is the be practice regards setting the index path dynamically? My application needs to switch indexes based on the logged in user, I put the following before_filter into my controller (the Document.index_dir method calculates the user's index path): before_filter { Document.ferret_configuration[:path] = Document.index_dir } It seems to work, the only rough edge is that the acts_as_ferret declaration immediately creates a spurious index directory rather than deferring until the index is actually accessed (and the index path known), this also means the index directory is not automagically created. Cheers, Stuart -- Stuart Rackham -- Posted via http://www.ruby-forum.com/. From tom.oristian at gmail.com Tue May 23 02:42:10 2006 From: tom.oristian at gmail.com (Tom) Date: Tue, 23 May 2006 08:42:10 +0200 Subject: [Ferret-talk] how to index the result of any instance method In-Reply-To: <20060522092141.GF26544@cordoba.webit.de> References: <7b42c730d49cce49b9dd38ba16909019@ruby-forum.com> <562a35c10605220037g4c63caaeq11d01e2a1da5f89c@mail.gmail.com> <20060522092141.GF26544@cordoba.webit.de> Message-ID: <19f81dea6b7169a7c199b676950569d9@ruby-forum.com> Jens Kraemer wrote: > Hi! > > On Mon, May 22, 2006 at 09:37:54AM +0200, Jan Prill wrote: >> symbols pointing to instance methods of your model here, i.e. >> to retrieve and index data from a related model. >> " > > right, just use the method name as a symbol in the fields list: > > class MyModel < ActiveRecord::Base > acts_as_ferret :fields => [ :full_text, 'title' ] > def full_text > # return content to be indexed here > end > end > > where title is a normal database backed attribute. > > Cheers, > Jens > > >> > >> >-- >> >Posted via http://www.ruby-forum.com/. >> >_______________________________________________ >> >Ferret-talk mailing list >> >Ferret-talk at rubyforge.org >> >http://rubyforge.org/mailman/listinfo/ferret-talk >> > > >> _______________________________________________ >> Ferret-talk mailing list >> Ferret-talk at rubyforge.org >> http://rubyforge.org/mailman/listinfo/ferret-talk > > -- > webit! Gesellschaft f?r neue Medien mbH www.webit.de > Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de > Schnorrstra?e 76 Tel +49 351 46766 0 > D-01069 Dresden Fax +49 351 46766 66 Thanks Jens, Now I've got the approach figured out, but I seem to be having problems yet. It seems that my full_text method is not actually being indexed. In fact, I've placed a breakpoint inside the method and it seems that it's never even being called. Meanwhile, Ferret still manages to update index with every new instance of MyModel, but without the full_text value. I also placed a breakpoint in vendor/plugins/acts_as_ferret/rebuild_index.rb and it appears that IT is never called when a new model instance is created. Any thoughts? Really appreciate your help, Tom -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Tue May 23 03:40:30 2006 From: kraemer at webit.de (Jens Kraemer) Date: Tue, 23 May 2006 09:40:30 +0200 Subject: [Ferret-talk] how to index the result of any instance method In-Reply-To: <19f81dea6b7169a7c199b676950569d9@ruby-forum.com> References: <7b42c730d49cce49b9dd38ba16909019@ruby-forum.com> <562a35c10605220037g4c63caaeq11d01e2a1da5f89c@mail.gmail.com> <20060522092141.GF26544@cordoba.webit.de> <19f81dea6b7169a7c199b676950569d9@ruby-forum.com> Message-ID: <20060523074030.GA16446@cordoba.webit.de> On Tue, May 23, 2006 at 08:42:10AM +0200, Tom wrote: > Jens Kraemer wrote: > > Hi! > > > > On Mon, May 22, 2006 at 09:37:54AM +0200, Jan Prill wrote: > >> symbols pointing to instance methods of your model here, i.e. > >> to retrieve and index data from a related model. > >> " > > > > right, just use the method name as a symbol in the fields list: > > > > class MyModel < ActiveRecord::Base > > acts_as_ferret :fields => [ :full_text, 'title' ] > > def full_text > > # return content to be indexed here > > end > > end > > > > where title is a normal database backed attribute. > > > > Thanks Jens, > > Now I've got the approach figured out, but I seem to be having problems > yet. It seems that my full_text method is not actually being indexed. > In fact, I've placed a breakpoint inside the method and it seems that > it's never even being called. Meanwhile, Ferret still manages to update > index with every new instance of MyModel, but without the full_text > value. I also placed a breakpoint in > vendor/plugins/acts_as_ferret/rebuild_index.rb and it appears that IT is > never called when a new model instance is created. Any thoughts? What version of acts_as_ferret do you use ? Could you try to upgrade from svn ? rebuild_index.rb has been removed some time ago as it is obsolete. Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From kontakt at daeltar.org Tue May 23 04:51:42 2006 From: kontakt at daeltar.org (stuff) Date: Tue, 23 May 2006 10:51:42 +0200 Subject: [Ferret-talk] Search parts of words? Message-ID: <12282a153f885d5c9158e909affe2869@ruby-forum.com> Hello, how is posibble to search only parts of words for exapmple if i have in index "Lazy fog was jumping over microstadio." I want this one will return for following search: "micro" -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Tue May 23 06:08:11 2006 From: kraemer at webit.de (Jens Kraemer) Date: Tue, 23 May 2006 12:08:11 +0200 Subject: [Ferret-talk] Search parts of words? In-Reply-To: <12282a153f885d5c9158e909affe2869@ruby-forum.com> References: <12282a153f885d5c9158e909affe2869@ruby-forum.com> Message-ID: <20060523100811.GA17901@cordoba.webit.de> On Tue, May 23, 2006 at 10:51:42AM +0200, stuff wrote: > Hello, > > how is posibble to search only parts of words for exapmple if i have in > index "Lazy fog was jumping over microstadio." I want this one will > return for following search: "micro" search for micro* see http://ferret.davebalmain.com/api/classes/Ferret/QueryParser.html If you don't use Ferret's Query Parser, you can construct a WildCardQuery for the Term 'micro' manually, too. Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From john.m.andrews at gmail.com Tue May 23 14:05:17 2006 From: john.m.andrews at gmail.com (John Andrews) Date: Tue, 23 May 2006 14:05:17 -0400 Subject: [Ferret-talk] acts_as_ferret 0.2.1 segfault Message-ID: <8bc6d8730605231105u11c22d94v94a6ced772915b3@mail.gmail.com> Hi, I just installed via script/plugin from the stable svn tag. (Revision 54) I'm running Rails 1.1.2 and have Ferret 0.9.3 (with C extensions. no compilation problems) I put together a simple model to test it and I'm getting a segfault. The model: class Report < ActiveRecord::Base acts_as_ferret :fields => [:title, :name] end $ script/console Loading development environment. >> a = Report.new :title => "test", :name => "me" => #"me", "title"=>"test", "auto_update"=>true, "path"=>nil}, @new_record=true> >> a.save ./script/../config/../config/../vendor/plugins/acts_as_ferret/lib/acts_as_ferret.rb:466: [BUG] Segmentation fault ruby 1.8.4 (2005-12-24) [i386-linux] vendor/plugins/acts_as_ferret/lib/acts_as_ferret.rb:466 reads: self.class.ferret_index << self.to_doc if @ferret_reindex by inserting logger.debug statements I have been able to determine that self.to_doc isn't the problem so I think it must be segfaulting during the <<. Does someone know if I have set up something improperly, or is this truly a bug? Thanks, John -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20060523/a4413dc3/attachment.htm From jan.prill at gmail.com Tue May 23 15:36:16 2006 From: jan.prill at gmail.com (Jan Prill) Date: Tue, 23 May 2006 21:36:16 +0200 Subject: [Ferret-talk] acts_as_ferret 0.2.1 segfault In-Reply-To: <8bc6d8730605231105u11c22d94v94a6ced772915b3@mail.gmail.com> References: <8bc6d8730605231105u11c22d94v94a6ced772915b3@mail.gmail.com> Message-ID: <562a35c10605231236x7316e44ayf37c3bb47b0e49a@mail.gmail.com> Hi, John, my guess is that you have problems with the strings you try to index not being compliant with the locale of your system. This might be especially the case if you retrieve the strings you want to index through ajax. Please have a look at: 1. http://www.ruby-forum.com/topic/64430 2. http://projects.jkraemer.net/acts_as_ferret/wiki/TypoWithFerret (at the bottom) 3. http://ferret.davebalmain.com/trac/ticket/55 maybe you could print the strings to the logfile. if they sound a little strange like query=%C3%A5&commit=search you definitly should try different locale settings. Regards Jan On 5/23/06, John Andrews wrote: > > Hi, > I just installed via script/plugin from the stable svn tag. (Revision 54) > I'm running Rails 1.1.2 and have Ferret 0.9.3 (with C extensions. no > compilation problems) > > I put together a simple model to test it and I'm getting a segfault. > The model: > class Report < ActiveRecord::Base > acts_as_ferret :fields => [:title, :name] > end > > $ script/console > Loading development environment. > >> a = Report.new :title => "test", :name => "me" > => #"me", "title"=>"test", > "auto_update"=>true, "path"=>nil}, @new_record=true> > >> a.save > ./script/../config/../config/../vendor/plugins/acts_as_ferret/lib/acts_as_ferret.rb:466: > [BUG] Segmentation fault > ruby 1.8.4 (2005-12-24) [i386-linux] > > vendor/plugins/acts_as_ferret/lib/acts_as_ferret.rb:466 reads: > self.class.ferret_index << self.to_doc if @ferret_reindex > by inserting logger.debug statements I have been able to determine that > self.to_doc isn't the problem so I think it must be segfaulting during the > <<. > > Does someone know if I have set up something improperly, or is this truly > a bug? > Thanks, > John > > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20060523/0d116355/attachment.htm From jan.prill at gmail.com Tue May 23 15:47:58 2006 From: jan.prill at gmail.com (Jan Prill) Date: Tue, 23 May 2006 21:47:58 +0200 Subject: [Ferret-talk] acts_as_ferret 0.2.1 segfault In-Reply-To: <562a35c10605231236x7316e44ayf37c3bb47b0e49a@mail.gmail.com> References: <8bc6d8730605231105u11c22d94v94a6ced772915b3@mail.gmail.com> <562a35c10605231236x7316e44ayf37c3bb47b0e49a@mail.gmail.com> Message-ID: <562a35c10605231247q692be2ees6aeb84b520e5b5b9@mail.gmail.com> sorry, I just realised that you already included the request in your mail. with 'me' and 'test' there should be no locale problems. If these are the strings you are actually using my previous mail won't be of much help... Have you tried to index something by using ferret without acts_as_ferret? I would encourage you to do so by following the short tutorial at http://ferret.davebalmain.com/api/files/TUTORIAL.html . By doing this you've eliminated a 'single point of failure' and you'll find out if it is a problem of your installation of ferret. Regards Jan On 5/23/06, Jan Prill wrote: > > Hi, John, > > my guess is that you have problems with the strings you try to index not > being compliant with the locale of your system. This might be especially the > case if you retrieve the strings you want to index through ajax. > > Please have a look at: > > 1. http://www.ruby-forum.com/topic/64430 > 2. http://projects.jkraemer.net/acts_as_ferret/wiki/TypoWithFerret (at the > bottom) > 3. http://ferret.davebalmain.com/trac/ticket/55 > > maybe you could print the strings to the logfile. if they sound a little > strange like query=%C3%A5&commit=search you definitly should try different > locale settings. > > Regards > Jan > > > > On 5/23/06, John Andrews wrote: > > > Hi, > I just installed via script/plugin from the stable svn tag. (Revision 54) > I'm running Rails 1.1.2 and have Ferret 0.9.3 (with C extensions. no > compilation problems) > > I put together a simple model to test it and I'm getting a segfault. > The model: > class Report < ActiveRecord::Base > acts_as_ferret :fields => [:title, :name] > end > > $ script/console > Loading development environment. > >> a = Report.new :title => "test", :name => "me" > => #"me", "title"=>"test", > "auto_update"=>true, "path"=>nil}, @new_record=true> > >> a.save > ./script/../config/../config/../vendor/plugins/acts_as_ferret/lib/acts_as_ferret.rb:466: > [BUG] Segmentation fault > ruby 1.8.4 (2005-12-24) [i386-linux] > > vendor/plugins/acts_as_ferret/lib/acts_as_ferret.rb:466 reads: > self.class.ferret_index << self.to_doc if @ferret_reindex > by inserting logger.debug statements I have been able to determine that > self.to_doc isn't the problem so I think it must be segfaulting during the > <<. > > Does someone know if I have set up something improperly, or is this truly > a bug? > Thanks, > John > > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20060523/9c366e2a/attachment-0001.htm From john.m.andrews at gmail.com Tue May 23 16:22:32 2006 From: john.m.andrews at gmail.com (John Andrews) Date: Tue, 23 May 2006 16:22:32 -0400 Subject: [Ferret-talk] acts_as_ferret 0.2.1 segfault In-Reply-To: <562a35c10605231247q692be2ees6aeb84b520e5b5b9@mail.gmail.com> References: <8bc6d8730605231105u11c22d94v94a6ced772915b3@mail.gmail.com> <562a35c10605231236x7316e44ayf37c3bb47b0e49a@mail.gmail.com> <562a35c10605231247q692be2ees6aeb84b520e5b5b9@mail.gmail.com> Message-ID: <8bc6d8730605231322h2e6b2e43t9a91cfd400aaa9ea@mail.gmail.com> On 5/23/06, Jan Prill wrote: > > sorry, I just realised that you already included the request in your mail. with 'me' and 'test' there should be no locale problems. If these are the strings you are actually using my previous mail won't be of much help... That's right. I have also tried longer strings with no strange characters such as "accounts payable" and had the same result. > Have you tried to index something by using ferret without acts_as_ferret? I would encourage you to do so by following the short tutorial at http://ferret.davebalmain.com/api/files/TUTORIAL.html . By doing this you've eliminated a 'single point of failure' and you'll find out if it is a problem of your installation of ferret. I thought I had, but it must have been on my OSX machine. On the machine in question (Fedora 5) I performed the steps from the tutorial. Similar result; this time during search instead of insert: irb(main):002:0> require 'rubygems' => true irb(main):003:0> require 'ferret' => true irb(main):004:0> include Ferret => Object irb(main):005:0> index = Index::Index.new => # irb(main):006:0> index << "this is a new document to be indexed" => # irb(main):007:0> index << ["and here", "is another", "new document", "to index"]=> # irb(main):008:0> index << {:title => "Programming Ruby", :content => "blah blah blah"} => # irb(main):009:0> index << {:title => "Programming Java", :content => "yada yada yada"} => # irb(main):010:0> index.search "document" (irb):10: [BUG] Segmentation fault ruby 1.8.4 (2005-12-24) [i386-linux] So my problem lies within ferret. Any ideas what I should try from here? Thanks for your help -John From jordan.w.frank at gmail.com Tue May 23 18:08:28 2006 From: jordan.w.frank at gmail.com (Jordan) Date: Wed, 24 May 2006 00:08:28 +0200 Subject: [Ferret-talk] acts_as_ferret 0.2.1 segfault In-Reply-To: <8bc6d8730605231322h2e6b2e43t9a91cfd400aaa9ea@mail.gmail.com> References: <8bc6d8730605231105u11c22d94v94a6ced772915b3@mail.gmail.com> <562a35c10605231236x7316e44ayf37c3bb47b0e49a@mail.gmail.com> <562a35c10605231247q692be2ees6aeb84b520e5b5b9@mail.gmail.com> <8bc6d8730605231322h2e6b2e43t9a91cfd400aaa9ea@mail.gmail.com> Message-ID: I experience the same problem, although it occurs on line 227, since the indexes are being created for the first time. Strangely enough, it works perfectly fine on my OSX Laptop, but it dies on our Fedora Core Linux server. The following is the error I get. /var/www/project/config/../vendor/plugins/acts_as_ferret/lib/acts_as_ferret.rb:227: [BUG] Segmentation fault ruby 1.8.4 (2005-12-24) [i386-linux] Aborted John Andrews wrote: > On 5/23/06, Jan Prill wrote: >> >> sorry, I just realised that you already included the request in your mail. with 'me' and 'test' there should be no locale problems. If these are the strings you are actually using my previous mail won't be of much help... > > That's right. I have also tried longer strings with no strange > characters such as "accounts payable" and had the same result. > >> Have you tried to index something by using ferret without acts_as_ferret? I would encourage you to do so by following the short tutorial at http://ferret.davebalmain.com/api/files/TUTORIAL.html . By doing this you've eliminated a 'single point of failure' and you'll find out if it is a problem of your installation of ferret. > > I thought I had, but it must have been on my OSX machine. On the > machine in question (Fedora 5) I performed the steps from the > tutorial. Similar result; this time during search instead of insert: > irb(main):002:0> require 'rubygems' > => true > irb(main):003:0> require 'ferret' > => true > irb(main):004:0> include Ferret > => Object > irb(main):005:0> index = Index::Index.new > => # > irb(main):006:0> index << "this is a new document to be indexed" > => # > irb(main):007:0> index << ["and here", "is another", "new document", > "to index"]=> # > irb(main):008:0> index << {:title => "Programming Ruby", :content => > "blah blah blah"} > => # > irb(main):009:0> index << {:title => "Programming Java", :content => > "yada yada yada"} > => # > irb(main):010:0> index.search "document" > (irb):10: [BUG] Segmentation fault > ruby 1.8.4 (2005-12-24) [i386-linux] > > So my problem lies within ferret. Any ideas what I should try from here? > Thanks for your help > -John -- Posted via http://www.ruby-forum.com/. From jan.prill at gmail.com Tue May 23 18:14:31 2006 From: jan.prill at gmail.com (Jan Prill) Date: Wed, 24 May 2006 00:14:31 +0200 Subject: [Ferret-talk] acts_as_ferret 0.2.1 segfault In-Reply-To: References: <8bc6d8730605231105u11c22d94v94a6ced772915b3@mail.gmail.com> <562a35c10605231236x7316e44ayf37c3bb47b0e49a@mail.gmail.com> <562a35c10605231247q692be2ees6aeb84b520e5b5b9@mail.gmail.com> <8bc6d8730605231322h2e6b2e43t9a91cfd400aaa9ea@mail.gmail.com> Message-ID: <562a35c10605231514y2da58adfs717bdf552270cd70@mail.gmail.com> And the interlinking element seems to be fedora... For both of you there are no problems on osx/freebsd. Installations on ubuntu don't report these problems and my install on gentoo is fine too. So we need to find out what's the problem with fedora.. Are people that are on redhat experiencing the same problems? Regards Jan On 5/24/06, Jordan wrote: > > I experience the same problem, although it occurs on line 227, since the > indexes are being created for the first time. Strangely enough, it works > perfectly fine on my OSX Laptop, but it dies on our Fedora Core Linux > server. The following is the error I get. > > > /var/www/project/config/../vendor/plugins/acts_as_ferret/lib/acts_as_ferret.rb:227: > [BUG] Segmentation fault > ruby 1.8.4 (2005-12-24) [i386-linux] > > Aborted > > > John Andrews wrote: > > On 5/23/06, Jan Prill wrote: > >> > >> sorry, I just realised that you already included the request in your > mail. with 'me' and 'test' there should be no locale problems. If these are > the strings you are actually using my previous mail won't be of much help... > > > > That's right. I have also tried longer strings with no strange > > characters such as "accounts payable" and had the same result. > > > >> Have you tried to index something by using ferret without > acts_as_ferret? I would encourage you to do so by following the short > tutorial at http://ferret.davebalmain.com/api/files/TUTORIAL.html . By > doing this you've eliminated a 'single point of failure' and you'll find out > if it is a problem of your installation of ferret. > > > > I thought I had, but it must have been on my OSX machine. On the > > machine in question (Fedora 5) I performed the steps from the > > tutorial. Similar result; this time during search instead of insert: > > irb(main):002:0> require 'rubygems' > > => true > > irb(main):003:0> require 'ferret' > > => true > > irb(main):004:0> include Ferret > > => Object > > irb(main):005:0> index = Index::Index.new > > => # > > irb(main):006:0> index << "this is a new document to be indexed" > > => # > > irb(main):007:0> index << ["and here", "is another", "new document", > > "to index"]=> # > > irb(main):008:0> index << {:title => "Programming Ruby", :content => > > "blah blah blah"} > > => # > > irb(main):009:0> index << {:title => "Programming Java", :content => > > "yada yada yada"} > > => # > > irb(main):010:0> index.search "document" > > (irb):10: [BUG] Segmentation fault > > ruby 1.8.4 (2005-12-24) [i386-linux] > > > > So my problem lies within ferret. Any ideas what I should try from here? > > Thanks for your help > > -John > > > -- > Posted via http://www.ruby-forum.com/. > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20060524/18428843/attachment.htm From jordan.w.frank at gmail.com Tue May 23 21:39:42 2006 From: jordan.w.frank at gmail.com (Jordan) Date: Wed, 24 May 2006 03:39:42 +0200 Subject: [Ferret-talk] acts_as_ferret 0.2.1 segfault In-Reply-To: <562a35c10605231514y2da58adfs717bdf552270cd70@mail.gmail.com> References: <8bc6d8730605231105u11c22d94v94a6ced772915b3@mail.gmail.com> <562a35c10605231236x7316e44ayf37c3bb47b0e49a@mail.gmail.com> <562a35c10605231247q692be2ees6aeb84b520e5b5b9@mail.gmail.com> <8bc6d8730605231322h2e6b2e43t9a91cfd400aaa9ea@mail.gmail.com> <562a35c10605231514y2da58adfs717bdf552270cd70@mail.gmail.com> Message-ID: <1b45d5a7fa8ce7cab615c8d30ca10b14@ruby-forum.com> Jan Prill wrote: > And the interlinking element seems to be fedora... For both of you there > are > no problems on osx/freebsd. Installations on ubuntu don't report these > problems and my install on gentoo is fine too. So we need to find out > what's > the problem with fedora.. Are people that are on redhat experiencing the > same problems? > > Regards > Jan I just tried rebuilding ferret with -Os instead of -O0 as was in the makefile that was generated (i'm sorta new to this whole ruby thing so I don't know how that gets created). And now it works fine. So after running ruby setup.rb config I went into ext/ modified Makefile, changed -O0 to -Os in the CFLAGS, and ran make, then went back up a directory and ran rake setup.rb and that installed it, and now everything is working just fine. Jordan -- Posted via http://www.ruby-forum.com/. From m-lists at bristav.se Wed May 24 02:02:26 2006 From: m-lists at bristav.se (Marcus Andersson) Date: Wed, 24 May 2006 08:02:26 +0200 Subject: [Ferret-talk] How to compile on Windows? Message-ID: <3fb592588e82560e76b60f4a111846da@ruby-forum.com> I need the UTF-8 support and I guess I therefore need to compile the C extension. I'm living on Windows right now and haven't done any serious compilation of C stuff ever in Windows. What is recommended setup? I have tried Visual Studio C++ Express 2005 (I think) and added all header-files from the Platform SDK (the first one was missing a number of header files). It starts to compile now but I get errors. What should I do to make it compile? Should I perhaps use MingW? Regards, Marcus Andersson -- Posted via http://www.ruby-forum.com/. From jan.prill at gmail.com Wed May 24 02:24:29 2006 From: jan.prill at gmail.com (Jan Prill) Date: Wed, 24 May 2006 08:24:29 +0200 Subject: [Ferret-talk] acts_as_ferret 0.2.1 segfault In-Reply-To: <1b45d5a7fa8ce7cab615c8d30ca10b14@ruby-forum.com> References: <8bc6d8730605231105u11c22d94v94a6ced772915b3@mail.gmail.com> <562a35c10605231236x7316e44ayf37c3bb47b0e49a@mail.gmail.com> <562a35c10605231247q692be2ees6aeb84b520e5b5b9@mail.gmail.com> <8bc6d8730605231322h2e6b2e43t9a91cfd400aaa9ea@mail.gmail.com> <562a35c10605231514y2da58adfs717bdf552270cd70@mail.gmail.com> <1b45d5a7fa8ce7cab615c8d30ca10b14@ruby-forum.com> Message-ID: <562a35c10605232324v4e9e5320n55c132067203534@mail.gmail.com> @Jordan: great, thanks for pointing this out! @John: does this the trick for you too? Jan On 5/24/06, Jordan wrote: > > Jan Prill wrote: > > And the interlinking element seems to be fedora... For both of you there > > are > > no problems on osx/freebsd. Installations on ubuntu don't report these > > problems and my install on gentoo is fine too. So we need to find out > > what's > > the problem with fedora.. Are people that are on redhat experiencing the > > same problems? > > > > Regards > > Jan > > I just tried rebuilding ferret with -Os instead of -O0 as was in the > makefile that was generated (i'm sorta new to this whole ruby thing so I > don't know how that gets created). And now it works fine. > > So after running > ruby setup.rb config > I went into ext/ modified Makefile, changed -O0 to -Os in the CFLAGS, > and ran make, then went back up a directory and ran > rake setup.rb > and that installed it, and now everything is working just fine. > > Jordan > > > -- > Posted via http://www.ruby-forum.com/. > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20060524/b54e8b84/attachment.htm From nick.snels at gmail.com Wed May 24 02:31:06 2006 From: nick.snels at gmail.com (Nick Snels) Date: Wed, 24 May 2006 08:31:06 +0200 Subject: [Ferret-talk] How to compile on Windows? In-Reply-To: <3fb592588e82560e76b60f4a111846da@ruby-forum.com> References: <3fb592588e82560e76b60f4a111846da@ruby-forum.com> Message-ID: <7e7727257ddad3968ccf3f6c90a20e65@ruby-forum.com> Hi Marcus, currently you cann't compile Ferret on Windows. For the moment you are stuck with the all Ruby version on Windows. Dave is working hard to make a Windows version of Ferret in C. So lets hope it will be finished soon. Kind regards, Nick -- Posted via http://www.ruby-forum.com/. From m-lists at bristav.se Wed May 24 03:57:13 2006 From: m-lists at bristav.se (Marcus Andersson) Date: Wed, 24 May 2006 09:57:13 +0200 Subject: [Ferret-talk] Offline indexing with Drb, online using index in Rails? Message-ID: <3b5cd0cc30c78a3bd3922b1b3d2b350f@ruby-forum.com> I have a Rails application where I need to search. Every time an update happens to something I need to update the index for the changed and related document. As I currently have to run on Windows (and therefore can't use the native extension) it is quite slow to index sometimes and the user have to wait a couple of seconds for the save operation to return. A thought is to put this outside the Rails application in an external service that is called through Drb. Is it possible to index with one application and use the index with another at the same time? Regards, Marcus -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Wed May 24 04:00:16 2006 From: kraemer at webit.de (Jens Kraemer) Date: Wed, 24 May 2006 10:00:16 +0200 Subject: [Ferret-talk] Offline indexing with Drb, online using index in Rails? In-Reply-To: <3b5cd0cc30c78a3bd3922b1b3d2b350f@ruby-forum.com> References: <3b5cd0cc30c78a3bd3922b1b3d2b350f@ruby-forum.com> Message-ID: <20060524080016.GD17901@cordoba.webit.de> On Wed, May 24, 2006 at 09:57:13AM +0200, Marcus Andersson wrote: > I have a Rails application where I need to search. Every time an update > happens to something I need to update the index for the changed and > related document. As I currently have to run on Windows (and therefore > can't use the native extension) it is quite slow to index sometimes and > the user have to wait a couple of seconds for the save operation to > return. > > A thought is to put this outside the Rails application in an external > service that is called through Drb. Is it possible to index with one > application and use the index with another at the same time? as long as only one process is writing to the index that's fine. But you have to reopen searchers to see the changes done to the index. Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From john.m.andrews at gmail.com Wed May 24 08:33:29 2006 From: john.m.andrews at gmail.com (John Andrews) Date: Wed, 24 May 2006 08:33:29 -0400 Subject: [Ferret-talk] acts_as_ferret 0.2.1 segfault In-Reply-To: <562a35c10605232324v4e9e5320n55c132067203534@mail.gmail.com> References: <8bc6d8730605231105u11c22d94v94a6ced772915b3@mail.gmail.com> <562a35c10605231236x7316e44ayf37c3bb47b0e49a@mail.gmail.com> <562a35c10605231247q692be2ees6aeb84b520e5b5b9@mail.gmail.com> <8bc6d8730605231322h2e6b2e43t9a91cfd400aaa9ea@mail.gmail.com> <562a35c10605231514y2da58adfs717bdf552270cd70@mail.gmail.com> <1b45d5a7fa8ce7cab615c8d30ca10b14@ruby-forum.com> <562a35c10605232324v4e9e5320n55c132067203534@mail.gmail.com> Message-ID: <8bc6d8730605240533l1d3e869cxc7063234e08d6d01@mail.gmail.com> I followed Jordan's steps and the tests all pass now. Thanks Jordan! Thanks Jan! On 5/24/06, Jan Prill wrote: > @Jordan: great, thanks for pointing this out! > > @John: does this the trick for you too? > > Jan > > > On 5/24/06, Jordan < jordan.w.frank at gmail.com> wrote: > > Jan Prill wrote: > > > And the interlinking element seems to be fedora... For both of you there > > > are > > > no problems on osx/freebsd. Installations on ubuntu don't report these > > > problems and my install on gentoo is fine too. So we need to find out > > > what's > > > the problem with fedora.. Are people that are on redhat experiencing the > > > same problems? > > > > > > Regards > > > Jan > > > > I just tried rebuilding ferret with -Os instead of -O0 as was in the > > makefile that was generated (i'm sorta new to this whole ruby thing so I > > don't know how that gets created). And now it works fine. > > > > So after running > > ruby setup.rb config > > I went into ext/ modified Makefile, changed -O0 to -Os in the CFLAGS, > > and ran make, then went back up a directory and ran > > rake setup.rb > > and that installed it, and now everything is working just fine. > > > > Jordan > > > > > > -- > > Posted via http://www.ruby-forum.com/. > > _______________________________________________ > > Ferret-talk mailing list > > Ferret-talk at rubyforge.org > > http://rubyforge.org/mailman/listinfo/ferret-talk > > > > > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > > From m-lists at bristav.se Wed May 24 08:35:20 2006 From: m-lists at bristav.se (Marcus Andersson) Date: Wed, 24 May 2006 14:35:20 +0200 Subject: [Ferret-talk] Offline indexing with Drb, online using index in Rails? In-Reply-To: <20060524080016.GD17901@cordoba.webit.de> References: <3b5cd0cc30c78a3bd3922b1b3d2b350f@ruby-forum.com> <20060524080016.GD17901@cordoba.webit.de> Message-ID: <8c2e1a0191a7759f6292260ef61646b2@ruby-forum.com> Jens Kraemer wrote: > On Wed, May 24, 2006 at 09:57:13AM +0200, Marcus Andersson wrote: > as long as only one process is writing to the index that's fine. But you > have to reopen searchers to see the changes done to the index. > How does that work in a normal Rails application if you run in a production environment (ie cluster of Mongrels, FCGI or SCGI where you always have a number of processes running)? I mean, every instance of the Rails env might want to add something to or read from the index. /Marcus -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Wed May 24 09:55:26 2006 From: kraemer at webit.de (Jens Kraemer) Date: Wed, 24 May 2006 15:55:26 +0200 Subject: [Ferret-talk] Offline indexing with Drb, online using index in Rails? In-Reply-To: <8c2e1a0191a7759f6292260ef61646b2@ruby-forum.com> References: <3b5cd0cc30c78a3bd3922b1b3d2b350f@ruby-forum.com> <20060524080016.GD17901@cordoba.webit.de> <8c2e1a0191a7759f6292260ef61646b2@ruby-forum.com> Message-ID: <20060524135526.GF17901@cordoba.webit.de> On Wed, May 24, 2006 at 02:35:20PM +0200, Marcus Andersson wrote: > Jens Kraemer wrote: > > On Wed, May 24, 2006 at 09:57:13AM +0200, Marcus Andersson wrote: > > as long as only one process is writing to the index that's fine. But you > > have to reopen searchers to see the changes done to the index. > > > How does that work in a normal Rails application if you run in a > production environment (ie cluster of Mongrels, FCGI or SCGI where you > always have a number of processes running)? I mean, every instance of > the Rails env might want to add something to or read from the index. there's a file based locking mechanism, Ferret's index class handles possible collisions by waiting and retrying the write after a short time. It also handles re-opening the searcher for you. If you don't use the Index class it's up to you to detect changes to the index and re-open your searcher. there's a method named latest? somewhere, you could look into the Index class to see how it's done there. Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From m-lists at bristav.se Wed May 24 11:37:56 2006 From: m-lists at bristav.se (Marcus Andersson) Date: Wed, 24 May 2006 17:37:56 +0200 Subject: [Ferret-talk] Ferret slow after a while Message-ID: I'm building a new index from scratch based on a number of documents stored in a database loaded using my Rails env (using Ruby Ferret 0.9x (installed today with Gem) on Windows). At first everything goes nice but after a number of documents it starts to go slower and slower until it grinds to a halt (at least feels like it). Am I doing something wrong? Is there some way to work around this? /Marcus Code in question: ENV['RAILS_ENV'] ||= 'development' puts "Environment : #{ENV['RAILS_ENV']}" require 'config/environment.rb' require 'ferret' index = Ferret::Index::Index.new( :path => Node.class_index_dir, :create => true) Node.find_all_by_type("PageNode").each { |content| puts "ID: #{content.id} => name: #{content.title}" index << content.to_doc if content.respond_to?("to_doc") } index.flush index.optimize index.close -- Posted via http://www.ruby-forum.com/. From jan.prill at gmail.com Wed May 24 12:56:19 2006 From: jan.prill at gmail.com (Jan Prill) Date: Wed, 24 May 2006 18:56:19 +0200 Subject: [Ferret-talk] Ferret slow after a while In-Reply-To: References: Message-ID: <562a35c10605240956o1ee257b7t51c503e741a26533@mail.gmail.com> Hi, Marcus, by using Ferret 0.9.3 on windows you are using the 'pure pure' ruby version. As I've read some time ago someone - i think it was jens kraemer - suggested that on windows downgrading to 0.3.2 might be a good idea, because this version comes with a native extension (not as feature rich as cFerret of course but a predecessor) even on windows. Pure ruby - as clean and wonderful the language is - is slow comparing it to java or C and therefore pure ruby ferret isn't really the first choice for building up an index of a large document set. Another possibility you might want to think about while waiting for cFerret on Windows could be to do the initial huge indexing batch on a linux or osx/freebsd machine, transfer the index and perform only ongoing updates on windows. Regardless what I've said before: What performance are you experiencing with your pure ruby installation? How much datasets do you need to index initially? When (after how much datasets) are you experiencing the bottleneck? Regards Jan On 5/24/06, Marcus Andersson wrote: > > I'm building a new index from scratch based on a number of documents > stored in a database loaded using my Rails env (using Ruby Ferret 0.9x > (installed today with Gem) on Windows). At first everything goes nice > but after a number of documents it starts to go slower and slower until > it grinds to a halt (at least feels like it). > > Am I doing something wrong? Is there some way to work around this? > > /Marcus > > Code in question: > > ENV['RAILS_ENV'] ||= 'development' > puts "Environment : #{ENV['RAILS_ENV']}" > > require 'config/environment.rb' > > require 'ferret' > > index = Ferret::Index::Index.new( :path => Node.class_index_dir, :create > => true) > Node.find_all_by_type("PageNode").each { |content| > puts "ID: #{content.id} => name: #{content.title}" > index << content.to_doc if content.respond_to?("to_doc") > } > index.flush > index.optimize > index.close > > -- > Posted via http://www.ruby-forum.com/. > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20060524/c80d3b16/attachment.htm From srackham at methods.co.nz Wed May 24 18:14:09 2006 From: srackham at methods.co.nz (Stuart Rackham) Date: Thu, 25 May 2006 00:14:09 +0200 Subject: [Ferret-talk] acts_as_ferret: dynamic index selection In-Reply-To: References: Message-ID: Stuart Rackham wrote: > Hi > > What is the be practice regards setting the index path dynamically? > > My application needs to switch indexes based on the logged in user, I > put the following before_filter into my controller (the > Document.index_dir method calculates the user's index path): > > before_filter { Document.ferret_configuration[:path] = > Document.index_dir } > > It seems to work, the only rough edge is that the acts_as_ferret > declaration immediately creates a spurious index directory rather than > deferring until the index is actually accessed (and the index path > known), this also means the index directory is not automagically > created. P.S. The above didn't quite work, needed to set index directory in two places for it to function when adding to the index and searching: In controller: before_filter { Document.set_index_dir } In the model: # Set the Ferret index to that of the current users account. def self.set_index_dir # Two places are used internally by acts_as_ferret. Document.ferret_configuration[:path] = Document.index_dir Document.configuration[:index_dir] = Document.index_dir end > > Cheers, Stuart > -- > Stuart Rackham -- Posted via http://www.ruby-forum.com/. From m-lists at bristav.se Thu May 25 09:22:06 2006 From: m-lists at bristav.se (Marcus Andersson) Date: Thu, 25 May 2006 15:22:06 +0200 Subject: [Ferret-talk] Ferret slow after a while In-Reply-To: <562a35c10605240956o1ee257b7t51c503e741a26533@mail.gmail.com> References: <562a35c10605240956o1ee257b7t51c503e741a26533@mail.gmail.com> Message-ID: <49b5953cd2277c3e3dab5ed52d1ea154@ruby-forum.com> Jan Prill wrote: > > Regardless what I've said before: What performance are you experiencing > with > your pure ruby installation? How much datasets do you need to index > initially? When (after how much datasets) are you experiencing the > bottleneck? > After doing quite a bit more of testing it seems that speed seems to be content dependant. The content is ugly test content it seems where someone have just made random key strokes. Content that it shokes on is down at the end. /Marcus Each document is built this way (documents may contain UTF-8 chars but I ignore that for now): class Node < ActiveRecord::Base acts_as_ferret ... end class PageNode < Node def to_doc doc = super page.content_items.each { |item| item.to_doc(doc) if item.searchable? } if page doc end end class ContentItem def to_doc(doc) doc << Ferret::Document::Field.new( 'content_item', self.content, Ferret::Document::Field::Store::NO, Ferret::Document::Field::Index::TOKENIZED) end end Content:

Huvudrubrik svart

ldfkgjdflkgjdflkgjdflgkdflgkdflgkjdflkgj

Huvudrubrik orange

sdlkfjsdfkljsdlfksjdflsjflskfjslkfjslkdfsd
fsd fsdfsd
fsdfsdfsddfdsdfsdf

Underrubrik svart

dfgfgdfgdfgdfgdfgdfgdfgdf
gdfgdgdfgkjhdfkjghdkjgh dkjghd kgjhd kgfjh d

Underrubrik orange

lkdfjgldfkgjdlfkgjdlfkgjdflkgdfg
dfgdfgdfgdfgdfgdfg

Styckerubrik svart
fghfhfghfhkfjglhkjfglhkfjhlkfjghlfkhjflkgh jflgkhjflgkhf
ghfghfgh
fgh
fghfghfghgfh

Styckerubrik orange
fghkfgjhlfgkjhflkghj flghkjfgl hkjfg lhkfgjhlfgkhfghfgh
-- Posted via http://www.ruby-forum.com/. From jan.prill at gmail.com Thu May 25 09:50:43 2006 From: jan.prill at gmail.com (Jan Prill) Date: Thu, 25 May 2006 15:50:43 +0200 Subject: [Ferret-talk] Ferret slow after a while In-Reply-To: <49b5953cd2277c3e3dab5ed52d1ea154@ruby-forum.com> References: <562a35c10605240956o1ee257b7t51c503e741a26533@mail.gmail.com> <49b5953cd2277c3e3dab5ed52d1ea154@ruby-forum.com> Message-ID: <562a35c10605250650l1813b9ax84ad8143fea04e9@mail.gmail.com> Hi, Marcus, I don't know too much about the internals of ferret. But I'm not too much surprised that ferret is choking on this 'content'. As all fulltext search engines ferret will presume that it's human readable language that is going to be indexed. It would be only because of coincidence that tests of the stemming, analyzing (and so on) algorithms won't fail, which results in lengthy parsings at least. Is it only because of problems to get 'real world' test content? You'll find loads of content on http://www.gutenberg.org/ for example... Regards Jan On 5/25/06, Marcus Andersson wrote: > > Jan Prill wrote: > > > > Regardless what I've said before: What performance are you experiencing > > with > > your pure ruby installation? How much datasets do you need to index > > initially? When (after how much datasets) are you experiencing the > > bottleneck? > > > After doing quite a bit more of testing it seems that speed seems to be > content dependant. The content is ugly test content it seems where > someone have just made random key strokes. > > Content that it shokes on is down at the end. > > /Marcus > > Each document is built this way (documents may contain UTF-8 chars but I > ignore that for now): > > class Node < ActiveRecord::Base > acts_as_ferret ... > end > > class PageNode < Node > def to_doc > doc = super > page.content_items.each { |item| item.to_doc(doc) if > item.searchable? } if page > doc > end > end > > class ContentItem > def to_doc(doc) > doc << Ferret::Document::Field.new( > 'content_item', self.content, > Ferret::Document::Field::Store::NO, > Ferret::Document::Field::Index::TOKENIZED) > end > end > > Content: > > > > > > > >

Huvudrubrik > svart

ldfkgjdflkgjdflkgjdflgkdflgkdflgkjdflkgj

Huvudrubrik > orange

sdlkfjsdfkljsdlfksjdflsjflskfjslkfjslkdfsd
fsd > fsdfsd
fsdfsdfsddfdsdfsdf

Underrubrik > svart

dfgfgdfgdfgdfgdfgdfgdfgdf
gdfgdgdfgkjhdfkjghdkjgh dkjghd > kgjhd kgfjh d

Underrubrik > > orange

lkdfjgldfkgjdlfkgjdlfkgjdflkgdfg
dfgdfgdfgdfgdfgdfg

Styckerubrik > svart
fghfhfghfhkfjglhkjfglhkfjhlkfjghlfkhjflkgh > jflgkhjflgkhf
ghfghfgh
fgh
fghfghfghgfh

Styckerubrik > orange
fghkfgjhlfgkjhflkghj flghkjfgl hkjfg lhkfgjhlfgkhfghfgh
> > > > > > > -- > Posted via http://www.ruby-forum.com/. > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20060525/81b01868/attachment-0001.htm From m-lists at bristav.se Thu May 25 11:15:30 2006 From: m-lists at bristav.se (Marcus Andersson) Date: Thu, 25 May 2006 17:15:30 +0200 Subject: [Ferret-talk] Ferret slow after a while In-Reply-To: <562a35c10605250650l1813b9ax84ad8143fea04e9@mail.gmail.com> References: <562a35c10605240956o1ee257b7t51c503e741a26533@mail.gmail.com> <49b5953cd2277c3e3dab5ed52d1ea154@ruby-forum.com> <562a35c10605250650l1813b9ax84ad8143fea04e9@mail.gmail.com> Message-ID: This is actually content from the customer's database. Most of the content in the database is real (it's actually in live deployment). Problem seems to be that they have created a number of test pages in the beginning that is still there. How do I as a developer ensure that the content isn't of a form that Ferret chokes on? I mean, even if I take the test data out now, I cannot guarantee someone else will put similar data into the database again. Then it's me, the developer, who will take the blame when search isn't working. It must be possible to either: - Somehow test the data before indexing to ensure it's not "deadly" - The indexing algorithm should skip after a (configurable) time if it's stuck on a small chunk of data. (or something like it) Would it help in this case to replace -tags with spaces (as those aren't significant anyway)? Regards Marcus ps. Thanks for the comments. -- Posted via http://www.ruby-forum.com/. From m-lists at bristav.se Thu May 25 11:23:07 2006 From: m-lists at bristav.se (Marcus Andersson) Date: Thu, 25 May 2006 17:23:07 +0200 Subject: [Ferret-talk] Ferret slow after a while In-Reply-To: References: <562a35c10605240956o1ee257b7t51c503e741a26533@mail.gmail.com> <49b5953cd2277c3e3dab5ed52d1ea154@ruby-forum.com> <562a35c10605250650l1813b9ax84ad8143fea04e9@mail.gmail.com> Message-ID: <0431a699a0cb637caf9a0197e4c1bf71@ruby-forum.com> Marcus Andersson wrote: > > Would it help in this case to replace -tags with spaces (as those > aren't significant anyway)? Answering to myself here: No, it don't (after testing...) Marcus -- Posted via http://www.ruby-forum.com/. From m-lists at bristav.se Thu May 25 12:21:12 2006 From: m-lists at bristav.se (Marcus Andersson) Date: Thu, 25 May 2006 18:21:12 +0200 Subject: [Ferret-talk] Ferret slow after a while In-Reply-To: <0431a699a0cb637caf9a0197e4c1bf71@ruby-forum.com> References: <562a35c10605240956o1ee257b7t51c503e741a26533@mail.gmail.com> <49b5953cd2277c3e3dab5ed52d1ea154@ruby-forum.com> <562a35c10605250650l1813b9ax84ad8143fea04e9@mail.gmail.com> <0431a699a0cb637caf9a0197e4c1bf71@ruby-forum.com> Message-ID: More testing: This document (with several fields in it) took 15 seconds to index: Field: new item Field: Presentationsmaterial Field: Ppt-presentationer Field:   Field: new item Field: new item Field: new item A bit long for that little content if you ask me. I have several similar documents that take a lot of time ("new item" is an ugly default value that all content items get from the beginning, don't ask me why, does it affect indexing speed when a lot of documents contains similar tokens?). But, I don't know. I'm using the Ruby version. That is supposed to be slow. Maybe the super fast C implementation should take 150ms to handle a document of this size? What affects indexing speed? Regards, Marcus -- Posted via http://www.ruby-forum.com/. From jan.prill at gmail.com Thu May 25 12:30:27 2006 From: jan.prill at gmail.com (Jan Prill) Date: Thu, 25 May 2006 18:30:27 +0200 Subject: [Ferret-talk] Ferret slow after a while In-Reply-To: References: <562a35c10605240956o1ee257b7t51c503e741a26533@mail.gmail.com> <49b5953cd2277c3e3dab5ed52d1ea154@ruby-forum.com> <562a35c10605250650l1813b9ax84ad8143fea04e9@mail.gmail.com> <0431a699a0cb637caf9a0197e4c1bf71@ruby-forum.com> Message-ID: <562a35c10605250930k21ce810cse07e97f3cb40b0bd@mail.gmail.com> Hi, Marcus, as you may read in http://ferret.davebalmain.com/trac/wiki/MyFirstBenchmarkthe indexing of 408MB project gutenberg files took around 1min. To give you an impression of the indexing speed. I haven't got the time right now to test the performance on a windows box and with cFerret. Maybe anyone else is possible to jump in. but 15 seconds for this document is obviously strange. cheers, Jan On 5/25/06, Marcus Andersson wrote: > > More testing: > > This document (with several fields in it) took 15 seconds to index: > Field: new item > Field: Presentationsmaterial > Field: Ppt-presentationer > Field:   > Field: new item > Field: new item > Field: new item > > A bit long for that little content if you ask me. I have several similar > documents that take a lot of time ("new item" is an ugly default value > that all content items get from the beginning, don't ask me why, does it > affect indexing speed when a lot of documents contains similar tokens?). > > But, I don't know. I'm using the Ruby version. That is supposed to be > slow. Maybe the super fast C implementation should take 150ms to handle > a document of this size? What affects indexing speed? > > Regards, > Marcus > > -- > Posted via http://www.ruby-forum.com/. > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20060525/a5c08db9/attachment.htm From jan.prill at gmail.com Thu May 25 12:51:49 2006 From: jan.prill at gmail.com (Jan Prill) Date: Thu, 25 May 2006 18:51:49 +0200 Subject: [Ferret-talk] Ferret slow after a while In-Reply-To: <562a35c10605250930k21ce810cse07e97f3cb40b0bd@mail.gmail.com> References: <562a35c10605240956o1ee257b7t51c503e741a26533@mail.gmail.com> <49b5953cd2277c3e3dab5ed52d1ea154@ruby-forum.com> <562a35c10605250650l1813b9ax84ad8143fea04e9@mail.gmail.com> <0431a699a0cb637caf9a0197e4c1bf71@ruby-forum.com> <562a35c10605250930k21ce810cse07e97f3cb40b0bd@mail.gmail.com> Message-ID: <562a35c10605250951k70ee1c68qddf61d6ed35b6f1@mail.gmail.com> Hi, Marc, if it would be of any help to you and you've got the time to make some preperations you might send me a test.sql (or migration) with a little testdata and your essential AR-models. Then I may test it on a windows box and we are able to compare the results... cheers, Jan On 5/25/06, Jan Prill wrote: > > Hi, Marcus, > > as you may read in > http://ferret.davebalmain.com/trac/wiki/MyFirstBenchmark the indexing of > 408MB project gutenberg files took around 1min. To give you an impression of > the indexing speed. > > I haven't got the time right now to test the performance on a windows box > and with cFerret. Maybe anyone else is possible to jump in. but 15 seconds > for this document is obviously strange. > > cheers, > Jan > > On 5/25/06, Marcus Andersson wrote: > > > More testing: > > > > This document (with several fields in it) took 15 seconds to index: > > Field: new item > > Field: Presentationsmaterial > > Field: Ppt-presentationer > > Field:   > > Field: new item > > Field: new item > > Field: new item > > > > A bit long for that little content if you ask me. I have several similar > > documents that take a lot of time ("new item" is an ugly default value > > that all content items get from the beginning, don't ask me why, does it > > > > affect indexing speed when a lot of documents contains similar tokens?). > > > > But, I don't know. I'm using the Ruby version. That is supposed to be > > slow. Maybe the super fast C implementation should take 150ms to handle > > a document of this size? What affects indexing speed? > > > > Regards, > > Marcus > > > > -- > > Posted via http://www.ruby-forum.com/. > > _______________________________________________ > > Ferret-talk mailing list > > Ferret-talk at rubyforge.org > > http://rubyforge.org/mailman/listinfo/ferret-talk > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20060525/8be84348/attachment.htm From cknox at ualberta.ca Thu May 25 23:21:19 2006 From: cknox at ualberta.ca (Craig Knox) Date: Fri, 26 May 2006 05:21:19 +0200 Subject: [Ferret-talk] multi_search will not work Message-ID: <5966bdb1e327fcb21de38d15ff8b0459@ruby-forum.com> I am trying to get multi_search to search across multiple models. I have the following: class Drug < ActiveRecord::Base acts_as_ferret :store_class_name => true end class Target < ActiveRecord::Base acts_as_ferret :store_class_name => true end and I am trying to query via: @drugs = Drug.multi_search(params[:query], [Target]) But I get no results. If I go: @drugs = Drug.find_by_contents(params[:query]) I get results back no problem (same if I just query the target), but even after I delete the index and rebuild both the drug and target index I get nothing back. I don't even get any results if I go: @drugs = Drug.multi_search(params[:query]) I have tried this with the latest trunk build and the latest stable version. I am really at a loss, and was hoping someone might have some suggestions. Thanks -- Posted via http://www.ruby-forum.com/. From jeroenbulters at gmail.com Fri May 26 13:38:27 2006 From: jeroenbulters at gmail.com (Jeroen Bulters) Date: Fri, 26 May 2006 19:38:27 +0200 Subject: [Ferret-talk] Comparing two documents in the index Message-ID: <6d8c30df2d3e40b7eb3c2822cb78f5ab@ruby-forum.com> I want to compare two documents in the index (i.e. retrieve the cosine similarity/score between two documents term-vector's). Is this possible using the standard Ferret functionality? Thanks in advance, Jeroen Bulters -- Posted via http://www.ruby-forum.com/. From m-lists at bristav.se Fri May 26 13:50:57 2006 From: m-lists at bristav.se (Marcus Andersson) Date: Fri, 26 May 2006 19:50:57 +0200 Subject: [Ferret-talk] Ferret slow after a while In-Reply-To: <562a35c10605250951k70ee1c68qddf61d6ed35b6f1@mail.gmail.com> References: <562a35c10605240956o1ee257b7t51c503e741a26533@mail.gmail.com> <49b5953cd2277c3e3dab5ed52d1ea154@ruby-forum.com> <562a35c10605250650l1813b9ax84ad8143fea04e9@mail.gmail.com> <0431a699a0cb637caf9a0197e4c1bf71@ruby-forum.com> <562a35c10605250930k21ce810cse07e97f3cb40b0bd@mail.gmail.com> <562a35c10605250951k70ee1c68qddf61d6ed35b6f1@mail.gmail.com> Message-ID: <9a4572b0df623e2daf8a5d306f12e985@ruby-forum.com> Jan Prill wrote: > Hi, Marc, > > if it would be of any help to you and you've got the time to make some > preperations you might send me a test.sql (or migration) with a little > testdata and your essential AR-models. Then I may test it on a windows > box > and we are able to compare the results... > > cheers, > Jan Thanks for your time. I think I wait for the windows C version though. Implemented an ugly straight db search for the time being. Regards, Marcus -- Posted via http://www.ruby-forum.com/. From dhwen at eml.cc Fri May 26 17:24:48 2006 From: dhwen at eml.cc (Donghui Wen) Date: Fri, 26 May 2006 23:24:48 +0200 Subject: [Ferret-talk] Could not get all matched docs Message-ID: <0cecd8fc338b58e15f68bac7eb2920a0@ruby-forum.com> Hi, I used Ferret to build an index and try to search here. But it looks like it always return only 10 matches results, event the return number of search_each is more than 10. Did I do something wrong? ------------------ n = index.search_each("ruby") do |doc, score| puts index[doc]['file'] end print n ------------------ I can only see 10 lines' output. But n is 214 in the case. Thanks in advance for your help! Donghui -- Posted via http://www.ruby-forum.com/. From dhwen at eml.cc Fri May 26 18:07:12 2006 From: dhwen at eml.cc (Donghui Wen) Date: Sat, 27 May 2006 00:07:12 +0200 Subject: [Ferret-talk] Could not get all matched docs In-Reply-To: <0cecd8fc338b58e15f68bac7eb2920a0@ruby-forum.com> References: <0cecd8fc338b58e15f68bac7eb2920a0@ruby-forum.com> Message-ID: <08a700c30e9af501b5a94b966e600514@ruby-forum.com> Ok, I just figured it out, I need to add the num_docs parameter: index.search_each('content:"NetScreen"', :num_docs=> 200 Donghui Donghui Wen wrote: > Hi, > I used Ferret to build an index and try to search here. > But it looks like it always return only 10 matches results, > event the return number of search_each is more than 10. > Did I do something wrong? > ------------------ > > n = index.search_each("ruby") do |doc, score| > puts index[doc]['file'] > end > > print n > > ------------------ > > I can only see 10 lines' output. > But n is 214 in the case. > > Thanks in advance for your help! > > Donghui -- Posted via http://www.ruby-forum.com/. From dbalmain.ml at gmail.com Fri May 26 19:07:38 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Sat, 27 May 2006 08:07:38 +0900 Subject: [Ferret-talk] Ferret slow after a while In-Reply-To: References: <562a35c10605240956o1ee257b7t51c503e741a26533@mail.gmail.com> <49b5953cd2277c3e3dab5ed52d1ea154@ruby-forum.com> <562a35c10605250650l1813b9ax84ad8143fea04e9@mail.gmail.com> <0431a699a0cb637caf9a0197e4c1bf71@ruby-forum.com> Message-ID: On 5/26/06, Marcus Andersson wrote: > More testing: > > This document (with several fields in it) took 15 seconds to index: > Field: new item > Field: Presentationsmaterial > Field: Ppt-presentationer > Field:   > Field: new item > Field: new item > Field: new item > > A bit long for that little content if you ask me. I have several similar > documents that take a lot of time ("new item" is an ugly default value > that all content items get from the beginning, don't ask me why, does it > affect indexing speed when a lot of documents contains similar tokens?). > > But, I don't know. I'm using the Ruby version. That is supposed to be > slow. Maybe the super fast C implementation should take 150ms to handle > a document of this size? What affects indexing speed? Hi Marcus, I just tested this here; require 'lib/rferret.rb' include Ferret include Ferret::Document include Ferret::Index doc = Document.new doc << Field.new(:field, "new item") doc << Field.new(:field, "Presentationsmaterial") doc << Field.new(:field, "Ppt-presentationer") doc << Field.new(:field, " ") doc << Field.new(:field, "new item") doc << Field.new(:field, "new item") doc << Field.new(:field, "new item") i = Index.new(:path => "index_dir") i << doc i.close dbalmain at ubuntu:~/workspace/ferret $ time ruby test.rb real 0m0.147s user 0m0.125s sys 0m0.022s This is with the pure ruby version. If this document is taking 15 seconds then something is going wrong. Similarly the bad data should hurt indexing speed considerably although it will make your index larger than usual and merging will take a little longer. Could you post a simple testcase that takes a long time for you? Cheers, Dave From dbalmain.ml at gmail.com Fri May 26 19:55:36 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Sat, 27 May 2006 08:55:36 +0900 Subject: [Ferret-talk] Comparing two documents in the index In-Reply-To: <6d8c30df2d3e40b7eb3c2822cb78f5ab@ruby-forum.com> References: <6d8c30df2d3e40b7eb3c2822cb78f5ab@ruby-forum.com> Message-ID: On 5/27/06, Jeroen Bulters wrote: > I want to compare two documents in the index (i.e. retrieve the cosine > similarity/score between two documents term-vector's). Is this possible > using the standard Ferret functionality? Hi Jeroen, No problem. Make sure you store term-vectors when you add the field. That is; doc.add_field(:field, "yada yada yada", Field::Store::NO, # or YES Field::Index::TOKENIZED, # or UNTOKENIZED Field::TermVector::YES) # or anything else but NO Then you can retrieve the term vector from an index reader like so; term_vector = index_reader.get_term_vector(doc_num, :field) terms = term_vector.terms # array of terms in :field in document freqs = term_vector.freqs # array of corresponding frequencies Hope that helps. Is that enough to get you going? Cheers, Dave From jeroenbulters at gmail.com Sat May 27 09:09:33 2006 From: jeroenbulters at gmail.com (Jeroen Bulters) Date: Sat, 27 May 2006 15:09:33 +0200 Subject: [Ferret-talk] Comparing two documents in the index In-Reply-To: References: <6d8c30df2d3e40b7eb3c2822cb78f5ab@ruby-forum.com> Message-ID: <22885eab86c37b76ce3354c4568cffe3@ruby-forum.com> David Balmain wrote: > doc.add_field(:field, "yada yada yada", > Field::Store::NO, # or YES > Field::Index::TOKENIZED, # or UNTOKENIZED > Field::TermVector::YES) # or anything else but NO I got this far: ------ BEGIN CODE SNIPPET ------ # Read weblog data weblogs = YAML::load(File.open("weblogs.yml")) # Walk over weblogs and save all data. print "--- Analyzing weblogs:\n" weblogs.each do |weblog, id| content = "" print " * Indexing weblog #{weblog}/#{id} " # Load the appropriate file for parsing. weblogdata = YAML::load(File.open("./data/#{id}")) weblogdata[:posts].each do |id, post| # Clean up content # by removing all UBB blocks. This will cut-out some content. I consider this # loss a plus :D content = content + "\n\n" + post[:text].gsub(/\[[^\]]+\][^\[]+\[[^\]]+\]/i, "") #content.gsub!(/\[[^\]]+\][^\[]+\[[^\]]+\]/i, "") end # Create a new document doc = Document.new doc.add_field(:id, weblog, Field::Store::YES, Field::Index::TOKENIZED, Field::TermVector::NO) doc.add_field(:content, content, Field::Store::NO, Field::Index::TOKENIZED, Field::TermVector::YES) # And add to the index. index << doc index.flush print "done.\n" end ------ END CODE SNIPPET ------ I Index about 23000 weblogs with their weblog id as the document id and the content by termvector. Now I want to compare two weblogs. So what you suggest is that I retrieve the term-vectors for both documents and calculate the dotproduct of the two vectors myself; or is there a nice Ferret-way to do this? Thanks in advance, Jeroen Bulters -- Posted via http://www.ruby-forum.com/. From dbalmain.ml at gmail.com Sat May 27 10:55:19 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Sat, 27 May 2006 23:55:19 +0900 Subject: [Ferret-talk] Comparing two documents in the index In-Reply-To: <22885eab86c37b76ce3354c4568cffe3@ruby-forum.com> References: <6d8c30df2d3e40b7eb3c2822cb78f5ab@ruby-forum.com> <22885eab86c37b76ce3354c4568cffe3@ruby-forum.com> Message-ID: On 5/27/06, Jeroen Bulters wrote: > I Index about 23000 weblogs with their weblog id as the document id and > the content by termvector. Now I want to compare two weblogs. So what > you suggest is that I retrieve the term-vectors for both documents and > calculate the dotproduct of the two vectors myself; or is there a nice > Ferret-way to do this? Until now I haven't really used the TermVectors so this probably isn't the best way to do it but here goes (this is very rough); def cosine_similarity(index_reader, doc1, doc2) tv1 = index_reader.get_term_vector(doc1, :data) terms1 = tv1.terms freqs1 = tv1.freqs matrix = {} terms1.size.times {|i| matrix[terms1[i]] = [freqs1[i], 0]} tv2 = index_reader.get_term_vector(doc2, :data) terms2 = tv2.terms freqs2 = tv2.freqs terms2.size.times {|i| (matrix[terms2[i]] ||= [0])[1] = freqs2[i]} dot_product = matrix.values.inject(0) {|dp, (a,b)| dp += a*b} lengths_product = Math.sqrt(freqs1.inject(0) {|sp, f| sp += f*f} * freqs2.inject(0) {|sp, f| sp += f*f}) dot_product / lengths_product end I'd be interested to hear how you go with this. If performance is poor I can add something like this to the C code. Hope this helps, Dave From jeroenbulters at gmail.com Sat May 27 11:40:58 2006 From: jeroenbulters at gmail.com (Jeroen Bulters) Date: Sat, 27 May 2006 17:40:58 +0200 Subject: [Ferret-talk] Comparing two documents in the index In-Reply-To: References: <6d8c30df2d3e40b7eb3c2822cb78f5ab@ruby-forum.com> <22885eab86c37b76ce3354c4568cffe3@ruby-forum.com> Message-ID: David Balmain wrote: > Until now I haven't really used the TermVectors so this probably isn't > the best way to do it but here goes (this is very rough); I'm going to try this out now. I'll also try extracting all term vectors from doc1 and using them as a query on doc2 (using a BooleanQuery). They use this kind of method in "Lucene in Action" (somewhere around page 190 if I recall correctly). Thanks for your quick responses; I'll let you know how things work out. Cheers, Jeroen Bulters -- Posted via http://www.ruby-forum.com/. From dbalmain.ml at gmail.com Sat May 27 18:36:25 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Sun, 28 May 2006 07:36:25 +0900 Subject: [Ferret-talk] Comparing two documents in the index In-Reply-To: References: <6d8c30df2d3e40b7eb3c2822cb78f5ab@ruby-forum.com> <22885eab86c37b76ce3354c4568cffe3@ruby-forum.com> Message-ID: On 5/28/06, Jeroen Bulters wrote: > David Balmain wrote: > > Until now I haven't really used the TermVectors so this probably isn't > > the best way to do it but here goes (this is very rough); > > I'm going to try this out now. I'll also try extracting all term vectors > from doc1 and using them as a query on doc2 (using a BooleanQuery). They > use this kind of method in "Lucene in Action" (somewhere around page 190 > if I recall correctly). If it's a "More Like This" query that you are trying to write, I recommend you look at the Lucene code here; http://svn.apache.org/viewvc/lucene/java/branches/lucene_2_0/contrib/similarity/src/java/org/apache/lucene/search/similar/MoreLikeThis.java?revision=409698&view=markup It's part of Lucene 2.0 now. I'll be adding MoreLikeThis Queries in the near future. Cheers, Dave From jeroenbulters at gmail.com Sun May 28 08:37:57 2006 From: jeroenbulters at gmail.com (Jeroen Bulters) Date: Sun, 28 May 2006 14:37:57 +0200 Subject: [Ferret-talk] Comparing two documents in the index In-Reply-To: References: <6d8c30df2d3e40b7eb3c2822cb78f5ab@ruby-forum.com> <22885eab86c37b76ce3354c4568cffe3@ruby-forum.com> Message-ID: <3b79c5115689d56787a6edeb81d1daf0@ruby-forum.com> Yes it is a more like this query, but: I only want the relevance score for document B given document A as the query (so weblog:B AND all_terms_from_A) I'll look into it; thesis is due in 4 weeks so I've got loads of time :D Cheers, Jeroen Bulters -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Mon May 29 03:33:51 2006 From: kraemer at webit.de (Jens Kraemer) Date: Mon, 29 May 2006 09:33:51 +0200 Subject: [Ferret-talk] Comparing two documents in the index In-Reply-To: References: <6d8c30df2d3e40b7eb3c2822cb78f5ab@ruby-forum.com> <22885eab86c37b76ce3354c4568cffe3@ruby-forum.com> Message-ID: <20060529073351.GI17901@cordoba.webit.de> On Sun, May 28, 2006 at 07:36:25AM +0900, David Balmain wrote: > On 5/28/06, Jeroen Bulters wrote: > > David Balmain wrote: > > > Until now I haven't really used the TermVectors so this probably isn't > > > the best way to do it but here goes (this is very rough); > > > > I'm going to try this out now. I'll also try extracting all term vectors > > from doc1 and using them as a query on doc2 (using a BooleanQuery). They > > use this kind of method in "Lucene in Action" (somewhere around page 190 > > if I recall correctly). > > If it's a "More Like This" query that you are trying to write, I > recommend you look at the Lucene code here; > > http://svn.apache.org/viewvc/lucene/java/branches/lucene_2_0/contrib/similarity/src/java/org/apache/lucene/search/similar/MoreLikeThis.java?revision=409698&view=markup or you check out the port of this that lives in acts_as_ferret :-) http://projects.jkraemer.net/acts_as_ferret/browser/trunk/plugin/acts_as_ferret/lib/acts_as_ferret.rb from Line 525 till around 720. > It's part of Lucene 2.0 now. I'll be adding MoreLikeThis Queries in > the near future. Dave, that's a nice idea. Should I try to prepare a patch for this based on what I did in acts_as_ferret ? Would be ruby-only, though. But as the whole more like this thing more or less is about building a BooleanQuery, I think speed is no issue here. Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From kraemer at webit.de Mon May 29 03:41:00 2006 From: kraemer at webit.de (Jens Kraemer) Date: Mon, 29 May 2006 09:41:00 +0200 Subject: [Ferret-talk] multi_search will not work In-Reply-To: <5966bdb1e327fcb21de38d15ff8b0459@ruby-forum.com> References: <5966bdb1e327fcb21de38d15ff8b0459@ruby-forum.com> Message-ID: <20060529074100.GJ17901@cordoba.webit.de> Hi Craig, could you please try to explicitly state the fields you want to index, i.e. acts_as_ferret(:fiels => [ 'content', 'extended_content' ], :store_class_name => true) just a guess but this could work around the problem. Cheers, Jens On Fri, May 26, 2006 at 05:21:19AM +0200, Craig Knox wrote: > I am trying to get multi_search to search across multiple models. I > have the following: > > class Drug < ActiveRecord::Base > acts_as_ferret :store_class_name => true > end > > class Target < ActiveRecord::Base > acts_as_ferret :store_class_name => true > end > > and I am trying to query via: > @drugs = Drug.multi_search(params[:query], [Target]) > > But I get no results. If I go: > @drugs = Drug.find_by_contents(params[:query]) > > I get results back no problem (same if I just query the target), but > even after I delete the index and rebuild both the drug and target index > I get nothing back. I don't even get any results if I go: > @drugs = Drug.multi_search(params[:query]) > > I have tried this with the latest trunk build and the latest stable > version. I am really at a loss, and was hoping someone might have some > suggestions. > > Thanks > > -- > Posted via http://www.ruby-forum.com/. > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 From dbalmain.ml at gmail.com Mon May 29 03:54:48 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Mon, 29 May 2006 16:54:48 +0900 Subject: [Ferret-talk] Comparing two documents in the index In-Reply-To: <20060529073351.GI17901@cordoba.webit.de> References: <6d8c30df2d3e40b7eb3c2822cb78f5ab@ruby-forum.com> <22885eab86c37b76ce3354c4568cffe3@ruby-forum.com> <20060529073351.GI17901@cordoba.webit.de> Message-ID: On 5/29/06, Jens Kraemer wrote: > > > On Sun, May 28, 2006 at 07:36:25AM +0900, David Balmain wrote: > > It's part of Lucene 2.0 now. I'll be adding MoreLikeThis Queries in > > the near future. > > Dave, that's a nice idea. Should I try to prepare a patch for this based > on what I did in acts_as_ferret ? Would be ruby-only, though. But as the > whole more like this thing more or less is about building a BooleanQuery, > I think speed is no issue here. Hi Jens, That'd be great but not just yet. I may be making a few adjustments to the API in the coming week. I'll be sure to discuss possible changes with you guys when the time comes. Gotta run. Cheers, Dave From atomgiant at gmail.com Tue May 30 07:48:54 2006 From: atomgiant at gmail.com (Tom Davies) Date: Tue, 30 May 2006 07:48:54 -0400 Subject: [Ferret-talk] Rebuild Indexes Issue Message-ID: Hi, I have some code to rebuild my ferret indexes but occasionally some stale documents remain in the index even after rebuilding. The only way I could find around this is to manually delete the index files from the filesystem. Here is the code I have for rebuilding one of my indexes for Gifts: # delete existing entries INDEX.size.times {|i| INDEX.delete(i)} gifts = Gift.find(:all) if (gifts) gifts.each do |e| INDEX << self.to_doc end end INDEX.flush() Does this look ok? I have verified this behavior on Ferret 0.3.2 and 0.9 on windows. Thanks, Tom Davies http://blog.atomgiant.com http://gifthat.com From dbalmain.ml at gmail.com Tue May 30 08:41:50 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Tue, 30 May 2006 21:41:50 +0900 Subject: [Ferret-talk] Rebuild Indexes Issue In-Reply-To: References: Message-ID: On 5/30/06, Tom Davies wrote: > Hi, > > I have some code to rebuild my ferret indexes but occasionally some > stale documents remain in the index even after rebuilding. The only > way I could find around this is to manually delete the index files > from the filesystem. Here is the code I have for rebuilding one of my > indexes for Gifts: > > # delete existing entries > INDEX.size.times {|i| INDEX.delete(i)} > > gifts = Gift.find(:all) > if (gifts) > gifts.each do |e| > INDEX << self.to_doc > end > end > INDEX.flush() > > Does this look ok? I have verified this behavior on Ferret 0.3.2 and > 0.9 on windows. Hi Tom, This looks fine. Personally, I would just reopen the index with :create => true rather than deleting all the documents but it should still work. I have no idea why stale documents would be remaining in the index. Do you have more than one process writing to the index? Can you narrow this behaviour down to a simple test case? Cheers, Dave From atomgiant at gmail.com Tue May 30 09:14:10 2006 From: atomgiant at gmail.com (Tom Davies) Date: Tue, 30 May 2006 09:14:10 -0400 Subject: [Ferret-talk] Rebuild Indexes Issue In-Reply-To: References: Message-ID: Hi Dave, Only one process is writing to the index in this case. If I can narrow it down to a specific example I will try to roll it into a test case. In the mean time I will try the :create => true and perhaps this may bypass the issue altogether. The reason I was deleting the documents that way was based on one of the How Tos on your Trac wiki. I would give you a specific link but your site seems to be giving 500s at the moment. Perhaps the :create => true syntax should also be listed there? Thanks again. Tom On 5/30/06, David Balmain wrote: > On 5/30/06, Tom Davies wrote: > > Hi, > > > > I have some code to rebuild my ferret indexes but occasionally some > > stale documents remain in the index even after rebuilding. The only > > way I could find around this is to manually delete the index files > > from the filesystem. Here is the code I have for rebuilding one of my > > indexes for Gifts: > > > > # delete existing entries > > INDEX.size.times {|i| INDEX.delete(i)} > > > > gifts = Gift.find(:all) > > if (gifts) > > gifts.each do |e| > > INDEX << self.to_doc > > end > > end > > INDEX.flush() > > > > Does this look ok? I have verified this behavior on Ferret 0.3.2 and > > 0.9 on windows. > > Hi Tom, > > This looks fine. Personally, I would just reopen the index with > :create => true rather than deleting all the documents but it should > still work. I have no idea why stale documents would be remaining in > the index. Do you have more than one process writing to the index? Can > you narrow this behaviour down to a simple test case? > > Cheers, > Dave > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > -- Tom Davies http://blog.atomgiant.com http://gifthat.com From dbalmain.ml at gmail.com Tue May 30 09:17:30 2006 From: dbalmain.ml at gmail.com (David Balmain) Date: Tue, 30 May 2006 22:17:30 +0900 Subject: [Ferret-talk] Rebuild Indexes Issue In-Reply-To: References: Message-ID: On 5/30/06, Tom Davies wrote: > The reason I was deleting the documents that way was based on one of > the How Tos on your Trac wiki. I would give you a specific link but > your site seems to be giving 500s at the moment. Perhaps the :create > => true syntax should also be listed there? I've been battling with the server all day. I'll check the wiki out when I get it up again. Cheers, Dave From sunshine82 at yeah.net Wed May 31 23:53:20 2006 From: sunshine82 at yeah.net (ferret user) Date: Thu, 1 Jun 2006 05:53:20 +0200 Subject: [Ferret-talk] about WildQuery ! Message-ID: <0906c2b97b2b8ec30a59f99d91fbb94f@ruby-forum.com> when i use WildQuery ,i was so slowly!! the query string like this : 'name|title:*test*' i search field 'name' and 'title' what include string 'test' it worked ,but too slow but when i use query string like this : 'name|title:test*' or 'name|title:*test' it worked fast my english is poor,thanks -- Posted via http://www.ruby-forum.com/.