From plynchnlm at gmail.com Fri Feb 1 14:49:22 2008 From: plynchnlm at gmail.com (Paul Lynch) Date: Fri, 1 Feb 2008 20:49:22 +0100 Subject: [Ferret-talk] Loading an index into memory? Message-ID: Is there a way to tell Ferret (or AAF) to read an entire index into memory, for faster searching? I have small index (3-4 MB) that could be kept in memory. Or does Ferret do that automatically? Thanks, --Paul -- Posted via http://www.ruby-forum.com/. From plynchnlm at gmail.com Fri Feb 1 14:58:08 2008 From: plynchnlm at gmail.com (Paul Lynch) Date: Fri, 1 Feb 2008 20:58:08 +0100 Subject: [Ferret-talk] find_with_ferret and HABTM assocs In-Reply-To: <1f73733371513ebd72e234f2a9f912cd@ruby-forum.com> References: <1f73733371513ebd72e234f2a9f912cd@ruby-forum.com> Message-ID: Could you be more specific about what is not working? -- Are your association methods between Role and User being set up correctly? (If not, make sure your statement is "has_and_belongs_to_many", and not "hasandbelongstomany".) -- Are you storing the data for role_name in the index? If so, can you see it there? I'm not familiar with the find_with_ferret method (because I use index.search_each). However, in my code I do have a case where I do something like your role_name method, and the output gets indexed. --Paul Rolf Guescini wrote: > Hi there, I have now read a lot of pages on doing find_with_ferret with > associations, and still can't grasp how to solve my specific situation. > I might be dense when not seeing how to apply the info I have read to my > situation, so I will try to start this topic in hope that it might also > be usefuls for others in the same situation: > > I have user objects with roles so my user object has: > hasandbelongstomany :roles > while my role object has > hasandbelongstomany :users > > I thought then that I could do a : > acts_as_ferret :fields => [:name, :role_name], :remote => false > > and: > > def role_name > self.roles.collect{ |role| role.name}.join(" ") > end > > and then do the following: > > find_with_ferret "*#{q}*", :page => page, :per_page => 10,:order => > 'name', :conditions => ["role_name = #{role}"] > > I have also tried many other constellations with multiple and include > and not getting it right because I did not see how to apply it to my > situation. > > Does anyone have a suggestion ? -- Posted via http://www.ruby-forum.com/. From john at digitalpulp.com Fri Feb 1 17:02:41 2008 From: john at digitalpulp.com (John Bachir) Date: Fri, 1 Feb 2008 17:02:41 -0500 Subject: [Ferret-talk] is anyone using AAF trunk? Message-ID: <3821F400-3708-47BA-8F32-BDBD3D882BC8@digitalpulp.com> My project is currently using AAF 0.4.3. Trunk has some compelling features, but my project is going live on Monday and I am apprehensive about switching. Is anyone using trunk in a relatively high-volume context? (A low and vague standard for evaluating production suitability, I know, but better than nothing). Thanks, John From jk at jkraemer.net Sat Feb 2 10:09:37 2008 From: jk at jkraemer.net (Jens Kraemer) Date: Sat, 2 Feb 2008 16:09:37 +0100 Subject: [Ferret-talk] Loading an index into memory? In-Reply-To: References: Message-ID: <20080202150937.GH14330@thunder.jkraemer.net> Hi! On Fri, Feb 01, 2008 at 08:49:22PM +0100, Paul Lynch wrote: > Is there a way to tell Ferret (or AAF) to read an entire index into > memory, for faster searching? I have small index (3-4 MB) that could be > kept in memory. Or does Ferret do that automatically? Ferret has an implementation if the Directory interface [1] for in-memory storage, however you won't gain much speed (if any) by using it, since most operating systems will keep the index in file system buffers anyway once it has been used the first time. [1] http://ferret.davebalmain.com/api/classes/Ferret/Store/RAMDirectory.html Cheers, Jens -- Jens Kr?mer http://www.jkraemer.net/ - Blog http://www.omdb.org/ - The new free film database From jk at jkraemer.net Sat Feb 2 10:15:42 2008 From: jk at jkraemer.net (Jens Kraemer) Date: Sat, 2 Feb 2008 16:15:42 +0100 Subject: [Ferret-talk] is anyone using AAF trunk? In-Reply-To: <3821F400-3708-47BA-8F32-BDBD3D882BC8@digitalpulp.com> References: <3821F400-3708-47BA-8F32-BDBD3D882BC8@digitalpulp.com> Message-ID: <20080202151542.GI14330@thunder.jkraemer.net> Hi! Not sure about hogh volume usage, but I'm using aaf trunk in a project right now, so chances are good I'll notice bugs quite early. Having that said, if I was you I wouldn't do an update just before launch if it wasn't absolutely necessary. Why not do the update after launch and give it some more testing until the next time you do a deployment? Cheers, Jens PS: I'm going through Trac right now and plan to release 0.4.4 in a week or so. On Fri, Feb 01, 2008 at 05:02:41PM -0500, John Bachir wrote: > My project is currently using AAF 0.4.3. Trunk has some compelling > features, but my project is going live on Monday and I am > apprehensive about switching. Is anyone using trunk in a relatively > high-volume context? (A low and vague standard for evaluating > production suitability, I know, but better than nothing). > > Thanks, > John > > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > -- Jens Kr?mer http://www.jkraemer.net/ - Blog http://www.omdb.org/ - The new free film database From pat at trailfire.com Sun Feb 3 16:01:10 2008 From: pat at trailfire.com (Pat Ferrel) Date: Sun, 03 Feb 2008 13:01:10 -0800 Subject: [Ferret-talk] Ferret+Lucene Index In-Reply-To: <20080125163357.GF507@cordoba.webit.de> Message-ID: I am a little new to Ruby so forgive if there is an obvious answer. But one way to solve my problem might be to use jruby and manipulate the lucene index through java. I have never done the ruby->java->ruby thing but it seems it would be nicer if there were a ruby class interface that hides some of the complexity of the raw java one. Has anyone done this or know of such a ruby interface or example code? On 1/25/08 8:33 AM, "Jens Kraemer" wrote: > On Fri, Jan 25, 2008 at 08:12:24AM -0800, Pat Ferrel wrote: >> > Can I use an earlier version of ferret maybe? Does anyone know when the >> > formats diverged? > > Yeah, Versions 0.3.x should definitely work, and possibly also 0.9.x. > > Afair there always were some substantial problems in terms of UTF8 > character handling, so you might run into problems even with the older > versions. > > Cheers, > Jens > >> > >> > >> > On 1/24/08 10:41 PM, "Ryan King" wrote: >> > >>> > > On Jan 24, 2008, at 9:25 PM, Pat Ferrel wrote: >>> > > >>>>> > >> > We use Nutch and Lucene for our heavy duty text analysis jobs but >>>>> > >> > I?m trying to use ferrret to do some experiments. I understood that >>>>> > >> > Ferret used the same index format as lucene but I cannot look into a >>>>> > >> > lucene index with ferret and cannot read a ferret index with luke >>>>> > >> > (the lucene index browser). Am I doing somehting wrong or have the >>>>> > >> > formats diverged? >>> > > >>> > > The formats have diverged. >>> > > >>> > > -ryan >>> > > >>> > > _______________________________________________ >>> > > Ferret-talk mailing list >>> > > Ferret-talk at rubyforge.org >>> > > http://rubyforge.org/mailman/listinfo/ferret-talk >> > >> > > >> > _______________________________________________ >> > Ferret-talk mailing list >> > Ferret-talk at rubyforge.org >> > http://rubyforge.org/mailman/listinfo/ferret-talk -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20080203/16e25a64/attachment.html From jk at jkraemer.net Mon Feb 4 07:01:04 2008 From: jk at jkraemer.net (Jens Kraemer) Date: Mon, 4 Feb 2008 13:01:04 +0100 Subject: [Ferret-talk] Ferret+Lucene Index In-Reply-To: References: <20080125163357.GF507@cordoba.webit.de> Message-ID: <20080204120104.GG656@thunder.jkraemer.net> On Sun, Feb 03, 2008 at 01:01:10PM -0800, Pat Ferrel wrote: > I am a little new to Ruby so forgive if there is an obvious answer. But one > way to solve my problem might be to use jruby and manipulate the lucene > index through java. > > I have never done the ruby->java->ruby thing but it seems it would be nicer > if there were a ruby class interface that hides some of the complexity of > the raw java one. Has anyone done this or know of such a ruby interface or > example code? Yeah, a Ruby-friendly wrapper around Lucene, maybe even compatible with Ferret's API, would for sure be a nice thing. But afaik there's no such thing yet. Cheers, Jens -- Jens Kr?mer http://www.jkraemer.net/ - Blog http://www.omdb.org/ - The new free film database From john at digitalpulp.com Mon Feb 4 15:05:32 2008 From: john at digitalpulp.com (John Bachir) Date: Mon, 4 Feb 2008 15:05:32 -0500 Subject: [Ferret-talk] strange permissions error Message-ID: While trying to run a rake task which accesses the ferret index, I keep getting this error: Permission denied - script/../config/../index/production/shared/rebuild I have run chgrp and chmod appropriately and am fairly certain that there is no OS-level read/write permissions issue-- or is something trying to /execute/ that path? If so, then this must be a bug, right? there wouldn't be a script living down in the index... any ideas would help.. john From jk at jkraemer.net Mon Feb 4 17:51:54 2008 From: jk at jkraemer.net (Jens Kraemer) Date: Mon, 4 Feb 2008 23:51:54 +0100 Subject: [Ferret-talk] strange permissions error In-Reply-To: References: Message-ID: <20080204225154.GC1694@thunder.jkraemer.net> On Mon, Feb 04, 2008 at 03:05:32PM -0500, John Bachir wrote: > While trying to run a rake task which accesses the ferret index, I > keep getting this error: > > Permission denied - script/../config/../index/production/shared/rebuild > > I have run chgrp and chmod appropriately and am fairly certain that > there is no OS-level read/write permissions issue-- or is something > trying to /execute/ that path? If so, then this must be a bug, right? Well, you need +x flags on directories in order to be able to access them. Maybe that's the problem? Jens -- Jens Kr?mer http://www.jkraemer.net/ - Blog http://www.omdb.org/ - The new free film database From john at digitalpulp.com Tue Feb 5 19:18:40 2008 From: john at digitalpulp.com (John Bachir) Date: Tue, 5 Feb 2008 19:18:40 -0500 Subject: [Ferret-talk] need to restart mongrel after drb server? Message-ID: <5967015D-AB39-4189-BA71-F5CA8676DFDE@digitalpulp.com> I just ran encountered a corrupted index problem after a deploy. I blew away the index, restarted the drb server, and rebuilt the entire index. Exceptions were still trickling up to my mongrels. The problem was solved by restarting the mongrels. Does this make any sense? I thought the mongrels/AAF just access the drb server as a service, and were unaware of the state of the service behind the port? Thanks for any insight, John From dburkes at infoteria.com Wed Feb 6 12:33:00 2008 From: dburkes at infoteria.com (Danny Burkes) Date: Wed, 6 Feb 2008 09:33:00 -0800 Subject: [Ferret-talk] find_id_by_contents crashing Message-ID: <846A52A5-4573-4771-95C7-FBD19D69E488@infoteria.com> Hi- I asked this question over on Ruby Forum, but it never made it to this list, so I'm posting it directly now- === I've got an index with over 11M entries. Both server and clients are running Ferret gem 0.11.4. If I do "UserMessage.find_id_by_contents('kenn*', :limit => 49)", it works fine. If I do " UserMessage.find_id_by_contents('kenn*', :limit => 50)", it throws an exception, like this: UserMessage.find_id_by_contents 'kenn*', :limit => 50 IOError: IO Error occured at :93 in xraise Error occured in fs_store.c:293 - fsi_seek_i seeking pos -1473943740: from (druby://ferret.lingr.com:9009) /usr/lib/ruby/gems/1.8/ gems/ferret-0.11.4/lib/ferret/index.rb:411:in `[]' from (druby://ferret.lingr.com:9009) /usr/lib/ruby/gems/1.8/ gems/ferret-0.11.4/lib/ferret/index.rb:411:in `[]' from (druby://ferret.lingr.com:9009) /usr/lib/ruby/1.8/ monitor.rb:229:in `synchronize' from (druby://ferret.lingr.com:9009) /usr/lib/ruby/gems/1.8/ gems/ferret-0.11.4/lib/ferret/index.rb:403:in `[]' ... yada yada yada ... Any idea what I can do to recover from this? Rebuilding this index would take a LOOOOOOOOOOOOONG time :-) - Danny Burkes http://www.lingr.com/help/about#danny From jk at jkraemer.net Thu Feb 7 11:29:25 2008 From: jk at jkraemer.net (Jens Kraemer) Date: Thu, 7 Feb 2008 17:29:25 +0100 Subject: [Ferret-talk] find_id_by_contents crashing In-Reply-To: <846A52A5-4573-4771-95C7-FBD19D69E488@infoteria.com> References: <846A52A5-4573-4771-95C7-FBD19D69E488@infoteria.com> Message-ID: <20080207162925.GG26288@thunder.jkraemer.net> Hi, I'm not sure but the negative seeking pos could be caused by an overflow of some sort. Maybe a large file problem? What does your index look like with 'ls -l'? Could you try reproducing this on a copy of your index with a small script with plain Ferret, and then retry with 0.11.6? Cheers, Jens On Wed, Feb 06, 2008 at 09:33:00AM -0800, Danny Burkes wrote: > Hi- > > I asked this question over on Ruby Forum, but it never made it to this > list, so I'm posting it directly now- > > === > > I've got an index with over 11M entries. Both server and clients are > running Ferret gem 0.11.4. > > If I do "UserMessage.find_id_by_contents('kenn*', :limit => 49)", it > works fine. If I do " UserMessage.find_id_by_contents('kenn*', :limit > => 50)", it throws an exception, like this: > > UserMessage.find_id_by_contents 'kenn*', :limit => 50 > IOError: IO Error occured at :93 in xraise > Error occured in fs_store.c:293 - fsi_seek_i > seeking pos -1473943740: > from (druby://ferret.lingr.com:9009) /usr/lib/ruby/gems/1.8/ > gems/ferret-0.11.4/lib/ferret/index.rb:411:in `[]' > from (druby://ferret.lingr.com:9009) /usr/lib/ruby/gems/1.8/ > gems/ferret-0.11.4/lib/ferret/index.rb:411:in `[]' > from (druby://ferret.lingr.com:9009) /usr/lib/ruby/1.8/ > monitor.rb:229:in `synchronize' > from (druby://ferret.lingr.com:9009) /usr/lib/ruby/gems/1.8/ > gems/ferret-0.11.4/lib/ferret/index.rb:403:in `[]' > ... yada yada yada ... > > Any idea what I can do to recover from this? Rebuilding this index > would take a LOOOOOOOOOOOOONG time :-) > > - Danny Burkes > http://www.lingr.com/help/about#danny > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > -- Jens Kr?mer Finkenlust 14, 06449 Aschersleben, Germany VAT Id DE251962952 http://www.jkraemer.net/ - Blog http://www.omdb.org/ - The new free film database From dburkes at infoteria.com Thu Feb 7 11:44:44 2008 From: dburkes at infoteria.com (Danny Burkes) Date: Thu, 7 Feb 2008 08:44:44 -0800 Subject: [Ferret-talk] find_id_by_contents crashing In-Reply-To: <20080207162925.GG26288@thunder.jkraemer.net> References: <846A52A5-4573-4771-95C7-FBD19D69E488@infoteria.com> <20080207162925.GG26288@thunder.jkraemer.net> Message-ID: <1862D28F-8CA1-4F5C-9669-A1211962B103@infoteria.com> > I'm not sure but the negative seeking pos could be caused by an > overflow > of some sort. Maybe a large file problem? What does your index look > like > with 'ls -l'? > I've attached the listing below... > Could you try reproducing this on a copy of your index with a small > script with plain Ferret, and then retry with 0.11.6? > Will do- I'll post the results here soon. Best Regards, Danny === dburkes at ferret.lingr.com:/var/ferret/production/user_message$ ls -l total 4041936 -rw------- 1 root root 3160 2007-08-10 07:01 _2ti3z.fdt.saved -rw------- 1 root root 8 2007-08-10 07:01 _2ti3z.fdx.saved -rw------- 1 root root 1326528 2008-02-07 01:17 _7bgxh_5d3.del -rw------- 1 root root 3861431389 2008-01-27 08:36 _7bgxh.cfs -rw------- 1 root root 10067 2008-02-03 11:17 _7dutf_5.del -rw------- 1 root root 37559822 2008-01-28 22:19 _7dutf.cfs -rw------- 1 root root 37273935 2008-01-30 06:41 _7g8n1.cfs -rw------- 1 root root 11371 2008-01-31 17:30 _7g8n1_ox.del -rw------- 1 root root 11559 2008-02-02 20:57 _7imep_2.del -rw------- 1 root root 35903751 2008-01-31 14:44 _7imep.cfs -rw------- 1 root root 36105008 2008-02-02 04:11 _7l096.cfs -rw------- 1 root root 10511 2008-02-03 18:26 _7l096_g.del -rw------- 1 root root 11659 2008-02-03 20:12 _7ne15_0.del -rw------- 1 root root 36128119 2008-02-03 13:04 _7ne15.cfs -rw------- 1 root root 34444491 2008-02-05 03:40 _7prs3.cfs -rw------- 1 root root 36773717 2008-02-06 13:41 _7s5pu.cfs -rw------- 1 root root 3380507 2008-02-06 18:49 _7seal.cfs -rw------- 1 root root 3260942 2008-02-06 23:37 _7smva.cfs -rw------- 1 root root 54 2008-02-07 04:55 _7svfx_0.del -rw------- 1 root root 3590095 2008-02-07 04:01 _7svfx.cfs -rw------- 1 root root 4030326 2008-02-07 06:32 _7t40k.cfs -rw------- 1 root root 347713 2008-02-07 06:48 _7t4vf.cfs -rw------- 1 root root 348108 2008-02-07 07:03 _7t5qa.cfs -rw------- 1 root root 359057 2008-02-07 07:16 _7t6l5.cfs -rw------- 1 root root 353181 2008-02-07 07:28 _7t7g0.cfs -rw------- 1 root root 397144 2008-02-07 07:41 _7t8av.cfs -rw------- 1 root root 343822 2008-02-07 07:52 _7t95q.cfs -rw------- 1 root root 348366 2008-02-07 08:04 _7ta0l.cfs -rw------- 1 root root 355600 2008-02-07 08:14 _7tavg.cfs -rw------- 1 root root 365283 2008-02-07 08:26 _7tbqb.cfs -rw------- 1 root root 34225 2008-02-07 08:28 _7tbte.cfs -rw------- 1 root root 36869 2008-02-07 08:29 _7tbwh.cfs -rw------- 1 root root 37095 2008-02-07 08:30 _7tbzk.cfs -rw------- 1 root root 36708 2008-02-07 08:31 _7tc2n.cfs -rw------- 1 root root 36468 2008-02-07 08:32 _7tc5q.cfs -rw------- 1 root root 38334 2008-02-07 08:33 _7tc8t.cfs -rw------- 1 root root 4941 2008-02-07 08:33 _7tc94.cfs -rw------- 1 root root 5270 2008-02-07 08:34 _7tc9f.cfs -rw------- 1 root root 5070 2008-02-07 08:34 _7tc9q.cfs -rw------- 1 root root 6055 2008-02-07 08:34 _7tca1.cfs -rw------- 1 root root 5695 2008-02-07 08:34 _7tcac.cfs -rw------- 1 root root 0 2008-02-07 07:27 ferret- write.lck -rw------- 1 dburkes dburkes 16 2008-02-07 08:34 segments -rw------- 1 root root 733 2008-02-07 08:34 segments_832jn From kraemer at webit.de Thu Feb 7 12:15:54 2008 From: kraemer at webit.de (Jens Kraemer) Date: Thu, 7 Feb 2008 18:15:54 +0100 Subject: [Ferret-talk] find_id_by_contents crashing In-Reply-To: <1862D28F-8CA1-4F5C-9669-A1211962B103@infoteria.com> References: <846A52A5-4573-4771-95C7-FBD19D69E488@infoteria.com> <20080207162925.GG26288@thunder.jkraemer.net> <1862D28F-8CA1-4F5C-9669-A1211962B103@infoteria.com> Message-ID: <20080207171554.GA32289@cordoba.webit.de> On Thu, Feb 07, 2008 at 08:44:44AM -0800, Danny Burkes wrote: [..] > dburkes at ferret.lingr.com:/var/ferret/production/user_message$ ls -l > total 4041936 > -rw------- 1 root root 3160 2007-08-10 07:01 > _2ti3z.fdt.saved > -rw------- 1 root root 8 2007-08-10 07:01 > _2ti3z.fdx.saved > -rw------- 1 root root 1326528 2008-02-07 01:17 _7bgxh_5d3.del > -rw------- 1 root root 3861431389 2008-01-27 08:36 _7bgxh.cfs [..] Ok, might be a large file issue. There has been at least one patch [1] after 0.11.4 was released regarding LFS, so upgrading to 0.11.6 might indeed help. 0.11.4 and 0.11.6 are index compatible, so no rebuild should be necessary. Btw, from your directory listing it looks like you're running the DRb server as root - please *don't* do that. Given proper write access to the index and log directories it's not necessary at all. I usually run it as the same (non-root) user that's running the Mongrels. Cheers, Jens [1] http://ferret.davebalmain.com/trac/ticket/215 -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold From dburkes at infoteria.com Thu Feb 7 12:39:24 2008 From: dburkes at infoteria.com (Danny Burkes) Date: Thu, 7 Feb 2008 09:39:24 -0800 Subject: [Ferret-talk] find_id_by_contents crashing In-Reply-To: <20080207171554.GA32289@cordoba.webit.de> References: <846A52A5-4573-4771-95C7-FBD19D69E488@infoteria.com> <20080207162925.GG26288@thunder.jkraemer.net> <1862D28F-8CA1-4F5C-9669-A1211962B103@infoteria.com> <20080207171554.GA32289@cordoba.webit.de> Message-ID: > Ok, might be a large file issue. There has been at least one patch [1] > after 0.11.4 was released regarding LFS, so upgrading to 0.11.6 might > indeed help. > > 0.11.4 and 0.11.6 are index compatible, so no rebuild should be > necessary. > Thanks, Jens! I upgraded to the 0.11.6 gem, and the problem went away. Best Regards, Danny From epchris at gmail.com Thu Feb 21 12:35:37 2008 From: epchris at gmail.com (Chris TenHarmsel) Date: Thu, 21 Feb 2008 17:35:37 +0000 Subject: [Ferret-talk] Experience using ferret to index log files Message-ID: Hi everyone, I've been exploring using ferret for indexing large amounts of production log files. Right now we have a homemade system for searching through the logs that involves specifying a date/time range and then grepping through the relevant files. This can take a long time. My initial tests (on 2gb of log files) have been promising, I've taken two separate approaches: The first is loading each line in each log file as a "document". The plus side to this is that doing a search will get you individual log lines as the results, which is what I want. The downside is that indexing takes a long long time and the index size is very large even when not storing the contents of the lines. This approach is not viable for indexing all of our logs. The second approach is indexing the log files as documents. This is relatively fast, 211sec for 2gb of logs, and the index size is a nice 12% of the sample size. The downside is that after figuring out which files match your search terms, you have to crawl through each "hit" document to find the relevant lines. For the sake of full disclosure, at any given time we keep roughly 30 days of logs which comes to about 800ish Gb of log files. Each file is roughly 15Mb in size before it gets rotated. Has anyone else tackled a problem like this and can offer any ideas on how to go about searching those logs? The best idea I can come up with (that I haven't implemented yet to get real numbers) is to index a certain number of log files by line, like the last 2 days, and then do another set by file (like the last week). This would have fast results for the more recent logs and you would just have to be patient for the slightly older logs. Any ideas/help? Thanks, Chris -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20080221/56ae5a79/attachment.html From john at johnleach.co.uk Fri Feb 22 13:52:01 2008 From: john at johnleach.co.uk (John Leach) Date: Fri, 22 Feb 2008 18:52:01 +0000 Subject: [Ferret-talk] Experience using ferret to index log files In-Reply-To: References: Message-ID: <1203706321.30147.37.camel@dogen.thepride.> Hi Chris, I've been toying with the idea of a Ferret log indexer for my Linux systems so this is rather interesting. Regarding performance of the one ferret document per line, you should look into the various tunables. An obvious one is ensuring auto_flush is disabled, but the next likely is :max_buffered_docs. This, by default, it set to flush to the index every 10,000 documents, but your log file lines will be hitting that regularly. Also consider :max_buffer_memory. As log files will often have lots of unique but "useless" terms (such as the timestamps) I'd recommend pre-parsing your log lines. If it's syslog files you're indexing, parse the timestamp and convert it to 200802221816 format and add that as a separate untokenized field to the index. Cut it down to the maximum accuracy you'll need as this will reduce the number of unique terms in the index (maybe you'll only ever need to find logs down to the day, not the hour and minute) Also, disable term vectors, as this will save disk space. I've also found using a field as the the id is slooow, so avoid that (that's usually only something done with the primary key from databases though, so I doubt you're doing it) Regarding performance and index size for the one ferret document per log file: By default, ferret only indexes the first 10,000 terms of each document so it might only be faster because it's indexing less! Ditto for the index file size :S See the :max_field_length option. Write your own custom stop words list to skip indexing hugely common words - this will reduce the size of your index. Consider writing your own Analyzer to do tokenization to reduce the number of unique terms, for example the following line from a log file on my system: Feb 21 05:13:10 lion named[15722]: unexpected RCODE (SERVFAIL) resolving 'ns1.rfrjqrkfccysqlycevtyz.info/AAAA/IN': 194.168.8.100#53 I'm not sure exactly how the default analyzer would tokenize this, but an ideal list of tokens would probably be: lion named unexpected RCODE SERVFAIL resolving ns1.rfrjqrkfccysqlycevtyz.info AAAA IN 194.168.8.100 53 If you still want to stick to document per log file, you can use the term_vectors to find the offset of the match in the log file - then you just open the log file and jump to that position (store the log filename). It does use a bit more disk space per term indexed, but useful! Also, omitting norms will save 1 byte per field per document too, a huge saving I'm sure you'll agree ;) :index => :yes_omit_norms Um, I think I'm done. The Ferret shortcut book by the Ferret author covers all this stuff - it's cheap and good: http://www.oreilly.com/catalog/9780596527853/index.html John. -- http://johnleach.co.uk http://www.brightbox.co.uk - UK/EU Ruby on Rails hosting On Thu, 2008-02-21 at 17:35 +0000, Chris TenHarmsel wrote: > Hi everyone, > I've been exploring using ferret for indexing large amounts of > production log files. Right now we have a homemade system for > searching through the logs that involves specifying a date/time range > and then grepping through the relevant files. This can take a long > time. > > My initial tests (on 2gb of log files) have been promising, I've taken > two separate approaches: > The first is loading each line in each log file as a "document". The > plus side to this is that doing a search will get you individual log > lines as the results, which is what I want. The downside is that > indexing takes a long long time and the index size is very large even > when not storing the contents of the lines. This approach is not > viable for indexing all of our logs. > > The second approach is indexing the log files as documents. This is > relatively fast, 211sec for 2gb of logs, and the index size is a nice > 12% of the sample size. The downside is that after figuring out which > files match your search terms, you have to crawl through each "hit" > document to find the relevant lines. > > For the sake of full disclosure, at any given time we keep roughly 30 > days of logs which comes to about 800ish Gb of log files. Each file > is roughly 15Mb in size before it gets rotated. > > Has anyone else tackled a problem like this and can offer any ideas on > how to go about searching those logs? The best idea I can come up > with (that I haven't implemented yet to get real numbers) is to index > a certain number of log files by line, like the last 2 days, and then > do another set by file (like the last week). This would have fast > results for the more recent logs and you would just have to be patient > for the slightly older logs. > > Any ideas/help? > From u.alberton at gmail.com Fri Feb 22 14:18:20 2008 From: u.alberton at gmail.com (Bira) Date: Fri, 22 Feb 2008 16:18:20 -0300 Subject: [Ferret-talk] Document scores Message-ID: We're using Ferret (but not acts_as_ferret) on a project I'm working on, and I ran into a problem with the document scores returned from searches. I consider myself a Ferret noob... I know a little about its API, having read the O'Reilly shortcut, but I couldn't find a solution to this problem there. Please allow me to explain: It started when I noticed that all of the relevance scores for each result were exactly the same. By reading the shortcut, I found out that happened because a range query (with initial and final dates) was always included in the queries passed to Ferret, and Ferret's RangeQuery always return results with identical scores, because it uses a ConstantScoreQuery internally. So far, so good - I removed this range query from the application code, as an experiment, and passed a simple string that translates into a TermQuery to it. From what I know of Ferret, it should return normal scores, but all of them came back as 0. Is this a known behavior/bug? Or did I do something wrong with the search or the indexing? I know the latter is more likely, and if needed I can try to provide some trimmed-down example code. -- Bira http://compexplicita.wordpress.com http://compexplicita.tumblr.com From john at johnleach.co.uk Fri Feb 22 14:29:26 2008 From: john at johnleach.co.uk (John Leach) Date: Fri, 22 Feb 2008 19:29:26 +0000 Subject: [Ferret-talk] Document scores In-Reply-To: References: Message-ID: <1203708566.30147.44.camel@dogen.thepride.> Hi Bira, this just sounds like your search is getting no hits. The ConstantScoreQuery was giving everything a minimum score but no other hits increased the score. Now you've removed the only thing that was providing a score, so it's dropped to 0. Make sure your indexing and searching is working correctly. Try the ferret-browser tool to review your index - see if it's what you expect (i.e: has the terms you're searching for). If all this is working as expect, try posting a snip of your code where you define the index, and where you do a search and we should be able to help. John. -- http://www.brightbox.co.uk - UK/EU Ruby on Rails Hosting http://johnleach.co.uk On Fri, 2008-02-22 at 16:18 -0300, Bira wrote: > We're using Ferret (but not acts_as_ferret) on a project I'm working > on, and I ran into a problem with the document scores returned from > searches. > > I consider myself a Ferret noob... I know a little about its API, > having read the O'Reilly shortcut, but I couldn't find a solution to > this problem there. Please allow me to explain: > > It started when I noticed that all of the relevance scores for each > result were exactly the same. By reading the shortcut, I found out > that happened because a range query (with initial and final dates) was > always included in the queries passed to Ferret, and Ferret's > RangeQuery always return results with identical scores, because it > uses a ConstantScoreQuery internally. > > So far, so good - I removed this range query from the application > code, as an experiment, and passed a simple string that translates > into a TermQuery to it. From what I know of Ferret, it should return > normal scores, but all of them came back as 0. > > Is this a known behavior/bug? Or did I do something wrong with the > search or the indexing? I know the latter is more likely, and if > needed I can try to provide some trimmed-down example code. > From epchris at gmail.com Fri Feb 22 15:09:56 2008 From: epchris at gmail.com (Chris TenHarmsel) Date: Fri, 22 Feb 2008 20:09:56 +0000 Subject: [Ferret-talk] Experience using ferret to index log files In-Reply-To: <1203706321.30147.37.camel@dogen.thepride.> References: <1203706321.30147.37.camel@dogen.thepride.> Message-ID: Note: Sorry if this was double posted, I sent it from the wrong email address before. Hi John, Thanks for the tips. Currently I'm using these tunables for my indexer: :max_buffer_memory => 204857600, :max_buffered_docs => 1000000, :merge_factor => 100000, For some reason, if I set max_buffered_docs to 1000001 or higher, Ferret segfaults, so I'm stuck at that. I wasn't aware that Ferret by default only indexes the first 10000 terms, so I will definitely have to change that for log file-level indexing Could you maybe elaborate a little more on a couple things: First, I'm not really that knowledgeable on the tokenizing that is happening. I looked through the docs and I think I understand the basics, but I'm not even sure how I would go about doing my own tokenizing to create more meaningful tokens. Is a token basically a thing that can be searched for? So if I had a token of "sometoken" and searched for "some" would it find it? From what I can tell, I would have to subclass the TokenStream class and implement "text=()" to split the input into my "tokens" and then have the "next" method just return them in order, correct? Secondly, I'm not sure what you mean by looking at the term_vector to find the position. If I do a search and get "Hits" ( http://ferret.davebalmain.com/api/classes/Ferret/Search/Hit.html) back, I thought all I got was the doc id and the score. Can you explain a little more on this? THanks, On Fri, Feb 22, 2008 at 6:52 PM, John Leach wrote: > Hi Chris, > > I've been toying with the idea of a Ferret log indexer for my Linux > systems so this is rather interesting. > > Regarding performance of the one ferret document per line, you should > look into the various tunables. An obvious one is ensuring auto_flush is > disabled, but the next likely is :max_buffered_docs. This, by default, > it set to flush to the index every 10,000 documents, but your log file > lines will be hitting that regularly. Also consider :max_buffer_memory. > > As log files will often have lots of unique but "useless" terms (such as > the timestamps) I'd recommend pre-parsing your log lines. If it's > syslog files you're indexing, parse the timestamp and convert it to > 200802221816 format and add that as a separate untokenized field to the > index. Cut it down to the maximum accuracy you'll need as this will > reduce the number of unique terms in the index (maybe you'll only ever > need to find logs down to the day, not the hour and minute) > > Also, disable term vectors, as this will save disk space. > > I've also found using a field as the the id is slooow, so avoid that > (that's usually only something done with the primary key from databases > though, so I doubt you're doing it) > > Regarding performance and index size for the one ferret document per log > file: By default, ferret only indexes the first 10,000 terms of each > document so it might only be faster because it's indexing less! Ditto > for the index file size :S See the :max_field_length option. > > Write your own custom stop words list to skip indexing hugely common > words - this will reduce the size of your index. > > Consider writing your own Analyzer to do tokenization to reduce the > number of unique terms, for example the following line from a log file > on my system: > > Feb 21 05:13:10 lion named[15722]: unexpected RCODE (SERVFAIL) resolving ' > ns1.rfrjqrkfccysqlycevtyz.info/AAAA/IN': 194.168.8.100#53 > > I'm not sure exactly how the default analyzer would tokenize this, but > an ideal list of tokens would probably be: > > lion named unexpected RCODE SERVFAIL resolving > ns1.rfrjqrkfccysqlycevtyz.info AAAA IN 194.168.8.100 53 > > If you still want to stick to document per log file, you can use the > term_vectors to find the offset of the match in the log file - then you > just open the log file and jump to that position (store the log > filename). It does use a bit more disk space per term indexed, but > useful! > > Also, omitting norms will save 1 byte per field per document too, a huge > saving I'm sure you'll agree ;) :index => :yes_omit_norms > > Um, I think I'm done. The Ferret shortcut book by the Ferret author > covers all this stuff - it's cheap and good: > > http://www.oreilly.com/catalog/9780596527853/index.html > > John. > -- > http://johnleach.co.uk > http://www.brightbox.co.uk - UK/EU Ruby on Rails hosting > > On Thu, 2008-02-21 at 17:35 +0000, Chris TenHarmsel wrote: > > Hi everyone, > > I've been exploring using ferret for indexing large amounts of > > production log files. Right now we have a homemade system for > > searching through the logs that involves specifying a date/time range > > and then grepping through the relevant files. This can take a long > > time. > > > > My initial tests (on 2gb of log files) have been promising, I've taken > > two separate approaches: > > The first is loading each line in each log file as a "document". The > > plus side to this is that doing a search will get you individual log > > lines as the results, which is what I want. The downside is that > > indexing takes a long long time and the index size is very large even > > when not storing the contents of the lines. This approach is not > > viable for indexing all of our logs. > > > > The second approach is indexing the log files as documents. This is > > relatively fast, 211sec for 2gb of logs, and the index size is a nice > > 12% of the sample size. The downside is that after figuring out which > > files match your search terms, you have to crawl through each "hit" > > document to find the relevant lines. > > > > For the sake of full disclosure, at any given time we keep roughly 30 > > days of logs which comes to about 800ish Gb of log files. Each file > > is roughly 15Mb in size before it gets rotated. > > > > Has anyone else tackled a problem like this and can offer any ideas on > > how to go about searching those logs? The best idea I can come up > > with (that I haven't implemented yet to get real numbers) is to index > > a certain number of log files by line, like the last 2 days, and then > > do another set by file (like the last week). This would have fast > > results for the more recent logs and you would just have to be patient > > for the slightly older logs. > > > > Any ideas/help? > > > > > > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20080222/2a714d3e/attachment-0001.html From john at johnleach.co.uk Sat Feb 23 08:22:31 2008 From: john at johnleach.co.uk (John Leach) Date: Sat, 23 Feb 2008 13:22:31 +0000 Subject: [Ferret-talk] Experience using ferret to index log files In-Reply-To: References: <1203706321.30147.37.camel@dogen.thepride.> Message-ID: <1203772951.13803.13.camel@dogen.thepride.> Hi Chris, On Fri, 2008-02-22 at 20:09 +0000, Chris TenHarmsel wrote: > First, I'm not really that knowledgeable on the tokenizing that is > happening. I looked through the docs and I think I understand the > basics, but I'm not even sure how I would go about doing my own > tokenizing to create more meaningful tokens. Is a token basically a > thing that can be searched for? Tokenizing is splitting the input text into words that can be searched for. Sometimes you can just split the text up by whitespace, but I'm thinking that log files might need some specific attention. > So if I had a token of "sometoken" and searched for "some" would it > find it? No. Though if you did a search for "some*", Ferret would search the available tokens (one of which would be sometoken), then do a search on the matching tokens. You might write a clever tokenizer to recognise that "sometoken" was actually two words without a space and return them as the separate tokens "some" and "token". > From what I can tell, I would have to subclass the TokenStream class > and implement "text=()" to split the input into my "tokens" and then > have the "next" method just return them in order, correct? Not sure off the top of my head, but that's about right, but then you need to make an Analyzer class that uses your new tokenizer. I have an example but I've not got time to extract it right now, sorry! > Secondly, I'm not sure what you mean by looking at the term_vector to > find the position. If I do a search and get > "Hits" (http://ferret.davebalmain.com/api/classes/Ferret/Search/Hit.html) back, I thought all I got was the doc id and the score. Can you explain a little more on this? The term vectors stores the offset in the document to the match, byte position and length - it's used often for highlighting search matches. I've not actually used them myself - a quick look at the api makes it sound like they're used internally by the highlight method. You can get to them using some methods on the index_reader, which return TermVector objects. index_reader.term_vector(doc_id, field) http://ferret.davebalmain.com/api/classes/Ferret/Index/TermVector.html John. -- http://www.brightbox.co.uk - UK/EU Ruby on Rails Hosting http://johnleach.co.uk From u.alberton at gmail.com Mon Feb 25 14:15:00 2008 From: u.alberton at gmail.com (Bira) Date: Mon, 25 Feb 2008 16:15:00 -0300 Subject: [Ferret-talk] Document scores In-Reply-To: <1203708566.30147.44.camel@dogen.thepride.> References: <1203708566.30147.44.camel@dogen.thepride.> Message-ID: On Fri, Feb 22, 2008 at 4:29 PM, John Leach wrote: > Hi Bira, > If all this is working as expect, try posting a snip of your code where > you define the index, and where you do a search and we should be able to > help. > > John. > I've managed to reduce it to a simple example, which I've packed in a 11KB zip file, most of which is a sample text for indexing (an e-mail message from the publicly available Enron archive). Does the list accept attachments? -- Bira http://compexplicita.wordpress.com http://compexplicita.tumblr.com From jk at jkraemer.net Mon Feb 25 14:56:40 2008 From: jk at jkraemer.net (Jens Kraemer) Date: Mon, 25 Feb 2008 20:56:40 +0100 Subject: [Ferret-talk] Document scores In-Reply-To: References: <1203708566.30147.44.camel@dogen.thepride.> Message-ID: <20080225195640.GA31406@thunder.jkraemer.net> Hi! On Mon, Feb 25, 2008 at 04:15:00PM -0300, Bira wrote: > On Fri, Feb 22, 2008 at 4:29 PM, John Leach wrote: > > Hi Bira, > > > If all this is working as expect, try posting a snip of your code where > > you define the index, and where you do a search and we should be able to > > help. > > > > John. > > > > I've managed to reduce it to a simple example, which I've packed in a > 11KB zip file, most of which is a sample text for indexing (an e-mail > message from the publicly available Enron archive). Does the list > accept attachments? not sure, just try it out :-) or upload it somewhere on the ferret wiki. cheers, Jens -- Jens Kr?mer Finkenlust 14, 06449 Aschersleben, Germany VAT Id DE251962952 http://www.jkraemer.net/ - Blog http://www.omdb.org/ - The new free film database From u.alberton at gmail.com Tue Feb 26 08:24:36 2008 From: u.alberton at gmail.com (Bira) Date: Tue, 26 Feb 2008 10:24:36 -0300 Subject: [Ferret-talk] Document scores In-Reply-To: <20080225195640.GA31406@thunder.jkraemer.net> References: <1203708566.30147.44.camel@dogen.thepride.> <20080225195640.GA31406@thunder.jkraemer.net> Message-ID: On Mon, Feb 25, 2008 at 4:56 PM, Jens Kraemer wrote: > not sure, just try it out :-) or upload it somewhere on the ferret wiki. OK :). I'm sending the example attached to this message. There's two Ruby files (indexer.rb and searcher.rb), along with a text file containing an e-mail from the Enron archives, which is the indexable sample. After extracting it to a directory, running indexer.rb will index that single message. Running searcher.rb will perform a pre-definded search on the index, and print out the result and its score. In my local environment (Ferret 0.11.6 on Linux), a single result is returned, as expected, and it's properly highlighted and everything. Its score is 0. The search is a simple term query for "earnings". -- Bira http://compexplicita.wordpress.com http://compexplicita.tumblr.com -------------- next part -------------- A non-text attachment was scrubbed... Name: minimal.tar.gz Type: application/x-gzip Size: 11705 bytes Desc: not available Url : http://rubyforge.org/pipermail/ferret-talk/attachments/20080226/32518256/attachment.gz From pdogsaz7 at gmail.com Tue Feb 26 17:03:18 2008 From: pdogsaz7 at gmail.com (Patrick) Date: Tue, 26 Feb 2008 23:03:18 +0100 Subject: [Ferret-talk] Failed to load ferret_ext.bundle on mac os x Message-ID: I have found very little information about this problem on the net, so i thought someone here might be able to help me. I am running mac os x 10.4 on a powerbook and have installed ruby 1.8.6 and was trying to get ferret to work, without any success. Here is exactly what i did and what happened: $ sudo gem install ferret Updating metadata for 125 gems from http://gems.rubyforge.org ............................................................................ ................................................. complete Building native extensions. This could take a while... Successfully installed ferret-0.11.6 1 gem installed Installing ri documentation for ferret-0.11.6... Installing RDoc documentation for ferret-0.11.6... $ ruby ferret_test.rb /usr/local/lib/ruby/gems/1.8/gems/ferret-0.11.6/lib/ferret_ext.bundle: Failed to load /usr/local/lib/ruby/gems/1.8/gems/ferret-0.11.6/lib/ferret_ext.bundle (LoadError) from /usr/local/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:27:in `require' from /usr/local/lib/ruby/gems/1.8/gems/ferret-0.11.6/lib/ferret.rb:25 from /usr/local/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:32:in `gem_original_require' from /usr/local/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:32:in `require' from ferret_test.rb:2 The ferret_test.rb includes the following lines: require 'rubygems' require 'ferret' include Ferret The ferret_ext.bundle file does exist but i am guessing it might not have been properly compiled. I opened it afterwards and it contains some error message that seem to not belong there. I do have gcc (4.0.1) and make etc. installed. The Powerbook is obviously not an intel mac but in the .bundle-file i found this, i am not sure if that helps to find out what the problem is: *ERROR: POSH double precision floating point serialization failed. Please report this to poshlib at poshlib.org! OS:..............MacOS X CPU:.............Intel 386+ endian:..........little ptr size:........32-bits 64-bit ints......yes floating point...enabled compiler.........Gnu GCC Please help! Patrick -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20080226/38dc5206/attachment-0001.html From julioody at gmail.com Tue Feb 26 18:12:35 2008 From: julioody at gmail.com (Julio Cesar Ody) Date: Wed, 27 Feb 2008 10:12:35 +1100 Subject: [Ferret-talk] Document scores In-Reply-To: References: <1203708566.30147.44.camel@dogen.thepride.> <20080225195640.GA31406@thunder.jkraemer.net> Message-ID: >From my experience with scores, I found that you *have* to establish boosts for each field, otherwise you'll always get scores that are too low. Try: - configuring boost for, say 3 fields. E.g.: tags => 20, title => 10, description => 15. - Adding entries to the index. - performing searches that hit each of these fields in separate so you can compare. Then check the score in the output. On Wed, Feb 27, 2008 at 12:24 AM, Bira wrote: > On Mon, Feb 25, 2008 at 4:56 PM, Jens Kraemer wrote: > > not sure, just try it out :-) or upload it somewhere on the ferret wiki. > > OK :). I'm sending the example attached to this message. > > There's two Ruby files (indexer.rb and searcher.rb), along with a text > file containing an e-mail from the Enron archives, which is the > indexable sample. > > After extracting it to a directory, running indexer.rb will index that > single message. Running searcher.rb will perform a pre-definded search > on the index, and print out the result and its score. > > In my local environment (Ferret 0.11.6 on Linux), a single result is > returned, as expected, and it's properly highlighted and everything. > Its score is 0. The search is a simple term query for "earnings". > > -- > Bira > http://compexplicita.wordpress.com > http://compexplicita.tumblr.com > > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > From u.alberton at gmail.com Wed Feb 27 13:04:27 2008 From: u.alberton at gmail.com (Bira) Date: Wed, 27 Feb 2008 15:04:27 -0300 Subject: [Ferret-talk] Document scores In-Reply-To: References: <1203708566.30147.44.camel@dogen.thepride.> <20080225195640.GA31406@thunder.jkraemer.net> Message-ID: On Tue, Feb 26, 2008 at 8:12 PM, Julio Cesar Ody wrote: > >From my experience with scores, I found that you *have* to establish > boosts for each field, otherwise you'll always get scores that are too > low. > > Try: > > - configuring boost for, say 3 fields. E.g.: tags => 20, title => 10, > description => 15. > - Adding entries to the index. > - performing searches that hit each of these fields in separate so you > can compare. I tried again, setting :default_boost to 1000 in the example, and the score still came up as zero. By the way, did the message containing the example arrive? -- Bira http://compexplicita.wordpress.com http://compexplicita.tumblr.com From jk at jkraemer.net Wed Feb 27 14:27:15 2008 From: jk at jkraemer.net (Jens Kraemer) Date: Wed, 27 Feb 2008 20:27:15 +0100 Subject: [Ferret-talk] Document scores In-Reply-To: References: <1203708566.30147.44.camel@dogen.thepride.> <20080225195640.GA31406@thunder.jkraemer.net> Message-ID: <20080227192714.GG31406@thunder.jkraemer.net> Hi! On Wed, Feb 27, 2008 at 03:04:27PM -0300, Bira wrote: [..] > By the way, did the message containing the example arrive? yes it did. I tried it out and got the same result as you - score of 0.0. Removing the :index => :omit_norms option from the FieldInfos declaration leads to the expected result, a non-zero score. It's not clear from the API docs if this is the expected behaviour: :omit_norms | Same as :yes except omit the | norms file. The norms file can | be omitted if you don't boost | any fields and you don't need | scoring based on field length. Here's Ferret's explanation of the score computation: 0.0 = field_weight(message:earnings in 0), product of: 3.162278 = tf(term_freq(message:earnings)=10) 0.3068528 = idf(doc_freq=1) 0.0 = field_norm(field=message, doc=0) Looks like Ferret should rather not consider the zero field_norm when computing the score in this case. Cheers, Jens -- Jens Kr?mer Finkenlust 14, 06449 Aschersleben, Germany VAT Id DE251962952 http://www.jkraemer.net/ - Blog http://www.omdb.org/ - The new free film database From jcanady at gmail.com Wed Feb 27 14:59:53 2008 From: jcanady at gmail.com (Jeremy Canady) Date: Wed, 27 Feb 2008 13:59:53 -0600 Subject: [Ferret-talk] Failed to load ferret_ext.bundle on mac os x In-Reply-To: References: Message-ID: <3587d9c10802271159w34e41f3byf20a915c756dc660@mail.gmail.com> Did you install gcc manually or did you get it when you installed the apple developer tools? -Jeremy On Tue, Feb 26, 2008 at 4:03 PM, Patrick wrote: > > I have found very little information about this problem on the net, so i > thought someone here might be able to help me. > > I am running mac os x 10.4 on a powerbook and have installed ruby 1.8.6 and > was trying to get ferret to work, without any success. > Here is exactly what i did and what happened: > > $ sudo gem install ferret > Updating metadata for 125 gems from http://gems.rubyforge.org > > ............................................................................................................................. > complete > Building native extensions. This could take a while... > Successfully installed ferret-0.11.6 > 1 gem installed > Installing ri documentation for ferret-0.11.6... > Installing RDoc documentation for ferret-0.11.6... > $ ruby ferret_test.rb > /usr/local/lib/ruby/gems/1.8/gems/ferret-0.11.6/lib/ferret_ext.bundle: > Failed to load > /usr/local/lib/ruby/gems/1.8/gems/ferret-0.11.6/lib/ferret_ext.bundle > (LoadError) > from > /usr/local/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:27:in `require' > from > /usr/local/lib/ruby/gems/1.8/gems/ferret-0.11.6/lib/ferret.rb:25 > from > /usr/local/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:32:in > `gem_original_require' > from > /usr/local/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:32:in `require' > from ferret_test.rb:2 > > > The ferret_test.rb includes the following lines: > require 'rubygems' > require 'ferret' > include Ferret > > The ferret_ext.bundle file does exist but i am guessing it might not have > been properly compiled. I opened it afterwards and it contains some error > message that seem to not belong there. > I do have gcc (4.0.1) and make etc. installed. The Powerbook is obviously > not an intel mac but in the .bundle-file i found this, i am not sure if that > helps to find out what the problem is: > > *ERROR: POSH double precision floating point serialization failed. Please > report this to poshlib at poshlib.org! > OS:..............MacOS X > CPU:.............Intel 386+ > endian:..........little > ptr size:........32-bits > 64-bit ints......yes > floating point...enabled > compiler.........Gnu GCC > > Please help! > > Patrick > > > > > > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > From pdogsaz7 at gmail.com Wed Feb 27 15:06:21 2008 From: pdogsaz7 at gmail.com (Patrick) Date: Wed, 27 Feb 2008 21:06:21 +0100 Subject: [Ferret-talk] Failed to load ferret_ext.bundle on mac os x In-Reply-To: <3587d9c10802271159w34e41f3byf20a915c756dc660@mail.gmail.com> Message-ID: I got it with the installation of the apple dev tools. $ gcc -v Using built-in specs. Target: powerpc-apple-darwin8 Configured with: /var/tmp/gcc/gcc-5370~2/src/configure --disable-checking -enable-werror --prefix=/usr --mandir=/share/man --enable-languages=c,objc,c++,obj-c++ --program-transform-name=/^[cg][^.-]*$/s/$/-4.0/ --with-gxx-include-dir=/include/c++/4.0.0 --with-slibdir=/usr/lib --build=powerpc-apple-darwin8 --host=powerpc-apple-darwin8 --target=powerpc-apple-darwin8 Thread model: posix gcc version 4.0.1 (Apple Computer, Inc. build 5370) Patrick Am 27.02.2008 20:59 Uhr schrieb "Jeremy Canady" unter : > Did you install gcc manually or did you get it when you installed the > apple developer tools? > > -Jeremy > > On Tue, Feb 26, 2008 at 4:03 PM, Patrick wrote: >> >> I have found very little information about this problem on the net, so i >> thought someone here might be able to help me. >> >> I am running mac os x 10.4 on a powerbook and have installed ruby 1.8.6 and >> was trying to get ferret to work, without any success. >> Here is exactly what i did and what happened: >> >> $ sudo gem install ferret >> Updating metadata for 125 gems from http://gems.rubyforge.org >> >> ............................................................................. >> ................................................ >> complete >> Building native extensions. This could take a while... >> Successfully installed ferret-0.11.6 >> 1 gem installed >> Installing ri documentation for ferret-0.11.6... >> Installing RDoc documentation for ferret-0.11.6... >> $ ruby ferret_test.rb >> /usr/local/lib/ruby/gems/1.8/gems/ferret-0.11.6/lib/ferret_ext.bundle: >> Failed to load >> /usr/local/lib/ruby/gems/1.8/gems/ferret-0.11.6/lib/ferret_ext.bundle >> (LoadError) >> from >> /usr/local/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:27:in `require' >> from >> /usr/local/lib/ruby/gems/1.8/gems/ferret-0.11.6/lib/ferret.rb:25 >> from >> /usr/local/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:32:in >> `gem_original_require' >> from >> /usr/local/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:32:in `require' >> from ferret_test.rb:2 >> >> >> The ferret_test.rb includes the following lines: >> require 'rubygems' >> require 'ferret' >> include Ferret >> >> The ferret_ext.bundle file does exist but i am guessing it might not have >> been properly compiled. I opened it afterwards and it contains some error >> message that seem to not belong there. >> I do have gcc (4.0.1) and make etc. installed. The Powerbook is obviously >> not an intel mac but in the .bundle-file i found this, i am not sure if that >> helps to find out what the problem is: >> >> *ERROR: POSH double precision floating point serialization failed. Please >> report this to poshlib at poshlib.org! >> OS:..............MacOS X >> CPU:.............Intel 386+ >> endian:..........little >> ptr size:........32-bits >> 64-bit ints......yes >> floating point...enabled >> compiler.........Gnu GCC >> >> Please help! >> >> Patrick >> >> >> >> >> >> _______________________________________________ >> Ferret-talk mailing list >> Ferret-talk at rubyforge.org >> http://rubyforge.org/mailman/listinfo/ferret-talk >> > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk From jcanady at gmail.com Wed Feb 27 16:01:21 2008 From: jcanady at gmail.com (Jeremy Canady) Date: Wed, 27 Feb 2008 15:01:21 -0600 Subject: [Ferret-talk] Failed to load ferret_ext.bundle on mac os x In-Reply-To: References: <3587d9c10802271159w34e41f3byf20a915c756dc660@mail.gmail.com> Message-ID: <3587d9c10802271301w488dc15jf24fb3f849eafe21@mail.gmail.com> I am going to assume you also have an up to date version of rubygems right? I will try to install it on one of my old eMacs in about an hour and will let you know how it goes. -Jeremy On Wed, Feb 27, 2008 at 2:06 PM, Patrick wrote: > I got it with the installation of the apple dev tools. > > $ gcc -v > Using built-in specs. > Target: powerpc-apple-darwin8 > Configured with: /var/tmp/gcc/gcc-5370~2/src/configure --disable-checking > -enable-werror --prefix=/usr --mandir=/share/man > --enable-languages=c,objc,c++,obj-c++ > --program-transform-name=/^[cg][^.-]*$/s/$/-4.0/ > --with-gxx-include-dir=/include/c++/4.0.0 --with-slibdir=/usr/lib > --build=powerpc-apple-darwin8 --host=powerpc-apple-darwin8 > --target=powerpc-apple-darwin8 > Thread model: posix > gcc version 4.0.1 (Apple Computer, Inc. build 5370) > > Patrick > > > Am 27.02.2008 20:59 Uhr schrieb "Jeremy Canady" unter : > > > > > Did you install gcc manually or did you get it when you installed the > > apple developer tools? > > > > -Jeremy > > > > On Tue, Feb 26, 2008 at 4:03 PM, Patrick wrote: > >> > >> I have found very little information about this problem on the net, so i > >> thought someone here might be able to help me. > >> > >> I am running mac os x 10.4 on a powerbook and have installed ruby 1.8.6 and > >> was trying to get ferret to work, without any success. > >> Here is exactly what i did and what happened: > >> > >> $ sudo gem install ferret > >> Updating metadata for 125 gems from http://gems.rubyforge.org > >> > >> ............................................................................. > >> ................................................ > >> complete > >> Building native extensions. This could take a while... > >> Successfully installed ferret-0.11.6 > >> 1 gem installed > >> Installing ri documentation for ferret-0.11.6... > >> Installing RDoc documentation for ferret-0.11.6... > >> $ ruby ferret_test.rb > >> /usr/local/lib/ruby/gems/1.8/gems/ferret-0.11.6/lib/ferret_ext.bundle: > >> Failed to load > >> /usr/local/lib/ruby/gems/1.8/gems/ferret-0.11.6/lib/ferret_ext.bundle > >> (LoadError) > >> from > >> /usr/local/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:27:in `require' > >> from > >> /usr/local/lib/ruby/gems/1.8/gems/ferret-0.11.6/lib/ferret.rb:25 > >> from > >> /usr/local/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:32:in > >> `gem_original_require' > >> from > >> /usr/local/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:32:in `require' > >> from ferret_test.rb:2 > >> > >> > >> The ferret_test.rb includes the following lines: > >> require 'rubygems' > >> require 'ferret' > >> include Ferret > >> > >> The ferret_ext.bundle file does exist but i am guessing it might not have > >> been properly compiled. I opened it afterwards and it contains some error > >> message that seem to not belong there. > >> I do have gcc (4.0.1) and make etc. installed. The Powerbook is obviously > >> not an intel mac but in the .bundle-file i found this, i am not sure if that > >> helps to find out what the problem is: > >> > >> *ERROR: POSH double precision floating point serialization failed. Please > >> report this to poshlib at poshlib.org! > >> OS:..............MacOS X > >> CPU:.............Intel 386+ > >> endian:..........little > >> ptr size:........32-bits > >> 64-bit ints......yes > >> floating point...enabled > >> compiler.........Gnu GCC > >> > >> Please help! > >> > >> Patrick > >> > >> > >> > >> > >> > >> _______________________________________________ > >> Ferret-talk mailing list > >> Ferret-talk at rubyforge.org > >> http://rubyforge.org/mailman/listinfo/ferret-talk > >> > > _______________________________________________ > > Ferret-talk mailing list > > Ferret-talk at rubyforge.org > > http://rubyforge.org/mailman/listinfo/ferret-talk > > > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > From u.alberton at gmail.com Thu Feb 28 07:23:59 2008 From: u.alberton at gmail.com (Bira) Date: Thu, 28 Feb 2008 09:23:59 -0300 Subject: [Ferret-talk] Multiple Filters? Message-ID: First, I'd like to thank you all for your patience and help on the "Document Scores" issue :). Now, I have a bit of a "noob" question... Is there a way to apply multiple filters to the same query? For example, I want to apply both a RangeFilter and a QueryFilter, but from what I've seen in the API docs, the :filter parameter that can be passed to the Searcher accepts only one filter. It does mention the possibility of applying filters to each other, but provides no examples. Is this possible? How can it be done? -- Bira http://compexplicita.wordpress.com http://compexplicita.tumblr.com From u.alberton at gmail.com Fri Feb 29 21:19:30 2008 From: u.alberton at gmail.com (Bira) Date: Fri, 29 Feb 2008 23:19:30 -0300 Subject: [Ferret-talk] Possible bug when creating a Ferret::Search::Sort object? Message-ID: I may have run across a bug in Ferret: if throws a segmentation fault when I try to create a Sort object using the default fields (SCORE and DOC_ID), but setting reverse to true. Here's the minimal example: #!/usr/bin/env ruby require 'rubygems' require 'ferret' Ferret::Search::Sort.new Ferret::Search::Sort.new( [ Ferret::Search::SortField::SCORE, Ferret::Search::SortField::DOC_ID ], false ) Ferret::Search::Sort.new( [ Ferret::Search::SortField::SCORE_REV, Ferret::Search::SortField::DOC_ID_REV ], false ) Ferret::Search::Sort.new( [ Ferret::Search::SortField::SCORE, Ferret::Search::SortField::DOC_ID ], true ) You should get something like this when creating the last object: $ruby sort.rb sort.rb:23: [BUG] Segmentation fault ruby 1.8.6 (2007-09-24) [x86_64-linux] Aborted Again, this is with Ferret 0.11.6 in Linux. Is this a known problem that's being worked on, or should I report it at the Trac tool on ferret.davebalmain.com? -- Bira http://compexplicita.wordpress.com http://compexplicita.tumblr.com