From alex at liivid.com Sat Nov 3 08:49:17 2007 From: alex at liivid.com (Alex Neth) Date: Sat, 3 Nov 2007 20:49:17 +0800 Subject: [Ferret-talk] Performance before and after optimization Message-ID: <39DF135A-146B-477D-90C8-ED1C7A309988@liivid.com> I have an index with a few hundred thousand records. The index is generally very fast, with sub 100ms responses. However if I start adding records, it gets extremely slow, up to over 2 seconds per query. This is true even if I am not currently indexing until I optimize the index. In order to work around this, I index in bulk and immediately optimize. This is not ideal for the performance of my site. Unfortunately, contrary to what Dave Balmain seems to say here: http://osdir.com/ml/lang.ruby.ferret.general/2006-08/msg00037.html , the index seems to be locked for reading during optimization. So I have two questions: 1) Why does the performance degrade so badly after adding just a few records, unless I optimize the index? Can I avoid this? 2) Can I keep a second index so that it doesn't get locked during optimization and then switch to the optimized index? Perhaps the index is not really locked and it is just using all the CPU? (I am using a single CPU server)? Thanks for any help. -Alex From hongli at plan99.net Sun Nov 4 13:22:49 2007 From: hongli at plan99.net (Hongli Lai) Date: Sun, 04 Nov 2007 19:22:49 +0100 Subject: [Ferret-talk] Searching different fields based on document permissions Message-ID: <472E0DF9.7020209@plan99.net> I'm currently writing a system that stores user-created documents. Each user belongs to a specific group, and the system supports multiple groups. The thing is, my users want to be able to hide pieces of a document from other groups. So for example, lets say Joe of team A has written this document: "Hello all, our secret plan is finally complete! We will begin our mission of world domination at 12:00 PM tomorrow." If Jane of team B views this document, she'll only see the text: "Hello all, our secret plan is finally complete!" Only other people in team A will be able to see the original message. So each document essentially has two versions of contents: one without private information, and one with private information. My users were very specific about this feature and want it no matter what. But this poses a problem for searching. Is it possible to tell Ferret the following? - Search all documents with the given search terms, but: * Search in the field content_without_private_information if the document does not belong to team A. * Search in the field content_with_private_information if the document belongs to team A. I've taken a quick look at the tutorial, and I've purchased the Ferret book by O'Reilly. But so far I can't seem to find anything that makes this possible. Is it possible at all? Or are there other possible alternatives? From scottd at gmail.com Sun Nov 4 19:39:31 2007 From: scottd at gmail.com (Scott Davies) Date: Sun, 4 Nov 2007 16:39:31 -0800 Subject: [Ferret-talk] Searching different fields based on document permissions In-Reply-To: <472E0DF9.7020209@plan99.net> References: <472E0DF9.7020209@plan99.net> Message-ID: <75f591160711041639j37be4fd4h4c13c41e9ee4f499@mail.gmail.com> It's trivial if you construct your query tree manually, which you'll probably have to do for your security purposes (as opposed to using one of the existing query parsers)...the first argument to TermQuery's constructor is which field to search. On 11/4/07, Hongli Lai wrote: > I'm currently writing a system that stores user-created documents. Each > user belongs to a specific group, and the system supports multiple > groups. The thing is, my users want to be able to hide pieces of a > document from other groups. So for example, lets say Joe of team A has > written this document: > "Hello all, our secret plan is finally complete! We will > begin our mission of world domination at 12:00 PM tomorrow." > > If Jane of team B views this document, she'll only see the text: > "Hello all, our secret plan is finally complete!" > Only other people in team A will be able to see the original message. > > So each document essentially has two versions of contents: one without > private information, and one with private information. > > My users were very specific about this feature and want it no matter > what. But this poses a problem for searching. Is it possible to tell > Ferret the following? > - Search all documents with the given search terms, but: > * Search in the field content_without_private_information if the > document does not belong to team A. > * Search in the field content_with_private_information if the > document belongs to team A. > > I've taken a quick look at the tutorial, and I've purchased the Ferret > book by O'Reilly. But so far I can't seem to find anything that makes > this possible. Is it possible at all? Or are there other possible > alternatives? > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > From hongli at plan99.net Mon Nov 5 05:09:28 2007 From: hongli at plan99.net (Hongli Lai) Date: Mon, 05 Nov 2007 11:09:28 +0100 Subject: [Ferret-talk] Searching different fields based on document permissions In-Reply-To: <75f591160711041639j37be4fd4h4c13c41e9ee4f499@mail.gmail.com> References: <472E0DF9.7020209@plan99.net> <75f591160711041639j37be4fd4h4c13c41e9ee4f499@mail.gmail.com> Message-ID: <472EEBD8.4040002@plan99.net> Scott Davies wrote: > It's trivial if you construct your query tree manually, which you'll > probably have to do for your security purposes (as opposed to using > one of the existing query parsers)...the first argument to TermQuery's > constructor is which field to search. I found out that I'll need a query that looks like this: "(group_id:#{group_id} AND private_content:#{search_term}) OR (public_content:#{search_term})" The query parser seems to generate a BooleanQuery at the top-level. I spent several hours reading the book and the API, but I could not find a way to generate 'OR' boolean queries. The API only allows :must, :must_not and :should. How can I construct an OR query like the one above? From jk at jkraemer.net Mon Nov 5 05:23:48 2007 From: jk at jkraemer.net (Jens Kraemer) Date: Mon, 5 Nov 2007 11:23:48 +0100 Subject: [Ferret-talk] Searching different fields based on document permissions In-Reply-To: <472EEBD8.4040002@plan99.net> References: <472E0DF9.7020209@plan99.net> <75f591160711041639j37be4fd4h4c13c41e9ee4f499@mail.gmail.com> <472EEBD8.4040002@plan99.net> Message-ID: <20071105102348.GX19167@thunder.jkraemer.net> On Mon, Nov 05, 2007 at 11:09:28AM +0100, Hongli Lai wrote: > Scott Davies wrote: > > It's trivial if you construct your query tree manually, which you'll > > probably have to do for your security purposes (as opposed to using > > one of the existing query parsers)...the first argument to TermQuery's > > constructor is which field to search. > > I found out that I'll need a query that looks like this: > > "(group_id:#{group_id} AND private_content:#{search_term}) OR > (public_content:#{search_term})" > > The query parser seems to generate a BooleanQuery at the top-level. I > spent several hours reading the book and the API, but I could not find a > way to generate 'OR' boolean queries. The API only allows :must, > :must_not and :should. How can I construct an OR query like the one above? Ferret by default creates OR queries so a query string like 'term1 term2' means the same as 'term1 OR term2' . Using the API, :should is the correct modifier to create ORed boolean clauses. Jens -- Jens Kr?mer http://www.jkraemer.net/ - Blog http://www.omdb.org/ - The new free film database From jk at jkraemer.net Mon Nov 5 05:29:16 2007 From: jk at jkraemer.net (Jens Kraemer) Date: Mon, 5 Nov 2007 11:29:16 +0100 Subject: [Ferret-talk] Performance before and after optimization In-Reply-To: <39DF135A-146B-477D-90C8-ED1C7A309988@liivid.com> References: <39DF135A-146B-477D-90C8-ED1C7A309988@liivid.com> Message-ID: <20071105102916.GY19167@thunder.jkraemer.net> On Sat, Nov 03, 2007 at 08:49:17PM +0800, Alex Neth wrote: [..] > 2) Can I keep a second index so that it doesn't get locked during > optimization and then switch to the optimized index? Perhaps the index > is not really locked and it is just using all the CPU? (I am using a > single CPU server)? If you're already indexing in batches, keeping a second read-only index for searching is a good idea. rsync is useful to keep the search-index up to date in this case. To check if CPU usage is a problem, try lowering the optimizing process' priority and see how it goes. Jens -- Jens Kr?mer http://www.jkraemer.net/ - Blog http://www.omdb.org/ - The new free film database From pjones at pmade.com Mon Nov 5 12:51:38 2007 From: pjones at pmade.com (Peter Jones) Date: Mon, 5 Nov 2007 10:51:38 -0700 Subject: [Ferret-talk] Unified ferret_start and ferret_stop Message-ID: Sorry for the top posting. I posted this to the ruby-forum site on the 25th of October but it doesn't seem to have made it's way to this mailing list. Here is the original post: --- I've attached my first set of changes. The attached archive includes a README file with information about what I've changed and why. These changes are only for Unix-like operating systems, for now. If you like the changes I've made, I'll integrate the Windows code from the various scripts in the script directory. Let me know if you have any questions. --- The patches were originally attached to the forum posting. The URL to the patches is therefor: http://www.ruby-forum.com/attachment/780/patches.tar.gz Thanks. -- Peter Jones pmade inc. - http://pmade.com From pjones at pmade.com Mon Nov 5 12:46:20 2007 From: pjones at pmade.com (Peter Jones) Date: Mon, 5 Nov 2007 10:46:20 -0700 Subject: [Ferret-talk] Partial Class Definition if Ferret Server Not Running Message-ID: <57F9859C-C936-4E05-B0F9-F64173DC0C79@pmade.com> When using a remote ferret server, if the ferret server is not running the acts_as_ferret class method will raise an exception. This causes the model class to only be partially defined, and therefore all use of that class in the rails application will explode until the rails process is restarted. This stems from the fact that ensure_index_exists is called on the server just before the end of the acts_as_ferret class method. This brings up a few questions: 1) Why can't remote_index call ensure_index_exists on the fly similar to how local_index does it? Can't this be done in the server on the fly? What about rebuilding all indexes in the server using ensure_index_exists at start up time, instead of being called for each class during class definition? 2) There seems to be a lot of generic functionality in local_index that could be moved up to the abstract index, and therefor expand the functionality of the remote_index class. Are there any reasons this hasn't been done yet? Either way, this needs to be corrected because allowing an exception to raise during class definition is a very bad thing. I'd be more than happy to submit a patch if someone points me in the right direction regarding the correct way to resolve this (in remote_index or ferret_sever). Having the ferret_server check the indexes when it starts seems to be the correct idea, instead of having them checked once for each class in each rails process as it starts. Thanks. -- Peter Jones pmade inc. - http://pmade.com From pjones at pmade.com Mon Nov 5 16:05:37 2007 From: pjones at pmade.com (Peter Jones) Date: Mon, 5 Nov 2007 14:05:37 -0700 Subject: [Ferret-talk] Segmentation Fault in more_like_this.rb Message-ID: I've been seeing some core dumps coming from ferret_server: acts_as_ferret/lib/more_like_this.rb:170: [BUG] Segmentation fault ruby 1.8.6 (2007-03-13) [i386-freebsd6] I'm running the latest build of ferret (0.11.4-rc5). Line 170 in more_like_this.rb is: freq = reader.doc_freq(field_name, word) which is calling into the ferret C code (if I'm reading this correctly). Is there anything I can do to get you more information, or help track down this problem? Thanks. -- Peter Jones pmade inc. - http://pmade.com From ndaniels at mac.com Mon Nov 5 16:11:53 2007 From: ndaniels at mac.com (Noah M. Daniels) Date: Mon, 5 Nov 2007 16:11:53 -0500 Subject: [Ferret-talk] Strange wildcard problem Message-ID: <42B8371C-25DE-4707-926A-FC7431F40B2C@mac.com> Hi, Apologies for reposting this for those who read this via ruby-forum, but it didn't make it to the list before, and the list seems more active... I'm using ferret (via acts_as_ferret) in a somewhat unorthodox manner and am having a strange wildcard problem. Before anyone wonders why we're doing things this way, the answer is basically that it lets us precompute what would be expensive database queries and store the results in a simple way (ferret index) prior to pushing the static data to our production server. Basically, I've got two (for the sake of simplicity) models, both of which are indexed on a similar (but separate) non-model field. However, one of those two models does not seem to get the proper number of results for a wildcard search: First of all, there's a non-indexed model called ProductTuple that's got a supplier_id as well as a product_category_id and product_material_id as well as some other id fields that aren't really important here. Thus, a ProductTuple has foreign key relationships to Suppliers and ProductCategories and ProductMaterials, but for ferret purposes just think of those foreign keys as what they are - ids (e.g. integers). The first model, Supplier, is ferret-indexed on several fields, such as the supplier name and supplier country, as well as the 'ferret_product_tuples' non-model field. ferret_product_tuples simply takes all the product tuples for a supplier and concatenates their product_category_id, product_material_id, etc. with delimiters. So, for a product tuple with product_category_id 82, product_material_id 88, and undefined product_technique_id, the resulting part of the ferret_product_tuple string would look like x00082_00088_00000x (where we use 00000 to indicate null). the xs are used as anchors, essentially, as a given supplier's ferret_product_tuple string might look like 'x00082_00088_00000x x00000_00081_00013x'. Now, the ferret query that gets constructed when we do the relevant queries simply looks like: 'ferret_product_tuple:x00082_?????_?????x' and this would, in the above instance, match that supplier. Everything I've described works _perfectly_, EXCEPT... we also index product_categories on this same string. So product category #82 would have a bunch of ferret_product_tuple strings that start out x00082 and have various things in the other positions. Here's what's strange... a product_category query for 'ferret_product_tuple:x?????_?????_?????x' should return ALL product categories, right? Yet it only returns six. A product category query for 'ferret_product_tuple:x?????_00081_?????x' should return all the product categories that share product_tuples with product_material #81, but in fact returns only a small number of categories. Yet making the wildcard match MORE restrictive by substituting 'ferret_product_tuple:x00082_00081_?????x' into that query yields product_category #82, which is erroneously not included in the 6 results for 'ferret_product_tuple:x?????_00081_?????x'. So, have I stumbled upon a bug in the wildcard handling? My initial thought was that the different analyzer I was using for the product_category index was the culprit, but I changed that analyzer out to no effect, so I've ruled that out. Any ideas? Thanks! From bk at benjaminkrause.com Mon Nov 5 16:23:00 2007 From: bk at benjaminkrause.com (Benjamin Krause) Date: Mon, 5 Nov 2007 22:23:00 +0100 Subject: [Ferret-talk] Segmentation Fault in more_like_this.rb In-Reply-To: References: Message-ID: <65C9DC2C-9D1F-4A7F-A897-A7034EB10049@benjaminkrause.com> Peter, > Is there anything I can do to get you more information, or help track > down this problem? Yes, of course.. try to break the error down to a simple test case and create a ticket at the ferret trac. There're still a few problems in ferret that needs to be addressed, and this might be one of them. Whenever David gets another chance to fix some ferret bugs, it would be great to have a test case that helps to identify the problem. Ben From kraemer at webit.de Tue Nov 6 04:06:16 2007 From: kraemer at webit.de (Jens Kraemer) Date: Tue, 6 Nov 2007 10:06:16 +0100 Subject: [Ferret-talk] Strange wildcard problem In-Reply-To: <42B8371C-25DE-4707-926A-FC7431F40B2C@mac.com> References: <42B8371C-25DE-4707-926A-FC7431F40B2C@mac.com> Message-ID: <20071106090616.GD30619@cordoba.webit.de> Hi! wildcard queries have a built in upper limit of terms they search for, which by default is set to 512 (according to http://ferret.davebalmain.com/api/classes/Ferret/Search/WildcardQuery.html). So when you query for asdf*, Ferret expands this to all terms in your index starting with asdf, but will stop after collecting 512 terms, then go and retrieve all documents containing these 512 terms, obviously missing those that would in theory match your query, but do this by containing a matching term that wasn't retrieved in the first step. Of course you can set the max_term count to a higher value, but in the long run this isn't really a solution. If I understand you correctly, your tuple field right now has a single term for each document, and that term is different for each document. Splitting up your tuple values into several different terms could help to reduce the number of terms needed to fetch for a wild card query. Cheers, Jens On Mon, Nov 05, 2007 at 04:11:53PM -0500, Noah M. Daniels wrote: > Hi, > Apologies for reposting this for those who read this via ruby-forum, > but it didn't make it to the list before, and the list seems more > active... > I'm using ferret (via acts_as_ferret) in a somewhat unorthodox > manner and am having a strange wildcard problem. Before anyone wonders > why we're doing things this way, the answer is basically that it lets > us precompute what would be expensive database queries and store the > results in a simple way (ferret index) prior to pushing the static > data to our production server. > Basically, I've got two (for the sake of simplicity) models, both of > which are indexed on a similar (but separate) non-model field. > However, one of those two models does not seem to get the proper > number of results for a wildcard search: > First of all, there's a non-indexed model called ProductTuple that's > got a supplier_id as well as a product_category_id and > product_material_id as well as some other id fields that aren't really > important here. Thus, a ProductTuple has foreign key relationships to > Suppliers and ProductCategories and ProductMaterials, but for ferret > purposes just think of those foreign keys as what they are - ids (e.g. > integers). > The first model, Supplier, is ferret-indexed on several fields, such > as the supplier name and supplier country, as well as the > 'ferret_product_tuples' non-model field. ferret_product_tuples simply > takes all the product tuples for a supplier and concatenates their > product_category_id, product_material_id, etc. with delimiters. > So, for a product tuple with product_category_id 82, > product_material_id 88, and undefined product_technique_id, the > resulting part of the ferret_product_tuple string would look like > x00082_00088_00000x (where we use 00000 to indicate null). the xs are > used as anchors, essentially, as a given supplier's > ferret_product_tuple string might look like 'x00082_00088_00000x > x00000_00081_00013x'. > Now, the ferret query that gets constructed when we do the relevant > queries simply looks like: > 'ferret_product_tuple:x00082_?????_?????x' > and this would, in the above instance, match that supplier. > Everything I've described works _perfectly_, EXCEPT... > we also index product_categories on this same string. So product > category #82 would have a bunch of ferret_product_tuple strings that > start out x00082 and have various things in the other positions. > Here's what's strange... a product_category query for > 'ferret_product_tuple:x?????_?????_?????x' should return ALL product > categories, right? Yet it only returns six. A product category query > for 'ferret_product_tuple:x?????_00081_?????x' should return all the > product categories that share product_tuples with product_material > #81, but in fact returns only a small number of categories. Yet making > the wildcard match MORE restrictive by substituting > 'ferret_product_tuple:x00082_00081_?????x' into that query yields > product_category #82, which is erroneously not included in the 6 > results for 'ferret_product_tuple:x?????_00081_?????x'. > So, have I stumbled upon a bug in the wildcard handling? My initial > thought was that the different analyzer I was using for the > product_category index was the culprit, but I changed that analyzer > out to no effect, so I've ruled that out. > Any ideas? Thanks! > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From kraemer at webit.de Tue Nov 6 04:06:57 2007 From: kraemer at webit.de (Jens Kraemer) Date: Tue, 6 Nov 2007 10:06:57 +0100 Subject: [Ferret-talk] Unified ferret_start and ferret_stop In-Reply-To: References: Message-ID: <20071106090657.GE30619@cordoba.webit.de> Thanks Peter, I'll have a look at these this evening. Jens On Mon, Nov 05, 2007 at 10:51:38AM -0700, Peter Jones wrote: > Sorry for the top posting. I posted this to the ruby-forum site on > the 25th of October but it doesn't seem to have made it's way to this > mailing list. Here is the original post: > --- > I've attached my first set of changes. The attached archive includes a > README file with information about what I've changed and why. > These changes are only for Unix-like operating systems, for now. If > you like the changes I've made, I'll integrate the Windows code from > the various scripts in the script directory. > Let me know if you have any questions. > --- > The patches were originally attached to the forum posting. The URL to > the patches is therefor: http://www.ruby-forum.com/attachment/780/patches.tar.gz > Thanks. > -- > Peter Jones > pmade inc. - http://pmade.com > > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From lebreeze at gmail.com Tue Nov 6 07:14:19 2007 From: lebreeze at gmail.com (Levent Ali) Date: Tue, 6 Nov 2007 13:14:19 +0100 Subject: [Ferret-talk] Question about the Fail over on the ferret server In-Reply-To: References: Message-ID: I am looking to solve the same issue... Ferret only seems to be able to use 1 cpu on the machine as well and once it ramps up to near 100% it comes to a grinding halt... -- Posted via http://www.ruby-forum.com/. From lebreeze at gmail.com Tue Nov 6 07:33:19 2007 From: lebreeze at gmail.com (Levent Ali) Date: Tue, 6 Nov 2007 13:33:19 +0100 Subject: [Ferret-talk] ferret / acts_as_ferret multiple server deployment In-Reply-To: References: <80767f27b7712e9a65f94ab6d4c09987@ruby-forum.com> <20060912183159.GB2233@cordoba.webit.de> <6e9c57140e75348102f2a5bcaf37a2ce@ruby-forum.com> <20060912211659.GA29768@cordoba.webit.de> Message-ID: <953f6162853d2b5652798ce0c39455b7@ruby-forum.com> David Balmain wrote: > On 9/13/06, Jens Kraemer wrote: >> >> load balancing the indexing to several servers can only be done via >> segmenting the data across those servers, and merging it when searching. >> This seems possible but is not implemented in Ferret (yet?) > > The start of this is there (ie the MultiSearcher). I just need to > implement RemoteSearcher. Don't expect it any time soon however as I'm > a little burnt out at the moment. I'm just going to be cleaning up > what is currently already built for the time being. > > Cheers, > Dave Any progress on RemoteSearcher? :) -- Posted via http://www.ruby-forum.com/. From ndaniels at mac.com Tue Nov 6 09:00:33 2007 From: ndaniels at mac.com (Noah Daniels) Date: Tue, 6 Nov 2007 15:00:33 +0100 Subject: [Ferret-talk] Strange wildcard problem In-Reply-To: <20071106090616.GD30619@cordoba.webit.de> References: <42B8371C-25DE-4707-926A-FC7431F40B2C@mac.com> <20071106090616.GD30619@cordoba.webit.de> Message-ID: <7435d282c6ea369eacfd1c1775bc341d@ruby-forum.com> Jens Kraemer wrote: > Hi! > > wildcard queries have a built in upper limit of terms they search for, > which by default is set to 512 (according to > http://ferret.davebalmain.com/api/classes/Ferret/Search/WildcardQuery.html). > > So when you query for asdf*, Ferret expands this to all terms in your > index starting with asdf, but will stop after collecting 512 terms, then > go and retrieve all documents containing these 512 terms, obviously > missing those that would in theory match your query, but do this by > containing a matching term that wasn't retrieved in the first step. > > Of course you can set the max_term count to a higher value, but in the > long run this isn't really a solution. If I understand you correctly, > your tuple field right now has a single term for each document, and that > term is different for each document. Splitting up your tuple values into > several different terms could help to reduce the number of terms needed > to fetch for a wild card query. > Interesting, thanks. Actually I can't split the tuple values up -- the requirement is to see those terms occur together in the same tuple, not just for the same document (there is a difference in this case). So, I'll try expanding the max_term count to see if that helps; otherwise I'll have to rethink the solution. -- Posted via http://www.ruby-forum.com/. From ndaniels at mac.com Tue Nov 6 11:25:56 2007 From: ndaniels at mac.com (Noah M. Daniels) Date: Tue, 6 Nov 2007 11:25:56 -0500 Subject: [Ferret-talk] Strange wildcard problem In-Reply-To: <7435d282c6ea369eacfd1c1775bc341d@ruby-forum.com> References: <42B8371C-25DE-4707-926A-FC7431F40B2C@mac.com> <20071106090616.GD30619@cordoba.webit.de> <7435d282c6ea369eacfd1c1775bc341d@ruby-forum.com> Message-ID: On Nov 6, 2007, at 9:00 AM, Noah Daniels wrote: > Jens Kraemer wrote: >> > > Interesting, thanks. Actually I can't split the tuple values up -- the > requirement is to see those terms occur together in the same tuple, > not > just for the same document (there is a difference in this case). So, > I'll try expanding the max_term count to see if that helps; otherwise > I'll have to rethink the solution. Jens, many thanks; upping the max_terms (max_clauses seems to be the same thing) solved the problem beautifully. However, now I'm trying to get this working with a remote ferret server (using acts_as_ferret) and not having any luck. Particularly, I can't figure out where to set max_terms (or Ferret::Search::MultiTermQuery.default_max_terms= ) such that the remote ferret server will pick it up -- including in the start script for the remote ferret server. Where can I change this option so it'll work for a remote server with AAF? thanks! From kraemer at webit.de Tue Nov 6 11:35:54 2007 From: kraemer at webit.de (Jens Kraemer) Date: Tue, 6 Nov 2007 17:35:54 +0100 Subject: [Ferret-talk] Strange wildcard problem In-Reply-To: References: <42B8371C-25DE-4707-926A-FC7431F40B2C@mac.com> <20071106090616.GD30619@cordoba.webit.de> <7435d282c6ea369eacfd1c1775bc341d@ruby-forum.com> Message-ID: <20071106163554.GD2040@cordoba.webit.de> On Tue, Nov 06, 2007 at 11:25:56AM -0500, Noah M. Daniels wrote: > > > On Nov 6, 2007, at 9:00 AM, Noah Daniels wrote: > > > Jens Kraemer wrote: > >> > > > > Interesting, thanks. Actually I can't split the tuple values up -- the > > requirement is to see those terms occur together in the same tuple, > > not > > just for the same document (there is a difference in this case). So, > > I'll try expanding the max_term count to see if that helps; otherwise > > I'll have to rethink the solution. > > Jens, many thanks; upping the max_terms (max_clauses seems to be the > same thing) solved the problem beautifully. However, now I'm trying to > get this working with a remote ferret server (using acts_as_ferret) > and not having any luck. Particularly, I can't figure out where to set > max_terms (or Ferret::Search::MultiTermQuery.default_max_terms= ) such > that the remote ferret server will pick it up -- including in the > start script for the remote ferret server. Where can I change this > option so it'll work for a remote server with AAF? Placing it at the end of acts_as_ferret's init.rb should work. Cheers, Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From ndaniels at mac.com Tue Nov 6 11:41:24 2007 From: ndaniels at mac.com (Noah M. Daniels) Date: Tue, 6 Nov 2007 11:41:24 -0500 Subject: [Ferret-talk] Strange wildcard problem In-Reply-To: <20071106163554.GD2040@cordoba.webit.de> References: <42B8371C-25DE-4707-926A-FC7431F40B2C@mac.com> <20071106090616.GD30619@cordoba.webit.de> <7435d282c6ea369eacfd1c1775bc341d@ruby-forum.com> <20071106163554.GD2040@cordoba.webit.de> Message-ID: <355290B8-BD7C-45F3-A1FC-CE6D15ABDBD1@mac.com> On Nov 6, 2007, at 11:35 AM, Jens Kraemer wrote: > On Tue, Nov 06, 2007 at 11:25:56AM -0500, Noah M. Daniels wrote: >> > Placing it at the end of acts_as_ferret's init.rb should work. Unfortunately, it doesn't seem to. For a local index, I can just put this anywhere in code (even in a controller, or in the console) and I start getting correct results from my query: Ferret::Search::MultiTermQuery.default_max_terms = 5000 but on my staging server, where a drb ferret server is used, putting that line in the init.rb doesn't seem to do anything -- in fact, even putting it into the initialize method of the LocalIndex class doesn't help! Any ideas? thanks! From alex at liivid.com Wed Nov 7 08:55:24 2007 From: alex at liivid.com (Alex Neth) Date: Wed, 7 Nov 2007 21:55:24 +0800 Subject: [Ferret-talk] Ferret-talk Digest, Vol 25, Issue 2 In-Reply-To: References: Message-ID: > From: Jens Kraemer > Subject: Re: [Ferret-talk] Performance before and after optimization > On Sat, Nov 03, 2007 at 08:49:17PM +0800, Alex Neth wrote: > [..] >> 2) Can I keep a second index so that it doesn't get locked during >> optimization and then switch to the optimized index? Perhaps the >> index >> is not really locked and it is just using all the CPU? (I am using a >> single CPU server)? > > If you're already indexing in batches, keeping a second read-only > index for > searching is a good idea. rsync is useful to keep the search-index > up to > date in this case. > > To check if CPU usage is a problem, try lowering the optimizing > process' > priority and see how it goes. > Thanks Jens. Any suggestion on how to get a two index solution working with acts_as_ferret? I could not find an easy way to change the index location dynamically. I would love to have a "read-only" index. It seems like using rsync might be problematic though as the index might not be in a consistent state throughout the sync. I don't think it is CPU, but it is definitely locking my site for up to a minute during optimization, which is very bad. From jk at jkraemer.net Wed Nov 7 15:05:52 2007 From: jk at jkraemer.net (Jens Kraemer) Date: Wed, 7 Nov 2007 21:05:52 +0100 Subject: [Ferret-talk] Unified ferret_start and ferret_stop In-Reply-To: References: Message-ID: <20071107200552.GB18363@thunder.jkraemer.net> Hi Peter, works like a charm and looks great :-) Just merged this into trunk. Cheers, Jens On Mon, Nov 05, 2007 at 10:51:38AM -0700, Peter Jones wrote: > Sorry for the top posting. I posted this to the ruby-forum site on > the 25th of October but it doesn't seem to have made it's way to this > mailing list. Here is the original post: > --- > I've attached my first set of changes. The attached archive includes a > README file with information about what I've changed and why. > These changes are only for Unix-like operating systems, for now. If > you like the changes I've made, I'll integrate the Windows code from > the various scripts in the script directory. > Let me know if you have any questions. > --- > The patches were originally attached to the forum posting. The URL to > the patches is therefor: http://www.ruby-forum.com/attachment/780/patches.tar.gz > Thanks. > -- > Peter Jones > pmade inc. - http://pmade.com > > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > -- Jens Kr?mer http://www.jkraemer.net/ - Blog http://www.omdb.org/ - The new free film database From mail at stuartsierra.com Wed Nov 7 17:01:45 2007 From: mail at stuartsierra.com (Stuart Sierra) Date: Wed, 7 Nov 2007 17:01:45 -0500 Subject: [Ferret-talk] Ferret-talk Digest, Vol 25, Issue 2 In-Reply-To: References: Message-ID: <314ee0450711071401l97b4be8j5e298d7d24383ea3@mail.gmail.com> On 11/7/07, Alex Neth wrote: > Thanks Jens. Any suggestion on how to get a two index solution > working with acts_as_ferret? I rolled my own with methods in my model class, something like this: def self.setup_new_index(location) config = aaf_configuration[:ferret].dup config.update(:create => true, :auto_flush => false, :field_infos => ActsAsFerret::field_infos([self]), :path => location) index = Ferret::Index::Index.new(config) index.logger = Logger.new("#{location}/index.log") index end def self.build_new_index(location) index = setup_new_index(location) max = self.maximum(:id) start = self.minimum(:id) start.step(max, increment) do |n| begin record = self.find(n) rescue ActiveRecord::RecordNotFound next end index << record.to_doc if record and record.ferret_enabled?(true) end end Then I have a rake task to replace the old index with the new one. -Stuart Sierra columbialawtech.org From mail at stuartsierra.com Wed Nov 7 17:04:39 2007 From: mail at stuartsierra.com (Stuart Sierra) Date: Wed, 7 Nov 2007 17:04:39 -0500 Subject: [Ferret-talk] Ferret-talk Digest, Vol 25, Issue 2 In-Reply-To: <314ee0450711071401l97b4be8j5e298d7d24383ea3@mail.gmail.com> References: <314ee0450711071401l97b4be8j5e298d7d24383ea3@mail.gmail.com> Message-ID: <314ee0450711071404j4048d4d4ic034ddd5d4676d80@mail.gmail.com> > On 11/7/07, Alex Neth wrote: > > Thanks Jens. Any suggestion on how to get a two index solution > > working with acts_as_ferret? On 11/7/07, Stuart Sierra wrote: > I rolled my own with methods in my model class, something like this: Correction: this line: > start.step(max, increment) do |n| should be > start.upto(max) do |n| -Stuart Sierra columbialawtech.org From phedre at gmail.com Fri Nov 9 14:41:17 2007 From: phedre at gmail.com (phedre) Date: Fri, 9 Nov 2007 14:41:17 -0500 Subject: [Ferret-talk] Problem with stemming and AAF Message-ID: <5d302a7b0711091141s433c009cpd45cb5db392a244d@mail.gmail.com> I'm sure I'm missing something completely obvious here, so I hope someone can point me in the right direction! I've implemented a basic search with AAF, which works as expected; I'm running a ferret drb server, and using will_paginate to page results. The code in my search_controller.rb: search_text = params[:query] || " " @products = Product.find_with_ferret(search_text, :page => params[:page], :per_page => #$ItemsPerPage, :limit => $ItemsPerPage, :offset => $offset) @results_pages = Product.paginate_search(search_text, :page => params[:page], :per_page => $ItemsPerPage) The next step was to implement stemming, which seemed straightforward enough. I created the stemmed_analyzer.rb file in the lib directory, as follows: require 'rubygems' require 'ferret' class StemmedAnalyzer < Ferret::Analysis::Analyzer include Ferret::Analysis def initialize(stop_words = ENGLISH_STOP_WORDS) @stop_words = stop_words end def token_stream(field, str) StemFilter.new(StopFilter.new(LowerCaseFilter.new(StandardTokenizer.new(str)), @stop_words)) end end And added the call to the analyzer in my model file: acts_as_ferret( :fields => { :name => { :boost => 1, :store => :yes }, :product_number => { :boost => 2 }, :description => { :boost => 0, :store => :yes }, :care => { :boost => -2 }, :manufacturer_name => { :boost => 1, :store => :yes }, :collection_name => { :boost => 1, :store => :yes }, :category_name => { :boost => 0 } }, :remote => true, :analyzer => StemmedAnalyzer.new ) Straight forward, no errors. But also no results. Searching for chairs returns only results for that word, not chair or chairs. I know the actual analyzer works, as when I explicity call it as follows, it returns the correct root words to the log files: search_terms = StemmedAnalyzer.new.token_stream(nil, params[:query]) while token = search_terms.next puts token end Like so: Search for "chairs tables" returns token["chair":0:6:1] token["tabl":7:13:1] but the front end throws up on me with a: TypeError (wrong argument type DRb::DRbObject (expected Data)) I'm fully confused. I'm sure it's something obvious that I'm just not seeing, and after beating my head against this for two days, I'm hoping someone can point it out to me! Or at least get me moving in the right direction. Thanks for any help! claudia -- If you can't be a good example, then you'll just have to be a horrible warning. From jk at jkraemer.net Sat Nov 10 03:36:18 2007 From: jk at jkraemer.net (Jens Kraemer) Date: Sat, 10 Nov 2007 09:36:18 +0100 Subject: [Ferret-talk] Problem with stemming and AAF In-Reply-To: <5d302a7b0711091141s433c009cpd45cb5db392a244d@mail.gmail.com> References: <5d302a7b0711091141s433c009cpd45cb5db392a244d@mail.gmail.com> Message-ID: <20071110083618.GJ2341@thunder.jkraemer.net> Hi! the analyzer option belongs to the set of options which aaf directly passes on to Ferret, and therefore the call has to read: acts_as_ferret(:fields => { }, :remote => true, :ferret => { :analyzer => StemmedAnalyzer }) Cheers, Jens On Fri, Nov 09, 2007 at 02:41:17PM -0500, phedre wrote: > I'm sure I'm missing something completely obvious here, so I hope > someone can point me in the right direction! > > I've implemented a basic search with AAF, which works as expected; I'm > running a ferret drb server, and using will_paginate to page results. > The code in my search_controller.rb: > > search_text = params[:query] || " " > @products = Product.find_with_ferret(search_text, :page => > params[:page], :per_page => #$ItemsPerPage, :limit => $ItemsPerPage, > :offset => $offset) > @results_pages = Product.paginate_search(search_text, :page => > params[:page], :per_page => $ItemsPerPage) > > > > The next step was to implement stemming, which seemed straightforward > enough. I created the stemmed_analyzer.rb file in the lib directory, > as follows: > > require 'rubygems' > require 'ferret' > > class StemmedAnalyzer < Ferret::Analysis::Analyzer > include Ferret::Analysis > def initialize(stop_words = ENGLISH_STOP_WORDS) > @stop_words = stop_words > end > def token_stream(field, str) > StemFilter.new(StopFilter.new(LowerCaseFilter.new(StandardTokenizer.new(str)), > @stop_words)) > end > end > > > And added the call to the analyzer in my model file: > > acts_as_ferret( :fields => { :name => { :boost => 1, > :store => :yes }, > :product_number => { :boost => 2 }, > :description => { :boost => 0, > :store => :yes }, > :care => { :boost => -2 }, > :manufacturer_name => { :boost => 1, > :store => :yes }, > :collection_name => { :boost => 1, > :store => :yes }, > :category_name => { :boost => 0 } > }, > :remote => true, > :analyzer => StemmedAnalyzer.new ) -- Jens Kr?mer http://www.jkraemer.net/ - Blog http://www.omdb.org/ - The new free film database From alex at liivid.com Sat Nov 10 03:29:48 2007 From: alex at liivid.com (Alex Neth) Date: Sat, 10 Nov 2007 16:29:48 +0800 Subject: [Ferret-talk] Performance before and after optimization In-Reply-To: References: Message-ID: <243B3673-0968-484C-AEBE-2BB66B161E55@liivid.com> > From: Jens Kraemer > Subject: Re: [Ferret-talk] Performance before and after optimization > On Sat, Nov 03, 2007 at 08:49:17PM +0800, Alex Neth wrote: > [..] >> 2) Can I keep a second index so that it doesn't get locked during >> optimization and then switch to the optimized index? Perhaps the >> index >> is not really locked and it is just using all the CPU? (I am using a >> single CPU server)? > > If you're already indexing in batches, keeping a second read-only > index for > searching is a good idea. rsync is useful to keep the search-index > up to > date in this case. > > To check if CPU usage is a problem, try lowering the optimizing > process' > priority and see how it goes. > Thanks Jens. Any suggestion on how to get a two index solution working with acts_as_ferret? I could not find an easy way to change the index location dynamically. I would love to have a "read-only" index. It seems like using rsync might be problematic though as the index might not be in a consistent state throughout the sync. It's not the CPU. The index is definitely locked for reading during optimization. With cheap disk space, I would rather use two indexes, add new records to the "off" index, optimize it, then switch indexes - and go back and for like that. From alex at liivid.com Sat Nov 10 05:41:34 2007 From: alex at liivid.com (Alex Neth) Date: Sat, 10 Nov 2007 18:41:34 +0800 Subject: [Ferret-talk] Ferret-talk Digest, Vol 25, Issue 3 In-Reply-To: References: Message-ID: <33AF33E6-109F-4F76-BFC5-504E3BCD527E@liivid.com> Thanks Stuart. I thought I had read somewhere that rebuild_index built the index in a different location and then swapped it, but after looking at the code (in local_index.rb) this doesn't appear to be the case. That might explain why the ferret server crashes sometimes when a search takes place during a reindex. I wouldn't be doing exactly the same thing as this but this does get me started. I'm concerned about swapping the index files on a live site though. Seems risky so I'll probably try to update the ferret_index member in LocalIndex. Looks like that will work. -Alex On Nov 10, 2007, at 4:36 PM, ferret-talk-request at rubyforge.org wrote: > > Message: 7 > Date: Wed, 7 Nov 2007 17:01:45 -0500 > From: "Stuart Sierra" > Subject: Re: [Ferret-talk] Ferret-talk Digest, Vol 25, Issue 2 > To: ferret-talk at rubyforge.org > Message-ID: > <314ee0450711071401l97b4be8j5e298d7d24383ea3 at mail.gmail.com> > Content-Type: text/plain; charset=UTF-8 > > On 11/7/07, Alex Neth wrote: >> Thanks Jens. Any suggestion on how to get a two index solution >> working with acts_as_ferret? > > I rolled my own with methods in my model class, something like this: > > def self.setup_new_index(location) > config = aaf_configuration[:ferret].dup > config.update(:create => true, :auto_flush => false, > :field_infos => ActsAsFerret::field_infos([self]), > :path => location) > index = Ferret::Index::Index.new(config) > index.logger = Logger.new("#{location}/index.log") > index > end > > > def self.build_new_index(location) > index = setup_new_index(location) > > max = self.maximum(:id) > start = self.minimum(:id) > > start.step(max, increment) do |n| > begin > record = self.find(n) > rescue ActiveRecord::RecordNotFound > next > end > index << record.to_doc if record and record.ferret_enabled? > (true) > end > end > > Then I have a rake task to replace the old index with the new one. > > -Stuart Sierra > columbialawtech.org -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20071110/7106628d/attachment.html From jk at jkraemer.net Sun Nov 11 08:34:49 2007 From: jk at jkraemer.net (Jens Kraemer) Date: Sun, 11 Nov 2007 14:34:49 +0100 Subject: [Ferret-talk] Ferret-talk Digest, Vol 25, Issue 3 In-Reply-To: <33AF33E6-109F-4F76-BFC5-504E3BCD527E@liivid.com> References: <33AF33E6-109F-4F76-BFC5-504E3BCD527E@liivid.com> Message-ID: <20071111133449.GB15113@thunder.jkraemer.net> On Sat, Nov 10, 2007 at 06:41:34PM +0800, Alex Neth wrote: > Thanks Stuart. I thought I had read somewhere that rebuild_index > built the index in a different location and then swapped it, but > after looking at the code (in local_index.rb) this doesn't appear to > be the case. That might explain why the ferret server crashes > sometimes when a search takes place during a reindex. have a look at the rebuild_index implementation in ferret_server.rb, that's the one which is used in DRb mode. And yes, it rebuilds the index in the background while running searches on the old one, so the index swapping logic in there might be a doog starting point for you. Cheers, Jens -- Jens Kr?mer http://www.jkraemer.net/ - Blog http://www.omdb.org/ - The new free film database From lists at kikobu.com Sun Nov 11 10:32:58 2007 From: lists at kikobu.com (Morten) Date: Sun, 11 Nov 2007 16:32:58 +0100 Subject: [Ferret-talk] undefined method `add' Message-ID: We've been running into problems with ferret indexing lately. The problem is intermittent and some times it persists. Just got this after wiping the index and redeploying: NoMethodError (undefined method `add' for Solution:Class): (druby://10.1.65.87:9009) /data/releases/20071111152414/vendor/rails/activerecord/lib/active_record/base.rb:1238:in `method_missing' (druby://10.1.65.87:9009) /data/releases/20071111152414/vendor/plugins/acts_as_ferret/lib/ferret_server.rb:71:in `send' (druby://10.1.65.87:9009) /data/releases/20071111152414/vendor/plugins/acts_as_ferret/lib/ferret_server.rb:71:in `method_missing' /data/releases/20071111152414/vendor/plugins/acts_as_ferret/lib/remote_index.rb:31:in `<<' /data/releases/20071111152414/vendor/plugins/acts_as_ferret/lib/instance_methods.rb:73:in `ferret_create' I'm running the latest stable version of AAF. Any tips or work arounds much appreciated. Morten From lists at kikobu.com Sun Nov 11 10:09:32 2007 From: lists at kikobu.com (Morten) Date: Sun, 11 Nov 2007 16:09:32 +0100 Subject: [Ferret-talk] Reducing dependency on remote ferret process Message-ID: Hi. We use FerretDrb for search. If the ferret process is down, our entire application comes down the moment we try to save a model which is indexed. Is there a way to decouple this relationship such that we can somehow resume normal operations despite ferret being down and not index the model? Thanks. Morten From hongli at plan99.net Sun Nov 11 11:19:34 2007 From: hongli at plan99.net (Hongli Lai) Date: Sun, 11 Nov 2007 17:19:34 +0100 Subject: [Ferret-talk] Reducing dependency on remote ferret process In-Reply-To: References: Message-ID: <47372B96.8010303@plan99.net> Morten wrote: > Hi. > > We use FerretDrb for search. If the ferret process is down, our entire > application comes down the moment we try to save a model which is indexed. > > Is there a way to decouple this relationship such that we can somehow > resume normal operations despite ferret being down and not index the model? > > Thanks. > > Morten I really don't understand your concern. I could also say "if the web server process is down, our entire application is down" (assuming you're talking about a web app). The FerretDrb process shouldn't be down. If you continue even if it's down, your index will become out of date. Depending on your data that may or may not be worse than crashing. From lists at kikobu.com Sun Nov 11 16:17:53 2007 From: lists at kikobu.com (Morten) Date: Sun, 11 Nov 2007 22:17:53 +0100 Subject: [Ferret-talk] Reducing dependency on remote ferret process In-Reply-To: <47372B96.8010303@plan99.net> References: <47372B96.8010303@plan99.net> Message-ID: Hongli Lai wrote: > Morten wrote: >> Hi. >> >> We use FerretDrb for search. If the ferret process is down, our entire >> application comes down the moment we try to save a model which is indexed. >> >> Is there a way to decouple this relationship such that we can somehow >> resume normal operations despite ferret being down and not index the model? >> >> Thanks. >> >> Morten > > I really don't understand your concern. I could also say "if the web > server process is down, our entire application is down" (assuming you're > talking about a web app). The FerretDrb process shouldn't be down. If > you continue even if it's down, your index will become out of date. > Depending on your data that may or may not be worse than crashing. I don't think your comparison is quite fair. Ferret is nice, but it's not fully matured compared to Apache, MySQL and so on. At least I'm having more stability issues with it than I've had with the other processes that I base my work on, which is why I think my concern is completely valid. Br, Morten From bk at benjaminkrause.com Sun Nov 11 16:36:54 2007 From: bk at benjaminkrause.com (Benjamin Krause) Date: Sun, 11 Nov 2007 22:36:54 +0100 Subject: [Ferret-talk] Reducing dependency on remote ferret process In-Reply-To: References: Message-ID: <920060E7-4234-40F2-AF2D-5D40ECE30D61@benjaminkrause.com> Hey .. unfortunately, no .. not with the current construction. However, there might be a chance to switch to a messaging service like ap4r, so your indexing requests doesn't get lost. I think there are some considerations about re-factoring the drb server, so maybe this dependency might be dropped in the future.. Cheers Ben On 2007-11-11, at 16:09, Morten wrote: > > Hi. > > We use FerretDrb for search. If the ferret process is down, our entire > application comes down the moment we try to save a model which is > indexed. > > Is there a way to decouple this relationship such that we can somehow > resume normal operations despite ferret being down and not index the > model? > > Thanks. > > Morten > > > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk Gruss Ben --- Benjamin Krause http://www.omdb.org/ bk at benjaminkrause.com Rails-Schulung "Advancing with Rails" mit David A. Black 19.11.-22.11.2007, Berlin-Mitte Details u. Anmeldung: http://www.railsschulung.de -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20071111/aeb7c5d3/attachment.html From julioody at gmail.com Sun Nov 11 17:03:58 2007 From: julioody at gmail.com (Julio Cesar Ody) Date: Mon, 12 Nov 2007 09:03:58 +1100 Subject: [Ferret-talk] Reducing dependency on remote ferret process In-Reply-To: <920060E7-4234-40F2-AF2D-5D40ECE30D61@benjaminkrause.com> References: <920060E7-4234-40F2-AF2D-5D40ECE30D61@benjaminkrause.com> Message-ID: Let me take a wild guess on this one. On ACTS_AS_FERRET_GEM_ROOT/lib/index.rb def ferret_create if ferret_enabled? logger.debug "ferret_create/update: #{self.class.name} : #{self.id}" self.class.aaf_index << self else ferret_enable if @ferret_disabled == :once end true # signal success to AR end Try wrapping "aaf_index<<" like this begin self.class.aaf_index << self rescue Exception => e logger.warn "Error creating/updating document: #{e.inspect}\n#{e.backtrace.join("\n\t")}" end Mind I'm just reading the source and writing the code here straight away. In the event of my theory being right, this would gracefully handle exceptions related to adding an entry to the index by dropping a warning in the AAF log file and moving on. I think this could be an optional in config/initializers (Rails 2.0) perhaps, as in config.aaf.exception_on_save = true IMHO, of course. On Nov 12, 2007 8:36 AM, Benjamin Krause wrote: > Hey .. > > unfortunately, no .. not with the current construction. > However, there might be a chance to switch to a > messaging service like ap4r, so your indexing > requests doesn't get lost. > > I think there are some considerations about re-factoring > the drb server, so maybe this dependency might be > dropped in the future.. > > Cheers > Ben > > > > > > > On 2007-11-11, at 16:09, Morten wrote: > > Hi. > > We use FerretDrb for search. If the ferret process is down, our entire > application comes down the moment we try to save a model which is indexed. > > Is there a way to decouple this relationship such that we can somehow > resume normal operations despite ferret being down and not index the model? > > Thanks. > > Morten > > > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > > > > Gruss > Ben > --- > Benjamin Krause > http://www.omdb.org/ > bk at benjaminkrause.com > > Rails-Schulung "Advancing with Rails" mit David A. Black > 19.11.-22.11.2007, Berlin-Mitte > Details u. Anmeldung: http://www.railsschulung.de > > > > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > From lists at kikobu.com Sun Nov 11 17:24:43 2007 From: lists at kikobu.com (Morten) Date: Sun, 11 Nov 2007 23:24:43 +0100 Subject: [Ferret-talk] undefined method `add' In-Reply-To: References: Message-ID: Hi, I *think* I'm getting closer to what's going on with this problem. Basically, the models that we're experiencing this with, are subclasses (Rails STI), such that: class Entry < AR::Base acts_as_ferret end class Solution < Entry end class Notice < Entry end The problem may appear intermittently, because the subclassed models have not been loadeded correctly somehow, and thus confusing ferret. If I reload the page that causes the problem a few times, things usually begin working. I suppose one way to do a quick fix would be to explicity require the models in one of the initialization files (eg. environment.rb) such that entry gets required first, and then each of the sub-classes. I'll see if I can reproduce this outside of production. Br, Morten Morten wrote: > We've been running into problems with ferret indexing lately. The > problem is intermittent and some times it persists. Just got this after > wiping the index and redeploying: > > NoMethodError (undefined method `add' for Solution:Class): > (druby://10.1.65.87:9009) > /data/releases/20071111152414/vendor/rails/activerecord/lib/active_record/base.rb:1238:in > `method_missing' > (druby://10.1.65.87:9009) > /data/releases/20071111152414/vendor/plugins/acts_as_ferret/lib/ferret_server.rb:71:in > `send' > (druby://10.1.65.87:9009) > /data/releases/20071111152414/vendor/plugins/acts_as_ferret/lib/ferret_server.rb:71:in > `method_missing' > > /data/releases/20071111152414/vendor/plugins/acts_as_ferret/lib/remote_index.rb:31:in > `<<' > > /data/releases/20071111152414/vendor/plugins/acts_as_ferret/lib/instance_methods.rb:73:in > `ferret_create' > > > I'm running the latest stable version of AAF. Any tips or work arounds > much appreciated. > > Morten From lists at kikobu.com Mon Nov 12 04:00:16 2007 From: lists at kikobu.com (Morten) Date: Mon, 12 Nov 2007 10:00:16 +0100 Subject: [Ferret-talk] undefined method `add' In-Reply-To: References: Message-ID: Well, that wasn't it. It appears to happen for top level classes in the inheritance hierarchy as well and for classes that are not even subclassed. It only happens once per class type immediatly after restarting the Ferret backgroundrb process, after which things begin working. It does not help to require the classes in environment.rb Any suggestions? Thanks. Morten Morten wrote: > Hi, > > I *think* I'm getting closer to what's going on with this problem. > > Basically, the models that we're experiencing this with, are subclasses > (Rails STI), such that: > > class Entry < AR::Base > acts_as_ferret > end > > class Solution < Entry > end > > class Notice < Entry > end > > The problem may appear intermittently, because the subclassed models > have not been loadeded correctly somehow, and thus confusing ferret. If > I reload the page that causes the problem a few times, things usually > begin working. > > I suppose one way to do a quick fix would be to explicity require the > models in one of the initialization files (eg. environment.rb) such that > entry gets required first, and then each of the sub-classes. > > I'll see if I can reproduce this outside of production. > > Br, > > Morten > > > > > > Morten wrote: >> We've been running into problems with ferret indexing lately. The >> problem is intermittent and some times it persists. Just got this after >> wiping the index and redeploying: >> >> NoMethodError (undefined method `add' for Solution:Class): >> (druby://10.1.65.87:9009) >> /data/releases/20071111152414/vendor/rails/activerecord/lib/active_record/base.rb:1238:in >> `method_missing' >> (druby://10.1.65.87:9009) >> /data/releases/20071111152414/vendor/plugins/acts_as_ferret/lib/ferret_server.rb:71:in >> `send' >> (druby://10.1.65.87:9009) >> /data/releases/20071111152414/vendor/plugins/acts_as_ferret/lib/ferret_server.rb:71:in >> `method_missing' >> >> /data/releases/20071111152414/vendor/plugins/acts_as_ferret/lib/remote_index.rb:31:in >> `<<' >> >> /data/releases/20071111152414/vendor/plugins/acts_as_ferret/lib/instance_methods.rb:73:in >> `ferret_create' >> >> >> I'm running the latest stable version of AAF. Any tips or work arounds >> much appreciated. >> >> Morten From lists at kikobu.com Mon Nov 12 08:27:58 2007 From: lists at kikobu.com (Morten) Date: Mon, 12 Nov 2007 14:27:58 +0100 Subject: [Ferret-talk] undefined method `add' In-Reply-To: References: Message-ID: On the first request, which breaks, the following gets written to the ferret_server.log: call index method: add with [# References: Message-ID: The underlying problem is bad unmarshalling of the Ferret::Document that gets sent to the DRb server. In ferret_server.rb: rescue NoMethodError @logger.debug "no luck, trying to call class method instead" Using rescue NoMethodError => e and then include e.message in the debug output, reveals: undefined method `to_doc' for # I'm pretty blank as to why Ferret::Document does not get properly unmarshalled on the initial request. If I change the add method to attempt a DRb reload in local_index.rb (line ~139): def add(record) if record.is_a?(DRb::DRbUnknown) record = record.reload logger.warn("Reloaded DRb::DRbUnknown to #{record.class.name}") end record = record.to_doc unless Hash === record || Ferret::Document === record ferret_index << record end Then I do indeed get a Document instance back, ie. I have a work around. But why does this work around work? Does the unmarshalling process occur before the relevant classes get loaded in the initial request? I'll patch up my local AAF to use this work around, but as it does not solve the actual root problem, I guess it's not interesting as a patch submission. Br, Morten From phedre at gmail.com Mon Nov 12 09:48:34 2007 From: phedre at gmail.com (claudia) Date: Mon, 12 Nov 2007 09:48:34 -0500 Subject: [Ferret-talk] Problem with stemming and AAF In-Reply-To: <17665fee0711120645td8cbbdm88c6e11916c02a53@mail.gmail.com> References: <5d302a7b0711091141s433c009cpd45cb5db392a244d@mail.gmail.com> <20071110083618.GJ2341@thunder.jkraemer.net> <17665fee0711120645td8cbbdm88c6e11916c02a53@mail.gmail.com> Message-ID: <17665fee0711120648i15cd2414h5135bdca3dbaa87d@mail.gmail.com> Such a simple solution. That's what I get for spending days staring at the silly thing. Thanks for the help! claudia On 10/11/2007, Jens Kraemer wrote: > the analyzer option belongs to the set of options which aaf directly > passes on to Ferret, and therefore the call has to read: > acts_as_ferret(:fields => { }, > :remote => true, > :ferret => { > :analyzer => StemmedAnalyzer > }) From kraemer at webit.de Mon Nov 12 10:28:47 2007 From: kraemer at webit.de (Jens Kraemer) Date: Mon, 12 Nov 2007 16:28:47 +0100 Subject: [Ferret-talk] undefined method `add' - work around In-Reply-To: References: Message-ID: <20071112152847.GM10556@cordoba.webit.de> Hi Morten, glad you could make this work for you. I'm not sure why you're seeing this strange behaviour, I've never seen this happen before. Cheers, Jens On Mon, Nov 12, 2007 at 03:22:33PM +0100, Morten wrote: > > The underlying problem is bad unmarshalling of the Ferret::Document that > gets sent to the DRb server. > > In ferret_server.rb: > > rescue NoMethodError > @logger.debug "no luck, trying to call class method instead" > > Using rescue NoMethodError => e and then include e.message in the debug > output, reveals: > > undefined method `to_doc' for # > > I'm pretty blank as to why Ferret::Document does not get properly > unmarshalled on the initial request. If I change the add method to > attempt a DRb reload in local_index.rb (line ~139): > > def add(record) > if record.is_a?(DRb::DRbUnknown) > record = record.reload > logger.warn("Reloaded DRb::DRbUnknown to #{record.class.name}") > end > > record = record.to_doc unless Hash === record || Ferret::Document === > record > ferret_index << record > end > > Then I do indeed get a Document instance back, ie. I have a work around. > > But why does this work around work? Does the unmarshalling process occur > before the relevant classes get loaded in the initial request? > > I'll patch up my local AAF to use this work around, but as it does not > solve the actual root problem, I guess it's not interesting as a patch > submission. > > Br, > > Morten > > > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From alain.ravet+ferret at gmail.com Tue Nov 13 07:47:04 2007 From: alain.ravet+ferret at gmail.com (Alain Ravet) Date: Tue, 13 Nov 2007 13:47:04 +0100 Subject: [Ferret-talk] acts_as_ferret : cannot use a customized Analyzer (as indicated in the AdvancedUsageNotes) Message-ID: Hi all, I cannot make aaf (rev. 220) use my custom analyzer, despite following the indications @ http://projects.jkraemer.net/acts_as_ferret/wiki/AdvancedUsage To pinpoint the problem, I created a model + a simple analyzer with 2 stop words : "fax" and "gsm". test 1 : model.rebuild_index + model.find_by_contents("fax") # fax is a stop word. => I get a result when I should not. (note : I delete the index directory => I can see the index is recreated, index/develop ). test 2 : insert a 'raise' in the token_stream() method => it's never thrown. test 3 : use the standard analyzer, to exclude the 2 stop words => same wrong result. class AccessPointKind2 < ActiveRecord::Base set_table_name "access_point_kinds2" acts_as_ferret( {:remote => true, :fields => { :name => {:store => :yes}} } , { :analyzer => Ferret::Analysis::StandardAnalyzer.new(["fax","gsm"]) } ) end Here are the model and the analyzer : MODEL : class AccessPointKind2 < ActiveRecord::Base set_table_name "access_point_kinds2" acts_as_ferret( {:remote => true, :fields => { :name => {:store => :yes}} } , {:analyzer => PlainAsciiAnalyzer.new} ) end ANALYZER lib : plain_ascii_analyzer.rb class PlainAsciiAnalyzer < ::Ferret::Analysis::Analyzer include ::Ferret::Analysis def token_stream(field, str) StopFilter.new( StandardTokenizer.new(str) , ["fax", "gsm"] ) # raise <<<----- is never executed when uncommented !! end end In the console, I rebuild the index + search for a stop word => I get a results, when I should not : >> reload!; AccessPointKind2.rebuild_index ; AccessPointKind2.find_by_contents("gsm").collect &:name Reloading... AccessPointKind2 Columns (0.002963) SHOW FIELDS FROM access_point_kinds2 Asked for a remote server ? true, ENV["FERRET_USE_LOCAL_INDEX"] is nil, looks like we are not the server Will use remote index server which should be available at druby://localhost:9010 default field list: [:name] AccessPointKind2 Load (0.002706) SELECT * FROM access_point_kinds2 WHERE (access_point_kinds2.id in ('7','12','13','8','2')) Query: gsm total hits: 5, results delivered: 5 => ["gsm", "gsm", "gsm(werk)", "gsm(priv?)", "gsm(priv?)"] >> I guess it's obvious, but I cannot see it. Help. Thanks in advance. Alain -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20071113/29bd78c4/attachment.html From jk at jkraemer.net Wed Nov 14 04:25:37 2007 From: jk at jkraemer.net (Jens Kraemer) Date: Wed, 14 Nov 2007 10:25:37 +0100 Subject: [Ferret-talk] acts_as_ferret : cannot use a customized Analyzer (as indicated in the AdvancedUsageNotes) In-Reply-To: References: Message-ID: <20071114092537.GB3558@thunder.jkraemer.net> Hi, I just tried and I'm afraid I couldn't reproduce your problem here (with aaf trunk). I just committed a testcase using StandardAnalyzer with your stop word list, and it works as intended. I also tried with your analyzer class from below, same result. Could you please try the lates aaf from trunk to see if it fixes your problem? Cheers, Jens On Tue, Nov 13, 2007 at 01:47:04PM +0100, Alain Ravet wrote: > Hi all, > > > I cannot make aaf (rev. 220) use my custom analyzer, despite following the > indications @ > > http://projects.jkraemer.net/acts_as_ferret/wiki/AdvancedUsage > > > To pinpoint the problem, I created a model + a simple analyzer with 2 stop > words : "fax" and "gsm". > > test 1 : model.rebuild_index + model.find_by_contents("fax") # fax is a > stop word. > => I get a result when I should not. > > (note : I delete the index directory => I can see the index is recreated, > index/develop > > ). > > test 2 : insert a 'raise' in the token_stream() method => it's never thrown. > > test 3 : use the standard analyzer, to exclude the 2 stop words => same > wrong result. > class AccessPointKind2 < ActiveRecord::Base > > set_table_name "access_point_kinds2" > > acts_as_ferret( > {:remote => true, :fields => { :name => {:store => :yes}} } , > { :analyzer => > Ferret::Analysis::StandardAnalyzer.new(["fax","gsm"]) > } > ) > end > > > > > > Here are the model and the analyzer : > MODEL : > > class AccessPointKind2 < ActiveRecord::Base > set_table_name "access_point_kinds2" > > acts_as_ferret( > {:remote => true, :fields => { :name => {:store => :yes}} } , > {:analyzer => PlainAsciiAnalyzer.new} > ) > end > > > ANALYZER > lib : plain_ascii_analyzer.rb > class PlainAsciiAnalyzer < ::Ferret::Analysis::Analyzer > include ::Ferret::Analysis > def token_stream(field, str) > StopFilter.new( > StandardTokenizer.new(str) , > ["fax", "gsm"] > ) > # raise <<<----- is never executed when uncommented !! > end > end > > > > In the console, I rebuild the index + search for a stop word => I get a > results, when I should not : > > > >> reload!; AccessPointKind2.rebuild_index ; > AccessPointKind2.find_by_contents("gsm").collect &:name > Reloading... > AccessPointKind2 Columns (0.002963) SHOW FIELDS FROM access_point_kinds2 > Asked for a remote server ? true, ENV["FERRET_USE_LOCAL_INDEX"] is nil, > looks like we are not the server > Will use remote index server which should be available at > druby://localhost:9010 > default field list: [:name] > AccessPointKind2 Load (0.002706) SELECT * FROM access_point_kinds2 WHERE > (access_point_kinds2.id in ('7','12','13','8','2')) > Query: gsm > total hits: 5, results delivered: 5 > => ["gsm", "gsm", "gsm(werk)", "gsm(priv?)", "gsm(priv?)"] > >> > > > I guess it's obvious, but I cannot see it. > Help. > > Thanks in advance. > > Alain > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk -- Jens Kr?mer http://www.jkraemer.net/ - Blog http://www.omdb.org/ - The new free film database From alain.ravet+ferret at gmail.com Wed Nov 14 16:51:25 2007 From: alain.ravet+ferret at gmail.com (Alain Ravet) Date: Wed, 14 Nov 2007 22:51:25 +0100 Subject: [Ferret-talk] acts_as_ferret : cannot use a customized Analyzer (as indicated in the AdvancedUsageNotes) In-Reply-To: <20071114092537.GB3558@thunder.jkraemer.net> References: <20071114092537.GB3558@thunder.jkraemer.net> Message-ID: Jens, > I just tried and I'm afraid I couldn't reproduce your problem here (with aaf trunk). ... > Could you please try the lates aaf from trunk to see if it fixes your problem? Same problem after installing the lasted version (262) of aaf : the custop analyzer I pass as an aaf parameter is not used. As a quick test, I tried using the "No Stop Word" custom analyzer as documented @ http://projects.jkraemer.net/acts_as_ferret/wiki/AdvancedUsage on a simple LUT table/model, to no avail. I tried the new syntax with the same wrong result. Setup : * I've installed the latest trunk version of aaf (262) * killed + restarted a (new) DrB server $ ./script/ferret_server -e production start * checked the Ferret version : $ gem list ferret ==> ferret (0.11.4) Test : I created a record where the name is a default stop word >> Country.find 11 Country Load (0.000388) SELECT * FROM countries WHERE (countries.`id` = 11) => # model, way 1 : class Country < ActiveRecord::Base acts_as_ferret( { :fields => [:name] }, { :analyzer => Ferret::Analysis::StandardAnalyzer.new( []) } ) end model, way 2 : class Country < ActiveRecord::Base acts_as_ferret( :fields => [:name] , :remote => true, :ferret => {:analyzer => Ferret::Analysis:: StandardAnalyzer.new([]) } ) end PROBLEM : in both cases it doesn't find any record where the name is 'the' >> reload! ; Country.*rebuild_index* ; Country.*find_by_contents*(" the") >> reload! ; Country.rebuild_index ; Country.find_by_contents ("the") Reloading... Asked for a remote server ? true, ENV["FERRET_USE_LOCAL_INDEX"] is nil, looks like we are not the server Will use remote index server which should be available at druby://localhost:9010 default field list: [:name] Query: the total hits: 0, results delivered: 0 => # I tried with my custom analyser (from the previous message), with the same wrong result. So, it looks like aaf is not using the custom analyzer I declared in the model. It doesn't make any sense to me. Alain Ravet -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20071114/a02e70a9/attachment.html From alain.ravet+ferret at gmail.com Wed Nov 14 16:58:08 2007 From: alain.ravet+ferret at gmail.com (Alain Ravet) Date: Wed, 14 Nov 2007 22:58:08 +0100 Subject: [Ferret-talk] acts_as_ferret : cannot use a customized Analyzer (as indicated in the AdvancedUsageNotes) In-Reply-To: References: <20071114092537.GB3558@thunder.jkraemer.net> Message-ID: remark : some spaces were erroneously inserted before the word "the" when I formatted the email, and are not present in the real code. So > => # > .. > >> reload! ; Country.rebuild_index ; Country.find_by_contents(" the") should read : > => # > .. > >> reload! ; Country.rebuild_index ; Country.find_by_contents("the") From alain.ravet+ferret at gmail.com Wed Nov 14 18:00:04 2007 From: alain.ravet+ferret at gmail.com (Alain Ravet) Date: Thu, 15 Nov 2007 00:00:04 +0100 Subject: [Ferret-talk] acts_as_ferret : cannot use a customized Analyzer (as indicated in the AdvancedUsageNotes) In-Reply-To: <20071114092537.GB3558@thunder.jkraemer.net> References: <20071114092537.GB3558@thunder.jkraemer.net> Message-ID: I'm one step further : - Good : I now know aaf knows about/received the custom analyzer but - Bad : the analyzer is not used by aaf ( : it stops on words it should not stop on) New test : a "no stop word" analyzer, adapted from the german stemming analyser @ http://projects.jkraemer.net/acts_as_ferret/wiki/AdvancedUsage file: model/country.rb ---------------------- class Test2Analyzer < ::Ferret::Analysis::Analyzer include Ferret::Analysis def initialize(stop_words = []) @stop_words = stop_words end def token_stream(field, str) StemFilter.new(StopFilter.new(LowerCaseFilter.new( StandardTokenizer.new(str)), @stop_words), 'de') end end class Country < ActiveRecord::Base acts_as_ferret( :fields => [:name] , :remote => true, :ferret => {:analyzer => Test2Analyzer.new([]) } ) end 0?/ delete the ferret index directory 1?/ restart the console and rebuild the index : ./script/console >> Country.rebuild_index Asked for a remote server ? true, ENV["FERRET_USE_LOCAL_INDEX"] is nil, looks like we are not the server Will use remote index server which should be available at druby://localhost:9010 default field list: [:name] => nil 2?/ confirm that aaf knows about my "no_stop_words" custom analyzer : >> puts Country.aaf_index.to_yaml --- !ruby/object:ActsAsFerret::RemoteIndex config: :fields: - :name :mysql_fast_batches: true :name: countries :class_name: Country :index_dir: /Users/aravet/aaprojets/newgids/newgids_machine/index/development/country :remote: druby://localhost:9010 :reindex_batch_size: 1000 :store_class_name: false :ferret_fields: :name: :store: :no :term_vector: :with_positions_offsets :boost: 1.0 :index: :yes :highlight: :yes :single_index: false :ferret: &id001 :key: :id :auto_flush: true :or_default: false :path: /Users/aravet/aaprojets/newgids/newgids_machine/index/development/country :create_if_missing: true :handle_parse_errors: true :analyzer: !ruby/object:Test2Analyzer <<<<----------- Good stop_words: [] <<<<----------- Good :default_field: - :name :enabled: true ferret_config: *id001 server: !ruby/object:DRb::DRbObject ref: uri: druby://localhost:9010 => nil 3?/ confirm that there is record with name == "the" >> Country.find_by_name "the" Country Load (0.000427) SELECT * FROM countries WHERE (countries.`name` = 'the') LIMIT 1 => # 4?/ try and find "t*" it with aaf => DOES NOT WORK (does not find Country[:name => "the"]) >> Country.find_by_contents "t*" Query: t* total hits: 0, results delivered: 0 => # 5?/ do the same for "t*", a non stop word => IT WORKS (finds Country[:name => "Frankrijk"]) >> Country.find_by_contents "f*" Country Load (0.000420) SELECT * FROM countries WHERE (countries.id in ('2')) Query: f* total hits: 1, results delivered: 1 => #], total_pages1 So, aaf (rev 262) * associates the right custom analyzer with the model, * but doesn't seem to use it when finding_by_contents (? and rebuilding the index ??) Alain -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20071115/1c0510a6/attachment-0001.html From hongli at plan99.net Wed Nov 14 18:24:25 2007 From: hongli at plan99.net (Hongli Lai) Date: Thu, 15 Nov 2007 00:24:25 +0100 Subject: [Ferret-talk] acts_as_ferret : cannot use a customized Analyzer (as indicated in the AdvancedUsageNotes) In-Reply-To: References: <20071114092537.GB3558@thunder.jkraemer.net> Message-ID: <473B83A9.4050509@plan99.net> Alain Ravet wrote: > class Country < ActiveRecord::Base > acts_as_ferret( > :fields => [:name] , > :remote => true, > :ferret => {:analyzer => Test2Analyzer.new([]) } > ) > end Try this: acts_as_ferret({ :fields => [:name], :remote => true }, { :analyzer => Test2Analyzer.new([]) }) From kraemer at webit.de Thu Nov 15 04:07:11 2007 From: kraemer at webit.de (Jens Kraemer) Date: Thu, 15 Nov 2007 10:07:11 +0100 Subject: [Ferret-talk] acts_as_ferret : cannot use a customized Analyzer (as indicated in the AdvancedUsageNotes) In-Reply-To: <473B83A9.4050509@plan99.net> References: <20071114092537.GB3558@thunder.jkraemer.net> <473B83A9.4050509@plan99.net> Message-ID: <20071115090711.GX10556@cordoba.webit.de> On Thu, Nov 15, 2007 at 12:24:25AM +0100, Hongli Lai wrote: > Alain Ravet wrote: > > class Country < ActiveRecord::Base > > acts_as_ferret( > > :fields => [:name] , > > :remote => true, > > :ferret => {:analyzer => Test2Analyzer.new([]) } > > ) > > end > > Try this: > > acts_as_ferret({ :fields => [:name], :remote => true }, > { :analyzer => Test2Analyzer.new([]) }) this won't help, these are both valid ways to call acts_as_ferret. The :ferret syntax is the preferred one, however. Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From kraemer at webit.de Thu Nov 15 04:13:18 2007 From: kraemer at webit.de (Jens Kraemer) Date: Thu, 15 Nov 2007 10:13:18 +0100 Subject: [Ferret-talk] acts_as_ferret : cannot use a customized Analyzer (as indicated in the AdvancedUsageNotes) In-Reply-To: References: <20071114092537.GB3558@thunder.jkraemer.net> Message-ID: <20071115091318.GY10556@cordoba.webit.de> Hi Alain, could you please check the index created by aaf with plain ferret and your custom analyzer to see if your queries deliver the expected results then? That way we should be able to find out if the problem is with indexing or searching through aaf. Jens On Thu, Nov 15, 2007 at 12:00:04AM +0100, Alain Ravet wrote: > I'm one step further : > - Good : I now know aaf knows about/received the custom analyzer > but > - Bad : the analyzer is not used by aaf ( : it stops on words it should > not stop on) > > New test : a "no stop word" analyzer, adapted from the german stemming > analyser @ > http://projects.jkraemer.net/acts_as_ferret/wiki/AdvancedUsage > > > file: model/country.rb > ---------------------- > class Test2Analyzer < ::Ferret::Analysis::Analyzer > include Ferret::Analysis > def initialize(stop_words = []) > @stop_words = stop_words > end > def token_stream(field, str) > StemFilter.new(StopFilter.new(LowerCaseFilter.new( > StandardTokenizer.new(str)), @stop_words), 'de') > end > end > class Country < ActiveRecord::Base > acts_as_ferret( > :fields => [:name] , > :remote => true, > :ferret => {:analyzer => Test2Analyzer.new([]) } > ) > end > > > 0?/ delete the ferret index directory > 1?/ restart the console and rebuild the index : > > > ./script/console > >> Country.rebuild_index > Asked for a remote server ? true, ENV["FERRET_USE_LOCAL_INDEX"] is nil, > looks like we are not the server > Will use remote index server which should be available at > druby://localhost:9010 > default field list: [:name] > => nil > > > 2?/ confirm that aaf knows about my "no_stop_words" custom analyzer : > > >> puts Country.aaf_index.to_yaml > --- !ruby/object:ActsAsFerret::RemoteIndex > config: > :fields: > - :name > :mysql_fast_batches: true > :name: countries > :class_name: Country > :index_dir: > /Users/aravet/aaprojets/newgids/newgids_machine/index/development/country > :remote: druby://localhost:9010 > :reindex_batch_size: 1000 > :store_class_name: false > :ferret_fields: > :name: > :store: :no > :term_vector: :with_positions_offsets > :boost: 1.0 > :index: :yes > :highlight: :yes > :single_index: false > :ferret: &id001 > :key: :id > :auto_flush: true > :or_default: false > :path: > /Users/aravet/aaprojets/newgids/newgids_machine/index/development/country > :create_if_missing: true > :handle_parse_errors: true > :analyzer: !ruby/object:Test2Analyzer <<<<----------- Good > stop_words: [] <<<<----------- Good > :default_field: > - :name > :enabled: true > ferret_config: *id001 > server: !ruby/object:DRb::DRbObject > ref: > uri: druby://localhost:9010 > => nil > > > > > 3?/ confirm that there is record with name == "the" > > >> Country.find_by_name "the" > Country Load (0.000427) SELECT * FROM countries WHERE (countries.`name` > = 'the') LIMIT 1 > => # > > > 4?/ try and find "t*" it with aaf > => DOES NOT WORK (does not find Country[:name => "the"]) > > >> Country.find_by_contents "t*" > Query: t* > total hits: 0, results delivered: 0 > => # @total_hits=0, @results=[], @total_pages=0> > > > 5?/ do the same for "t*", a non stop word > => IT WORKS (finds Country[:name => "Frankrijk"]) > > >> Country.find_by_contents "f*" > Country Load (0.000420) SELECT * FROM countries WHERE (countries.id in > ('2')) > Query: f* > total hits: 1, results delivered: 1 > => # @total_hits=1, @results=[#], total_pages1 > > > So, aaf (rev 262) > * associates the right custom analyzer with the model, > * but doesn't seem to use it when finding_by_contents (? and rebuilding the > index ??) > > > Alain > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From ssmoot at gmail.com Thu Nov 15 09:37:00 2007 From: ssmoot at gmail.com (Sam Smoot) Date: Thu, 15 Nov 2007 08:37:00 -0600 Subject: [Ferret-talk] Ferret/AAF Stability? Message-ID: Hello. I'm the author of DataMapper (http://datamapper.org), and am trying to choose what Full-Text-Indexing engine/plugin I want to include by default. I was hoping you guys could help. :-) Sphinx comes highly recommended, but without live index updates, it just doesn't seem practical for most of my work. I'm most experienced with Solr, but the whole HTTP::Request and general complexity of it is off-putting. I haven't used Ferret in an application yet, but I love what I see so far. The ability to have an in-process server in development, and the clean Ruby API are big wins for me. But I've heard a lot of scary things about corrupted indexes, even when using the DRb server. Is this just FUD? Are there any unresolved issues revolving around corrupted indexes? Can I afford to use Ferret in big applications for Fortune-500 clients? (I know that sounds... pompous really, but it's a genuine concern.) Any advice you could offer would be greatly appreciated. I've also read a few messages about serializing index requests/updates to Ferret through message-queues. Are there any decent guides/blog-posts on this topic? Thanks, -Sam From eimorton at gmail.com Thu Nov 15 10:44:39 2007 From: eimorton at gmail.com (Erik Morton) Date: Thu, 15 Nov 2007 10:44:39 -0500 Subject: [Ferret-talk] Ferret/AAF Stability? In-Reply-To: References: Message-ID: <8A33A7ED-7B45-4C3C-B130-443FE3D6D179@gmail.com> We have several 3GB indexes with approximately 1 million documents in each of them. Here are some quick notes, feel free to reach out with other questions: * no corruption problems that weren't our fault. * there was an issue with large index files (> ~2GB) that was patched, but I'm honestly not sure if it is in the trunk, as the ferret trac/ svn is frequently MIA (which is a concern of course) * the code is clear and fairly easy to follow. AAF is very easy to follow. * I've been very happy with performance of the actual indexing/ searching, however you need to watch out for the processes that are actually doing the synchronization for writes. DRB is a bottleneck for us right now, though our volume isn't high enough that I'd call it a real problem yet. * for moderately high-volume sites you'll want to consider batching index updates "offline", though for large indexes make sure that you have enough IO capacity to optimize the index. We host on EC2 and the $.1/hour instances simply do not have anywhere near the IO capacity to optimize a large index without having _every other process_ waiting for IO. I haven't tested the larger instance types yet. * we love how easy and efficient it is to combine many indexes into one. We index tens of thousands of websites in parallel and then combine 100 or so indexes into one index very quickly. * the mailing list is great. Jens is on top of things, very receptive to new ideas and takes *very* good care of AAF. Haven't seen Dave Balmain in a while. Overall we are happy. There are times when search accuracy questions come up, and frequently the problem is that we are not effectively parsing queries or using the right analyzer for the problem at hand, so RTFM (http://www.oreilly.com/catalog/9780596527853/). That's all I can think of now... Erik On Nov 15, 2007, at 9:37 AM, Sam Smoot wrote: > Hello. I'm the author of DataMapper (http://datamapper.org), and am > trying to choose what Full-Text-Indexing engine/plugin I want to > include by default. I was hoping you guys could help. :-) > > Sphinx comes highly recommended, but without live index updates, it > just doesn't seem practical for most of my work. > > I'm most experienced with Solr, but the whole HTTP::Request and > general complexity of it is off-putting. > > I haven't used Ferret in an application yet, but I love what I see so > far. The ability to have an in-process server in development, and the > clean Ruby API are big wins for me. But I've heard a lot of scary > things about corrupted indexes, even when using the DRb server. Is > this just FUD? Are there any unresolved issues revolving around > corrupted indexes? Can I afford to use Ferret in big applications for > Fortune-500 clients? (I know that sounds... pompous really, but it's a > genuine concern.) > > Any advice you could offer would be greatly appreciated. > > I've also read a few messages about serializing index requests/updates > to Ferret through message-queues. Are there any decent > guides/blog-posts on this topic? > > Thanks, -Sam > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk From bk at benjaminkrause.com Thu Nov 15 13:41:42 2007 From: bk at benjaminkrause.com (Benjamin Krause) Date: Thu, 15 Nov 2007 19:41:42 +0100 Subject: [Ferret-talk] Ferret/AAF Stability? In-Reply-To: References: Message-ID: <73691E3A-A60B-4055-A610-3CCBA56C4622@benjaminkrause.com> Hey .. > I haven't used Ferret in an application yet, but I love what I see so > far. The ability to have an in-process server in development, and the > clean Ruby API are big wins for me. But I've heard a lot of scary > things about corrupted indexes, even when using the DRb server. Is > this just FUD? Are there any unresolved issues revolving around > corrupted indexes? Can I afford to use Ferret in big applications for > Fortune-500 clients? (I know that sounds... pompous really, but it's a > genuine concern.) We're using ferret on omdb.org for 14 month without any problems. There're a few things you might want to work around (Erik pointed some out). If you expect a huge amount of index updates, you need to think about a few infrastructural problems, because right now, AAF does not allow you to cluster indexing servers. but i know there is a solution for that :) If you just have huge amount of search queries, there is no need to worry.. i would not suggest usings AAF's ferret server for searching, though .. but it's quite easy to do the searching in each mongrel, so not concern here either. i guess we need more information about the data you want to index to give more detailed advices. > I've also read a few messages about serializing index requests/updates > to Ferret through message-queues. Are there any decent > guides/blog-posts on this topic? yes, that's currently being worked on .. so there will be some guides later on :) Cheers Ben --- Benjamin Krause http://www.omdb.org/ bk at benjaminkrause.com Rails-Schulung "Advancing with Rails" mit David A. Black 19.11.-22.11.2007, Berlin-Mitte Details u. Anmeldung: http://www.railsschulung.de -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20071115/c58d15dc/attachment.html From john at digitalpulp.com Thu Nov 15 14:00:16 2007 From: john at digitalpulp.com (John Bachir) Date: Thu, 15 Nov 2007 14:00:16 -0500 Subject: [Ferret-talk] Ferret/AAF Stability? In-Reply-To: <73691E3A-A60B-4055-A610-3CCBA56C4622@benjaminkrause.com> References: <73691E3A-A60B-4055-A610-3CCBA56C4622@benjaminkrause.com> Message-ID: <4AFD0835-1E8C-43B1-BB8C-325D011CCF6F@digitalpulp.com> On Nov 15, 2007, at 1:41 PM, Benjamin Krause wrote: > i would not suggest usings AAF's ferret server for searching, > though .. but it's quite easy to do the searching in each mongrel, so > not concern here either. I'm confused... what does "searching" mean in this context? :) John From jjm at codewell.com Thu Nov 15 13:39:46 2007 From: jjm at codewell.com (Jeff Mallatt) Date: Thu, 15 Nov 2007 13:39:46 -0500 Subject: [Ferret-talk] indexing runs out of memory Message-ID: <7.0.1.0.2.20071115133259.03837958@codewell.com> I'm using Ferret to index a whole bunch of stuff at once. Thousands of documents that produce an index which grows to about 1.25Gb. While the indexer is running, I watch the memory use of the Ruby process grow steadily until it, too, is up to about 1.25Gb -- at which point the process crashes printing: [FATAL] failed to allocate memory Does anyone else have any experience with this mode of failure? Should I not try to create the index all at once, but rather do a few documents then close the index then re-open it then do a few more? Or is a 1.25Gb index simply too big to try to create on my machine? TIA From bk at benjaminkrause.com Thu Nov 15 15:04:34 2007 From: bk at benjaminkrause.com (Benjamin Krause) Date: Thu, 15 Nov 2007 21:04:34 +0100 Subject: [Ferret-talk] Ferret/AAF Stability? In-Reply-To: <4AFD0835-1E8C-43B1-BB8C-325D011CCF6F@digitalpulp.com> References: <73691E3A-A60B-4055-A610-3CCBA56C4622@benjaminkrause.com> <4AFD0835-1E8C-43B1-BB8C-325D011CCF6F@digitalpulp.com> Message-ID: <1DBFE5E9-9565-44AB-B7E7-0010848B869B@benjaminkrause.com> John, > On Nov 15, 2007, at 1:41 PM, Benjamin Krause wrote: >> i would not suggest usings AAF's ferret server for searching, >> though .. but it's quite easy to do the searching in each mongrel, so >> not concern here either. > > I'm confused... what does "searching" mean in this context? :) If you're using AAF, you should use the ferret drb server to index your objects. however, using the ferret server means, whenever someone is search (if you're using Model.find_by_contents) the search will be forwarded to the ferret server. The ferret server will process the searching request and send the response back to the mongrel. This overhead isn't necessary, as mongrel could use a local index to do the search. there is no need to bother the ferret server. so, indexing (aka updating, creating, saving, whatever) should use the ferret server, but searching (using find_by_contents) will use the ferret server if you're using standard AAF, even though it's not really necessary and could result in a bottleneck. don't get me wrong. it is totally fine to use standard AAF, unless you're having huge amounts of searches or livesearches. I would not recommend use a custom ferret solution, unless you expect a problem or already have one :) Cheers Ben --- Benjamin Krause http://www.omdb.org/ bk at benjaminkrause.com Rails-Schulung "Advancing with Rails" mit David A. Black 19.11.-22.11.2007, Berlin-Mitte Details u. Anmeldung: http://www.railsschulung.de -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20071115/122a7037/attachment.html From bk at benjaminkrause.com Thu Nov 15 15:12:44 2007 From: bk at benjaminkrause.com (Benjamin Krause) Date: Thu, 15 Nov 2007 21:12:44 +0100 Subject: [Ferret-talk] indexing runs out of memory In-Reply-To: <7.0.1.0.2.20071115133259.03837958@codewell.com> References: <7.0.1.0.2.20071115133259.03837958@codewell.com> Message-ID: <38BBC4CB-F693-499B-82E9-8C1F03B3A7E3@benjaminkrause.com> Jeff, On 2007-11-15, at 19:39, Jeff Mallatt wrote: > [FATAL] failed to allocate memory Yes, closing and reopening the IndexWriter might help. There has been reports about ferret index with 3 or more gigs on this list.. so i don't think this is a general problem. Ben From aquajags at yahoo.com Fri Nov 16 01:56:12 2007 From: aquajags at yahoo.com (Jagdish rao) Date: Thu, 15 Nov 2007 22:56:12 -0800 (PST) Subject: [Ferret-talk] problem with searching plurals (with apostrophe) Message-ID: <784388.66109.qm@web60416.mail.yahoo.com> hello guys, i am using acts_as_ferret plugin(0.4.1 Latest) with ferret gem(0.11.4 Latest) on rails 1.2.5 and ruby 1.8.6(UBUNTU Gutsy) i have this :Stores Model acts_as_ferret :fields => {:name => { :boost => 2 ,:store => :yes}, :short_desc => { :boost => 1.5,:store => :yes }, :tag_list => {:boost => 1 }, :name_for_sort => {:index => :untokenized} } and i search using this code in my Stores controller @products = Store.find_by_contents params[:q].to_s.upcase+"*" for e.g i have a Stores with name as "benhank's coffee outlet" when i search for "benhank" i get the resultant store as expected. but when i search with param as "benhank's" or "benhanks" --- i dont get anyresults. atleast i shud have got the result for search with "benhank's" which is actually what is entered in :name field how can i get this done. pls help i have been trying to understand to use the analysers and tokenisers but couldn't get through.also looking at wildcardquery and fuzzy things thanks jags ____________________________________________________________________________________ Be a better sports nut! Let your teams follow you with Yahoo Mobile. Try it now. http://mobile.yahoo.com/sports;_ylt=At9_qDKvtAbMuh1G1SQtBI7ntAcJ -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20071115/d387f517/attachment.html From kraemer at webit.de Fri Nov 16 04:31:53 2007 From: kraemer at webit.de (Jens Kraemer) Date: Fri, 16 Nov 2007 10:31:53 +0100 Subject: [Ferret-talk] problem with searching plurals (with apostrophe) In-Reply-To: <784388.66109.qm@web60416.mail.yahoo.com> References: <784388.66109.qm@web60416.mail.yahoo.com> Message-ID: <20071116093153.GE10556@cordoba.webit.de> Hi, your problem is pretty much analyzers and tokenization related, so you really should understand what happens there. Tried the Ferret short cut pdf book from o'reilly? In general you'll need a stemming analyzer to strip plural endings from words. Regarding the "'" - it's a question of the tokenizer you're using whether the 's ending is considered to be part of the word it follows, or 's' is interpreted as a term of it's own. Cheers, Jens On Thu, Nov 15, 2007 at 10:56:12PM -0800, Jagdish rao wrote: > > > hello guys, > > i am using acts_as_ferret plugin(0.4.1 Latest) with ferret gem(0.11.4 Latest) > on rails 1.2.5 and ruby 1.8.6(UBUNTU Gutsy) > i have this > :Stores Model > acts_as_ferret :fields => {:name => { :boost => 2 ,:store => :yes}, > :short_desc => { :boost => 1.5,:store => > :yes }, > :tag_list => {:boost => 1 }, > :name_for_sort => {:index => :untokenized} > } > > and i search using this code in my Stores controller > > @products = Store.find_by_contents params[:q].to_s.upcase+"*" > > for e.g i have a Stores with name as > "benhank's coffee outlet" > > when i search for "benhank" i get the resultant store as expected. > but when i search with param as "benhank's" or "benhanks" --- > i dont get anyresults. > atleast i shud have got the result for search with "benhank's" > which is actually what is entered in :name field > > how can i get this done. pls help > i have been trying to understand to use the analysers and tokenisers > but couldn't get through.also looking at wildcardquery and fuzzy things > > thanks > jags > > > > > > ____________________________________________________________________________________ > Be a better sports nut! Let your teams follow you > with Yahoo Mobile. Try it now. http://mobile.yahoo.com/sports;_ylt=At9_qDKvtAbMuh1G1SQtBI7ntAcJ > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From scottd at gmail.com Fri Nov 16 05:56:26 2007 From: scottd at gmail.com (Scott Davies) Date: Fri, 16 Nov 2007 02:56:26 -0800 Subject: [Ferret-talk] Multithreading / multiprocessing woes Message-ID: <75f591160711160256i68802afcg8af5ae4e95c67636@mail.gmail.com> I've been running some multithreaded tests on Ferret. Using a single Ferret::Index::Index inside a DRb server, it definitely behaves for me as if all readers are locked out of the index when writing is going on in that index, not just optimization -- at least when segment merging happens, which is when the writes take the longest and you can therefore least afford to lock out all reads. This is very easy to notice when you add, say, your 100,000th document to the index, and that one write takes over 5 seconds to complete because it triggers a bunch of incremental segment-merging, and all queries to the index stall in the meantime. Or when you add your millionth document, which can stall all reads for over a minute. :-( When I try to use an IndexReader in a separate process, things are even worse. The IndexReader doesn't see any updates to the index since it was created. Not too surprising, but if I try creating a new IndexReader for every query, and have the Index in the other writing process turn on auto_flush, then the reading process crashes after a few (generally fewer than 100) queries, in one of at least two different ways selected apparently at random: Failure Mode #1: script/ferret_speedtest2_reader:30:in `initialize': IO Error occured at :93 in xraise (IOError) Error occured in index.c:901 - sis_find_segments_file Error reading the segment infos. Store listing was from script/ferret_speedtest2_reader:30:in `new' from script/ferret_speedtest2_reader:30:in `run_test_query' [Yes, there really are two blank lines after "Store listing was".] Failure Mode #2: script/ferret_speedtest2_reader:30:in `initialize': IO Error occured at :93 in xraise (IOError) Error occured in fs_store.c:127 - fs_each doing 'each' in /Users/scott/dev/ruby/timetracker/tmp/ferret_speedtest_index: from script/ferret_speedtest2_reader:30:in `new' from script/ferret_speedtest2_reader:30:in `run_test_query' Meanwhile, if I try eliminating this second failure mode by explicitly calling close on the IndexReader before I throw it away, the close immediately crashes with: script/ferret_speedtest2_reader:45: [BUG] Bus Error ruby 1.8.6 (2007-03-13) [i686-darwin8.8.5] Abort trap Given the combination of problems above, I'm at a loss to understand how to use Ferret on a live website that requires reasonably fast turnaround between a user submitting data and the user being able to search over that data, unless either (1) the site only gets a few thousand new index entries per day and the site can be taken down for a few minutes daily to optimize the index, or (2) it's OK for the entire site to periodically stall on all queries for seconds or even minutes whenever segment-merging happens to kick in. Do all Ferret users just suck it up and live with one of these limitations, or am I missing something and/or just getting "lucky" with the errors above? For reference, the system being used here is a Mac running Leopard, although I doubt that matters... From bk at benjaminkrause.com Fri Nov 16 07:12:34 2007 From: bk at benjaminkrause.com (Benjamin Krause) Date: Fri, 16 Nov 2007 13:12:34 +0100 Subject: [Ferret-talk] Multithreading / multiprocessing woes In-Reply-To: <75f591160711160256i68802afcg8af5ae4e95c67636@mail.gmail.com> References: <75f591160711160256i68802afcg8af5ae4e95c67636@mail.gmail.com> Message-ID: Scott, > Do all Ferret users just suck it up and live with one of these > limitations, or am I missing something and/or just getting "lucky" > with the errors above? This limitations you're talking about are known and will be fixed in the near future.. the trick is, to have one read-only and one write-only index.. This is currently being worked on. If you need a fix right now, you need to do it yourself but you can take a look on omdb's code and how it's done there: http://bugs.omdb.org/browser/branches/2007.1/lib/omdb/ferret/lib/util.rb (see the switch code) If you don't need a fix right now, i'm sure AAF will come up with a solution for that in the near future (aka probably not this year). on a side note.. for the to many open files error, see: http://ferret.davebalmain.com/api/classes/Ferret/Index/IndexWriter.html (use_compound_file, you may have set this to false) or simply increase the number of open files. On omdb we're running with 32k :-) rails at homer.omdb.org ~ $ ulimit -n 32768 Cheers Ben From pjones at pmade.com Fri Nov 16 11:24:01 2007 From: pjones at pmade.com (Peter Jones) Date: Fri, 16 Nov 2007 09:24:01 -0700 Subject: [Ferret-talk] Reducing dependency on remote ferret process In-Reply-To: References: Message-ID: Morten, If you're still looking at how to solve this, here is what I did. This is just a hack, but I didn't really have a choice, this coupling was killing my entire application stack. --- act_methods.rb (revision 1534) +++ act_methods.rb (working copy) @@ -185,9 +185,10 @@ end logger.info "default field list: #{aaf_configuration[:ferret] [:default_field].inspect}" - if options[:remote] - aaf_index.ensure_index_exists - end + # FIXME fix and send a patch to the AAF team + # if options[:remote] + # aaf_index.ensure_index_exists + # end end -- Peter Jones - 303-669-2637 pmade inc. - http://pmade.com On Nov 11, 2007, at 08:09, Morten wrote: > We use FerretDrb for search. If the ferret process is down, our entire > application comes down the moment we try to save a model which is > indexed. > > Is there a way to decouple this relationship such that we can somehow > resume normal operations despite ferret being down and not index the > model? From mail at stuartsierra.com Fri Nov 16 12:19:10 2007 From: mail at stuartsierra.com (Stuart Sierra) Date: Fri, 16 Nov 2007 12:19:10 -0500 Subject: [Ferret-talk] Ferret/AAF Stability? In-Reply-To: References: Message-ID: <314ee0450711160919y7eaad1cl16ef0bc349b09dee@mail.gmail.com> On Nov 15, 2007 9:37 AM, Sam Smoot wrote: > Hello. I'm the author of DataMapper (http://datamapper.org), and am > trying to choose what Full-Text-Indexing engine/plugin I want to > include by default. I was hoping you guys could help. :-) > > Sphinx comes highly recommended, but without live index updates, it > just doesn't seem practical for most of my work. > > I'm most experienced with Solr, but the whole HTTP::Request and > general complexity of it is off-putting. For a different perspective: I'm in the middle of switching from Ferret to Solr. I like Ferret a lot, and still use it on several sites, but I had some problems with one large site: 1. the patches for large-index support are still in development; 2. each update to Ferret requires rebuilding the index; 3. Ferret doesn't yet support compressed indexes. My other reason for switching is that Rails' ActiveRecord is not well-suited to storing large documents, which made acts_as_ferret less compelling. I was nervous about tackling Solr, but I've found it quite easy to use, and the built-in caching and multithreading make it fast. I think Ferret is adequate for most search tasks, but if (like me) you're building a dedicated search engine, Solr is currently a stronger candidate. -Stuart Sierra From scottd at gmail.com Fri Nov 16 15:35:36 2007 From: scottd at gmail.com (Scott Davies) Date: Fri, 16 Nov 2007 12:35:36 -0800 Subject: [Ferret-talk] Multithreading / multiprocessing woes In-Reply-To: References: <75f591160711160256i68802afcg8af5ae4e95c67636@mail.gmail.com> Message-ID: <75f591160711161235g692622c5p4b7fb907e9596cdb@mail.gmail.com> Hi Ben -- Thanks much for the quick and helpful reply! Unfortunately, the solution you're using on omdb looks suspect to me, for the same reason that Alex Neth brought up a few days ago on this list: to my knowledge there's no guarantee that rsync will produce a coherent snapshot of the source directory as it was at any one particular instant in time. In fact, I don't see how rsync could both always terminate in finite time and provide such a guarantee, except on exotic filesystems that provide, say, atomic snapshots with copy-on-write capabilities. (Sigh...sometimes I miss the Google File System.) In which case you'd have to disable your site during the rsync in order to prevent corruption, which basically boils down to the "must take site offline daily for a few minutes to deal with this problem" limitation. I'm guessing the rsync is faster than an index optimization, so I guess this might at least cut down on the amount of time the site has to be down, but still...wah. Am I a fool for wondering whether it might ultimately be less painful to try an index server that runs Lucene under a JRuby process? On Nov 16, 2007 4:12 AM, Benjamin Krause wrote: > Scott, > > > Do all Ferret users just suck it up and live with one of these > > limitations, or am I missing something and/or just getting "lucky" > > with the errors above? > > This limitations you're talking about are known and will be fixed > in the near future.. the trick is, to have one read-only and one > write-only index.. This is currently being worked on. If you need > a fix right now, you need to do it yourself but you can take a look > on omdb's code and how it's done there: > > http://bugs.omdb.org/browser/branches/2007.1/lib/omdb/ferret/lib/util.rb > (see the switch code) > > If you don't need a fix right now, i'm sure AAF will come up with > a solution for that in the near future (aka probably not this year). > > on a side note.. for the to many open files error, see: > > http://ferret.davebalmain.com/api/classes/Ferret/Index/IndexWriter.html > (use_compound_file, you may have set this to false) or simply increase > the number of open files. On omdb we're running with 32k :-) > > rails at homer.omdb.org ~ $ ulimit -n > 32768 > > Cheers > Ben > > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > From bk at benjaminkrause.com Fri Nov 16 17:40:03 2007 From: bk at benjaminkrause.com (Benjamin Krause) Date: Fri, 16 Nov 2007 23:40:03 +0100 Subject: [Ferret-talk] Multithreading / multiprocessing woes In-Reply-To: <75f591160711161235g692622c5p4b7fb907e9596cdb@mail.gmail.com> References: <75f591160711160256i68802afcg8af5ae4e95c67636@mail.gmail.com> <75f591160711161235g692622c5p4b7fb907e9596cdb@mail.gmail.com> Message-ID: <0F759925-5885-4389-BD81-95E28665AA90@benjaminkrause.com> Scott, we're using two directories, not one for ferret. One index is the passive index. it is not used for searches, but new indexing requests will be added to that index. so lets call it the indexing-index. all mongrels will use the second directory, lets call it searching-index. Both indexes are almost identical, i'll explain the differences. All out indexing requests are queued. So whenever you want to index something, it will be placed in the queue, and added to the indexing-index. After a certain amount of queue-items added to the index, we're stopping indexing. The queue will be halted. New requests can be added, but nothing will be added to the indexing-index. Now we're rsyncing the indexing-index to all machines, remember, searching is still done in the searching-index, which is outdated, but we don't mind about that :) After rsync is complete, we're switching both directories, so the indexing-index becomes the searching-index and vice versa. Actually we're just switching symlinks, so the this will take almost no time. And even if one of the mongrels still have a filehandle to the old index open, nothing will happen, it is still using the outdated index, but the next request will use the new index. After that, the new indexing-index will be synced from the searching-index. As the searching-index is read-only, there is no risk of corrupting something during the sync. Now we're resuming processing the queue, until we've added our certain amount of queue entries, or the queue is empty. The downside is, that the searching-index is outdated, but not more that a couple of minutes (about 2 minutes on omdb). We didn't have one corrupted index since. There is now downtime whatsoever, and the rsync snapshot will always be coherent. Cheers Ben From scottd at gmail.com Fri Nov 16 19:37:26 2007 From: scottd at gmail.com (Scott Davies) Date: Fri, 16 Nov 2007 16:37:26 -0800 Subject: [Ferret-talk] Multithreading / multiprocessing woes In-Reply-To: <0F759925-5885-4389-BD81-95E28665AA90@benjaminkrause.com> References: <75f591160711160256i68802afcg8af5ae4e95c67636@mail.gmail.com> <75f591160711161235g692622c5p4b7fb907e9596cdb@mail.gmail.com> <0F759925-5885-4389-BD81-95E28665AA90@benjaminkrause.com> Message-ID: <75f591160711161637p74fded32h39d58fc56f29a341@mail.gmail.com> Ben -- Thanks for the detailed explanation! Yes, that does make sense. If I understand it correctly, though, something won't show up in a search until at least one index switch happens after it's been submitted, which means we're talking about a minute or so on average (not just worst-case) from submission to search result, even if the switches are being done constantly (given that each switch takes about two minutes). For my site, I'm really hoping that most content will show up within a second or so of its submission. That simply can't happen if I'm not updating the same index I'm doing searches with. I'd be OK with the turnaround *occasionally* being a minute -- say, while an index optimization or particularly large segment merge happens. But so far it looks to me like the choices with Ferret are either: (1) The *average* time from submission to search result is on the order of minutes. However, searches are always reasonably fast. (Your approach.) (2) The average time from submission to search result is less than a second. However, the *worst-case* times can be minutes, and now all *searches* stall over those minutes as well, which is Bad. If you don't get more than a few thousand submissions per day, you can at least schedule these outages as nightly index optimizations, but you'll have the outages one way or another. (All "same index used for reading + writing" approaches.) I don't think either of these choices is very good for the particular site I have in mind (at least if I'm being optimistic enough about its chances of "taking off" to worry about the possibility of many thousands of submissions / day). Am I correct in my summarization of the two choices with Ferret here, or have I missed something? Anyhow, thanks again! If those two options are in fact what I have, I think I'll run some tests with Lucene/JRuby to see whether that provides a third option as far as performance goes, and report back what sort of issues come up. (My guess is that it'll be moderately painful to set up and that the average throughput will be worse than Ferret's, but that an average submission-to-search-result turnaround time of a second or two will be achievable without the site necessarily going completely down for minutes every now and then. We'll see.) -- Scott On Nov 16, 2007 2:40 PM, Benjamin Krause wrote: > Scott, > > we're using two directories, not one for ferret. One > index is the passive index. it is not used for searches, > but new indexing requests will be added to that index. > so lets call it the indexing-index. > > all mongrels will use the second directory, lets call it > searching-index. Both indexes are almost identical, > i'll explain the differences. > > All out indexing requests are queued. So whenever > you want to index something, it will be placed in the > queue, and added to the indexing-index. After a > certain amount of queue-items added to the index, > we're stopping indexing. The queue will be halted. > New requests can be added, but nothing will be > added to the indexing-index. > > Now we're rsyncing the indexing-index to all machines, > remember, searching is still done in the searching-index, > which is outdated, but we don't mind about that :) > > After rsync is complete, we're switching both directories, > so the indexing-index becomes the searching-index and > vice versa. Actually we're just switching symlinks, so > the this will take almost no time. And even if one of the > mongrels still have a filehandle to the old index open, > nothing will happen, it is still using the outdated index, > but the next request will use the new index. After that, > the new indexing-index will be synced from the > searching-index. As the searching-index is read-only, > there is no risk of corrupting something during the > sync. > > Now we're resuming processing the queue, until we've > added our certain amount of queue entries, or the queue > is empty. > > The downside is, that the searching-index is outdated, > but not more that a couple of minutes (about 2 minutes > on omdb). We didn't have one corrupted index since. > There is now downtime whatsoever, and the rsync snapshot > will always be coherent. > > > Cheers > Ben > > > > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > From erik at ehatchersolutions.com Fri Nov 16 16:13:15 2007 From: erik at ehatchersolutions.com (Erik Hatcher) Date: Fri, 16 Nov 2007 16:13:15 -0500 Subject: [Ferret-talk] Multithreading / multiprocessing woes In-Reply-To: <75f591160711161235g692622c5p4b7fb907e9596cdb@mail.gmail.com> References: <75f591160711160256i68802afcg8af5ae4e95c67636@mail.gmail.com> <75f591160711161235g692622c5p4b7fb907e9596cdb@mail.gmail.com> Message-ID: <8C5DFF2D-440E-4A5A-8D25-5CA965938D02@ehatchersolutions.com> On Nov 16, 2007, at 3:35 PM, Scott Davies wrote: > Am I a fool for wondering whether it might ultimately be less painful > to try an index server that runs Lucene under a JRuby process? Or, rather, an index server that runs Solr accessed with a pure Ruby, solr-ruby, API (which works with MRI or JRuby)? :) Erik From scottd at gmail.com Sat Nov 17 05:12:46 2007 From: scottd at gmail.com (Scott Davies) Date: Sat, 17 Nov 2007 02:12:46 -0800 Subject: [Ferret-talk] Multithreading / multiprocessing woes In-Reply-To: <8C5DFF2D-440E-4A5A-8D25-5CA965938D02@ehatchersolutions.com> References: <75f591160711160256i68802afcg8af5ae4e95c67636@mail.gmail.com> <75f591160711161235g692622c5p4b7fb907e9596cdb@mail.gmail.com> <8C5DFF2D-440E-4A5A-8D25-5CA965938D02@ehatchersolutions.com> Message-ID: <75f591160711170212q1d4d6475v3e830a64ff4c3dc2@mail.gmail.com> Hmmm...I'd first heard of Solr only a couple of days ago, and I hadn't been aware of a Ruby API to it until you mentioned it. Interesting...thanks! On Nov 16, 2007 1:13 PM, Erik Hatcher wrote: > > On Nov 16, 2007, at 3:35 PM, Scott Davies wrote: > > Am I a fool for wondering whether it might ultimately be less painful > > to try an index server that runs Lucene under a JRuby process? > > Or, rather, an index server that runs Solr accessed with a pure Ruby, > solr-ruby, API (which works with MRI or JRuby)? :) > > Erik > > > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > From jk at jkraemer.net Sat Nov 17 07:39:26 2007 From: jk at jkraemer.net (Jens Kraemer) Date: Sat, 17 Nov 2007 13:39:26 +0100 Subject: [Ferret-talk] Ferret/AAF Stability? In-Reply-To: <314ee0450711160919y7eaad1cl16ef0bc349b09dee@mail.gmail.com> References: <314ee0450711160919y7eaad1cl16ef0bc349b09dee@mail.gmail.com> Message-ID: <20071117123925.GO3558@thunder.jkraemer.net> Hi! On Fri, Nov 16, 2007 at 12:19:10PM -0500, Stuart Sierra wrote: [..] > For a different perspective: I'm in the middle of switching from > Ferret to Solr. I like Ferret a lot, and still use it on several > sites, but I had some problems with one large site: > > 1. the patches for large-index support are still in development; Let's hope Dave reads this ;-) However there are several sites I know of with Index sizes > several GB, so they seem to be working well enough. > 2. each update to Ferret requires rebuilding the index; This for sure is annoying but I'd consider this normal for a library that has developed that fast. I think Dave has had very good reasons for each of the changes he did to the index format. Plus I don't think *every* release had a new index format ;-) > 3. Ferret doesn't yet support compressed indexes. At least from the docs it looks like it does, see http://ferret.davebalmain.com/api/classes/Ferret/Index/FieldInfo.html . I didn't ever try this out however. > My other reason for switching is that Rails' ActiveRecord is not > well-suited to storing large documents, which made acts_as_ferret less > compelling. That's a good point, and we plan to make aaf independent from active_record in the future. > I was nervous about tackling Solr, but I've found it quite easy to > use, and the built-in caching and multithreading make it fast. numbers, please :-) > I think Ferret is adequate for most search tasks, but if (like me) > you're building a dedicated search engine, Solr is currently a > stronger candidate. Well, As Solr uses Lucene internally, the mechanics and performance characteristics naturally can't be that different from Ferret. Maybe Ferret has a bug or two and a non-working inter-process locking (which doesn't matter when you think about building a dedicated search server like Solr is, since it's only one process), but the general internal handling of the index is the same, i.e. you can also only have one Writer open to a Lucene index at a time, and Searchers won't see index changes until re-opened, too. Having that said, if my application's main concern would be search, I most probably wouldn't choose any pre-cooked solution like aaf or Solr, but build exactly the thing I need from scratch, basing it either on Lucene or Ferret. But maybe that's just me ;-) Cheers, Jens -- Jens Kr?mer http://www.jkraemer.net/ - Blog http://www.omdb.org/ - The new free film database From jk at jkraemer.net Sat Nov 17 16:50:57 2007 From: jk at jkraemer.net (Jens Kraemer) Date: Sat, 17 Nov 2007 22:50:57 +0100 Subject: [Ferret-talk] Multithreading / multiprocessing woes In-Reply-To: <75f591160711160256i68802afcg8af5ae4e95c67636@mail.gmail.com> References: <75f591160711160256i68802afcg8af5ae4e95c67636@mail.gmail.com> Message-ID: <20071117215057.GP3558@thunder.jkraemer.net> Hi! On Fri, Nov 16, 2007 at 02:56:26AM -0800, Scott Davies wrote: > I've been running some multithreaded tests on Ferret. Using a single > Ferret::Index::Index inside a DRb server, it definitely behaves for me > as if all readers are locked out of the index when writing is going on > in that index, not just optimization -- at least when segment merging > happens, which is when the writes take the longest and you can > therefore least afford to lock out all reads. This is very easy to > notice when you add, say, your 100,000th document to the index, and > that one write takes over 5 seconds to complete because it triggers a > bunch of incremental segment-merging, and all queries to the index > stall in the meantime. Or when you add your millionth document, which > can stall all reads for over a minute. :-( Don't get me wrong, but how often do you think you'll add your millionth document to the index? And even if you really do index a million documents per week - I wouldn't exactly call it bad performance if one or two search requests *per week* take a minute to complete, while all others are completed in less than a second... Having that said, the problem with blocking searches might be possible to solve by not using Ferret's Index class for searching/indexing, but using the lower level APIs (Searcher and IndexWriter) and doing manual synchronization (inside *one* process). I didn't feel the need to implement this for aaf (yet ;-), since I think it's already fast enough to not be the bottleneck in most real world usage scenarios (say - typical Rails apps using aaf for full text search). > When I try to use an IndexReader in a separate process, things are > even worse. The IndexReader doesn't see any updates to the index > since it was created. Not too surprising, but if I try creating a new > IndexReader for every query, and have the Index in the other writing > process turn on auto_flush, then the reading process crashes after a > few (generally fewer than 100) queries, in one of at least two > different ways selected apparently at random: [..] Stick to the one-process-per-index rule to be on the safe side. > Given the combination of problems above, I'm at a loss to understand > how to use Ferret on a live website that requires reasonably fast > turnaround between a user submitting data and the user being able to > search over that data, unless either (1) the site only gets a few > thousand new index entries per day and the site can be taken down for > a few minutes daily to optimize the index, or (2) it's OK for the > entire site to periodically stall on all queries for seconds or even > minutes whenever segment-merging happens to kick in. I wouldn't set the limit at a few thousand new documents per day, and also optimizing daily is only useful if you're having lots of document deletions per day. Cheers, Jens PS: If you happen to benchmark Solr against aaf's DRb server, be sure to let us know your findings :-) -- Jens Kr?mer http://www.jkraemer.net/ - Blog http://www.omdb.org/ - The new free film database From ndaniels at mac.com Sat Nov 17 18:27:29 2007 From: ndaniels at mac.com (Noah M. Daniels) Date: Sat, 17 Nov 2007 18:27:29 -0500 Subject: [Ferret-talk] crash while building index Message-ID: Hi, I'm trying to reindex a model (I'm using acts_as_ferret) after having added (via metaprogramming) a large number of fields (several hundred) to the index. It keeps crashing when trying to rebuild the index (the crash log is below, from ferret_server.out) but it only seems to crash on Linux (Ubuntu server 7.04, x86-64) whereas it's fine on my OS X laptop (10.5.1). This is with ferret 0.11.4 in both cases. Any thoughts? Is there a hard field limit in ferret? *** glibc detected *** ruby: realloc(): invalid next size: 0x000000000232ffc0 *** ======= Backtrace: ========= /lib/libc.so.6[0x2ae17c1a549d] /lib/libc.so.6(realloc+0x124)[0x2ae17c1a74e4] /usr/lib/libruby1.8.so.1.8(ruby_xrealloc+0x5c)[0x2ae17b5baf8c] /usr/lib/ruby/gems/1.8/gems/ferret-0.11.4/lib/ferret_ext.so(mp_alloc +0xb6)[0x2ae18094c886] /usr/lib/ruby/gems/1.8/gems/ferret-0.11.4/lib/ ferret_ext.so(dw_get_fld_inv+0xf7)[0x2ae1809732b7] /usr/lib/ruby/gems/1.8/gems/ferret-0.11.4/lib/ferret_ext.so(dw_add_doc +0x86)[0x2ae18097a146] /usr/lib/ruby/gems/1.8/gems/ferret-0.11.4/lib/ferret_ext.so(iw_add_doc +0x24)[0x2ae18097a284] /usr/lib/ruby/gems/1.8/gems/ferret-0.11.4/lib/ ferret_ext.so[0x2ae1809384a3] /usr/lib/libruby1.8.so.1.8[0x2ae17b5a40ce] /usr/lib/libruby1.8.so.1.8[0x2ae17b5a45d8] /usr/lib/libruby1.8.so.1.8[0x2ae17b5ac541] /usr/lib/libruby1.8.so.1.8[0x2ae17b5af52e] /usr/lib/libruby1.8.so.1.8[0x2ae17b5ac6f0] /usr/lib/libruby1.8.so.1.8[0x2ae17b5af52e] /usr/lib/libruby1.8.so.1.8[0x2ae17b5ac6f0] /usr/lib/libruby1.8.so.1.8[0x2ae17b5ad207] /usr/lib/libruby1.8.so.1.8[0x2ae17b5a40ea] /usr/lib/libruby1.8.so.1.8[0x2ae17b5a45d8] /usr/lib/libruby1.8.so.1.8[0x2ae17b5ac67f] /usr/lib/libruby1.8.so.1.8[0x2ae17b5ad8dd] /usr/lib/libruby1.8.so.1.8[0x2ae17b5acfb1] /usr/lib/libruby1.8.so.1.8[0x2ae17b5a40ea] /usr/lib/libruby1.8.so.1.8[0x2ae17b5a45d8] /usr/lib/libruby1.8.so.1.8[0x2ae17b5ac541] /usr/lib/libruby1.8.so.1.8[0x2ae17b5ad8dd] /usr/lib/libruby1.8.so.1.8[0x2ae17b5a40ea] /usr/lib/libruby1.8.so.1.8[0x2ae17b5a45d8] /usr/lib/libruby1.8.so.1.8[0x2ae17b5ac541] /usr/lib/libruby1.8.so.1.8[0x2ae17b5af52e] /usr/lib/libruby1.8.so.1.8(rb_ary_each+0x23)[0x2ae17b58a853] /usr/lib/libruby1.8.so.1.8[0x2ae17b5a40ce] /usr/lib/libruby1.8.so.1.8[0x2ae17b5a45d8] /usr/lib/libruby1.8.so.1.8[0x2ae17b5ac541] /usr/lib/libruby1.8.so.1.8[0x2ae17b5ad8dd] /usr/lib/libruby1.8.so.1.8[0x2ae17b5af52e] /usr/lib/libruby1.8.so.1.8[0x2ae17b5ac6f0] /usr/lib/libruby1.8.so.1.8[0x2ae17b5a40ea] /usr/lib/libruby1.8.so.1.8[0x2ae17b5a45d8] /usr/lib/libruby1.8.so.1.8[0x2ae17b5ac67f] /usr/lib/libruby1.8.so.1.8[0x2ae17b5ad8dd] /usr/lib/libruby1.8.so.1.8[0x2ae17b5ac44c] /usr/lib/libruby1.8.so.1.8[0x2ae17b5aaa57] /usr/lib/libruby1.8.so.1.8[0x2ae17b5af52e] /usr/lib/libruby1.8.so.1.8[0x2ae17b5ac6f0] /usr/lib/libruby1.8.so.1.8[0x2ae17b5af52e] /usr/lib/libruby1.8.so.1.8[0x2ae17b5cdd53] /usr/lib/libruby1.8.so.1.8[0x2ae17b5a40ce] /usr/lib/libruby1.8.so.1.8[0x2ae17b5a45d8] /usr/lib/libruby1.8.so.1.8[0x2ae17b5ac541] /usr/lib/libruby1.8.so.1.8[0x2ae17b5ad8dd] /usr/lib/libruby1.8.so.1.8[0x2ae17b5af52e] /usr/lib/libruby1.8.so.1.8[0x2ae17b5ac6f0] /usr/lib/libruby1.8.so.1.8[0x2ae17b5acfb1] /usr/lib/libruby1.8.so.1.8[0x2ae17b5ad207] /usr/lib/libruby1.8.so.1.8[0x2ae17b5a40ea] /usr/lib/libruby1.8.so.1.8[0x2ae17b5a45d8] /usr/lib/libruby1.8.so.1.8[0x2ae17b5ac541] /usr/lib/libruby1.8.so.1.8[0x2ae17b5ae19f] /usr/lib/libruby1.8.so.1.8[0x2ae17b5abb23] /usr/lib/libruby1.8.so.1.8[0x2ae17b5aaa57] /usr/lib/libruby1.8.so.1.8[0x2ae17b5acfb1] /usr/lib/libruby1.8.so.1.8[0x2ae17b5ad207] /usr/lib/libruby1.8.so.1.8[0x2ae17b5a40ea] ======= Memory map: ======== 00400000-00401000 r-xp 00000000 fe:00 6148859 /usr/bin/ruby1.8 00600000-00601000 rw-p 00000000 fe:00 6148859 /usr/bin/ruby1.8 00601000-02cb4000 rw-p 00601000 00:00 0 [heap] 40000000-40001000 ---p 40000000 00:00 0 40001000-40801000 rw-p 40001000 00:00 0 2aaaaaaac000-2aaaaaaaf000 rw-p 2aaaaaaac000 00:00 0 2aaaaaaaf000-2aaaaaaea000 r--p 00000000 fe:00 6178429 /usr/lib/locale/en_US.utf8/LC_CTYPE 2aaaaaaea000-2aaaaaaf1000 r--s 00000000 fe:00 6146048 /usr/lib/gconv/gconv-modules.cache 2aaaaaaf1000-2aaaaaaf9000 r-xp 00000000 fe:00 6294630 /usr/lib/ruby/gems/1.8/gems/postgres-0.7.1/ postgres.so 2aaaaaaf9000-2aaaaacf8000 ---p 00008000 fe:00 6294630 /usr/lib/ruby/gems/1.8/gems/postgres-0.7.1/ postgres.so 2aaaaacf8000-2aaaaacf9000 rw-p 00007000 fe:00 6294630 /usr/lib/ruby/gems/1.8/gems/postgres-0.7.1/ postgres.so 2aaaaacf9000-2aaaaacfd000 rw-p 2aaaaacf9000 00:00 0 2aaaaacff000-2aaaaad1e000 r-xp 00000000 fe:00 6149409 /usr/lib/libpq.so.5.0 2aaaaad1e000-2aaaaaf1e000 ---p 0001f000 fe:00 6149409 /usr/lib/libpq.so.5.0 2aaaaaf1e000-2aaaaaf20000 rw-p 0001f000 fe:00 6149409 /usr/lib/libpq.so.5.0 2aaaaaf20000-2aaaaaf34000 r-xp 00000000 fe:00 29212699 /lib/libnsl-2.5.so 2aaaaaf34000-2aaaab134000 ---p 00014000 fe:00 29212699 /lib/libnsl-2.5.so 2aaaab134000-2aaaab136000 rw-p 00014000 fe:00 29212699 /lib/libnsl-2.5.so 2aaaab136000-2aaaab138000 rw-p 2aaaab136000 00:00 0 2aaaab138000-2aaaab1bb000 r-xp 00000000 fe:00 6148718 /usr/lib/libkrb5.so.3.2 2aaaab1bb000-2aaaab3ba000 ---p 00083000 fe:00 6148718 /usr/lib/libkrb5.so.3.2 2aaaab3ba000-2aaaab3be000 rw-p 00082000 fe:00 6148718 /usr/lib/libkrb5.so.3.2 2aaaab3be000-2aaaab3c0000 r-xp 00000000 fe:00 29212682 /lib/libcom_err.so.2.1 2aaaab3c0000-2aaaab5bf000 ---p 00002000 fe:00 29212682 /lib/libcom_err.so.2.1 2aaaab5bf000-2aaaab5c0000 rw-p 00001000 fe:00 29212682 /lib/libcom_err.so.2.1 2aaaab5c0000-2aaaab5e3000 r-xp 00000000 fe:00 6148715 /usr/lib/libk5crypto.so.3.0 2aaaab5e3000-2aaaab7e2000 ---p 00023000 fe:00 6148715 /usr/lib/libk5crypto.so.3.0 2aaaab7e2000-2aaaab7e4000 rw-p 00022000 fe:00 6148715 /usr/lib/libk5crypto.so.3.0 2aaaab7e4000-2aaaab7f5000 r-xp 00000000 fe:00 29212708 /lib/libresolv-2.5.so 2aaaab7f5000-2aaaab9f5000 ---p 00011000 fe:00 29212708 /lib/libresolv-2.5.so 2aaaab9f5000-2aaaab9f7000 rw-p 00011000 fe:00 29212708 /lib/libresolv-2.5.so 2aaaab9f7000-2aaaab9f9000 rw-p 2aaaab9f7000 00:00 0 2aaaab9f9000-2aaaab9fd000 r-xp 00000000 fe:00 6148719 /usr/lib/libkrb5support.so.0.0 2aaaab9fd000-2aaaabbfc000 ---p 00004000 fe:00 6148719 /usr/lib/libkrb5support.so.0.0 2aaaabbfc000-2aaaabbfd000 rw-p 00003000 fe:00 6148719 /usr/lib/libkrb5support.so.0.0 2aaaabbfd000-2aaaabc04000 r-xp 00000000 fe:00 29212700 /lib/libnss_compat-2.5.so 2aaaabc04000-2aaaabe04000 ---p 00007000 fe:00 29212700 /lib/libnss_compat-2.5.so 2aaaabe04000-2aaaabe06000 rw-p 00007000 fe:00 29212700 /lib/libnss_compat-2.5.so 2aaaabe06000-2aaaabe10000 r-xp 00000000 fe:00 29212704 /lib/libnss_nis-2.5.so 2aaaabe10000-2aaaac00f000 ---p 0000a000 fe:00 29212704 /lib/libnss_nis-2.5.so 2aaaac00f000-2aaaac011000 rw-p 00009000 fe:00 29212704 /lib/libnss_nis-2.5.so 2aaaac011000-2aaaac415000 rw-p 2aaaac011000 00:00 0 2aaaac41b000-2aaaac428000 r-xp 00000000 fe:00 29212686 /lib/libgcc_s.so.1 2aaaac428000-2aaaac628000 ---p 0000d000 fe:00 29212686 /lib/libgcc_s.so.1 2aaaac628000-2aaaac629000 rw-p 0000d000 fe:00 29212686 /lib/libgcc_s.so.1 2aaab0000000-2aaab0021000 rw-p 2aaab0000000 00:00 0 2aaab0021000-2aaab4000000 ---p 2aaab0021000 00:00 0 2ae17b353000-2ae17b36f000 r-xp 00000000 fe:00 29212687 /lib/ld-2.5.so 2ae17b36f000-2ae17b3d4000 rw-p 2ae17b36f000 00:00 0 2ae17b3d5000-2ae17b488000 rw-p 2ae17b3d5000 00:00 0 2ae17b56e000-2ae17b570000 rw-p 0001b000 fe:00 29212687 /lib/ld-2.5.so 2ae17b570000-2ae17b63d000 r-xp 00000000 fe:00 6148857 /usr/lib/libruby1.8.so.1.8.5 2ae17b63d000-2ae17b83c000 ---p 000cd000 fe:00 6148857 /usr/lib/libruby1.8.so.1.8.5 2ae17b83c000-2ae17b841000 rw-p 000cc000 fe:00 6148857 /usr/lib/libruby1.8.so.1.8.5 2ae17b841000-2ae17b85e000 rw-p 2ae17b841000 00:00 0 2ae17b85e000-2ae17b873000 r-xp 00000000 fe:00 29212707 /lib/libpthread-2.5.so 2ae17b873000-2ae17ba73000 ---p 00015000 fe:00 29212707 /lib/libpthread-2.5.so 2ae17ba73000-2ae17ba75000 rw-p 00015000 fe:00 29212707 /lib/libpthread-2.5.so 2ae17ba75000-2ae17ba79000 rw-p 2ae17ba75000 00:00 0 2ae17ba79000-2ae17ba7b000 r-xp 00000000 fe:00 29212696 /lib/libdl-2.5.so 2ae17ba7b000-2ae17bc7b000 ---p 00002000 fe:00 29212696 /lib/libdl-2.5.so 2ae17bc7b000-2ae17bc7d000 rw-p 00002000 fe:00 29212696 /lib/libdl-2.5.so 2ae17bc7d000-2ae17bc82000 r-xp 00000000 fe:00 29212695 /lib/libcrypt-2.5.so 2ae17bc82000-2ae17be81000 ---p 00005000 fe:00 29212695 /lib/libcrypt-2.5.so 2ae17be81000-2ae17be83000 rw-p 00004000 fe:00 29212695 /lib/libcrypt-2.5.so 2ae17be83000-2ae17beb2000 rw-p 2ae17be83000 00:00 0 2ae17beb2000-2ae17bf33000 r-xp 00000000 fe:00 29212697 /lib/libm-2.5.so 2ae17bf33000-2ae17c132000 ---p 00081000 fe:00 29212697 /lib/libm-2.5.so 2ae17c132000-2ae17c134000 rw-p 00080000 fe:00 29212697 /lib/libm-2.5.so 2ae17c134000-2ae17c27b000 r-xp 00000000 fe:00 29212693 /lib/libc-2.5.so 2ae17c27b000-2ae17c47b000 ---p 00147000 fe:00 29212693 /lib/libc-2.5.so 2ae17c47b000-2ae17c47e000 r--p 00147000 fe:00 29212693 /lib/libc-2.5.so 2ae17c47e000-2ae17c480000 rw-p 0014a000 fe:00 29212693 /lib/libc-2.5.so 2ae17c480000-2ae17c487000 rw-p 2ae17c480000 00:00 0 2ae17c487000-2ae17c492000 r-xp 00000000 fe:00 6209867 /usr/lib/ruby/1.8/x86_64-linux/socket.so 2ae17c492000-2ae17c691000 ---p 0000b000 fe:00 6209867 /usr/lib/ruby/1.8/x86_64-linux/socket.so 2ae17c691000-2ae17c692000 rw-p 0000a000 fe:00 6209867 /usr/lib/ruby/1.8/x86_64-linux/socket.so 2ae17c692000-2ae17c7cf000 rw-p 2ae17c692000 00:00 0 2ae17c7cf000-2ae17c7d4000 r-xp 00000000 fe:00 6209868 /usr/lib/ruby/1.8/x86_64-linux/stringio.so 2ae17c7d4000-2ae17c9d3000 ---p 00005000 fe:00 6209868 /usr/lib/ruby/1.8/x86_64-linux/stringio.so 2ae17c9d3000-2ae17c9d4000 rw-p 00004000 fe:00 6209868 /usr/lib/ruby/1.8/x86_64-linux/stringio.so 2ae17c9d4000-2ae17c9f0000 r-xp 00000000 fe:00 6209870 /usr/lib/ruby/1.8/x86_64-linux/syck.so 2ae17c9f0000-2ae17cbef000 ---p 0001c000 fe:00 6209870 /usr/lib/ruby/1.8/x86_64-linux/syck.so 2ae17cbef000-2ae17cbf1000 rw-p 0001b000 fe:00 6209870 /usr/lib/ruby/1.8/x86_64-linux/syck.so 2ae17cbf1000-2ae17cbfa000 r-xp 00000000 fe:00 6209872 /usr/lib/ruby/1.8/x86_64-linux/zlib.so 2ae17cbfa000-2ae17cdf9000 ---p 00009000 fe:00 6209872 /usr/lib/ruby/1.8/x86_64-linux/zlib.so 2ae17cdf9000-2ae17cdfa000 rw-p 00008000 fe:00 6209872 /usr/lib/ruby/1.8/x86_64-linux/zlib.so 2ae17ce00000-2ae17ce16000 r-xp 00000000 fe:00 6146044 /usr/lib/libz.so.1.2.3 2ae17ce16000-2ae17d015000 ---p 00016000 fe:00 6146044 /usr/lib/libz.so.1.2.3 2ae17d015000-2ae17d016000 rw-p 00015000 fe:00 6146044 /usr/lib/libz.so.1.2.3 2ae17d016000-2ae17d01a000 r-xp 00000000 fe:00 6209853 /usr/lib/ruby/1.8/x86_64-linux/digest/sha2.so 2ae17d01a000-2ae17d219000 ---p 00004000 fe:00 6209853 /usr/lib/ruby/1.8/x86_64-linux/digest/sha2.so 2ae17d219000-2ae17d21a000 rw-p 00003000 fe:00 6209853 /usr/lib/ruby/1.8/x86_64-linux/digest/sha2.so 2ae17d21a000-2ae17d21c000 r-xp 00000000 fe:00 6209848 /usr/lib/ruby/1.8/x86_64-linux/digest.so 2ae17d21c000-2ae17d41b000 ---p 00002000 fe:00 6209848 /usr/lib/ruby/1.8/x86_64-linux/digest.so 2ae17d41b000-2ae17d41c000 rw-p 00001000 fe:00 6209848 /usr/lib/ruby/1.8/x86_64-linux/digest.so 2ae17d41c000-2ae17d457000 r-xp 00000000 fe:00 6212089 /usr/lib/ruby/1.8/x86_64-linux/openssl.so 2ae17d457000-2ae17d656000 ---p 0003b000 fe:00 6212089 /usr/lib/ruby/1.8/x86_64-linux/openssl.so 2ae17d656000-2ae17d659000 rw-p 0003a000 fe:00 6212089 /usr/lib/ruby/1.8/x86_64-linux/openssl.so 2ae17d65f000-2ae17d6a1000 r-xp 00000000 fe:00 6149328 /usr/lib/libssl.so.0.9.8 2ae17d6a1000-2ae17d8a1000 ---p 00042000 fe:00 6149328 /usr/lib/libssl.so.0.9.8 2ae17d8a1000-2ae17d8a7000 rw-p 00042000 fe:00 6149328 /usr/lib/libssl.so.0.9.8 2ae17d8a7000-2ae17d9fc000 r-xp 00000000 fe:00 6149327 /usr/lib/libcrypto.so.0.9.8 2ae17d9fc000-2ae17dbfc000 ---p 00155000 fe:00 6149327 /usr/lib/libcrypto.so.0.9.8 2ae17dbfc000-2ae17dc1f000 rw-p 00155000 fe:00 6149327 /usr/lib/libcrypto.so.0.9.8 2ae17dc1f000-2ae17dc22000 rw-p 2ae17dc1f000 00:00 0 2ae17dc22000-2ae17dc23000 r-xp 00000000 fe:00 6209857 /usr/lib/ruby/1.8/x86_64-linux/fcntl.so 2ae17dc23000-2ae17de22000 ---p 00001000 fe:00 6209857 /usr/lib/ruby/1.8/x86_64-linux/fcntl.so 2ae17de22000-2ae17de23000 rw-p 00000000 fe:00 6209857 /usr/lib/ruby/1.8/x86_64-linux/fcntl.so 2ae17de23000-2ae17e05d000 rw-p 2ae17de23000 00:00 0 2ae17e05d000-2ae17e061000 r-xp 00000000 fe:00 6209869 /usr/lib/ruby/1.8/x86_64-linux/strscan.so 2ae17e061000-2ae17e261000 ---p 00004000 fe:00 6209869 /usr/lib/ruby/1.8/x86_64-linux/strscan.so 2ae17e261000-2ae17e262000 rw-p 00004000 fe:00 6209869 /usr/lib/ruby/1.8/x86_64-linux/strscan.so 2ae17e262000-2ae17e26d000 r-xp 00000000 fe:00 6209846 /usr/lib/ruby/1.8/x86_64-linux/bigdecimal.so 2ae17e26d000-2ae17e46c000 ---p 0000b000 fe:00 6209846 /usr/lib/ruby/1.8/x86_64-linux/bigdecimal.so 2ae17e46c000-2ae17e46d000 rw-p 0000a000 fe:00 6209846 /usr/lib/ruby/1.8/x86_64-linux/bigdecimal.so 2ae17e46d000-2ae17e86f000 rw-p 2ae17e46d000 00:00 0 2ae17e86f000-2ae17e8ab000 r-xp 00000000 fe:00 6209861 /usr/lib/ruby/1.8/x86_64-linux/nkf.so 2ae17e8ab000-2ae17eaab000 ---p 0003c000 fe:00 6209861 /usr/lib/ruby/1.8/x86_64-linux/nkf.so 2ae17eaab000-2ae17eaaf000 rw-p 0003c000 fe:00 6209861 /usr/lib/ruby/1.8/x86_64-linux/nkf.so 2ae17eaaf000-2ae17eab0000 rw-p 2ae17eaaf000 00:00 0 2ae17eab1000-2ae17f1e7000 rw-p 2ae17eab1000 00:00 0 2ae17f1e7000-2ae17f1e9000 r-xp 00000000 fe:00 6209856 /usr/lib/ruby/1.8/x86_64-linux/etc.so 2ae17f1e9000-2ae17f3e9000 ---p 00002000 fe:00 6209856 /usr/lib/ruby/1.8/x86_64-linux/etc.so 2ae17f3e9000-2ae17f3ea000 rw-p 00002000 fe:00 6209856 /usr/lib/ruby/1.8/x86_64-linux/etc.so 2ae17f3ea000-2ae17f3ec000 r-xp 00000000 fe:00 6209850 /usr/lib/ruby/1.8/x86_64-linux/digest/md5.so 2ae17f3ec000-2ae17f5eb000 ---p 00002000 fe:00 6209850 /usr/lib/ruby/1.8/x86_64-linux/digest/md5.so 2ae17f5eb000-2ae17f5ec000 rw-p 00001000 fe:00 6209850 /usr/lib/ruby/1.8/x86_64-linux/digest/md5.so 2ae17f5ec000-2ae17f5ef000 r-xp 00000000 fe:00 6209864 /usr/lib/ruby/1.8/x86_64-linux/racc/cparse.so 2ae17f5ef000-2ae17f7ef000 ---p 00003000 fe:00 6209864 /usr/lib/ruby/1.8/x86_64-linux/racc/cparse.so 2ae17f7ef000-2ae17f7f0000 rw-p 00003000 fe:00 6209864 /usr/lib/ruby/1.8/x86_64-linux/racc/cparse.so 2ae17f7f0000-2ae17f7f4000 r-xp 00000000 fe:00 6209858 /usr/lib/ruby/1.8/x86_64-linux/iconv.so 2ae17f7f4000-2ae17f9f3000 ---p 00004000 fe:00 6209858 /usr/lib/ruby/1.8/x86_64-linux/iconv.so 2ae17f9f3000-2ae17f9f4000 rw-p 00003000 fe:00 6209858 /usr/lib/ruby/1.8/x86_64-linux/iconv.so 2ae17f9f4000-2ae17f9f5000 rw-p 2ae17f9f4000 00:00 0 2ae17f9f5000-2ae17f9f8000 r-xp 00000000 fe:00 6209852 /usr/lib/ruby/1.8/x86_64-linux/digest/sha1.so 2ae17f9f8000-2ae17fbf8000 ---p 00003000 fe:00 6209852 /usr/lib/ruby/1.8/x86_64-linux/digest/sha1.so 2ae17fbf8000-2ae17fbf9000 rw-p 00003000 fe:00 6209852 /usr/lib/ruby/1.8/x86_64-linux/digest/sha1.so 2ae17fbfa000-2ae1808f4000 rw-p 2ae17fbfa000 00:00 0 2ae1808f4000-2ae180997000 r-xp 00000000 fe:00 7297883 /usr/lib/ruby/gems/1.8/gems/ferret-0.11.4/ lib/ferret_ext.so 2ae180997000-2ae180b96000 ---p 000a3000 fe:00 7297883 /usr/lib/ruby/gems/1.8/gems/ferret-0.11.4/ lib/ferret_ext.so 2ae180b96000-2ae180bb7000 rw-p 000a2000 fe:00 7297883 /usr/lib/ruby/gems/1.8/gems/ferret-0.11.4/ lib/ferret_ext.so 2ae180bb7000-2ae180bb8000 rw-p 2ae180bb7000 00:00 0 2ae180bb8000-2ae180bbe000 r-xp 00000000 fe:00 6508818 /usr/lib/ruby/gems/1.8/gems/amatch-0.2.3/ ext/amatch.so 2ae180bbe000-2ae180dbd000 ---p 00006000 fe:00 6508818 /usr/lib/ruby/gems/1.8/gems/amatch-0.2.3/ ext/amatch.so 2ae180dbd000-2ae180dbe000 rw-p 00005000 fe:00 6508818 /usr/lib/ruby/gems/1.8/gems/amatch-0.2.3/ ext/amatch.so 2ae180dbe000-2ae180dc0000 r-xp 00000000 fe:00 17203639 /var/www/webroot/panjiva.com/admin/releases/ 20071117220121/vendor/ruby_inline/.ruby_inline/Inline_String_7dae.so 2ae180dc0000-2ae180fbf000 ---p 00002000 fe:00 17203639 /var/www/webroot/panjiva.com/admin/releases/ 20071117220121/vendor/ruby_inline/.ruby_inline/Inline_String_7dae.so 2ae180fbf000-2ae180fc0000 rw-p 00001000 fe:00 17203639 /var/www/webroot/panjiva.com/admin/releases/ 20071117220121/vendor/ruby_inline/.ruby_inline/Inline_String_7dae.so 2ae180fc0000-2ae180fc8000 rw-p 2ae180fc0000 00:00 0 2ae180fc8000-2ae180fd2000 r-xp 00000000 fe:00 29212702 /lib/libnss_files-2.5.so 2ae180fd2000-2ae1811d1000 ---p 0000a000 fe:00 29212702 /lib/libnss_files-2.5.so 2ae1811d1000-2ae1811d3000 rw-p 00009000 fe:00 29212702 /lib/libnss_files-2.5.so 7fff2f6dc000-7fff2f757000 rw-p 7fff2f6dc000 00:00 0 [stack] ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vdso] From jk at jkraemer.net Sun Nov 18 04:53:14 2007 From: jk at jkraemer.net (Jens Kraemer) Date: Sun, 18 Nov 2007 10:53:14 +0100 Subject: [Ferret-talk] crash while building index In-Reply-To: References: Message-ID: <20071118095314.GQ3558@thunder.jkraemer.net> On Sat, Nov 17, 2007 at 06:27:29PM -0500, Noah M. Daniels wrote: > Hi, > > I'm trying to reindex a model (I'm using acts_as_ferret) after having > added (via metaprogramming) a large number of fields (several hundred) > to the index. > > It keeps crashing when trying to rebuild the index (the crash log is > below, from ferret_server.out) but it only seems to crash on Linux > (Ubuntu server 7.04, x86-64) whereas it's fine on my OS X laptop > (10.5.1). This is with ferret 0.11.4 in both cases. > > Any thoughts? Is there a hard field limit in ferret? > > > *** glibc detected *** ruby: realloc(): invalid next size: > 0x000000000232ffc0 *** > ======= Backtrace: ========= [..] Looks strange - maybe a problem with Ubuntu's 64bit libs? Can you try to provide a simple script reproducing this behaviour? Cheers, Jens -- Jens Kr?mer http://www.jkraemer.net/ - Blog http://www.omdb.org/ - The new free film database From erik at ehatchersolutions.com Sun Nov 18 05:24:15 2007 From: erik at ehatchersolutions.com (Erik Hatcher) Date: Sun, 18 Nov 2007 05:24:15 -0500 Subject: [Ferret-talk] Ferret/AAF Stability? In-Reply-To: <20071117123925.GO3558@thunder.jkraemer.net> References: <314ee0450711160919y7eaad1cl16ef0bc349b09dee@mail.gmail.com> <20071117123925.GO3558@thunder.jkraemer.net> Message-ID: <5513C9F2-E3AE-4B0F-8D48-4B0FD8E9965F@ehatchersolutions.com> On Nov 17, 2007, at 7:39 AM, Jens Kraemer wrote: >> I think Ferret is adequate for most search tasks, but if (like me) >> you're building a dedicated search engine, Solr is currently a >> stronger candidate. > > Well, As Solr uses Lucene internally, the mechanics and performance > characteristics naturally can't be that different from Ferret. Maybe > Ferret has a bug or two and a non-working inter-process locking (which > doesn't matter when you think about building a dedicated search server > like Solr is, since it's only one process), but the general internal > handling of the index is the same, i.e. you can also only have one > Writer open to a Lucene index at a time, and Searchers won't see index > changes until re-opened, too. That's all true. However, Solr manages all the IndexWriter/ IndexSearcher stuff for you quite transparently (which I guess is comparable to Ferret + DRb, eh?). Because it is a single point of access to the index, it takes care of the single writer situation, and also handles warming IndexSearchers before coming online so that caches are built and a search on an updated index is as fast as it was before being updated. > Having that said, if my application's main concern would be search, I > most probably wouldn't choose any pre-cooked solution like aaf or > Solr, > but build exactly the thing I need from scratch, basing it either on > Lucene or Ferret. But maybe that's just me ;-) You'd be reinventing a lot of wheels doing that, with IndexWriter synchronization, IndexSearcher warming, caching, and much more. Erik From andreas.korth at gmail.com Sun Nov 18 10:05:23 2007 From: andreas.korth at gmail.com (Andreas Korth) Date: Sun, 18 Nov 2007 16:05:23 +0100 Subject: [Ferret-talk] Multithreading / multiprocessing woes In-Reply-To: <8C5DFF2D-440E-4A5A-8D25-5CA965938D02@ehatchersolutions.com> References: <75f591160711160256i68802afcg8af5ae4e95c67636@mail.gmail.com> <75f591160711161235g692622c5p4b7fb907e9596cdb@mail.gmail.com> <8C5DFF2D-440E-4A5A-8D25-5CA965938D02@ehatchersolutions.com> Message-ID: <323C44A8-07A5-4716-AC8C-3C4B7F221A83@gmail.com> Hi everyone! This is a very interesting thread, because it raises the question as to whether Ferret is something you would want to use in a production environment - or not. I've been using Ferret in two applications and my experiences were quite disappointing. I chose Ferret because it's fast and it's got a Ruby API. Everything else about it is just annoying and potentially hazardous. What worries me most is the fact that Ferret is effectively an abandoned project. The original author, who is the sole owner of the code, hasn't been posting to this list for about six months. He hasn't introduced any improvements in about the same period of time and many bugs still remain unfixed. New bugs can't be submitted (let alone patches) because the project Trac is offline. There is no other component in my applications which behaves as badly as Ferret. If you don't treat it _very_ carefully it will throw segfaults as if this was an established way of indicating an error condition. The ActsAsFerret plugin _does_ treat ferret quite carefully and it's the only reason why many people are able to use Ferret at all. However, AAF is one approach and for some applications it might not be the right one. Especially if you want to put multiple models in one index - it's possible, but not really a flexible solution. The most sensitive point of Ferret is concurrency and many people actually use Ferret in distributed environments (which is usually a Rails app that scales across several machines). AAF introduces a DRb server to work around this problem, but with many concurrent read/ write requests, performance quickly degrades. With the advent of JRuby, a myriad of Java-based solutions is now accessible to Ruby developers, including many full-text indices. There are very mature solutions readily available for production use and many next-generation search engines currently in development. For the next application that needs full text search, I'm most definitely not going to use Ferret. I agree with Erik and give Solr a shot. I would like to encourage everyone, who is already using another full text index for Ruby/Rails to share his/her experiences on this list. Because I have the feeling that many people would like to get rid of Ferret for exactly the same reasons I've pointed out above. Andy On 16.11.2007, at 22:13, Erik Hatcher wrote: > > On Nov 16, 2007, at 3:35 PM, Scott Davies wrote: >> Am I a fool for wondering whether it might ultimately be less painful >> to try an index server that runs Lucene under a JRuby process? > > Or, rather, an index server that runs Solr accessed with a pure Ruby, > solr-ruby, API (which works with MRI or JRuby)? :) > > Erik > > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk From casey at nerdle.com Sun Nov 18 10:24:34 2007 From: casey at nerdle.com (casey at nerdle.com) Date: Sun, 18 Nov 2007 10:24:34 -0500 (EST) Subject: [Ferret-talk] Multithreading / multiprocessing woes In-Reply-To: <323C44A8-07A5-4716-AC8C-3C4B7F221A83@gmail.com> References: <75f591160711160256i68802afcg8af5ae4e95c67636@mail.gmail.com> <75f591160711161235g692622c5p4b7fb907e9596cdb@mail.gmail.com> <8C5DFF2D-440E-4A5A-8D25-5CA965938D02@ehatchersolutions.com> <323C44A8-07A5-4716-AC8C-3C4B7F221A83@gmail.com> Message-ID: Andy, You asked about other full text indexes for Ruby/Rails. I am using both AAF/Ferret and Sphinx in my app. I haven't had any problems with Ferret or acts_as_ferret so far. I am using the DRb server and it is being hit with 200-250,000 requests a day from dozens of clients (Mongrel instances). My index isn't huge - it is about 600 MB. I'm using Sphinx (http://www.sphinxsearch.com/) wherever I don't need realtime updates. A large portion of my site requires search indexes to be always up-to-date but in many places, I can live with an index that may be 5 minutes old. Sphinx trades realtime indexing for performance - both search and indexing speed is blazingly fast. Sphinx comes with a server component that speaks a simple protocol and there are several rails plugins available. Sphinx (and acts_as_sphinx or whatever plugin you choose) and acts_as_ferret are very different animals, but I'm very pleased with the combination. Casey On Sun, 18 Nov 2007, Andreas Korth wrote: > Hi everyone! > > This is a very interesting thread, because it raises the question as > to whether Ferret is something you would want to use in a production > environment - or not. > > I've been using Ferret in two applications and my experiences were > quite disappointing. I chose Ferret because it's fast and it's got a > Ruby API. Everything else about it is just annoying and potentially > hazardous. > > What worries me most is the fact that Ferret is effectively an > abandoned project. The original author, who is the sole owner of the > code, hasn't been posting to this list for about six months. He hasn't > introduced any improvements in about the same period of time and many > bugs still remain unfixed. New bugs can't be submitted (let alone > patches) because the project Trac is offline. > > There is no other component in my applications which behaves as badly > as Ferret. If you don't treat it _very_ carefully it will throw > segfaults as if this was an established way of indicating an error > condition. > > The ActsAsFerret plugin _does_ treat ferret quite carefully and it's > the only reason why many people are able to use Ferret at all. > However, AAF is one approach and for some applications it might not be > the right one. Especially if you want to put multiple models in one > index - it's possible, but not really a flexible solution. > > The most sensitive point of Ferret is concurrency and many people > actually use Ferret in distributed environments (which is usually a > Rails app that scales across several machines). AAF introduces a DRb > server to work around this problem, but with many concurrent read/ > write requests, performance quickly degrades. > > With the advent of JRuby, a myriad of Java-based solutions is now > accessible to Ruby developers, including many full-text indices. There > are very mature solutions readily available for production use and > many next-generation search engines currently in development. > > For the next application that needs full text search, I'm most > definitely not going to use Ferret. I agree with Erik and give Solr a > shot. > > I would like to encourage everyone, who is already using another full > text index for Ruby/Rails to share his/her experiences on this list. > Because I have the feeling that many people would like to get rid of > Ferret for exactly the same reasons I've pointed out above. > > Andy > > > On 16.11.2007, at 22:13, Erik Hatcher wrote: > >> >> On Nov 16, 2007, at 3:35 PM, Scott Davies wrote: >>> Am I a fool for wondering whether it might ultimately be less painful >>> to try an index server that runs Lucene under a JRuby process? >> >> Or, rather, an index server that runs Solr accessed with a pure Ruby, >> solr-ruby, API (which works with MRI or JRuby)? :) >> >> Erik >> >> _______________________________________________ >> Ferret-talk mailing list >> Ferret-talk at rubyforge.org >> http://rubyforge.org/mailman/listinfo/ferret-talk > > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > From jk at jkraemer.net Sun Nov 18 12:51:04 2007 From: jk at jkraemer.net (Jens Kraemer) Date: Sun, 18 Nov 2007 18:51:04 +0100 Subject: [Ferret-talk] Multithreading / multiprocessing woes In-Reply-To: References: <75f591160711160256i68802afcg8af5ae4e95c67636@mail.gmail.com> <75f591160711161235g692622c5p4b7fb907e9596cdb@mail.gmail.com> <8C5DFF2D-440E-4A5A-8D25-5CA965938D02@ehatchersolutions.com> <323C44A8-07A5-4716-AC8C-3C4B7F221A83@gmail.com> Message-ID: <20071118175104.GS3558@thunder.jkraemer.net> Hi! On Sun, Nov 18, 2007 at 10:24:34AM -0500, casey at nerdle.com wrote: > Andy, > > You asked about other full text indexes for Ruby/Rails. I am using both > AAF/Ferret and Sphinx in my app. > > I haven't had any problems with Ferret or acts_as_ferret so far. I am > using the DRb server and it is being hit with 200-250,000 requests a day > from dozens of clients (Mongrel instances). My index isn't huge - it is > about 600 MB. ah, glad to see somebody where everything just works standing up and tell the world :-) On Sun, 18 Nov 2007, Andreas Korth wrote: [..] > > > > What worries me most is the fact that Ferret is effectively an > > abandoned project. The original author, who is the sole owner of the > > code, hasn't been posting to this list for about six months. He hasn't > > introduced any improvements in about the same period of time and many > > bugs still remain unfixed. New bugs can't be submitted (let alone > > patches) because the project Trac is offline. Trac is online again for days, and Ferret even got a new logo :-) I wouldn't call it abandoned, it's just stabilizing. > > There is no other component in my applications which behaves as badly > > as Ferret. If you don't treat it _very_ carefully it will throw > > segfaults as if this was an established way of indicating an error > > condition. > > > > The ActsAsFerret plugin _does_ treat ferret quite carefully and it's > > the only reason why many people are able to use Ferret at all. > > However, AAF is one approach and for some applications it might not be > > the right one. Especially if you want to put multiple models in one > > index - it's possible, but not really a flexible solution. Well, even if aaf doesn't fit your needs, you might at least have a look at it if you want to know how to treat your Ferret well :-) I admit it isn't always an easy library to deal with, but with a proper set of unit tests it's entirely possible and no headache at all. Imho. > > The most sensitive point of Ferret is concurrency and many people > > actually use Ferret in distributed environments (which is usually a > > Rails app that scales across several machines). AAF introduces a DRb > > server to work around this problem, but with many concurrent read/ > > write requests, performance quickly degrades. AAf's DRb server can handle some serious load as it is now, but for sure there's much room for improvement. However I didn't receive many complaints from people actually *having* this problem in real life applications yet. Most of the time this is brought up as some kind of 'what if' problem. Somebody did a speed comparison of Solr and aaf/Drb a while back, where aaf was at least as fast as Solr was, with it's admittedly naive DRb server. I don't say this was a representative benchmark or anything, but it's the only numbers I know of... So please from now on, anybody feeling to blame aaf's DRb as slow, *please* show us some numbers and the test process which led to these numbers. Ideally you'd also show us the numbers of any solution you've found to be faster solving the same problem. Thanks. > > With the advent of JRuby, a myriad of Java-based solutions is now > > accessible to Ruby developers, including many full-text indices. There > > are very mature solutions readily available for production use and > > many next-generation search engines currently in development. For sure. I'm excited by these possiblities as well. Cheers, Jens -- Jens Kr?mer http://www.jkraemer.net/ - Blog http://www.omdb.org/ - The new free film database From marvin at rectangular.com Sun Nov 18 12:29:31 2007 From: marvin at rectangular.com (Marvin Humphrey) Date: Sun, 18 Nov 2007 09:29:31 -0800 Subject: [Ferret-talk] Multithreading / multiprocessing woes In-Reply-To: <323C44A8-07A5-4716-AC8C-3C4B7F221A83@gmail.com> References: <75f591160711160256i68802afcg8af5ae4e95c67636@mail.gmail.com> <75f591160711161235g692622c5p4b7fb907e9596cdb@mail.gmail.com> <8C5DFF2D-440E-4A5A-8D25-5CA965938D02@ehatchersolutions.com> <323C44A8-07A5-4716-AC8C-3C4B7F221A83@gmail.com> Message-ID: On Nov 18, 2007, at 7:05 AM, Andreas Korth wrote: > What worries me most is the fact that Ferret is effectively an > abandoned project. The original author, who is the sole owner of the > code, hasn't been posting to this list for about six months. He hasn't > introduced any improvements in about the same period of time and many > bugs still remain unfixed. I have a large fraction of the expertise needed to maintain the C part of the Ferret code base, FWIW. What I'm missing is significant Ruby expertise, which I wouldn't mind accumulating. :) If what's needed is C-level bug fixing, I can probably help out. > New bugs can't be submitted (let alone > patches) because the project Trac is offline. I know it's been down before, but looks like it's up to me, now. Also, I see a commit from Dave bumping the version to 0.11.5 yesterday. The C code base that I am currently working on, which has a foundation designed by Dave and I to be shared by multiple host languages, is going to wind up having Ruby bindings eventually. It will either happen as part of the Lucy project, or independently. In the meantime, perhaps I can contribute to Ferret in a caretaker/ troubleshooter role. Dave gave me commit access to the repository a while ago, and I just verified that I still have it. Marvin Humphrey Rectangular Research http://www.rectangular.com/ From andreas.korth at gmail.com Sun Nov 18 14:50:04 2007 From: andreas.korth at gmail.com (Andreas Korth) Date: Sun, 18 Nov 2007 20:50:04 +0100 Subject: [Ferret-talk] Multithreading / multiprocessing woes In-Reply-To: <20071118175104.GS3558@thunder.jkraemer.net> References: <75f591160711160256i68802afcg8af5ae4e95c67636@mail.gmail.com> <75f591160711161235g692622c5p4b7fb907e9596cdb@mail.gmail.com> <8C5DFF2D-440E-4A5A-8D25-5CA965938D02@ehatchersolutions.com> <323C44A8-07A5-4716-AC8C-3C4B7F221A83@gmail.com> <20071118175104.GS3558@thunder.jkraemer.net> Message-ID: <2868A3E0-B0C3-46D8-97F0-541888C5C979@gmail.com> On 18.11.2007, at 18:51, Jens Kraemer wrote: > Trac is online again for days, and Ferret even got a new logo :-) I > wouldn't call it abandoned, it's just stabilizing. Yes, I noticed that. I should have checked before posting. However, a project site that is frequently down for extended periods of time is not exactly building up trust :) > AAf's DRb server can handle some serious load as it is now, but for > sure > there's much room for improvement. However I didn't receive many > complaints from people actually *having* this problem in real life > applications yet. Most of the time this is brought up as some kind of > 'what if' problem. My apologies for implying that AAF is part of the problem. It certainly isn't. I made the mistake to mix up my concerns about Ferret with comments on AAF. What I actually meant to say, is that AAF is one viable way to deal with some of Ferret's shortcomings. The fact that in the Rails community AAF is almost synonymous with Ferret speaks for your plugin and I'm not in a position to question that. > So please from now on, anybody feeling to blame aaf's DRb as slow, > *please* show us some numbers and the test process which led to > these numbers. Again, I wasn't to blame AAF here. To be more precise: Ferret is pretty damn fast. The problem is its extremely sensitive API which exposes problems from the C implementation to the Ruby developer. I don't know of any way to catch a segfault in Ruby, and even if I did, there's little I can do about it from Rubyland. Without transactional index updates, such behavior is intolerable, unless you can afford to rebuild your index several times a day. This leaves us to build another Ruby API on top of Ferret's in order to compensate for these imperfections. I wrote a custom solution with a focus on reliability. But with all the infrastructure built around Ferret (DRb server, transactions, queuing), the overall indexing performance wasn't that great anymore: Remote indexing with 10 concurrent clients was 8-9 times slower than local indexing. Maybe AAF is faster, but since the implementations are different, there's no point in comparing them directly. Andy From andreas.korth at gmail.com Sun Nov 18 14:56:27 2007 From: andreas.korth at gmail.com (Andreas Korth) Date: Sun, 18 Nov 2007 20:56:27 +0100 Subject: [Ferret-talk] Multithreading / multiprocessing woes In-Reply-To: References: <75f591160711160256i68802afcg8af5ae4e95c67636@mail.gmail.com> <75f591160711161235g692622c5p4b7fb907e9596cdb@mail.gmail.com> <8C5DFF2D-440E-4A5A-8D25-5CA965938D02@ehatchersolutions.com> <323C44A8-07A5-4716-AC8C-3C4B7F221A83@gmail.com> Message-ID: <3DAD6E50-FDEB-4B5F-9248-CBD16BF51174@gmail.com> On 18.11.2007, at 16:24, casey at nerdle.com wrote: > I'm using Sphinx (http://www.sphinxsearch.com/) wherever I don't need > realtime updates. A large portion of my site requires search > indexes to > be always up-to-date but in many places, I can live with an index > that may > be 5 minutes old. Sphinx trades realtime indexing for performance - > both > search and indexing speed is blazingly fast. Sphinx comes with a > server > component that speaks a simple protocol and there are several rails > plugins available. Thanks, Casey. I'll take a look at Sphinx. Since I'm primarily concerned about index consistency and don't mind short delays either, it sounds like a pretty good alternative. Cheers, Andy From erik at ehatchersolutions.com Sun Nov 18 04:29:36 2007 From: erik at ehatchersolutions.com (Erik Hatcher) Date: Sun, 18 Nov 2007 04:29:36 -0500 Subject: [Ferret-talk] Multithreading / multiprocessing woes In-Reply-To: <75f591160711170212q1d4d6475v3e830a64ff4c3dc2@mail.gmail.com> References: <75f591160711160256i68802afcg8af5ae4e95c67636@mail.gmail.com> <75f591160711161235g692622c5p4b7fb907e9596cdb@mail.gmail.com> <8C5DFF2D-440E-4A5A-8D25-5CA965938D02@ehatchersolutions.com> <75f591160711170212q1d4d6475v3e830a64ff4c3dc2@mail.gmail.com> Message-ID: On Nov 17, 2007, at 5:12 AM, Scott Davies wrote: > Hmmm...I'd first heard of Solr only a couple of days ago, and I hadn't > been aware of a Ruby API to it until you mentioned it. > Interesting...thanks! I've honestly given fairly little of my time to Ferret, though I have tinkered with it some and it is mighty fine! Believe you me, I don't want to steal any thunder from Ferret. And I've not compared/contrasted them much myself. Truth be told I'm still a Java dude, and knowing that Lucene and Solr are in Java, excel at what they are designed to do and already gulping the Apache cool-ade I really dig Solr. I've presented solr+ruby a couple of times now, once at RailsConf and then again a few weeks ago at rubyconf. RailsConf: rubyconf: acts_as_solr as it exists today is sub-optimal compared to acts_as_ferret. I'm quite admittedly not much into relational databases so I have only tinkered in this area myself. Erik From julioody at gmail.com Sun Nov 18 19:45:31 2007 From: julioody at gmail.com (Julio Cesar Ody) Date: Mon, 19 Nov 2007 11:45:31 +1100 Subject: [Ferret-talk] Multithreading / multiprocessing woes In-Reply-To: References: <75f591160711160256i68802afcg8af5ae4e95c67636@mail.gmail.com> <75f591160711161235g692622c5p4b7fb907e9596cdb@mail.gmail.com> <8C5DFF2D-440E-4A5A-8D25-5CA965938D02@ehatchersolutions.com> <75f591160711170212q1d4d6475v3e830a64ff4c3dc2@mail.gmail.com> Message-ID: Great. For my own curiosity, and maybe people here share some of it: Is it possible to write your own custom analyzers for Solr? If so, how easy it is? Can one do that in Ruby or do I have to write it in Java? I personally think that's one of the greatest things about Ferret. So far I haven't bothered looking into Sphinx or Solr precisely because, from a glance, I couldn't find a way to customize anything in detail like I can do with Ferret. I assume there is a way... Thing is, reading through the Ferret booklet (the one from OReilly), you get a glimpse of how easy it is to build custom solutions using it. So whereas it's kind of sad that the lead developer has been distant from the project in the last few months (?), I have to say, there's hardly matching how easy it is to work with it. On Nov 18, 2007 8:29 PM, Erik Hatcher wrote: > > On Nov 17, 2007, at 5:12 AM, Scott Davies wrote: > > Hmmm...I'd first heard of Solr only a couple of days ago, and I hadn't > > been aware of a Ruby API to it until you mentioned it. > > Interesting...thanks! > > I've honestly given fairly little of my time to Ferret, though I have > tinkered with it some and it is mighty fine! > > Believe you me, I don't want to steal any thunder from Ferret. And > I've not compared/contrasted them much myself. Truth be told I'm > still a Java dude, and knowing that Lucene and Solr are in Java, > excel at what they are designed to do and already gulping the Apache > cool-ade I really dig Solr. > > I've presented solr+ruby a couple of times now, once at RailsConf and > then again a few weeks ago at rubyconf. > > RailsConf: > > > rubyconf: > > > acts_as_solr as it exists today is sub-optimal compared to > acts_as_ferret. I'm quite admittedly not much into relational > databases so I have only tinkered in this area myself. > > Erik > > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > From alex at liivid.com Sun Nov 18 19:33:23 2007 From: alex at liivid.com (Alex Neth) Date: Mon, 19 Nov 2007 08:33:23 +0800 Subject: [Ferret-talk] My AAF tweaks In-Reply-To: References: Message-ID: I have had to fix a few issues with AAF in order to get it working well for myself in a production environment. I'm using the latest "release" version which is 0.4.1: 1) When there is no index in place, every request starts a new rebuild. While there is some code in place to allow this to happen during testing, personally I see no reason to even test for this, although maybe it's necessary for those using multiple indexes (?). At the very least, the same index should never be built twice at the same time I hope, so I put in rudimentary code that just locks until the index is complete on a request. The real problem is that all the rebuilds use the same FERRET_INDEX/ rebuild path, which causes the dRB server to core dump and massive CPU load as two reindexes are running and files are replaced underneath them. This may be the reason for a lot of stability complaints, as I think a lot of people just remove their index instead of calling rebuild. 2) Performance degrades when I index articles until I call optimize on the index. Optimize can take many seconds and seems to lock all access via the dRb server. I added logic to use a separate index for modifications (adds/deletes) and optimizations. It required significant hacking of the AAF plug-in. I basically have a writable index that is then copied each time to a new read-only index location, followed by changing the index_dir in AAF to the new read- only index. This prevents the slowdown during indexing, but everything still seems to lock during optimizations. I have a faster server now, so optimizations only take around 6 seconds. I may have to use a separate dRb server to do the optimizations. I am not sure where this locking occurs. I would like to see aaf take take or this locking issue somehow. None of the above are ready to be checked in publicly but I'd be happy to send a patch if someone wants to base some work on it. Other than these issues, aaf/ferret have been excellent, and basically "just worked". I am able to handle around 40 requests per second, including rendering a results page. I haven't finished performance testing as that is more than enough performance for me right now. -Alex From mail at stuartsierra.com Sun Nov 18 21:59:22 2007 From: mail at stuartsierra.com (Stuart Sierra) Date: Sun, 18 Nov 2007 21:59:22 -0500 Subject: [Ferret-talk] Ferret/AAF Stability? In-Reply-To: <20071117123925.GO3558@thunder.jkraemer.net> References: <314ee0450711160919y7eaad1cl16ef0bc349b09dee@mail.gmail.com> <20071117123925.GO3558@thunder.jkraemer.net> Message-ID: <314ee0450711181859y5711b63dje9a2e183204d4b37@mail.gmail.com> On Nov 17, 2007 7:39 AM, Jens Kraemer wrote: > > 3. Ferret doesn't yet support compressed indexes. > > At least from the docs it looks like it does, see > http://ferret.davebalmain.com/api/classes/Ferret/Index/FieldInfo.html . > I didn't ever try this out however. Yes, it's in the API, but there's no code for it yet. > > I was nervous about tackling Solr, but I've found it quite easy to > > use, and the built-in caching and multithreading make it fast. > > numbers, please :-) I make no claim that it's faster than Ferret, but it's fast enough. > Having that said, if my application's main concern would be search, I > most probably wouldn't choose any pre-cooked solution like aaf or Solr, > but build exactly the thing I need from scratch, basing it either on > Lucene or Ferret. But maybe that's just me ;-) I'd like to do that, but I lack sufficient time and skill. :) In the mean time, I'm hoping Solr will let me offer an open search API to my users without too much extra effort on my part. We'll see how it goes; I may end up back on Ferret at some point. -Stuart From tvollmer at codemart.de Tue Nov 20 07:17:30 2007 From: tvollmer at codemart.de (Till Vollmer) Date: Tue, 20 Nov 2007 13:17:30 +0100 Subject: [Ferret-talk] Compound search / grouping Message-ID: Hi, Following problem: We have a tree structure with children and a root element (recursivly) stored in one table (imagine a threaded forum). Each of the children has a title which should be indexed by ferret. Now we want to make a search that returns only the root and searches all items. So if one node has "expensive" and nother node has "car" I want to enter "expensive car" in search and still find the root of all children (and only once!) Also paging should work as well. Any clues how to achieve that? Regards Till -- Posted via http://www.ruby-forum.com/. From cstrom at mdlogix.com Tue Nov 20 07:52:29 2007 From: cstrom at mdlogix.com (Chris Strom) Date: Tue, 20 Nov 2007 07:52:29 -0500 Subject: [Ferret-talk] Compound search / grouping In-Reply-To: References: Message-ID: <20071120125229.GC5312@jaynestown.users.mdlogix.com> On Tue, Nov 20, 2007 at 01:17:30PM +0100, Till Vollmer wrote: > Hi, > > Following problem: > > We have a tree structure with children and a root element (recursivly) > stored in one table (imagine a threaded forum). > > Each of the children has a title which should be indexed by ferret. > > Now we want to make a search that returns only the root and searches all > items. > > > So if one node has "expensive" and nother node has "car" I want to enter > "expensive car" in search and still find the root of all children (and > only once!) > > Also paging should work as well. > > Any clues how to achieve that? An instance method in the root class to the effect of children_titles_with_spaces would get you this. That method would return "expensive car" given your simple, two-node example, which would be indexable with the normal analyzer. -Chris From tvollmer at codemart.de Tue Nov 20 08:01:39 2007 From: tvollmer at codemart.de (Till Vollmer) Date: Tue, 20 Nov 2007 14:01:39 +0100 Subject: [Ferret-talk] Compound search / grouping In-Reply-To: <20071120125229.GC5312@jaynestown.users.mdlogix.com> References: <20071120125229.GC5312@jaynestown.users.mdlogix.com> Message-ID: Hi, Thank you for the clue. Ok, like a virtual attribute. Works technically but: Downside: How often is that called ? Our tree has e.g. 200 children. This means that the children are collected on every change of one of the children (index) or? Any other ideas? Regards Till Am 20.11.2007 um 13:52 schrieb Chris Strom: > On Tue, Nov 20, 2007 at 01:17:30PM +0100, Till Vollmer wrote: >> Hi, >> >> Following problem: >> >> We have a tree structure with children and a root element >> (recursivly) >> stored in one table (imagine a threaded forum). >> >> Each of the children has a title which should be indexed by ferret. >> >> Now we want to make a search that returns only the root and >> searches all >> items. >> >> >> So if one node has "expensive" and nother node has "car" I want to >> enter >> "expensive car" in search and still find the root of all children >> (and >> only once!) >> >> Also paging should work as well. >> >> Any clues how to achieve that? > > An instance method in the root class to the effect of > children_titles_with_spaces would get you this. That method would > return > "expensive car" given your simple, two-node example, which would be > indexable with the normal analyzer. > > -Chris > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk Codemart GmbH Till Vollmer Managing Director Tel: +49 (0)89 1213 5359 Mob: + 49 (0)160 718 7403 Fax: +49 (0)89 1892 1347 Yahoo ID: till_vollmer Skype: till_vollmer www.codemart.de till.vollmer at codemart.de -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20071120/936d5394/attachment-0001.html From cstrom at mdlogix.com Tue Nov 20 08:36:09 2007 From: cstrom at mdlogix.com (Chris Strom) Date: Tue, 20 Nov 2007 08:36:09 -0500 Subject: [Ferret-talk] Compound search / grouping In-Reply-To: References: <20071120125229.GC5312@jaynestown.users.mdlogix.com> Message-ID: <20071120133609.GD5312@jaynestown.users.mdlogix.com> If you are using acts_as_ferret, it will never get called. The acts_as_ferret declaration would go on the root class. Updates to the child classes would not trigger an aaf index update in the root class. If you want to real-time index updates, you would have to add an after_save callback to the child class that forces an aaf update in the root class. If real-time updates are not too important, then you could dump the child updates into a queue that performs bulk updates. This would minimize the number of times this method gets called. If you're worried about 200+ SQL calls, don't perform the join in ruby, do it via SQL using CONCAT and "Advanced Attribute" as described in AWDWR, 19.3. -Chris On Tue, Nov 20, 2007 at 02:01:39PM +0100, Till Vollmer wrote: > Hi, > Thank you for the clue. > Ok, like a virtual attribute. Works technically but: > Downside: How often is that called ? Our tree has e.g. 200 children. This > means that the children are collected on every change of one of the > children (index) or? > Any other ideas? > Regards > Till > > > Am 20.11.2007 um 13:52 schrieb Chris Strom: > >> On Tue, Nov 20, 2007 at 01:17:30PM +0100, Till Vollmer wrote: >>> Hi, >>> >>> Following problem: >>> >>> We have a tree structure with children and a root element (recursivly) >>> stored in one table (imagine a threaded forum). >>> >>> Each of the children has a title which should be indexed by ferret. >>> >>> Now we want to make a search that returns only the root and searches all >>> items. >>> >>> >>> So if one node has "expensive" and nother node has "car" I want to enter >>> "expensive car" in search and still find the root of all children (and >>> only once!) >>> >>> Also paging should work as well. >>> >>> Any clues how to achieve that? >> >> An instance method in the root class to the effect of >> children_titles_with_spaces would get you this. That method would return >> "expensive car" given your simple, two-node example, which would be >> indexable with the normal analyzer. >> >> -Chris >> _______________________________________________ >> Ferret-talk mailing list >> Ferret-talk at rubyforge.org >> http://rubyforge.org/mailman/listinfo/ferret-talk > > Codemart GmbH > Till Vollmer > Managing Director > Tel: +49 (0)89 1213 5359 > Mob: + 49 (0)160 718 7403 > Fax: +49 (0)89 1892 1347 > Yahoo ID: till_vollmer > Skype: till_vollmer > www.codemart.de > till.vollmer at codemart.de > > > > > > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk From tvollmer at codemart.de Tue Nov 20 09:08:52 2007 From: tvollmer at codemart.de (Till Vollmer) Date: Tue, 20 Nov 2007 15:08:52 +0100 Subject: [Ferret-talk] Compound search / grouping In-Reply-To: <20071120133609.GD5312@jaynestown.users.mdlogix.com> References: <20071120125229.GC5312@jaynestown.users.mdlogix.com> <20071120133609.GD5312@jaynestown.users.mdlogix.com> Message-ID: Hi, Thank you for your answer. The root and the nodes are in the same table for us. Is there no "group_by" or something for ferret? That would probably make the deal. Regards Till Am 20.11.2007 um 14:36 schrieb Chris Strom: > If you are using acts_as_ferret, it will never get called. The > acts_as_ferret declaration would go on the root class. Updates to the > child classes would not trigger an aaf index update in the root class. > > If you want to real-time index updates, you would have to add an > after_save callback to the child class that forces an aaf update in > the > root class. > > If real-time updates are not too important, then you could dump the > child > updates into a queue that performs bulk updates. This would > minimize the > number of times this method gets called. > > If you're worried about 200+ SQL calls, don't perform the join in > ruby, do > it via SQL using CONCAT and "Advanced Attribute" as described in > AWDWR, > 19.3. > > -Chris > > On Tue, Nov 20, 2007 at 02:01:39PM +0100, Till Vollmer wrote: >> Hi, >> Thank you for the clue. >> Ok, like a virtual attribute. Works technically but: >> Downside: How often is that called ? Our tree has e.g. 200 >> children. This >> means that the children are collected on every change of one of the >> children (index) or? >> Any other ideas? >> Regards >> Till >> >> >> Am 20.11.2007 um 13:52 schrieb Chris Strom: >> >>> On Tue, Nov 20, 2007 at 01:17:30PM +0100, Till Vollmer wrote: >>>> Hi, >>>> >>>> Following problem: >>>> >>>> We have a tree structure with children and a root element >>>> (recursivly) >>>> stored in one table (imagine a threaded forum). >>>> >>>> Each of the children has a title which should be indexed by ferret. >>>> >>>> Now we want to make a search that returns only the root and >>>> searches all >>>> items. >>>> >>>> >>>> So if one node has "expensive" and nother node has "car" I want >>>> to enter >>>> "expensive car" in search and still find the root of all children >>>> (and >>>> only once!) >>>> >>>> Also paging should work as well. >>>> >>>> Any clues how to achieve that? >>> >>> An instance method in the root class to the effect of >>> children_titles_with_spaces would get you this. That method would >>> return >>> "expensive car" given your simple, two-node example, which would be >>> indexable with the normal analyzer. >>> >>> -Chris >>> _______________________________________________ >>> Ferret-talk mailing list >>> Ferret-talk at rubyforge.org >>> http://rubyforge.org/mailman/listinfo/ferret-talk >> >> Codemart GmbH >> Till Vollmer >> Managing Director >> Tel: +49 (0)89 1213 5359 >> Mob: + 49 (0)160 718 7403 >> Fax: +49 (0)89 1892 1347 >> Yahoo ID: till_vollmer >> Skype: till_vollmer >> www.codemart.de >> till.vollmer at codemart.de >> >> >> >> >> > >> _______________________________________________ >> Ferret-talk mailing list >> Ferret-talk at rubyforge.org >> http://rubyforge.org/mailman/listinfo/ferret-talk > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk Codemart GmbH Till Vollmer Managing Director Tel: +49 (0)89 1213 5359 Mob: + 49 (0)160 718 7403 Fax: +49 (0)89 1892 1347 Yahoo ID: till_vollmer Skype: till_vollmer www.codemart.de till.vollmer at codemart.de -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20071120/1fd65759/attachment.html From cstrom at mdlogix.com Tue Nov 20 09:45:57 2007 From: cstrom at mdlogix.com (Chris Strom) Date: Tue, 20 Nov 2007 09:45:57 -0500 Subject: [Ferret-talk] Compound search / grouping In-Reply-To: References: <20071120125229.GC5312@jaynestown.users.mdlogix.com> <20071120133609.GD5312@jaynestown.users.mdlogix.com> Message-ID: <20071120144557.GE5312@jaynestown.users.mdlogix.com> If it's all the same class, then don't index children: def ferret_enabled?(is_rebuild = false) @ferret_disabled.nil? && self.root? end Only root nodes would get indexed with the above method (and an appropriate root? definition) in place. AFAIK, there is no group_by concept in aaf. At the same time, I don't think it's really necessary. The above, combined with a single method definition for children_titles_with_spaces should get you exactly what you're looking to do. -Chris On Tue, Nov 20, 2007 at 03:08:52PM +0100, Till Vollmer wrote: > Hi, > Thank you for your answer. > > The root and the nodes are in the same table for us. > Is there no "group_by" or something for ferret? That would probably make > the deal. > > Regards > Till > > Am 20.11.2007 um 14:36 schrieb Chris Strom: > >> If you are using acts_as_ferret, it will never get called. The >> acts_as_ferret declaration would go on the root class. Updates to the >> child classes would not trigger an aaf index update in the root class. >> >> If you want to real-time index updates, you would have to add an >> after_save callback to the child class that forces an aaf update in the >> root class. >> >> If real-time updates are not too important, then you could dump the child >> updates into a queue that performs bulk updates. This would minimize the >> number of times this method gets called. >> >> If you're worried about 200+ SQL calls, don't perform the join in ruby, do >> it via SQL using CONCAT and "Advanced Attribute" as described in AWDWR, >> 19.3. >> >> -Chris >> >> On Tue, Nov 20, 2007 at 02:01:39PM +0100, Till Vollmer wrote: >>> Hi, >>> Thank you for the clue. >>> Ok, like a virtual attribute. Works technically but: >>> Downside: How often is that called ? Our tree has e.g. 200 children. This >>> means that the children are collected on every change of one of the >>> children (index) or? >>> Any other ideas? >>> Regards >>> Till >>> >>> >>> Am 20.11.2007 um 13:52 schrieb Chris Strom: >>> >>>> On Tue, Nov 20, 2007 at 01:17:30PM +0100, Till Vollmer wrote: >>>>> Hi, >>>>> >>>>> Following problem: >>>>> >>>>> We have a tree structure with children and a root element (recursivly) >>>>> stored in one table (imagine a threaded forum). >>>>> >>>>> Each of the children has a title which should be indexed by ferret. >>>>> >>>>> Now we want to make a search that returns only the root and searches >>>>> all >>>>> items. >>>>> >>>>> >>>>> So if one node has "expensive" and nother node has "car" I want to >>>>> enter >>>>> "expensive car" in search and still find the root of all children (and >>>>> only once!) >>>>> >>>>> Also paging should work as well. >>>>> >>>>> Any clues how to achieve that? >>>> >>>> An instance method in the root class to the effect of >>>> children_titles_with_spaces would get you this. That method would >>>> return >>>> "expensive car" given your simple, two-node example, which would be >>>> indexable with the normal analyzer. >>>> >>>> -Chris >>>> _______________________________________________ >>>> Ferret-talk mailing list >>>> Ferret-talk at rubyforge.org >>>> http://rubyforge.org/mailman/listinfo/ferret-talk >>> >>> Codemart GmbH >>> Till Vollmer >>> Managing Director >>> Tel: +49 (0)89 1213 5359 >>> Mob: + 49 (0)160 718 7403 >>> Fax: +49 (0)89 1892 1347 >>> Yahoo ID: till_vollmer >>> Skype: till_vollmer >>> www.codemart.de >>> till.vollmer at codemart.de >>> >>> >>> >>> >>> >> >>> _______________________________________________ >>> Ferret-talk mailing list >>> Ferret-talk at rubyforge.org >>> http://rubyforge.org/mailman/listinfo/ferret-talk >> _______________________________________________ >> Ferret-talk mailing list >> Ferret-talk at rubyforge.org >> http://rubyforge.org/mailman/listinfo/ferret-talk > > Codemart GmbH > Till Vollmer > Managing Director > Tel: +49 (0)89 1213 5359 > Mob: + 49 (0)160 718 7403 > Fax: +49 (0)89 1892 1347 > Yahoo ID: till_vollmer > Skype: till_vollmer > www.codemart.de > till.vollmer at codemart.de > > > > > > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk From smaloff at veer.com Tue Nov 20 12:07:00 2007 From: smaloff at veer.com (Sheldon Maloff) Date: Tue, 20 Nov 2007 18:07:00 +0100 Subject: [Ferret-talk] Question on Deploying a Ferret DRb server Message-ID: (Sorry if some people see this twice. I originally posted this question from ruby-forum.com, but didn't realize that the Ferret forum was a mirror and that I actually wasn't a member.) Anyway, I've read all the documentation I could find, and read most of this forum, but I'm a still a little confused on running a ferret DRb server. All the examples seem to be from the point of view of running the DRb server from within the context of a RoR web site. I'd like to consider the following scenario: Server 1: front-end web server + mongrel cluster Server 2: Ferret DRb server Server 3: MySQL database My question is related to Server 2? Exactly what is it that I have to deploy to that computer to have a Ferret DRb server? I understand that on Server 1, ferret_server.yml should have a production entry that points to Server 2, and that all the models on Server 1 need :remote => :true. But what lives on server 2? Do I just deploy the models folder, the config folder and the scripts folder? Or do I deploy an entire copy of the web site code? I haven't found the answer to that in anything I've read, so I thought I'd ask here. Thanks, Sheldon Maloff -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Tue Nov 20 13:13:46 2007 From: kraemer at webit.de (Jens Kraemer) Date: Tue, 20 Nov 2007 19:13:46 +0100 Subject: [Ferret-talk] Question on Deploying a Ferret DRb server In-Reply-To: References: Message-ID: <20071120181346.GN29982@cordoba.webit.de> Hi! On Tue, Nov 20, 2007 at 06:07:00PM +0100, Sheldon Maloff wrote: > (Sorry if some people see this twice. I originally posted this question > from ruby-forum.com, but didn't realize that the Ferret forum was a > mirror and that I actually wasn't a member.) > > Anyway, > > I've read all the documentation I could find, and read most of this > forum, but I'm a still a little confused on running a ferret DRb server. > > All the examples seem to be from the point of view of running the DRb > server from within the context of a RoR web site. I'd like to consider > the following scenario: > > Server 1: front-end web server + mongrel cluster > Server 2: Ferret DRb server > Server 3: MySQL database > > My question is related to Server 2? Exactly what is it that I have to > deploy to that computer to have a Ferret DRb server? I understand that > on Server 1, ferret_server.yml should have a production entry that > points to Server 2, and that all the models on Server 1 need :remote => > :true. But what lives on server 2? Do I just deploy the models folder, > the config folder and the scripts folder? Or do I deploy an entire copy > of the web site code? The DRb server needs at least access to your model classes, I think the easiest way is to just deploy a copy of your whole Rails app to the DRb server. Of course you won't need your views there, but I'd find it easier to just deploy the whole app to a second place with capistrano than manually ripping the parts needed for DRb off. Btw, you don't need :remote => true anymore with the current release of aaf. Just configure your ferret_server.yml for production environment. Cheers, Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From smaloff at veer.com Tue Nov 20 13:31:20 2007 From: smaloff at veer.com (Sheldon Maloff) Date: Tue, 20 Nov 2007 19:31:20 +0100 Subject: [Ferret-talk] Question on Deploying a Ferret DRb server In-Reply-To: <20071120181346.GN29982@cordoba.webit.de> References: <20071120181346.GN29982@cordoba.webit.de> Message-ID: Jens Kraemer wrote: > Of course you won't need your views there, but I'd find it easier to > just deploy the whole app to a second place with capistrano than > manually ripping the parts needed for DRb off. Thanks Jens. That's what I understood by reading everything I could. I just wanted confirmation that that was the recommended practice. Keep up the excellent work on AAF. Cheers, Sheldon Maloff -- Posted via http://www.ruby-forum.com/. From scottd at gmail.com Wed Nov 21 14:53:51 2007 From: scottd at gmail.com (Scott Davies) Date: Wed, 21 Nov 2007 11:53:51 -0800 Subject: [Ferret-talk] Multithreading / multiprocessing woes In-Reply-To: References: <75f591160711160256i68802afcg8af5ae4e95c67636@mail.gmail.com> <75f591160711161235g692622c5p4b7fb907e9596cdb@mail.gmail.com> <8C5DFF2D-440E-4A5A-8D25-5CA965938D02@ehatchersolutions.com> <75f591160711170212q1d4d6475v3e830a64ff4c3dc2@mail.gmail.com> Message-ID: <75f591160711211153w460068acr931b7b0408368da6@mail.gmail.com> For the record, while Lucene is pretty well-behaved as far as I can tell, DRb running under JRuby is not. When hit with multiple request streams simultaneously, DRb under JRuby 1.0.2 very quickly falls over and stops responding to all queries. DRb under JRuby 1.1b1 *almost* works, but every now and then JRuby will freak out and for a few requests things will fail in very strange ways. (Attempts to construct Java objects will fail with exceptions such as "undefined method `constructors' for nil:NilClass" or "undefined method `java_class' for Class:Class"; sometimes looking up a class will fail...) On the plus side, I do get the impression that JRuby development is pretty active, and I see some concurrency bugs listed as high-priority for JRuby 1.1, some of which have already been patched in the trunk. My guess is that JRuby+Lucene+DRb will be a fine choice in a few months...it was actually pretty painless to set up, even with MLI Ruby RoR clients talking to a JRuby indexing server. (I have a simple metaprogramming hack that lets the client specify a sequence of code to execute on the server side, where the specification looks *almost* like normal Ruby code; this effectively lets me easily construct gnarly Lucene query trees in MLI Ruby clients that know nothing about Lucene or Java. I actually initially came up with this hack to work around Ferret's "query trees and filters don't marshal" issue.) JRuby's not ready for serious use in scenarios with concurrency just yet, though. Meanwhile, I'm hoping to avoid Solr because it seems (1) kind of complicated for what I'd actually get out of it in my particular application, (2) not particularly well-documented given its size, and (3) likely to get in my way when I want to do anything low-level and gnarly with Lucene. I guess I'll continue limping along with Ferret for the moment and hope the concurrency issues get worked out soonish. Has anyone actually decided specifically to make Ferret bulletproof in the face of concurrency over the next few months, or is it probably just not going to happen? If it doesn't, I suspect Ferret will probably fall by the wayside as more Ruby people jump ship for Lucene-based solutions. Which would be a shame, because Ferret does hold a lot of promise...indexing is hard, and Ferret is *almost* a great solution. (Too bad the last 20% is usually 80% of the work...) -- Scott On Nov 18, 2007 4:45 PM, Julio Cesar Ody wrote: > Great. For my own curiosity, and maybe people here share some of it: > > Is it possible to write your own custom analyzers for Solr? If so, how > easy it is? Can one do that in Ruby or do I have to write it in Java? > > I personally think that's one of the greatest things about Ferret. So > far I haven't bothered looking into Sphinx or Solr precisely because, > from a glance, I couldn't find a way to customize anything in detail > like I can do with Ferret. I assume there is a way... > > Thing is, reading through the Ferret booklet (the one from OReilly), > you get a glimpse of how easy it is to build custom solutions using > it. So whereas it's kind of sad that the lead developer has been > distant from the project in the last few months (?), I have to say, > there's hardly matching how easy it is to work with it. > > > > > On Nov 18, 2007 8:29 PM, Erik Hatcher wrote: > > > > On Nov 17, 2007, at 5:12 AM, Scott Davies wrote: > > > Hmmm...I'd first heard of Solr only a couple of days ago, and I hadn't > > > been aware of a Ruby API to it until you mentioned it. > > > Interesting...thanks! > > > > I've honestly given fairly little of my time to Ferret, though I have > > tinkered with it some and it is mighty fine! > > > > Believe you me, I don't want to steal any thunder from Ferret. And > > I've not compared/contrasted them much myself. Truth be told I'm > > still a Java dude, and knowing that Lucene and Solr are in Java, > > excel at what they are designed to do and already gulping the Apache > > cool-ade I really dig Solr. > > > > I've presented solr+ruby a couple of times now, once at RailsConf and > > then again a few weeks ago at rubyconf. > > > > RailsConf: > > > > > > rubyconf: > > > > > > acts_as_solr as it exists today is sub-optimal compared to > > acts_as_ferret. I'm quite admittedly not much into relational > > databases so I have only tinkered in this area myself. > > > > Erik > > > > _______________________________________________ > > Ferret-talk mailing list > > Ferret-talk at rubyforge.org > > http://rubyforge.org/mailman/listinfo/ferret-talk > > > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > From erik at ehatchersolutions.com Wed Nov 21 15:24:51 2007 From: erik at ehatchersolutions.com (Erik Hatcher) Date: Wed, 21 Nov 2007 15:24:51 -0500 Subject: [Ferret-talk] Multithreading / multiprocessing woes In-Reply-To: <75f591160711211153w460068acr931b7b0408368da6@mail.gmail.com> References: <75f591160711160256i68802afcg8af5ae4e95c67636@mail.gmail.com> <75f591160711161235g692622c5p4b7fb907e9596cdb@mail.gmail.com> <8C5DFF2D-440E-4A5A-8D25-5CA965938D02@ehatchersolutions.com> <75f591160711170212q1d4d6475v3e830a64ff4c3dc2@mail.gmail.com> <75f591160711211153w460068acr931b7b0408368da6@mail.gmail.com> Message-ID: <40DFBD5A-8552-4639-8599-979BABA927F6@ehatchersolutions.com> On Nov 21, 2007, at 2:53 PM, Scott Davies wrote: > My guess is that JRuby+Lucene+DRb will be a fine choice in a few > months... Definitely not a bad choice. However I still implore you to give Solr another chance. More on that.... > Meanwhile, I'm hoping to avoid Solr because it seems (1) kind of > complicated for what I'd actually get out of it in my particular > application How so? It's a "search server" with the same goals that I imagine you'd have for the JRuby+Lucene+DRb combination. It's not really complicated, especially with the solr-ruby library. Add documents, delete them, query for them. Leverage highlighting and more-like these features, dismax querying, etc. > , (2) not particularly well-documented given its size Wow. Have you seen the Solr wiki? http://wiki.apache.org/solr - there are nooks and crannies documented on that wiki that go well beyond what I'd consider good documentation. By all means point me to areas that aren't documented that you need to know (off list) and I'll get those taken care of. > (3) likely to get in my way when I want to do anything low-level and > gnarly with Lucene. Maybe, but not much in your way. You'd have to wrap your low-level mojo inside some Solr API perhaps, but not even if we're just talking about custom analyzers or similarity implementation. > Which would be a shame, because Ferret does hold a lot of > promise.. hear hear! I definitely extend major kudos to Dave and the other Ferret contributors. Great stuff. Erik From scottd at gmail.com Wed Nov 21 17:04:51 2007 From: scottd at gmail.com (Scott Davies) Date: Wed, 21 Nov 2007 14:04:51 -0800 Subject: [Ferret-talk] Multithreading / multiprocessing woes In-Reply-To: <40DFBD5A-8552-4639-8599-979BABA927F6@ehatchersolutions.com> References: <75f591160711160256i68802afcg8af5ae4e95c67636@mail.gmail.com> <75f591160711161235g692622c5p4b7fb907e9596cdb@mail.gmail.com> <8C5DFF2D-440E-4A5A-8D25-5CA965938D02@ehatchersolutions.com> <75f591160711170212q1d4d6475v3e830a64ff4c3dc2@mail.gmail.com> <75f591160711211153w460068acr931b7b0408368da6@mail.gmail.com> <40DFBD5A-8552-4639-8599-979BABA927F6@ehatchersolutions.com> Message-ID: <75f591160711211404r68c831f5p85a107b240f1b86b@mail.gmail.com> On Nov 21, 2007 12:24 PM, Erik Hatcher wrote: > > How so? It's a "search server" with the same goals that I imagine > you'd have for the JRuby+Lucene+DRb combination. It's a bit more than I need right out of the gate, what with the caching, replication, faceted search, etc. Of course, that might not be a problem if it uses sensible configuration defaults I can safely ignore to start with. > It's not really complicated, especially with the solr-ruby library. > Add documents, delete them, query for them. Leverage highlighting > and more-like these features, dismax querying, etc. My particular application does enough weird things that, for the most part, I'd prefer unfettered access to the low-level Lucene APIs. (For example, my application uses a lot of gnarly query trees involving filters and ranges, and I'm not sure whether those are easily transmitted through the Solr APIs. Then I have "run all of these queries against each of the documents in this specific set and tell me which document/query pairs match in one fell swoop" routines, in which case it might be a good idea to copy the documents into a temporary RAM index to run the queries against.) > > > , (2) not particularly well-documented given its size > > Wow. Have you seen the Solr wiki? http://wiki.apache.org/solr - > there are nooks and crannies documented on that wiki that go well > beyond what I'd consider good documentation. > > By all means point me to areas that aren't documented that you need > to know (off list) and I'll get those taken care of. Wikis are fine for looking up details when you already mostly know what you're doing, but they're not nearly as useful when you're in the earlier stages trying to get the big "What does this system look like and how does it work?" picture and evaluate initial plans of attack. Ferret and Lucene both have entire *books* written about them that are *excellent* for those purposes. (They're not free-as-in-beer, but are well worth the cost.) By comparison, Solr has a very simple "here is how you get a straightforward app off the ground" tutorial that says little about how Solr is actually organized, and then you're basicaly left staring at a Wiki page with a thousand bullet points and no clear path to big-picture enlightenment. And given the choice between (1) using a lower-level system that's been very well-documented in a well-organized explanatory fashion and (2) using a slightly higher-level system I still haven't acquired a mental "big picture" for, I generally find (1) more productive. This isn't a criticism of Solr's documentation nearly as much as a hearty "Book-style documentation is useful, and, holy crap, Ferret and Lucene actually HAVE IT. Woohoo!", plus an added bonus testament to my own laziness. > > (3) likely to get in my way when I want to do anything low-level and > > gnarly with Lucene. > > Maybe, but not much in your way. You'd have to wrap your low-level > mojo inside some Solr API perhaps, but not even if we're just talking > about custom analyzers or similarity implementation. Yeah, my guess is that if I sit down and figure out how Solr is laid out, adding APIs to do what I want won't be too hard. Might still be kind of tedious implementing all the necessary marshaling, though. -- Scott From me at benjaminarai.com Sat Nov 24 17:30:56 2007 From: me at benjaminarai.com (Benjamin Arai) Date: Sat, 24 Nov 2007 14:30:56 -0800 Subject: [Ferret-talk] Getting a Lucene.net index readable by Ferret Message-ID: <4748A620.2070608@benjaminarai.com> Hi, What would it take to get the a Lucene.net index readable by Ferret? I know that there has been discussion on this before but I am trying to figure the actual amount of work (cost) would be required to get this done. Any help would be greatly appreciated. Benjamin From jk at jkraemer.net Mon Nov 26 16:11:26 2007 From: jk at jkraemer.net (=?utf-8?Q?Jens_Kr=c3=a4mer?=) Date: Mon, 26 Nov 2007 22:11:26 +0100 Subject: [Ferret-talk] search not working after upgrade In-Reply-To: References: <3d7046ce8dad1b0bc671415dc978587a@ruby-forum.com> Message-ID: <3d32bbb7d2da1727835043216042065a@ruby-forum.com> Izit Izit wrote: > Correction on my previous post. > > The correct way to do it is: > > Product.find_by_contents("*",{},:conditions =>search_conditions,:include > => [:supplier],:order =>"products.id" ) > > Leave out the :limit=>:all that is put in by default. Exactly - I tried to make aaf a bit more clever by letting it assume :limit => :all whenever sql conditions are given, but messed it up somehow ;-) It's fixed in trunk (http://projects.jkraemer.net/acts_as_ferret/changeset/286), or just apply the attached patch. Btw, this whole thread hasn't come through to the mailing list (yet?), I discovered it by pure chance. Please subscribe to the ferret mailing list (http://rubyforge.org/mail/?group_id=1028) and post there directly to make sure your posting gets actually read. Cheers, Jens Attachments: http://www.ruby-forum.com/attachment/1044/fix_limit_all.diff -- Posted via http://www.ruby-forum.com/. From john at digitalpulp.com Tue Nov 27 21:11:35 2007 From: john at digitalpulp.com (John Bachir) Date: Tue, 27 Nov 2007 21:11:35 -0500 Subject: [Ferret-talk] flakey web-list interface (was: search not working after upgrade) In-Reply-To: <3d32bbb7d2da1727835043216042065a@ruby-forum.com> References: <3d7046ce8dad1b0bc671415dc978587a@ruby-forum.com> <3d32bbb7d2da1727835043216042065a@ruby-forum.com> Message-ID: On Nov 26, 2007, at 4:11 PM, Jens Kr?mer wrote: > Btw, this whole thread hasn't come through to the mailing list > (yet?), I > discovered it by pure chance. Please subscribe to the ferret mailing > list (http://rubyforge.org/mail/?group_id=1028) and post there > directly > to make sure your posting gets actually read. Jens- I see this happy a lot on rubyforge-- is it because it only brings email in from the web interface when the poster is subscribed? Or is it just flakey software? Do you have any insight into how we might be able to get rubyforge to either address or document this issue? John From kraemer at webit.de Wed Nov 28 04:14:40 2007 From: kraemer at webit.de (Jens Kraemer) Date: Wed, 28 Nov 2007 10:14:40 +0100 Subject: [Ferret-talk] flakey web-list interface (was: search not working after upgrade) In-Reply-To: References: <3d7046ce8dad1b0bc671415dc978587a@ruby-forum.com> <3d32bbb7d2da1727835043216042065a@ruby-forum.com> Message-ID: <20071128091440.GI5751@cordoba.webit.de> On Tue, Nov 27, 2007 at 09:11:35PM -0500, John Bachir wrote: > > On Nov 26, 2007, at 4:11 PM, Jens Kr?mer wrote: > > Btw, this whole thread hasn't come through to the mailing list > > (yet?), I > > discovered it by pure chance. Please subscribe to the ferret mailing > > list (http://rubyforge.org/mail/?group_id=1028) and post there > > directly > > to make sure your posting gets actually read. > > Jens- > > I see this happy a lot on rubyforge-- is it because it only brings > email in from the web interface when the poster is subscribed? Or is > it just flakey software? Do you have any insight into how we might be > able to get rubyforge to either address or document this issue? I'm not sure why this happens, maybe some spam prevention kicks in, or it's the way you said, that it only accepts messages from people subscribed to the mailing list. I'll try and ask Andreas Schwarz, the creator of ruby-forum.com, about this. Cheers, Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa From tvollmer at codemart.de Wed Nov 28 04:25:35 2007 From: tvollmer at codemart.de (Till Vollmer) Date: Wed, 28 Nov 2007 10:25:35 +0100 Subject: [Ferret-talk] Ferret on Mac OS X Leopard In-Reply-To: <20071128091440.GI5751@cordoba.webit.de> References: <3d7046ce8dad1b0bc671415dc978587a@ruby-forum.com> <3d32bbb7d2da1727835043216042065a@ruby-forum.com> <20071128091440.GI5751@cordoba.webit.de> Message-ID: <9ED901E2-1E96-4614-AEF2-F67E10B75E54@codemart.de> Hello, I have some major problems on installing ferret on Leopard. While I know its already installed when you install Leopard I want to do it manually as I am not using the installed version of ruby (since I migrated from Tiger). When the native extensions are compiled I get some linker problems. Can anyone reproduce that? I have an older version of ferret installed (which I installed while being on Tiger) and this works fine, but I want to upgrade. Regards Till From andreas.korth at gmail.com Wed Nov 28 07:36:05 2007 From: andreas.korth at gmail.com (Andreas Korth) Date: Wed, 28 Nov 2007 13:36:05 +0100 Subject: [Ferret-talk] Ferret on Mac OS X Leopard In-Reply-To: <9ED901E2-1E96-4614-AEF2-F67E10B75E54@codemart.de> References: <3d7046ce8dad1b0bc671415dc978587a@ruby-forum.com> <3d32bbb7d2da1727835043216042065a@ruby-forum.com> <20071128091440.GI5751@cordoba.webit.de> <9ED901E2-1E96-4614-AEF2-F67E10B75E54@codemart.de> Message-ID: On 28.11.2007, at 10:25, Till Vollmer wrote: > I have some major problems on installing ferret on Leopard. Interestingly, the words "major problems" and "Leopard" coincide a lot lately. Here's my advice: - get rid of the probably buggiest piece of software that apple has ever shipped - go back to tiger and stay there for at least a couple of major updates - oh and if you made the same mistake and updated tiger instead of a fresh install, I'm very sorry for you ;) > When the native extensions are compiled I get some linker problems. If you can't resist the temptation of leopard's amazingly great feature set - make sure you install the latest ruby version as well as any other package via macports - have apple's developer tools installed in advance (includes gcc + build tools) - a talisman might probably be helpful Best of luck, Andy From ndaniels at mac.com Wed Nov 28 13:36:24 2007 From: ndaniels at mac.com (Noah M. Daniels) Date: Wed, 28 Nov 2007 13:36:24 -0500 Subject: [Ferret-talk] Ferret on Mac OS X Leopard In-Reply-To: References: <3d7046ce8dad1b0bc671415dc978587a@ruby-forum.com> <3d32bbb7d2da1727835043216042065a@ruby-forum.com> <20071128091440.GI5751@cordoba.webit.de> <9ED901E2-1E96-4614-AEF2-F67E10B75E54@codemart.de> Message-ID: <9952A4BA-F039-4738-9AF6-0186D6DCBAAF@mac.com> On Nov 28, 2007, at 7:36 AM, Andreas Korth wrote: > > On 28.11.2007, at 10:25, Till Vollmer wrote: > >> I have some major problems on installing ferret on Leopard. > > Interestingly, the words "major problems" and "Leopard" coincide a lot > lately. > > Here's my advice: > > - get rid of the probably buggiest piece of software that apple has > ever shipped > > If you can't resist the temptation of leopard's amazingly great > feature set > > - make sure you install the latest ruby version as well as any other > package via macports > - have apple's developer tools installed in advance (includes gcc + > build tools) > - a talisman might probably be helpful For what it's worth, I've had zero problems with the Apple-supplied ruby, rails, etc... I updated some gems but ferret is running great (better than on our Linux servers, in fact -- no end of problems on Ubuntu 7.04 x64). I would recommend sticking with the Apple-supplied ruby; for once they've gotten it right, and everything seems to work beautifully. From andreas.korth at gmail.com Thu Nov 29 07:01:21 2007 From: andreas.korth at gmail.com (Andreas Korth) Date: Thu, 29 Nov 2007 13:01:21 +0100 Subject: [Ferret-talk] Ferret on Mac OS X Leopard In-Reply-To: <9952A4BA-F039-4738-9AF6-0186D6DCBAAF@mac.com> References: <3d7046ce8dad1b0bc671415dc978587a@ruby-forum.com> <3d32bbb7d2da1727835043216042065a@ruby-forum.com> <20071128091440.GI5751@cordoba.webit.de> <9ED901E2-1E96-4614-AEF2-F67E10B75E54@codemart.de> <9952A4BA-F039-4738-9AF6-0186D6DCBAAF@mac.com> Message-ID: <4136BE39-04ED-4B11-B6C5-6472E17FDA05@gmail.com> On 28.11.2007, at 19:36, Noah M. Daniels wrote: >> Interestingly, the words "major problems" and "Leopard" coincide a >> lot >> lately. > For what it's worth, I've had zero problems with the Apple-supplied > ruby, rails, etc... Frankly, knowing that everything works well for others isn't worth much to people who _are_ having problems ;) But it appears to be a common reaction ? especially in the Apple community. > I would recommend sticking with the Apple-supplied > ruby; for once they've gotten it right, and everything seems to work > beautifully. You must have a different understanding of 'getting it right'. Here's what I got when I entered 'ruby -v' or 'gems' into the console of a fresh 10.5 install: -bash: ruby: command not found After getting it to work eventually, a 'gem update --system' just wrecked the whole Ruby installation. At that point I just gave up and installed Ruby/Gems and Rails via Macports. One thing I'd really like to know is how one is supposed to update the Ruby/Rails packages which shipped with Leopard. I had no chance to check, but are they still shipping Rails 1.1.2? I bet that Apple isn't going to update Ruby during the whole lifetime of Leopard. Anything else would be a big surprise. So here goes my advice again: Use Macports. Do not use whatever Apple ships. Cheers, Andy -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20071129/c1e5c6a6/attachment.html From ndaniels at mac.com Thu Nov 29 11:58:03 2007 From: ndaniels at mac.com (Noah M. Daniels) Date: Thu, 29 Nov 2007 11:58:03 -0500 Subject: [Ferret-talk] Ferret on Mac OS X Leopard In-Reply-To: <4136BE39-04ED-4B11-B6C5-6472E17FDA05@gmail.com> References: <3d7046ce8dad1b0bc671415dc978587a@ruby-forum.com> <3d32bbb7d2da1727835043216042065a@ruby-forum.com> <20071128091440.GI5751@cordoba.webit.de> <9ED901E2-1E96-4614-AEF2-F67E10B75E54@codemart.de> <9952A4BA-F039-4738-9AF6-0186D6DCBAAF@mac.com> <4136BE39-04ED-4B11-B6C5-6472E17FDA05@gmail.com> Message-ID: On Nov 29, 2007, at 7:01 AM, Andreas Korth wrote: > > On 28.11.2007, at 19:36, Noah M. Daniels wrote: > >>> Interestingly, the words "major problems" and "Leopard" coincide a >>> lot >>> lately. > >> For what it's worth, I've had zero problems with the Apple-supplied >> ruby, rails, etc... > > Frankly, knowing that everything works well for others isn't worth > much to people who _are_ having problems ;) > > But it appears to be a common reaction ? especially in the Apple > community. Point taken, but I was responding to the poster's statement that they'd avoided the apple-supplied Ruby. > > > You must have a different understanding of 'getting it right'. > Here's what I got when I entered 'ruby -v' or 'gems' into the > console of a fresh 10.5 install: > > -bash: ruby: command not found > That's very strange, but just sounds like a path issue. > After getting it to work eventually, a 'gem update --system' just > wrecked the whole Ruby installation. At that point I just gave up > and installed Ruby/Gems and Rails via Macports. > Yes, this one is a known issue. See these links: http://discussions.apple.com/thread.jspa?threadID=1200950&tstart=0 http://discussions.apple.com/thread.jspa?threadID=1202925&tstart=0 > One thing I'd really like to know is how one is supposed to update > the Ruby/Rails packages which shipped with Leopard. I had no chance > to check, but are they still shipping Rails 1.1.2? I bet that Apple > isn't going to update Ruby during the whole lifetime of Leopard. > Anything else would be a big surprise. > It comes with rails 1.2.3, and gem update rails updates it to 1.2.5 (well, now 1.2.6) just fine. > So here goes my advice again: Use Macports. Do not use whatever > Apple ships. I disagree in the most friendly way possible :) From marvin at rectangular.com Thu Nov 29 12:02:26 2007 From: marvin at rectangular.com (Marvin Humphrey) Date: Thu, 29 Nov 2007 09:02:26 -0800 Subject: [Ferret-talk] Ferret on Mac OS X Leopard In-Reply-To: <4136BE39-04ED-4B11-B6C5-6472E17FDA05@gmail.com> References: <3d7046ce8dad1b0bc671415dc978587a@ruby-forum.com> <3d32bbb7d2da1727835043216042065a@ruby-forum.com> <20071128091440.GI5751@cordoba.webit.de> <9ED901E2-1E96-4614-AEF2-F67E10B75E54@codemart.de> <9952A4BA-F039-4738-9AF6-0186D6DCBAAF@mac.com> <4136BE39-04ED-4B11-B6C5-6472E17FDA05@gmail.com> Message-ID: <931A7E6F-5D4E-4F92-BA5E-4296BF1F5446@rectangular.com> On Nov 29, 2007, at 4:01 AM, Andreas Korth wrote: > Here's what I got when I entered 'ruby -v' or 'gems' into the > console of a fresh 10.5 install: > > -bash: ruby: command not found Curious. Did you install the developer tools? Here's what I get in Terminal with a fresh install of 10.5 and XCode 3.0: /Users/marvin/ $ ruby -v ruby 1.8.6 (2007-06-07 patchlevel 36) [universal-darwin9.0] /Users/marvin/ $ which ruby /usr/bin/ruby Marvin Humphrey Rectangular Research http://www.rectangular.com/ From lebreeze at gmail.com Fri Nov 30 07:48:10 2007 From: lebreeze at gmail.com (Levent Ali) Date: Fri, 30 Nov 2007 12:48:10 +0000 Subject: [Ferret-talk] Cannot install ferret gem on Leopard Message-ID: <76685bc50711300448p646c8124q6f5a42ce28ff6946@mail.gmail.com> I have 0.11.3 installed When I try 0.11.6 or 0.11.5 I get the following output Building native extensions. This could take a while... ERROR: While executing gem ... (Gem::Installer::ExtensionBuildError) ERROR: Failed to build gem native extension. ruby extconf.rb install ferret creating Makefile make gcc -I. -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I. -fno-common -g -O2 -fno-common -pipe -fno-common -D_FILE_OFFSET_BITS=64 -c analysis.c gcc -I. -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I. -fno-common -g -O2 -fno-common -pipe -fno-common -D_FILE_OFFSET_BITS=64 -c api.c gcc -I. -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I. -fno-common -g -O2 -fno-common -pipe -fno-common -D_FILE_OFFSET_BITS=64 -c array.c gcc -I. -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I. -fno-common -g -O2 -fno-common -pipe -fno-common -D_FILE_OFFSET_BITS=64 -c bitvector.c gcc -I. -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I. -fno-common -g -O2 -fno-common -pipe -fno-common -D_FILE_OFFSET_BITS=64 -c compound_io.c gcc -I. -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I. -fno-common -g -O2 -fno-common -pipe -fno-common -D_FILE_OFFSET_BITS=64 -c document.c gcc -I. -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I. -fno-common -g -O2 -fno-common -pipe -fno-common -D_FILE_OFFSET_BITS=64 -c except.c gcc -I. -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I. -fno-common -g -O2 -fno-common -pipe -fno-common -D_FILE_OFFSET_BITS=64 -c ferret.c gcc -I. -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I. -fno-common -g -O2 -fno-common -pipe -fno-common -D_FILE_OFFSET_BITS=64 -c filter.c gcc -I. -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I. -fno-common -g -O2 -fno-common -pipe -fno-common -D_FILE_OFFSET_BITS=64 -c fs_store.c gcc -I. -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I. -fno-common -g -O2 -fno-common -pipe -fno-common -D_FILE_OFFSET_BITS=64 -c global.c gcc -I. -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I. -fno-common -g -O2 -fno-common -pipe -fno-common -D_FILE_OFFSET_BITS=64 -c hash.c gcc -I. -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I. -fno-common -g -O2 -fno-common -pipe -fno-common -D_FILE_OFFSET_BITS=64 -c hashset.c gcc -I. -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I. -fno-common -g -O2 -fno-common -pipe -fno-common -D_FILE_OFFSET_BITS=64 -c helper.c gcc -I. -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I. -fno-common -g -O2 -fno-common -pipe -fno-common -D_FILE_OFFSET_BITS=64 -c index.c gcc -I. -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I. -fno-common -g -O2 -fno-common -pipe -fno-common -D_FILE_OFFSET_BITS=64 -c libstemmer.c gcc -I. -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I. -fno-common -g -O2 -fno-common -pipe -fno-common -D_FILE_OFFSET_BITS=64 -c mempool.c gcc -I. -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I. -fno-common -g -O2 -fno-common -pipe -fno-common -D_FILE_OFFSET_BITS=64 -c multimapper.c gcc -I. -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I. -fno-common -g -O2 -fno-common -pipe -fno-common -D_FILE_OFFSET_BITS=64 -c posh.c gcc -I. -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I. -fno-common -g -O2 -fno-common -pipe -fno-common -D_FILE_OFFSET_BITS=64 -c priorityqueue.c gcc -I. -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I. -fno-common -g -O2 -fno-common -pipe -fno-common -D_FILE_OFFSET_BITS=64 -c q_boolean.c gcc -I. -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I. -fno-common -g -O2 -fno-common -pipe -fno-common -D_FILE_OFFSET_BITS=64 -c q_const_score.c gcc -I. -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I. -fno-common -g -O2 -fno-common -pipe -fno-common -D_FILE_OFFSET_BITS=64 -c q_filtered_query.c gcc -I. -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I. -fno-common -g -O2 -fno-common -pipe -fno-common -D_FILE_OFFSET_BITS=64 -c q_fuzzy.c gcc -I. -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I. -fno-common -g -O2 -fno-common -pipe -fno-common -D_FILE_OFFSET_BITS=64 -c q_match_all.c gcc -I. -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I. -fno-common -g -O2 -fno-common -pipe -fno-common -D_FILE_OFFSET_BITS=64 -c q_multi_term.c gcc -I. -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I. -fno-common -g -O2 -fno-common -pipe -fno-common -D_FILE_OFFSET_BITS=64 -c q_parser.c gcc -I. -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I. -fno-common -g -O2 -fno-common -pipe -fno-common -D_FILE_OFFSET_BITS=64 -c q_phrase.c gcc -I. -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I. -fno-common -g -O2 -fno-common -pipe -fno-common -D_FILE_OFFSET_BITS=64 -c q_prefix.c gcc -I. -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I. -fno-common -g -O2 -fno-common -pipe -fno-common -D_FILE_OFFSET_BITS=64 -c q_range.c gcc -I. -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I. -fno-common -g -O2 -fno-common -pipe -fno-common -D_FILE_OFFSET_BITS=64 -c q_span.c gcc -I. -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I. -fno-common -g -O2 -fno-common -pipe -fno-common -D_FILE_OFFSET_BITS=64 -c q_term.c gcc -I. -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I. -fno-common -g -O2 -fno-common -pipe -fno-common -D_FILE_OFFSET_BITS=64 -c q_wildcard.c gcc -I. -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I. -fno-common -g -O2 -fno-common -pipe -fno-common -D_FILE_OFFSET_BITS=64 -c r_analysis.c gcc -I. -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I. -fno-common -g -O2 -fno-common -pipe -fno-common -D_FILE_OFFSET_BITS=64 -c r_index.c gcc -I. -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I. -fno-common -g -O2 -fno-common -pipe -fno-common -D_FILE_OFFSET_BITS=64 -c r_qparser.c gcc -I. -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I. -fno-common -g -O2 -fno-common -pipe -fno-common -D_FILE_OFFSET_BITS=64 -c r_search.c gcc -I. -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I. -fno-common -g -O2 -fno-common -pipe -fno-common -D_FILE_OFFSET_BITS=64 -c r_store.c gcc -I. -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I. -fno-common -g -O2 -fno-common -pipe -fno-common -D_FILE_OFFSET_BITS=64 -c r_utils.c gcc -I. -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I. -fno-common -g -O2 -fno-common -pipe -fno-common -D_FILE_OFFSET_BITS=64 -c ram_store.c gcc -I. -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I. -fno-common -g -O2 -fno-common -pipe -fno-common -D_FILE_OFFSET_BITS=64 -c search.c gcc -I. -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I. -fno-common -g -O2 -fno-common -pipe -fno-common -D_FILE_OFFSET_BITS=64 -c similarity.c gcc -I. -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I. -fno-common -g -O2 -fno-common -pipe -fno-common -D_FILE_OFFSET_BITS=64 -c sort.c gcc -I. -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I. -fno-common -g -O2 -fno-common -pipe -fno-common -D_FILE_OFFSET_BITS=64 -c stem_ISO_8859_1_danish.c gcc -I. -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I. -fno-common -g -O2 -fno-common -pipe -fno-common -D_FILE_OFFSET_BITS=64 -c stem_ISO_8859_1_dutch.c gcc -I. -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I. -fno-common -g -O2 -fno-common -pipe -fno-common -D_FILE_OFFSET_BITS=64 -c stem_ISO_8859_1_english.c gcc -I. -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I. -fno-common -g -O2 -fno-common -pipe -fno-common -D_FILE_OFFSET_BITS=64 -c stem_ISO_8859_1_finnish.c gcc -I. -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I. -fno-common -g -O2 -fno-common -pipe -fno-common -D_FILE_OFFSET_BITS=64 -c stem_ISO_8859_1_french.c gcc -I. -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I. -fno-common -g -O2 -fno-common -pipe -fno-common -D_FILE_OFFSET_BITS=64 -c stem_ISO_8859_1_german.c gcc -I. -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I. -fno-common -g -O2 -fno-common -pipe -fno-common -D_FILE_OFFSET_BITS=64 -c stem_ISO_8859_1_italian.c gcc -I. -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I. -fno-common -g -O2 -fno-common -pipe -fno-common -D_FILE_OFFSET_BITS=64 -c stem_ISO_8859_1_norwegian.c gcc -I. -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I. -fno-common -g -O2 -fno-common -pipe -fno-common -D_FILE_OFFSET_BITS=64 -c stem_ISO_8859_1_porter.c gcc -I. -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I. -fno-common -g -O2 -fno-common -pipe -fno-common -D_FILE_OFFSET_BITS=64 -c stem_ISO_8859_1_portuguese.c gcc -I. -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I. -fno-common -g -O2 -fno-common -pipe -fno-common -D_FILE_OFFSET_BITS=64 -c stem_ISO_8859_1_spanish.c gcc -I. -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I. -fno-common -g -O2 -fno-common -pipe -fno-common -D_FILE_OFFSET_BITS=64 -c stem_ISO_8859_1_swedish.c gcc -I. -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I. -fno-common -g -O2 -fno-common -pipe -fno-common -D_FILE_OFFSET_BITS=64 -c stem_KOI8_R_russian.c gcc -I. -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I. -fno-common -g -O2 -fno-common -pipe -fno-common -D_FILE_OFFSET_BITS=64 -c stem_UTF_8_danish.c gcc -I. -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I. -fno-common -g -O2 -fno-common -pipe -fno-common -D_FILE_OFFSET_BITS=64 -c stem_UTF_8_dutch.c gcc -I. -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I. -fno-common -g -O2 -fno-common -pipe -fno-common -D_FILE_OFFSET_BITS=64 -c stem_UTF_8_english.c gcc -I. -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I. -fno-common -g -O2 -fno-common -pipe -fno-common -D_FILE_OFFSET_BITS=64 -c stem_UTF_8_finnish.c gcc -I. -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I. -fno-common -g -O2 -fno-common -pipe -fno-common -D_FILE_OFFSET_BITS=64 -c stem_UTF_8_french.c gcc -I. -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I. -fno-common -g -O2 -fno-common -pipe -fno-common -D_FILE_OFFSET_BITS=64 -c stem_UTF_8_german.c gcc -I. -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I. -fno-common -g -O2 -fno-common -pipe -fno-common -D_FILE_OFFSET_BITS=64 -c stem_UTF_8_italian.c gcc -I. -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I. -fno-common -g -O2 -fno-common -pipe -fno-common -D_FILE_OFFSET_BITS=64 -c stem_UTF_8_norwegian.c gcc -I. -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I. -fno-common -g -O2 -fno-common -pipe -fno-common -D_FILE_OFFSET_BITS=64 -c stem_UTF_8_porter.c gcc -I. -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I. -fno-common -g -O2 -fno-common -pipe -fno-common -D_FILE_OFFSET_BITS=64 -c stem_UTF_8_portuguese.c gcc -I. -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I. -fno-common -g -O2 -fno-common -pipe -fno-common -D_FILE_OFFSET_BITS=64 -c stem_UTF_8_russian.c gcc -I. -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I. -fno-common -g -O2 -fno-common -pipe -fno-common -D_FILE_OFFSET_BITS=64 -c stem_UTF_8_spanish.c gcc -I. -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I. -fno-common -g -O2 -fno-common -pipe -fno-common -D_FILE_OFFSET_BITS=64 -c stem_UTF_8_swedish.c gcc -I. -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I. -fno-common -g -O2 -fno-common -pipe -fno-common -D_FILE_OFFSET_BITS=64 -c stopwords.c gcc -I. -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I. -fno-common -g -O2 -fno-common -pipe -fno-common -D_FILE_OFFSET_BITS=64 -c store.c gcc -I. -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I. -fno-common -g -O2 -fno-common -pipe -fno-common -D_FILE_OFFSET_BITS=64 -c term_vectors.c gcc -I. -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I/usr/local/lib/ruby/1.8/i686-darwin8.10.3 -I. -fno-common -g -O2 -fno-common -pipe -fno-common -D_FILE_OFFSET_BITS=64 -c utilities.c cc -dynamic -bundle -undefined suppress -flat_namespace -L"/usr/local/lib" -o ferret_ext.bundle analysis.o api.o array.o bitvector.o compound_io.o document.o except.o ferret.o filter.o fs_store.o global.o hash.o hashset.o helper.o index.o libstemmer.o mempool.o multimapper.o posh.o priorityqueue.o q_boolean.o q_const_score.o q_filtered_query.o q_fuzzy.o q_match_all.o q_multi_term.o q_parser.o q_phrase.o q_prefix.o q_range.o q_span.o q_term.o q_wildcard.o r_analysis.o r_index.o r_qparser.o r_search.o r_store.o r_utils.o ram_store.o search.o similarity.o sort.o stem_ISO_8859_1_danish.o stem_ISO_8859_1_dutch.o stem_ISO_8859_1_english.o stem_ISO_8859_1_finnish.o stem_ISO_8859_1_french.o stem_ISO_8859_1_german.o stem_ISO_8859_1_italian.o stem_ISO_8859_1_norwegian.o stem_ISO_8859_1_porter.o stem_ISO_8859_1_portuguese.o stem_ISO_8859_1_spanish.o stem_ISO_8859_1_swedish.o stem_KOI8_R_russian.o stem_UTF_8_danish.o stem_UTF_8_dutch.o stem_UTF_8_english.o stem_UTF_8_finnish.o stem_UTF_8_french.o stem_UTF_8_german.o stem_UTF_8_italian.o stem_UTF_8_norwegian.o stem_UTF_8_porter.o stem_UTF_8_portuguese.o stem_UTF_8_russian.o stem_UTF_8_spanish.o stem_UTF_8_swedish.o stopwords.o store.o term_vectors.o utilities.o -lruby -lpthread -ldl -lobjc /usr/bin/ld: /usr/lib/gcc/i686-apple-darwin8/4.0.1/../../../libpthread.dylib unknown flags (type) of section 6 (__TEXT,__dof_plockstat) in load command 0 /usr/bin/ld: /usr/lib/gcc/i686-apple-darwin8/4.0.1/../../../libdl.dylib unknown flags (type) of section 6 (__TEXT,__dof_plockstat) in load command 0 /usr/bin/ld: /usr/lib/gcc/i686-apple-darwin8/4.0.1/../../../libobjc.dylib load command 9 unknown cmd field /usr/bin/ld: /usr/lib/gcc/i686-apple-darwin8/4.0.1/../../../libSystem.dylib unknown flags (type) of section 6 (__TEXT,__dof_plockstat) in load command 0 /usr/bin/ld: /usr/lib/libSystem.B.dylib unknown flags (type) of section 6 (__TEXT,__dof_plockstat) in load command 0 collect2: ld returned 1 exit status make: *** [ferret_ext.bundle] Error 1