From lyesjob at gmail.com Mon Dec 1 02:52:06 2008 From: lyesjob at gmail.com (Lyes Amazouz) Date: Mon, 1 Dec 2008 08:52:06 +0100 Subject: [Ferret-talk] Need some information about Ferret In-Reply-To: <1C29582C-07C7-43A1-BBDB-984587E0C3C3@ehatchersolutions.com> References: <60d886530811300049i26d2ac42w8a0bc6a268a6ca2f@mail.gmail.com> <54218BA4-A5E2-42A2-B6A6-C0FF7E875C35@ehatchersolutions.com> <60d886530811300754t47c9efb1uf3b3f4e5c18c0330@mail.gmail.com> <1C29582C-07C7-43A1-BBDB-984587E0C3C3@ehatchersolutions.com> Message-ID: <60d886530811302352l3be99003t76f2ca6d2d255169@mail.gmail.com> Hello But why move from Solr to Ferret? > > We found that the search and the indexation with Solr was too slow, and we decided to find another alternative. Ferret seems to be a good choice. We tried Ferret on some examples and we found that it was better. -- =========== | Lyes Amazouz | USTHB, Algiers =========== -------------- next part -------------- An HTML attachment was scrubbed... URL: From erik at ehatchersolutions.com Mon Dec 1 03:45:35 2008 From: erik at ehatchersolutions.com (Erik Hatcher) Date: Mon, 1 Dec 2008 03:45:35 -0500 Subject: [Ferret-talk] Need some information about Ferret In-Reply-To: <60d886530811302352l3be99003t76f2ca6d2d255169@mail.gmail.com> References: <60d886530811300049i26d2ac42w8a0bc6a268a6ca2f@mail.gmail.com> <54218BA4-A5E2-42A2-B6A6-C0FF7E875C35@ehatchersolutions.com> <60d886530811300754t47c9efb1uf3b3f4e5c18c0330@mail.gmail.com> <1C29582C-07C7-43A1-BBDB-984587E0C3C3@ehatchersolutions.com> <60d886530811302352l3be99003t76f2ca6d2d255169@mail.gmail.com> Message-ID: <9FFF360D-54CE-44D4-A90F-31EBCB998E09@ehatchersolutions.com> On Dec 1, 2008, at 2:52 AM, Lyes Amazouz wrote: > Hello > > But why move from Solr to Ferret? > > > We found that the search and the indexation with Solr was too slow, > and we decided to find another alternative. Ferret seems to be a > good choice. We tried Ferret on some examples and we found that it > was better. Thanks for the feedback. If you don't mind elaborating further, what kind of documents are you indexing (database rows? file system files? other?), how many documents do you have, and how are you indexing it? Thanks, Erik From lyesjob at gmail.com Mon Dec 1 05:36:09 2008 From: lyesjob at gmail.com (Lyes Amazouz) Date: Mon, 1 Dec 2008 11:36:09 +0100 Subject: [Ferret-talk] Need some information about Ferret In-Reply-To: <9FFF360D-54CE-44D4-A90F-31EBCB998E09@ehatchersolutions.com> References: <60d886530811300049i26d2ac42w8a0bc6a268a6ca2f@mail.gmail.com> <54218BA4-A5E2-42A2-B6A6-C0FF7E875C35@ehatchersolutions.com> <60d886530811300754t47c9efb1uf3b3f4e5c18c0330@mail.gmail.com> <1C29582C-07C7-43A1-BBDB-984587E0C3C3@ehatchersolutions.com> <60d886530811302352l3be99003t76f2ca6d2d255169@mail.gmail.com> <9FFF360D-54CE-44D4-A90F-31EBCB998E09@ehatchersolutions.com> Message-ID: <60d886530812010236m2d496c20ic374c8d34843ae6f@mail.gmail.com> Hello Erik Thanks for the feedback. If you don't mind elaborating further, what kind > of documents are you indexing (database rows? file system files? other?), > how many documents do you have, and how are you indexing it? > > Thanks, > > Erik > Now, we are indexing file system files varying from HTML pages (85%) to IMAGES (10%) (We index Meta information here), PDF(2%) WORD (2%) and PURE TEXT (1%), we have 100 000 000 documents to index (10%) is already done. And for the last question, I didn't exactly understand what do you mean by "How we are indexing", What I can say is that before we index non full text documents (like PDF, WORD and HTML), we operate a content extraction (usingpdftotext, antiword and 'hpricot' ruby library). We axtract also the metadata related to each document we index. > > > > > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > -- =========== | Lyes Amazouz | USTHB, Algiers =========== -------------- next part -------------- An HTML attachment was scrubbed... URL: From kraemer at webit.de Mon Dec 1 10:48:03 2008 From: kraemer at webit.de (=?ISO-8859-1?Q?Jens_Kr=E4mer?=) Date: Mon, 1 Dec 2008 16:48:03 +0100 Subject: [Ferret-talk] Need some information about Ferret In-Reply-To: <60d886530811300049i26d2ac42w8a0bc6a268a6ca2f@mail.gmail.com> References: <60d886530811300049i26d2ac42w8a0bc6a268a6ca2f@mail.gmail.com> Message-ID: Hi! On 30.11.2008, at 09:49, Lyes Amazouz wrote: > Hi everybody! > > In our company, we want to use Ferret as the main index/search > engine of our applications. And we are looking for some testimonies > about how Ferret is efficient when deployed in production. > > * Was Ferret already deployed in production in some companies? is > there some testimonies about that? Yes, I use Ferret whenever I need some kind of search for a site or application I'm working on. Usually these are full text searches for product catalogs and/or html content - not really large scale, at most around 10000 documents. Most recent example is www.fahrrad-xxl.de. We also use Ferret + aaf in a knowledge management system I'm working on for xscio AG (xscio.de). > * What is the maximum number of documents we can index with ferret? > Has some one informations about that. I have no idea whether there is an upper limit for the number the documents other than the maximum value a Ruby Fixnum instance can have... > * What is the best way to access a very huge Ferret Index? May we > distribute it on several machines or not? Afair there's no way to distribute an index across multiple machines built into Ferret. You could do the distribution yourself of course by clustering your data and distributing across several independent ferret indexes. Downside is that search result scores from different indexes aren't directly comparable. > By the way, can Ferret read Solr indexes as they are both clones of > luceen? Ferret isn't really index compatible with Lucene anymore, it uses a slightly different index format mostly due to differences in the representation of utf8 values, but I think there were other changes, too. Oh, and Solr also isn't a clone of Lucene, it's a search server that internally uses the Lucene library. Cheers, Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49351467660 | Telefax +493514676666 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold From cgansen at gmail.com Tue Dec 2 17:18:45 2008 From: cgansen at gmail.com (Chris G.) Date: Tue, 2 Dec 2008 23:18:45 +0100 Subject: [Ferret-talk] DrbServer incompatible with config.gem In-Reply-To: <86f9a2b86a44e71729426fa22f12b5c6@ruby-forum.com> References: <86f9a2b86a44e71729426fa22f12b5c6@ruby-forum.com> Message-ID: <6b6d8131da96309b721dbba2145a4d38@ruby-forum.com> I don't use the AAF gem, but do have a require statement in a initializer in config/intializers that pulls in acts_as_ferret. Any chance you can supply a more detailed back trace that sheds some light on what class is causing trouble? Do you also have trouble booting script/console? -c- Dave Anderson wrote: > I recently upgraded to Rails 2.2.2 and refactored some of my rails code > to use the config.gem functionality. In doing so, I noticed some odd > behavior when trying to start the DrbServer. Essentially, the DrbServer > will not start unless I have the "require 'acts_as_ferret" statement > below in environment.rb. I thought the gem.config would have been > enough, but apparently not. > > -- start environment.rb -- > > Rails::Initializer.run do |config| > > [...] > > # Requires acts_as_ferret must be here, can't figure out why > require 'acts_as_ferret' > > config.gem 'acts_as_ferret', :version => '0.4.3' > config.gem 'ferret', :version => '0.11.6' > > [...] > > end > > -- end environment.rb -- > > Here's the error message I receive when the require is left out. > > -- error -- > > me at machina $ script/ferret_server -e production start > undefined method `acts_as_ferret' for # > > -- end snip -- > > Any ideas if this is normal? -- Posted via http://www.ruby-forum.com/. From toastkid.williams at gmail.com Thu Dec 4 07:21:38 2008 From: toastkid.williams at gmail.com (Max Williams) Date: Thu, 4 Dec 2008 12:21:38 +0000 Subject: [Ferret-talk] order by scoring, pagination and AR conditions don't work together Message-ID: Hi all I'm having a problem with aaf - i want to do a search and order the results by the ferret score. I also want to paginate the results (20 per page), and to complicate matters further i have an array of 'allowed_ids' - this is effectively the pool of resources which the search results are limited to. So, i want to order by score, pass in some AR conditions, and paginate the results. Currently the pagination seems to be breaking. Here's an example search, paginated with 1000 per page so it effectively gets all the results (there are only 67 for this particular search). I've collected the ids of the results just for illustration purposes: ferret_results = ActsAsFerret::find("viola", [TeachingObject,LearningObject], #(ferret) options { :page => 1, :per_page => 1000 }, #find options - need to specify conditions for each searched class individually {:conditions => { :teaching_object=>["resources.id in (?)",allowed_ids], :learning_object=>["resources.id in (?)",allowed_ids] } } ).collect(&:id) => [5407, 5427, 5416, 5401, 5411, 5415, 5420, 5421, 5426, 5431, 5435, 5436, 5397, 5403, 5412, 5418, 5419, 5423, 5424, 5429, 5437, 5439, 533, 534, 5405, 5425, 5440, 5402, 5413, 5414, 5417, 5432, 5433, 5438, 5410, 5404, 532, 5399, 5409, 531, 5398, 5400, 5408, 5422, 5428, 5430, 5434, 5406, 5441, 518, 524, 535, 529, 525, 526, 536, 537, 538, 530, 528, 527, 4452, 1709, 5790] So - if i was to bring the per_page parameter down to 10 then i'd expect to get the first 10 ids from the list above, right? But, that's not what happens: ferret_results = ActsAsFerret::find("viola", [TeachingObject,LearningObject], #(ferret) options { :page => 1, :per_page => 10 }, #find options - need to specify conditions for each searched class individually {:conditions => { :teaching_object=>["resources.id in (?)",allowed_ids], :learning_object=>["resources.id in (?)",allowed_ids] } } ).collect(&:id) => [532, 531, 518, 524, 529, 525, 526, 530, 528, 527] Can anyone tell me if i'm doing something wrong, or a way i can work around this? thanks max -------------- next part -------------- An HTML attachment was scrubbed... URL: From lyesjob at gmail.com Thu Dec 4 08:03:46 2008 From: lyesjob at gmail.com (Lyes Amazouz) Date: Thu, 4 Dec 2008 14:03:46 +0100 Subject: [Ferret-talk] Need some information about Ferret In-Reply-To: References: <60d886530811300049i26d2ac42w8a0bc6a268a6ca2f@mail.gmail.com> Message-ID: <60d886530812040503q6ef8fa8ap8893eb539b1be731@mail.gmail.com> Hello Jens! Thank you for your contribution. > Yes, I use Ferret whenever I need some kind of search for a site or > application I'm working on. Usually these are full text searches for product > catalogs and/or html content - not really large scale, at most around 10000 > documents. Most recent example is www.fahrrad-xxl.de. > Is 100 000 your maximum documents Number? We have more than 100.000.000 documents to index. 2.800.000 are already done but the indexation machine starts to be heavy! Do you think that ferret will be able to index all this? > > * What is the best way to access a very huge Ferret Index? May we >> distribute it on several machines or not? >> > > Afair there's no way to distribute an index across multiple machines built > into Ferret. You could do the distribution yourself of course by clustering > your data and distributing across several independent ferret indexes. > Downside is that search result scores from different indexes aren't directly > comparable. Yes, it is a good Idea. But how will we merge the results when we will get them back after a request? -- =========== | Lyes Amazouz | USTHB, Algiers =========== -------------- next part -------------- An HTML attachment was scrubbed... URL: From toastkid.williams at gmail.com Thu Dec 11 06:47:54 2008 From: toastkid.williams at gmail.com (Max Williams) Date: Thu, 11 Dec 2008 12:47:54 +0100 Subject: [Ferret-talk] All Indexes being configured on every page load (seemingly) Message-ID: Hi - i'm doing some DB optimisations on our site, mainly by watching the log files (in dev mode) and seeing what db access is going on. I'm seeing a lot of massive outputs like below - this is for one of my ferret-indexed classes, but i have 5, and they all seem to output all this stuff on every page that loads something from one of those tables (just loading, not even updating). I'm just wondering - a) is this normal? b) Is it really reconfiguring all my ferret indexes every time i load a page? c) is it necessary? d) is it harming my site's performance? thanks, max SQL (0.001301) SHOW TABLES configured index for class Lesson: {:user_default_field=>nil, :enabled=>true, :fields=>{:property_names=>{}, :asset_count=>{:index=>:untokenized}, :asset_paths=>{}, :resource_property_names=>{}, :description_for_sort=>{:index=>:untokenized}, :user_login=>{}, :name=>{}, :created_at_for_sort=>{:index=>:untokenized}, :officialness=>{}, :asset_names=>{}, :description=>{}, :name_for_sort=>{:index=>:untokenized}, :user_name=>{}}, :store_class_name=>true, :index_dir=>"/home/max/work/e_learning_resource/trunk/index/development/lesson", :mysql_fast_batches=>true, :name=>:lesson, :single_index=>false, :index_base_dir=>"/home/max/work/e_learning_resource/trunk/index/development/lesson", :reindex_batch_size=>1000, :registered_models=>[Lesson(id: integer, name: string, description: text, user_id: integer, created_at: datetime, privacy: integer, is_official: boolean, is_readonly: boolean, comments_allowed: boolean, hours: integer, sessions: integer, updated_at: datetime)], :ferret=>{:path=>"/home/max/work/e_learning_resource/trunk/index/development/lesson", :auto_flush=>true, :or_default=>false, :key=>[:id, :class_name], :handle_parse_errors=>true, :create_if_missing=>true, :default_field=>[:property_names, :asset_paths, :resource_property_names, :name, :user_login, :officialness, :description, :asset_names, :user_name]}, :raise_drb_errors=>false, :ferret_fields=>{:property_names=>{:highlight=>:yes, :store=>:no, :term_vector=>:with_positions_offsets, :index=>:yes, :via=>:property_names, :boost=>1.0}, :asset_paths=>{:highlight=>:yes, :store=>:no, :term_vector=>:with_positions_offsets, :index=>:yes, :via=>:asset_paths, :boost=>1.0}, :asset_count=>{:highlight=>:yes, :store=>:no, :term_vector=>:with_positions_offsets, :index=>:untokenized, :via=>:asset_count, :boost=>1.0}, :resource_property_names=>{:highlight=>:yes, :store=>:no, :term_vector=>:with_positions_offsets, :index=>:yes, :via=>:resource_property_names, :boost=>1.0}, :description_for_sort=>{:highlight=>:yes, :store=>:no, :term_vector=>:with_positions_offsets, :index=>:untokenized, :via=>:description_for_sort, :boost=>1.0}, :name=>{:highlight=>:yes, :store=>:no, :term_vector=>:with_positions_offsets, :index=>:yes, :via=>:name, :boost=>1.0}, :user_login=>{:highlight=>:yes, :store=>:no, :term_vector=>:with_positions_offsets, :index=>:yes, :via=>:user_login, :boost=>1.0}, :created_at_for_sort=>{:highlight=>:yes, :store=>:no, :term_vector=>:with_positions_offsets, :index=>:untokenized, :via=>:created_at_for_sort, :boost=>1.0}, :officialness=>{:highlight=>:yes, :store=>:no, :term_vector=>:with_positions_offsets, :index=>:yes, :via=>:officialness, :boost=>1.0}, :description=>{:highlight=>:yes, :store=>:no, :term_vector=>:with_positions_offsets, :index=>:yes, :via=>:description, :boost=>1.0}, :asset_names=>{:highlight=>:yes, :store=>:no, :term_vector=>:with_positions_offsets, :index=>:yes, :via=>:asset_names, :boost=>1.0}, :user_name=>{:highlight=>:yes, :store=>:no, :term_vector=>:with_positions_offsets, :index=>:yes, :via=>:user_name, :boost=>1.0}, :name_for_sort=>{:highlight=>:yes, :store=>:no, :term_vector=>:with_positions_offsets, :index=>:untokenized, :via=>:name_for_sort, :boost=>1.0}}} -- Posted via http://www.ruby-forum.com/. From jk at jkraemer.net Thu Dec 11 07:31:39 2008 From: jk at jkraemer.net (Jens Kraemer) Date: Thu, 11 Dec 2008 13:31:39 +0100 Subject: [Ferret-talk] All Indexes being configured on every page load (seemingly) In-Reply-To: References: Message-ID: <9496BE7E-8399-4059-840E-E74C984DA098@jkraemer.net> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi! On 11.12.2008, at 12:47, Max Williams wrote: > Hi - i'm doing some DB optimisations on our site, mainly by watching > the > log files (in dev mode) and seeing what db access is going on. I'm > seeing a lot of massive outputs like below - this is for one of my > ferret-indexed classes, but i have 5, and they all seem to output all > this stuff on every page that loads something from one of those tables > (just loading, not even updating). > > I'm just wondering - > a) is this normal? yes, in development mode it is. You shouldn't see this in production. > b) Is it really reconfiguring all my ferret indexes every time i > load a > page? yep. I think the biggest problem is the noisy output in this case, which really makes log files hard to read. You can easily comment out the debug statement responsible for this, it's around line 94 in act_methods.rb. > c) is it necessary? It's a side effect of Rails reloading class definitions on each request in dev mode, maybe there would be a way for aaf to work around this. However I think this would lead to unexpected behaviour i.e. if you modified some aaf option in dev mode and the change would only be picked up by restarting the server. > d) is it harming my site's performance? If you're running in production mode this will happen only once at application startup, so the answer is no. Cheers, Jens - -- Jens Kr?mer Finkenlust 14, 06449 Aschersleben, Germany VAT Id DE251962952 http://www.jkraemer.net/ - Blog http://www.omdb.org/ - The new free film database GPG public key: http://www.jkraemer.net/static/keys/jk_jkraemer.net.key.asc -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (Darwin) iEYEARECAAYFAklBCCsACgkQgpXMPm7s2942UQCfeVWnVE9Ae8WV52akrwjCKo7H 7cUAn3urNNNdxVDvO79NXuyU+vgoDUCD =XfXl -----END PGP SIGNATURE----- From toastkid.williams at gmail.com Thu Dec 11 08:54:00 2008 From: toastkid.williams at gmail.com (Max Williams) Date: Thu, 11 Dec 2008 14:54:00 +0100 Subject: [Ferret-talk] All Indexes being configured on every page load (seemingly) In-Reply-To: <9496BE7E-8399-4059-840E-E74C984DA098@jkraemer.net> References: <9496BE7E-8399-4059-840E-E74C984DA098@jkraemer.net> Message-ID: <3f088a09d0d0347103c8fcaadbc0af23@ruby-forum.com> Ah, i see, that makes sense. I forgot about the class-reloading stuff (even though i habitually make a change to a class and reload the page to see it, while in dev mode). Thanks Jens! max -- Posted via http://www.ruby-forum.com/. From durante.dev at mac.com Mon Dec 29 14:52:38 2008 From: durante.dev at mac.com (Dur Dev) Date: Mon, 29 Dec 2008 20:52:38 +0100 Subject: [Ferret-talk] Ferret::Search::TypedRangeQuery Message-ID: <82b25c28e06ced742e7f5ba2d5bd3d19@ruby-forum.com> I need to do typed range queries on record ids and dates. When I call Ferret::Search::TypedRangeQuery I get the error: NameError: uninitialized constant Ferret::Search::TypedRangeQuery Is TypedRangeQuery supported in AAF? I am using Ferret gem 0.11.6 and AAF 0.4.3. If its not supported, is there any other way of accomplishing a typed range query? I tried using a RangeQuery and setting the following in the model: acts_as_ferret :fields => { :id_sort => { :index => :untokenized }, :updated_at_sort => { :index => :untokenized }, ...}, :ferret => { :use_typed_range_query => true } ...but this didn't work either. The search results indicate ferret is treating the ids as strings. It seem to do the correct thing with date ranges however. -- Posted via http://www.ruby-forum.com/. From durante.dev at mac.com Wed Dec 31 14:21:42 2008 From: durante.dev at mac.com (Dur Dev) Date: Wed, 31 Dec 2008 20:21:42 +0100 Subject: [Ferret-talk] Ferret::Search::TypedRangeQuery In-Reply-To: <82b25c28e06ced742e7f5ba2d5bd3d19@ruby-forum.com> References: <82b25c28e06ced742e7f5ba2d5bd3d19@ruby-forum.com> Message-ID: Dur Dev wrote: > I need to do typed range queries on record ids and dates. When I call > Ferret::Search::TypedRangeQuery I get the error: > > NameError: uninitialized constant Ferret::Search::TypedRangeQuery > > Is TypedRangeQuery supported in AAF? I am using Ferret gem 0.11.6 and > AAF 0.4.3. If its not supported, is there any other way of accomplishing > a typed range query? I tried using a RangeQuery and setting the > following in the model: > > acts_as_ferret :fields => { :id_sort => { :index => :untokenized }, > :updated_at_sort => { :index => :untokenized }, ...}, :ferret => { > :use_typed_range_query => true } > > ...but this didn't work either. The search results indicate ferret is > treating the ids as strings. It seem to do the correct thing with date > ranges however. Looks like it's not supported in 0.11.6. It's in "current". Nice feature, look forward to using it when it's released. -- Posted via http://www.ruby-forum.com/.