From stephane.wirtel at gmail.com Sun Jan 6 14:37:25 2008 From: stephane.wirtel at gmail.com (Stephane Wirtel) Date: Sun, 6 Jan 2008 20:37:25 +0100 Subject: [Brug-talk] =?iso-8859-1?q?Gelukkig_Nieuwjaar_!=2C_Happy_New_Year?= =?iso-8859-1?q?_!=2C_Bonne_Ann=E9e_!?= Message-ID: Gelukkig Nieuwjaar 2008 ! Happy New Year 2008 ! Bonne Ann?e 2008 ! Best Regards, St?phane Wirtel -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/brug-talk/attachments/20080106/b4821c26/attachment-0001.html From Johan.Andries at ae.be Tue Jan 8 03:52:39 2008 From: Johan.Andries at ae.be (Johan Andries) Date: Tue, 8 Jan 2008 09:52:39 +0100 Subject: [Brug-talk] SAI presentation on Ruby Message-ID: Hi, on Thursday 28/2 I'll be doing a presentation on Ruby (and JRuby/IronRuby) for a more enterprisey audience: http://www.sai.be/nl/eventdetail.aspx?ev_id=153 (in Dutch). The main message will be that Ruby (with blocks, meta-programming, DSL-like constructs, etc.) might be a better option than Java when coding certain kinds of business logic. Much like the polyglot programming idea in this blog post . Of course I will also mention ActiveHibernate in this context. Cheers, Johan. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/brug-talk/attachments/20080108/22adfdfd/attachment.html From alain.ravet at gmail.com Tue Jan 8 04:30:43 2008 From: alain.ravet at gmail.com (Alain Ravet) Date: Tue, 8 Jan 2008 10:30:43 +0100 Subject: [Brug-talk] SAI presentation on Ruby In-Reply-To: References: Message-ID: Johan, Conference access seems limited to SAI members : # Avondconferenties zijn enkel toegankelijk voor SAI persoonlijke leden en SAI # bedrijfsleden en worden voor hen gratis ingericht. Alain > on Thursday 28/2 I'll be doing a presentation on Ruby (and JRuby/IronRuby) > for a more enterprisey audience: > http://www.sai.be/nl/eventdetail.aspx?ev_id=153 (in Dutch). From marco.pas at logicacmg.com Tue Jan 8 04:44:20 2008 From: marco.pas at logicacmg.com (Pas, Marco) Date: Tue, 8 Jan 2008 10:44:20 +0100 Subject: [Brug-talk] SAI presentation on Ruby Message-ID: <994309130D377342856DF83773DCD498014E7EA6@NL-EX003.groupinfra.com> I would like to attend bu indeed it seems that it is limited to SAI members. Gr -----Original Message----- From: brug-talk-bounces at rubyforge.org [mailto:brug-talk-bounces at rubyforge.org] On Behalf Of Alain Ravet Sent: dinsdag 8 januari 2008 10:31 To: brug-talk at rubyforge.org Subject: Re: [Brug-talk] SAI presentation on Ruby Johan, Conference access seems limited to SAI members : # Avondconferenties zijn enkel toegankelijk voor SAI persoonlijke leden en SAI # bedrijfsleden en worden voor hen gratis ingericht. Alain > on Thursday 28/2 I'll be doing a presentation on Ruby (and JRuby/IronRuby) > for a more enterprisey audience: > http://www.sai.be/nl/eventdetail.aspx?ev_id=153 (in Dutch). _______________________________________________ Brug-talk mailing list Brug-talk at rubyforge.org http://rubyforge.org/mailman/listinfo/brug-talk This e-mail and any attachment is for authorised use by the intended recipient(s) only. It may contain proprietary material, confidential information and/or be subject to legal privilege. It should not be copied, disclosed to, retained or used by, any other party. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender. Thank you. From peter at vandenabeele.com Sun Jan 20 19:03:13 2008 From: peter at vandenabeele.com (Peter Vandenabeele) Date: Mon, 21 Jan 2008 01:03:13 +0100 Subject: [Brug-talk] Ferret - sphinx - Solr Message-ID: We are evaluating different solutions for search in Rails. The "logical" choice proposed is Ferret (a Lucene clone for Rails). But now I bumped into this recent thread (started Jan 4, 2008): http://www.ruby-forum.com/topic/137629#616449 where a number of people mention serious stability problems with Ferret. The alternative (with less features but far more stable and performant) that seems to be proposed is "sphinx". From this recent thread, Ferret seems too risky for the first version. I might even just use a plain simple SQL "like" or "rlike" or a "fulltext match" to start with. Actually, for Postgresql tsearch2 gets thumbs up (but we settled for Mysql for now ...). Any hints from local experiences? Thanks in advance ... Peter From dl at userneed.com Mon Jan 21 03:27:47 2008 From: dl at userneed.com (Denis Lamotte) Date: Mon, 21 Jan 2008 09:27:47 +0100 Subject: [Brug-talk] Ferret - sphinx - Solr In-Reply-To: References: Message-ID: <47945783.6080402@userneed.com> Peter Vandenabeele a ?crit : > We are evaluating different solutions for search in Rails. The > "logical" choice proposed is Ferret (a Lucene clone for Rails). But > now I bumped into this recent thread (started Jan 4, 2008): > > http://www.ruby-forum.com/topic/137629#616449 > > where a number of people mention serious stability problems with > Ferret. The alternative (with less features but far more stable and > performant) that seems to be proposed is "sphinx". From this recent > thread, Ferret seems too risky for the first version. I might even > just use a plain simple SQL "like" or "rlike" or a "fulltext match" to > start with. Actually, for Postgresql tsearch2 gets thumbs up (but we > settled for Mysql for now ...). > > Any hints from local experiences? Thanks in advance ... > We faced the exact same and my brother ask what search engine Ilya (http://www.igvita.com/) used under aiderss.com this is this answer regarding sphinx vs ferret >Francois, yep.. I've played with sphinx before. It's blazing fast, has >federated search and full of other very nice features. >The only, and unfortunately a big one, problem is that it doesn't >support incremental index updates. Updates are just as slow as >re-indexing the entire dataset.. This doesn't bode well for dynamic >datasets such as the one we have at AideRSS. Having said that, if you >have a static dataset, sphinx is arguably the way to go! >Ilya I plan to use sphinx in on our latest application and i plan to recreated the indexes several times in a day. i will start testing the dietetic application we made for doxys.com, this week and we expect 20.000 users after the 15 february so i'll have more to say after that. i'm sure you have google it but theses links were helpful to me http://www.datanoise.com/articles/2007/3/23/acts_as_sphinx-plugin http://www.slashdotdash.net/articles/2007/08/06/rails-searching-with-sphinx http://kpumuk.info/ror-plugins/using-sphinx-search-engine-in-ruby-on-rails/ best regards Denis > Peter > _______________________________________________ > Brug-talk mailing list > Brug-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/brug-talk > > -- Lamotte Denis User Need sprl phone: +32(0)2 3465580 www.userneed.com -------------- next part -------------- A non-text attachment was scrubbed... Name: dl.vcf Type: text/x-vcard Size: 200 bytes Desc: not available Url : http://rubyforge.org/pipermail/brug-talk/attachments/20080121/d93f95e3/attachment.vcf From peter at 10-forward.be Mon Jan 21 04:47:27 2008 From: peter at 10-forward.be (Peter De Berdt (10-forward)) Date: Mon, 21 Jan 2008 10:47:27 +0100 Subject: [Brug-talk] Ferret - sphinx - Solr In-Reply-To: References: Message-ID: <180FDA15-CD12-4699-A235-14C2AA534074@10-forward.be> On 21 Jan 2008, at 01:03, Peter Vandenabeele wrote: > We are evaluating different solutions for search in Rails. The > "logical" choice proposed is Ferret (a Lucene clone for Rails). But > now I bumped into this recent thread (started Jan 4, 2008): > > http://www.ruby-forum.com/topic/137629#616449 > > where a number of people mention serious stability problems with > Ferret. The alternative (with less features but far more stable and > performant) that seems to be proposed is "sphinx". From this recent > thread, Ferret seems too risky for the first version. I might even > just use a plain simple SQL "like" or "rlike" or a "fulltext match" to > start with. Actually, for Postgresql tsearch2 gets thumbs up (but we > settled for Mysql for now ...). > > Any hints from local experiences? Thanks in advance ... We've been using ferret in both development and now in production and I must say our experiences are... bad. In development mode, our indexes corrupted quite often, giving either bad results or just bombing the application. Sometimes reindexing the models with a rake task did solve the problems, but there were times when manually deleting the index folder was necessary to get it going again. Then comes production mode. Because of concurrency issues (more mongrels accessing/updating the index at the same time), you have to rely on backgroundrb (which is included in the acts_as_ferret plugin). We're using quite complex indices with quite a few of related fields indexed with the main record, as well as multi model searches. The problems we face are the following: ? At irregular times, errors like this pop up: undefined method `to_doc' for # Googling around has revealed there's a hacky patch and the plugin developers don't have a clue as to where the problem lies. ? The backgroundrb server just halts (i.e. the process is killed) ? The index gets corrupted anyway, without an apparent reason Now, the first thing you always count on, is human error, i.e. we made a mistake somewhere ourselves and we need to fix it. Having gone through every single line of code, I can honestly say I'm quite sure this is not the case. This brings it down to three possible bad players: Ferret itself, acts_as_ferret or backgroundrb. I guess I don't have to tell you I'm not particularly happy with this. Over the last couple of days, I've been looking into other possible solutions, such as acts_as_solr, acts_as_searchable (using Hyperestraier), and the MySQL MyISAM extra table thingie of which the name escapes me for the moment. I'm currently leaning towards solr, but Sphinx looks mighty tempting. I might run a few tests with it to see how well it copes. Anyone have any realworld experience? Best regards. Peter De Berdt ______________________ 10-forward Zwarteweg 28 B-8433 Middelkerke Mobile : (0473) 38 35 86 info at 10-forward.be http://www.10-forward.be ______________________ -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/brug-talk/attachments/20080121/1d9f0fd7/attachment-0001.html From peter at vandenabeele.com Mon Jan 21 05:19:42 2008 From: peter at vandenabeele.com (Peter Vandenabeele) Date: Mon, 21 Jan 2008 11:19:42 +0100 Subject: [Brug-talk] Ferret - sphinx - Solr In-Reply-To: <180FDA15-CD12-4699-A235-14C2AA534074@10-forward.be> References: <180FDA15-CD12-4699-A235-14C2AA534074@10-forward.be> Message-ID: On Jan 21, 2008 10:47 AM, Peter De Berdt (10-forward) wrote: ... > Over the last couple of days, I've been looking into other possible > solutions, such as acts_as_solr, acts_as_searchable (using Hyperestraier), > and the MySQL MyISAM extra table thingie of which the name escapes me for > the moment. I'm currently leaning towards solr, but Sphinx looks mighty > tempting. I might run a few tests with it to see how well it copes. Anyone > have any realworld experience? Well ... it looks like we have a topic for a BoF (Beards of a Feather) at the Ruby and Rails room at fosdem ... Peter also just update the program :-) http://wiki.rubyist.be/wiki/show/FosdemDevroom2008AceptedPapers It seems a few people are fighting this topic. We could easily spend a few hours hacking away at this particular problem ("advanced search in Rails"). I am currently looking into the postgresql + tsearch2 path. Also seems promising (if you can do it inside the database, shouldn't that be more performant ? and we can ask instant advise to the neighbours in the postgresql DevRoom ... http://fosdem.org/2008/schedule/devroom/bsdpostgresql I am not convinced yet of the "mysql thingie" (fulltext search in the MyISAM tables) since I see little possibility for tweaking it and the discussions on http://dev.mysql.com/doc/refman/5.0/en/fulltext-search.html don't add too much positive vibes to my gut feeling about it (we are running mysql 5.0 on debian stable=etch maybe things improv in mysql 5.1 or 6 ?). I currently see 2 paths: * "in database" full text search with tsearch2 in postgresql * Sphinx Peter From alain.ravet at gmail.com Mon Jan 21 06:14:26 2008 From: alain.ravet at gmail.com (Alain Ravet) Date: Mon, 21 Jan 2008 12:14:26 +0100 Subject: [Brug-talk] Ferret - sphinx - Solr In-Reply-To: References: <180FDA15-CD12-4699-A235-14C2AA534074@10-forward.be> Message-ID: > Well ... it looks like we have a topic for a BoF (Beards of a Feather) at the > Ruby and Rails room at fosdem ... At this stage, what could we say other than : - "Ferret is a great tool but in the past some people have had some unacceptable stability problems in some conditions.". Said that way, only a russian-roulette fan - if there are any left - would seem justified to use it. Some facts : 1)- Ferret is fast, powerful and, with aaf, very simple to use and tune. 2)- Ferret has some unique (*1) features. If you depend on them, ... 3)- some "big" applications use it with success , but 4)- some renowned Rails people have had problems - in the past - with Ferret stability. and 5) Ferret+aaf is customizable (so you can mess-up with it) (*1) you could use a Java+Lucene server, but it's much more complex to install, use and maintain. This conversation should continue in the Ferret forum. I've used Ferret with success, and in one case it would be very hard to replace, but all those trouble reports are scary. I would like the people who have experienced serious enough problems to make them turn away from Ferret to tell their stories with some details. For example : - did they use a recent version of Ferret/AAF? - did they use it through a (Drb) server? - did they use it out-of-the box, or did they customize it? - did they use the basics features, or some of the advanaced features (multi-indexes, etc..) - etc.. Alain Ravet From peter at 10-forward.be Mon Jan 21 06:34:54 2008 From: peter at 10-forward.be (Peter De Berdt (10-forward)) Date: Mon, 21 Jan 2008 12:34:54 +0100 Subject: [Brug-talk] Ferret - sphinx - Solr In-Reply-To: References: <180FDA15-CD12-4699-A235-14C2AA534074@10-forward.be> Message-ID: <4BDA7889-FB8A-44F3-B624-CFFAA1B9E4AA@10-forward.be> On 21 Jan 2008, at 12:14, Alain Ravet wrote: > - did they use a recent version of Ferret/AAF? The latest one > - did they use it through a (Drb) server? Yes > - did they use it out-of-the box, or did they customize it? Out-of-the-box, except for the hack to circumvent the to_doc error > - did they use the basics features, or some of the advanced features > (multi-indexes, etc..) Depending on what you call advanced: yes, we used multi model indices for one. We also had to jump through some hoops to include related records in the main record index such as a mixin module to include polymorphically associated addresses, phonenumbers, ? (these are linked to companies as well as people, ?). The biggest problem with aaf and ferret in general is that the corruption can not be reproduced easily. If I could just say: "do this, then that, and when you query it like so, you'll get a corrupt index", I'd be happy to post it on the ferret forum and get it sorted out or even do it myself and contribute a patch, but that's simply not the case. We had used ferret before it went DRB backed in production and faced the obvious corruption problems. Still, the Rails list had a discussion where some people said (iirc, the people behind the aaf plugin) all those problems had vanished now that they had solved the concurrency problem. Clearly, that's not the case. It has just made corruption less frequent for us. I'm happy it works for you Alain, and I know you're working with huge datasets, but for us, where records are updated, deleted, created quite frequently, it's been hell. Still, I have to find a solution that works for us and is 100% reliable and sadly just supporting postgresql is out of the question (we could have used tsearch otherwise), because we need to provide our customers with a choice of database (and most of them opt for mysql). If I find a solution that gives me total peace of mind, I'll shout it out loud on every occasion I get, but currently, I can't. Best regards. Peter De Berdt ______________________ 10-forward Zwarteweg 28 B-8433 Middelkerke Mobile : (0473) 38 35 86 info at 10-forward.be http://www.10-forward.be ______________________ -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/brug-talk/attachments/20080121/f758b158/attachment.html From peter at 10-forward.be Mon Jan 21 08:47:07 2008 From: peter at 10-forward.be (Peter De Berdt (10-forward)) Date: Mon, 21 Jan 2008 14:47:07 +0100 Subject: [Brug-talk] Ferret - sphinx - Solr In-Reply-To: <180FDA15-CD12-4699-A235-14C2AA534074@10-forward.be> References: <180FDA15-CD12-4699-A235-14C2AA534074@10-forward.be> Message-ID: <17DF0EE9-3760-478E-AC85-DA5A814CBB8C@10-forward.be> On 21 Jan 2008, at 10:47, Peter De Berdt (10-forward) wrote: > Over the last couple of days, I've been looking into other possible > solutions, such as acts_as_solr, acts_as_searchable (using > Hyperestraier), and the MySQL MyISAM extra table thingie of which > the name escapes me for the moment. I'm currently leaning towards > solr, but Sphinx looks mighty tempting. I might run a few tests > with it to see how well it copes Sadly, sphinx is out of the question, it doesn't support wildcard searches as our customers are used to apparently, searches for "pet*" don't return "Peter" records for example. I may have missed something, but this is a feature we can't live without. Best regards. Peter De Berdt ______________________ 10-forward Zwarteweg 28 B-8433 Middelkerke Mobile : (0473) 38 35 86 info at 10-forward.be http://www.10-forward.be ______________________ -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/brug-talk/attachments/20080121/17d97603/attachment.html From peter at vandenabeele.com Wed Jan 23 17:19:03 2008 From: peter at vandenabeele.com (Peter Vandenabeele) Date: Wed, 23 Jan 2008 23:19:03 +0100 Subject: [Brug-talk] Ferret - sphinx - Solr In-Reply-To: <17DF0EE9-3760-478E-AC85-DA5A814CBB8C@10-forward.be> References: <180FDA15-CD12-4699-A235-14C2AA534074@10-forward.be> <17DF0EE9-3760-478E-AC85-DA5A814CBB8C@10-forward.be> Message-ID: 2008/1/21 Peter De Berdt (10-forward) : > Sadly, sphinx is out of the question, it doesn't support wildcard searches > as our customers are used to apparently, searches for "pet*" don't return > "Peter" records for example. I may have missed something, but this is a > feature we can't live without. I am now working my way through the Sphinx 0.9.8 config file to put more details in it than the default test1 / documents example and bumped into this. From the last line (about backwards compatibility), I may understand that this is a new feature in the 0.9.8 version. For Rails integration, it is clear that ultrasphinx and Thinking Sphinx are competing for the latest and greatest version (they are very actively maintained). As a side note, I looked into tsearch2 in postgresql 8.2 ... seems quite simple an sich, but no up-to-date Rails integration yet. I see tsearch2 now as my fall-back, since that seem utterly simple to set-up and just code some custom find function (there is no external indexer etc., just need to write a trigger in the postgresql database at update/create to also fill in the ts_vector column). Back to the * function in sphinx: # enable_star # # this feature enables "star-syntax" in keywords when searching # through indexes whcih were created with prefix or infix indexing # enabled. # # enable_star only affects searching; so it can be changed # without reindexing (one would need to restart searchd, though). # # possible values are 0 and 1. # # the default value is 0, which means to disable star-syntax # and treat all keywords as prefixes or infixes respectively, # depending on indexing-time min_prefix_len/min_infix_len settings. # # the value of 1 means that # 1) star can be used at the start and/or the end of the keyword; # 2) star will match zero or more characters. # # for example, assume that the index was built with infixes and # that enable_star is 1. searching should work as follows: # # 1) "abcdef" query will match only those documents which contain # the exact "abcdef" word in them; # # 2) "abc*" query will match those documents which contain # any words starting with "abc" (including the documents which # contain the exact "abc" word only); # # 3) "*cde*" query will match those documents which contain # any words which have "cde" characters in any part of the word # (including the documents which contain the exact "cde" word only). # # 4) "*def" query will match those documents which contain # any words ending with "def" (including the documents which # contain the exact "def" word only). # # optional, default value is 0 (to keep compatibility with 0.9.7). HTH, Peter From peter at vandenabeele.com Wed Jan 23 19:28:25 2008 From: peter at vandenabeele.com (Peter Vandenabeele) Date: Thu, 24 Jan 2008 01:28:25 +0100 Subject: [Brug-talk] Ferret - sphinx - Solr In-Reply-To: References: <180FDA15-CD12-4699-A235-14C2AA534074@10-forward.be> <17DF0EE9-3760-478E-AC85-DA5A814CBB8C@10-forward.be> Message-ID: On Jan 23, 2008 11:19 PM, Peter Vandenabeele wrote: > 2008/1/21 Peter De Berdt (10-forward) : > > Sadly, sphinx is out of the question, it doesn't support wildcard searches > > as our customers are used to apparently, searches for "pet*" don't return > > "Peter" records for example. I may have missed something, but this is a > > feature we can't live without. Test set of 6500 records with 3 fields of approx. 800 bytes/records (4.8 MBytes) (using a few bash and linux manuals as source text). Indexing without prefix takes 0.4 seconds Indexing with a prefix of e.g. 5 characters takes 1.0 seconds (the prefix is used to index "partial" words efficiently) No stemming is used (so only "exact" matches will work). I am still searching for a stemmer for Dutch language. Search results with and without stars. sphinx-0.9.8/test$ /usr/local/bin/search "scripting" | head -9 Sphinx 0.9.8-dev (r1065) Copyright (c) 2001-2008, Andrew Aksyonoff using config file './sphinx.conf'... index 'el_index1': query 'scripting ': returned 82 matches of 82 total in 0.000 sec displaying matches: 1. document=2, weight=2 id=2 sphinx/sphinx-0.9.8/test$ /usr/local/bin/search "scriptin" | head -9 Sphinx 0.9.8-dev (r1065) Copyright (c) 2001-2008, Andrew Aksyonoff using config file './sphinx.conf'... index 'el_index1': query 'scriptin ': returned 0 matches of 0 total in 0.000 sec words: 1. 'scriptin': 0 documents, 0 hits ==> zero matches since the last letter is removed peterv at debian-new:~/data/biz/allejobsinleuven/development/sphinx/sphinx-0.9.8/test$ /usr/local/bin/search "scriptin*" | head -9 Sphinx 0.9.8-dev (r1065) Copyright (c) 2001-2008, Andrew Aksyonoff using config file './sphinx.conf'... index 'el_index1': query 'scriptin* ': returned 82 matches of 82 total in 0.000 sec displaying matches: 1. document=2, weight=2 id=2 ==> the exact same matches are found back with the star peterv at debian-new:~/data/biz/allejobsinleuven/development/sphinx/sphinx-0.9.8/test$ /usr/local/bin/search "scripti*" | head -9 Sphinx 0.9.8-dev (r1065) Copyright (c) 2001-2008, Andrew Aksyonoff using config file './sphinx.conf'... index 'el_index1': query 'scripti* ': returned 82 matches of 82 total in 0.000 sec displaying matches: 1. document=2, weight=2 id=2 peterv at debian-new:~/data/biz/allejobsinleuven/development/sphinx/sphinx-0.9.8/test$ /usr/local/bin/search "script*" | head -9 Sphinx 0.9.8-dev (r1065) Copyright (c) 2001-2008, Andrew Aksyonoff using config file './sphinx.conf'... index 'el_index1': query 'script* ': returned 1000 matches of 1123 total in 0.000 sec displaying matches: 1. document=49, weight=3 id=49 peterv at debian-new:~/data/biz/allejobsinleuven/development/sphinx/sphinx-0.9.8/test$ /usr/local/bin/search "scri*" | head -9 Sphinx 0.9.8-dev (r1065) Copyright (c) 2001-2008, Andrew Aksyonoff using config file './sphinx.conf'... index 'el_index1': query 'scri* ': returned 1000 matches of 1123 total in 0.000 sec displaying matches: 1. document=49, weight=3 id=49 peterv at debian-new:~/data/biz/allejobsinleuven/development/sphinx/sphinx-0.9.8/test$ /usr/local/bin/search "scr*" | head -9 Sphinx 0.9.8-dev (r1065) Copyright (c) 2001-2008, Andrew Aksyonoff using config file './sphinx.conf'... index 'el_index1': query 'scr* ': returned 0 matches of 0 total in 0.000 sec words: 1. 'scr*': 0 documents, 0 hits ==> if the amount of letters gets too small, no matches. HTH, Peter From peter at 10-forward.be Thu Jan 24 03:48:51 2008 From: peter at 10-forward.be (Peter De Berdt (10-forward)) Date: Thu, 24 Jan 2008 09:48:51 +0100 Subject: [Brug-talk] Ferret - sphinx - Solr In-Reply-To: References: <180FDA15-CD12-4699-A235-14C2AA534074@10-forward.be> <17DF0EE9-3760-478E-AC85-DA5A814CBB8C@10-forward.be> Message-ID: <1673695F-35CD-4911-95E6-82267025513C@10-forward.be> On 23 Jan 2008, at 23:19, Peter Vandenabeele wrote: > I am now working my way through the Sphinx 0.9.8 config file to put > more details > in it than the default test1 / documents example and bumped into > this. From the > last line (about backwards compatibility), I may understand that > this is a new > feature in the 0.9.8 version. You're right, I saw the allow_star config parameter and had set it, but with no result. I must have made a mistake somewhere. Which brings me to a new problem: indexing virtual attributes, both on the main record and related records. It does happen quite often for us that certain fields are stored in English in the database (by internal convention), but should be stored as the combination of all available languages in the application in the index. E.g. the database will hold a boolean "published" which will be set to either true or false, but the index holds "gepubliceerd published publi?". I know I could use a before_save filter to save the virtual attribute to the database and then index it that way, but it's something I'd really rather avoid. This is just a basic example, but there are cases that would be more complex. Have I missed something here or is it effectively impossible to use one of the sphinx plugins to include virtual attributes in the index? Best regards. Peter De Berdt ______________________ 10-forward Zwarteweg 28 B-8433 Middelkerke Mobile : (0473) 38 35 86 info at 10-forward.be http://www.10-forward.be ______________________ -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/brug-talk/attachments/20080124/9bfe4e42/attachment.html From peter at vandenabeele.com Thu Jan 24 04:22:15 2008 From: peter at vandenabeele.com (Peter Vandenabeele) Date: Thu, 24 Jan 2008 10:22:15 +0100 Subject: [Brug-talk] Ferret - sphinx - Solr In-Reply-To: <1673695F-35CD-4911-95E6-82267025513C@10-forward.be> References: <180FDA15-CD12-4699-A235-14C2AA534074@10-forward.be> <17DF0EE9-3760-478E-AC85-DA5A814CBB8C@10-forward.be> <1673695F-35CD-4911-95E6-82267025513C@10-forward.be> Message-ID: On Jan 24, 2008 9:48 AM, Peter De Berdt (10-forward) wrote: > Which brings me to a new problem: indexing virtual attributes, both on the > main record and related records. I will face the same problem, once we get to adding tags (both synomyns and multiple language aspects). > It does happen quite often for us that certain fields are stored in English > in the database (by internal convention), but should be stored as the > combination of all available languages in the application in the index. E.g. > the database will hold a boolean "published" which will be set to either > true or false, but the index holds "gepubliceerd published publi?". I know I > could use a before_save filter to save the virtual attribute to the database > and then index it that way, but it's something I'd really rather avoid. This > is just a basic example, but there are cases that would be more complex. > > Have I missed something here or is it effectively impossible to use one of > the sphinx plugins to include virtual attributes in the index? Could the "synonyms" feature at the level of the search engine be useful ? For sphinx, I did not see it in the main documentation, but this thread eventually shows the solution in sphinx: http://www.sphinxsearch.com/forum/view.html?id=1165 For tsearch2, the synonyms concept is clearly marked in the documentation under the section: "Dictionaries": http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/docs/tsearch2-ref.html I have not tried that, but for pre-defined tags (a limited, controlled set), I will certainly to look into it (e.g. bedrijfswagen = company_car = firmawagen ...). BTW, does anyone have stemmers, stopword files, synonyms in Dutch ? HTH, Peter From peter at vandenabeele.com Mon Jan 28 16:16:50 2008 From: peter at vandenabeele.com (Peter Vandenabeele) Date: Mon, 28 Jan 2008 22:16:50 +0100 Subject: [Brug-talk] Ferret - sphinx - Solr In-Reply-To: References: <180FDA15-CD12-4699-A235-14C2AA534074@10-forward.be> <17DF0EE9-3760-478E-AC85-DA5A814CBB8C@10-forward.be> Message-ID: On Jan 24, 2008 1:28 AM, Peter Vandenabeele wrote: > On Jan 23, 2008 11:19 PM, Peter Vandenabeele wrote: > > 2008/1/21 Peter De Berdt (10-forward) : > > > Sadly, sphinx is out of the question, it doesn't support wildcard searches > > > as our customers are used to apparently, searches for "pet*" don't return > > > "Peter" records for example. I may have missed something, but this is a > > > feature we can't live without. > > Test set of 6500 records with 3 fields of approx. 800 bytes/records (4.8 MBytes) > (using a few bash and linux manuals as source text). I have some more test data on multi-table indexing and searching in a somewhat larger set (34,000) with a belongs_to relationship to the first (6,000) records set. Might be useful even if the data is not very "scientific" and my machine was also swapping to disk during the first half of the testsm which may explain some of the 20 - 30 ms search time that i sometimes saw. http://www.vandenabeele.com/Ultrasphinx-performance HTH, Peter