From plynchnlm at gmail.com Mon Mar 2 17:17:49 2009 From: plynchnlm at gmail.com (Paul Lynch) Date: Mon, 2 Mar 2009 17:17:49 -0500 Subject: [Ferret-talk] Wildcard trouble In-Reply-To: <4B3E5C2D-538C-4C17-866A-816EE6483BA4@jkraemer.net> References: <50d6c72a0901061617l14e9248s63fea7330d26cfd0@mail.gmail.com> <4B3E5C2D-538C-4C17-866A-816EE6483BA4@jkraemer.net> Message-ID: <50d6c72a0903021417p573885b7n38aa744e56945d1a@mail.gmail.com> Jens, I think you are right. There are 932 terms matching c* in my data table. (The rest of my query is simple-- just onr or two other terms without wildcards). I tried setting the value of default_max_terms, but it did not seem to have any effect. (I think I was setting it correctly, because I tried assigning a negative number and it immediately complained.) However, now that I know what that is doing, I'm not sure I want to increase the value. Anyway, thank you very much for your help in sorting this out. --Paul On Tue, Feb 10, 2009 at 4:15 AM, Jens Kraemer wrote: > Hi Paul, > > On 07.01.2009, at 01:17, Paul Lynch wrote: > > Hi-- I just ran into an odd situation. If I do a search including the >> term: >> c* - I get 4 hits >> ca* - I get the same 4 documents >> co* - I get one new document, not found by c* >> >> Does anyone know what might be going on, or have suggestions for >> debugging? >> > > How does your full query look like? Ferret has a built in default limit of > 512 for the number of terms wildcard queries (and other MultiTermQueries) > can get expanded to. Any more terms matching your criteria will be dropped > then, keeping the most relevant 512 terms. > You can override this value by specifying a max_terms value when > constructing the query via the API: > > query = WildcardQuery.new(:field, "c*", > :max_terms => 1024) > > you might also try monkey patching the > Ferret::Search::MultiTermQuery::default_max_terms method to return your > custom limit so you dont need to use the query API to construct your queries > (i.e. with aaf which doesn't reliably work with query objects due to the DRb > stuff involved). > > It *might* also be a bug in Ferret - if the above doesn't help, can you > reproduce this with a simple test case? > > cheers, > Jens > > -- > Jens Kr?mer > Finkenlust 14, 06449 Aschersleben, Germany > VAT Id DE251962952 > http://www.jkraemer.net/ - Blog > http://www.omdb.org/ - The new free film database > > > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > -- Paul Lynch Aquilent, Inc. National Library of Medicine (Contractor) -------------- next part -------------- An HTML attachment was scrubbed... URL: From plynchnlm at gmail.com Tue Mar 10 13:28:31 2009 From: plynchnlm at gmail.com (Paul Lynch) Date: Tue, 10 Mar 2009 18:28:31 +0100 Subject: [Ferret-talk] Are document boost and sort incompatible? Message-ID: <5cd4a26695725982244cdb906a2f03b9@ruby-forum.com> I am using document boosting (following Jens' suggestion here: http://www.ruby-forum.com/topic/84358) but am also trying to sort the returned results, so I was using the :sort option. In my testing, using a wildcard search, it appears that using the :sort option causes Ferret to ignore document boosts and just return the first N hits it finds by looking alphabetically. Is that the expected behavior? If I want the N most relevant documents, do I need to sort them myself after getting them from Ferret? Thanks, --Paul -- Posted via http://www.ruby-forum.com/. From kraemer at webit.de Tue Mar 10 13:55:45 2009 From: kraemer at webit.de (=?ISO-8859-1?Q?Jens_Kr=E4mer?=) Date: Tue, 10 Mar 2009 18:55:45 +0100 Subject: [Ferret-talk] Are document boost and sort incompatible? In-Reply-To: <5cd4a26695725982244cdb906a2f03b9@ruby-forum.com> References: <5cd4a26695725982244cdb906a2f03b9@ruby-forum.com> Message-ID: Hi Paul, I think your assumption is correct - Ferret by default sorts by score, and by specifying an alternative sorting you replace that default sorting with something else. Cheers, Jens On 10.03.2009, at 18:28, Paul Lynch wrote: > I am using document boosting (following Jens' suggestion here: > http://www.ruby-forum.com/topic/84358) but am also trying to sort the > returned results, so I was using the :sort option. In my testing, > using > a wildcard search, it appears that using the :sort option causes > Ferret > to ignore document boosts and just return the first N hits it finds by > looking alphabetically. Is that the expected behavior? If I want > the N > most relevant documents, do I need to sort them myself after getting > them from Ferret? > > Thanks, > --Paul > -- > Posted via http://www.ruby-forum.com/. > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk > -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49351467660 | Telefax +493514676666 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold -------------- next part -------------- A non-text attachment was scrubbed... Name: PGP.sig Type: application/pgp-signature Size: 194 bytes Desc: This is a digitally signed message part URL: From jonifel at gmail.com Wed Mar 18 09:24:42 2009 From: jonifel at gmail.com (Jon Felsing) Date: Wed, 18 Mar 2009 14:24:42 +0100 Subject: [Ferret-talk] Acts_as_ferret: Slow ferret_update with associated models Message-ID: <354f24f4131145d6814186a41f984fe7@ruby-forum.com> Hi, I am using some custom fields for ferret indexing to include fields from associated models. When those child models are updated, I do ferret_update on the parent model. Unfortunately, that takes very long, depending on the number of associated child models. That is because every child model is called to re-index the parent, instead of just the changed one. What can I do to speed up re-indexing? I am using Rails 2.1.2, Ferret 0.11.6 and ActsAsFerret 0.4.2. Thanks for your advice! Best Jon -- Posted via http://www.ruby-forum.com/. From zijian.huang at etrade.com Thu Mar 19 14:12:30 2009 From: zijian.huang at etrade.com (Huang, Zijian(Victor)) Date: Thu, 19 Mar 2009 14:12:30 -0400 Subject: [Ferret-talk] Indexing the local file system Message-ID: <0BB549C6E74E24409FB20B3B1D1B66440F24B5BA@ATL1EX11.corp.etradegrp.com> Hi, all: I am new to Ferret, can anyone please tell me what do I do to index some text files in a local directory? Thanks Victor -------------- next part -------------- An HTML attachment was scrubbed... URL: From hgs at dmu.ac.uk Thu Mar 19 14:42:52 2009 From: hgs at dmu.ac.uk (Hugh Sasse) Date: Thu, 19 Mar 2009 18:42:52 +0000 (GMT) Subject: [Ferret-talk] Indexing the local file system In-Reply-To: <0BB549C6E74E24409FB20B3B1D1B66440F24B5BA@ATL1EX11.corp.etradegrp.com> References: <0BB549C6E74E24409FB20B3B1D1B66440F24B5BA@ATL1EX11.corp.etradegrp.com> Message-ID: On Thu, 19 Mar 2009, Huang, Zijian(Victor) wrote: > Hi, all: > I am new to Ferret, can anyone please tell me what do I do to index > some text files in a local directory? I got started with the book (not to hand) and modifying some code, which I then put here: http://www.cse.dmu.ac.uk/~hgs/ruby/#ff.rb Hopefully that will do most of what you want. > > Thanks > > Victor > Hugh From jk at jkraemer.net Thu Mar 19 15:03:20 2009 From: jk at jkraemer.net (Jens Kraemer) Date: Thu, 19 Mar 2009 20:03:20 +0100 Subject: [Ferret-talk] Indexing the local file system In-Reply-To: <0BB549C6E74E24409FB20B3B1D1B66440F24B5BA@ATL1EX11.corp.etradegrp.com> References: <0BB549C6E74E24409FB20B3B1D1B66440F24B5BA@ATL1EX11.corp.etradegrp.com> Message-ID: <4176C9D3-6CF1-4CEB-AFEB-31CAC4A87DBE@jkraemer.net> Hi! On 19.03.2009, at 19:12, Huang, Zijian(Victor) wrote: > Hi, all: > I am new to Ferret, can anyone please tell me what do I do to > index some text files in a local directory? > Have a look at either RDig (rdig.rubyforge.org) or the FerretFinder project (http://www.methods.co.nz/ff/) - each of these projects do what you want (and are open source). Or, even better, get the Ferret Book (available as PDF or in Print from O'Reilly) for a complete reference including many examples (afair there's also an example that deals with indexing files). Basically it boils down to - create new ferret index - for each file: - get textual content (involves conversion from pdf, word or whatever file format to plain text) - create ferret document with textual content and filename - add ferret document to ferret index - close ferret index Cheers, Jens -- Jens Kr?mer Finkenlust 14, 06449 Aschersleben, Germany VAT Id DE251962952 http://www.jkraemer.net/ - Blog http://www.omdb.org/ - The new free film database -------------- next part -------------- A non-text attachment was scrubbed... Name: PGP.sig Type: application/pgp-signature Size: 194 bytes Desc: This is a digitally signed message part URL: From zijian.huang at etrade.com Thu Mar 19 17:29:00 2009 From: zijian.huang at etrade.com (Huang, Zijian(Victor)) Date: Thu, 19 Mar 2009 17:29:00 -0400 Subject: [Ferret-talk] Indexing the local file system In-Reply-To: <4176C9D3-6CF1-4CEB-AFEB-31CAC4A87DBE@jkraemer.net> References: <0BB549C6E74E24409FB20B3B1D1B66440F24B5BA@ATL1EX11.corp.etradegrp.com> <4176C9D3-6CF1-4CEB-AFEB-31CAC4A87DBE@jkraemer.net> Message-ID: <0BB549C6E74E24409FB20B3B1D1B66440F24B8C2@ATL1EX11.corp.etradegrp.com> Great! Thanks for the helps Vic -----Original Message----- From: ferret-talk-bounces at rubyforge.org [mailto:ferret-talk-bounces at rubyforge.org] On Behalf Of Jens Kraemer Sent: Thursday, March 19, 2009 12:03 PM To: ferret-talk at rubyforge.org Subject: Re: [Ferret-talk] Indexing the local file system Hi! On 19.03.2009, at 19:12, Huang, Zijian(Victor) wrote: > Hi, all: > I am new to Ferret, can anyone please tell me what do I do to > index some text files in a local directory? > Have a look at either RDig (rdig.rubyforge.org) or the FerretFinder project (http://www.methods.co.nz/ff/) - each of these projects do what you want (and are open source). Or, even better, get the Ferret Book (available as PDF or in Print from O'Reilly) for a complete reference including many examples (afair there's also an example that deals with indexing files). Basically it boils down to - create new ferret index - for each file: - get textual content (involves conversion from pdf, word or whatever file format to plain text) - create ferret document with textual content and filename - add ferret document to ferret index - close ferret index Cheers, Jens -- Jens Kr?mer Finkenlust 14, 06449 Aschersleben, Germany VAT Id DE251962952 http://www.jkraemer.net/ - Blog http://www.omdb.org/ - The new free film database From zijian.huang at etrade.com Thu Mar 19 17:32:44 2009 From: zijian.huang at etrade.com (Huang, Zijian(Victor)) Date: Thu, 19 Mar 2009 17:32:44 -0400 Subject: [Ferret-talk] Crawler for Ferret Message-ID: <0BB549C6E74E24409FB20B3B1D1B66440F24B8CD@ATL1EX11.corp.etradegrp.com> Hi, guys: Can you please recommend a good crawler for Ferret? Nutch is pretty powerful in the Java side, do we have some thing is similar in Ruby? It will be great if the crawler also handlers incremental index update easily. Thanks Victor -------------- next part -------------- An HTML attachment was scrubbed... URL: From kraemer at webit.de Thu Mar 19 18:17:52 2009 From: kraemer at webit.de (=?ISO-8859-1?Q?Jens_Kr=E4mer?=) Date: Thu, 19 Mar 2009 23:17:52 +0100 Subject: [Ferret-talk] Crawler for Ferret In-Reply-To: <0BB549C6E74E24409FB20B3B1D1B66440F24B8CD@ATL1EX11.corp.etradegrp.com> References: <0BB549C6E74E24409FB20B3B1D1B66440F24B8CD@ATL1EX11.corp.etradegrp.com> Message-ID: <8CDD0D63-36C0-495B-969B-30696664BBA0@webit.de> On 19.03.2009, at 22:32, Huang, Zijian(Victor) wrote: > Hi, guys: > Can you please recommend a good crawler for Ferret? Nutch is > pretty powerful in the Java side, do we have some thing is similar > in Ruby? It will be great if the crawler also handlers incremental > index update easily. > RDig can do http crawling, but cannot really be compared with Nutch feature- and performance wise as it was designed for intranet use, say indexing the web pages of a few hosts. Cheers, Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49351467660 | Telefax +493514676666 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold -------------- next part -------------- A non-text attachment was scrubbed... Name: PGP.sig Type: application/pgp-signature Size: 194 bytes Desc: This is a digitally signed message part URL: From timg at catalyst.net.nz Thu Mar 19 18:38:54 2009 From: timg at catalyst.net.nz (Timothy Goddard) Date: Fri, 20 Mar 2009 11:38:54 +1300 Subject: [Ferret-talk] Crawler for Ferret In-Reply-To: <0BB549C6E74E24409FB20B3B1D1B66440F24B8CD@ATL1EX11.corp.etradegrp.com> References: <0BB549C6E74E24409FB20B3B1D1B66440F24B8CD@ATL1EX11.corp.etradegrp.com> Message-ID: <200903201138.55062.timg@catalyst.net.nz> I wrote one called Suckr. http://goddard.net.nz/projects/suckr/ It does the crawling, including incremental update and provides a command line search interface. I've had some periodic stability issues with this on the old Debian box I've been using it on myself - please test thoroughly. It has some documentation in the README file. Please let me know if you have any questions. Cheers, Tim On Friday 20 March 2009 Huang, Zijian(Victor) wrote: > Hi, guys: > Can you please recommend a good crawler for Ferret? Nutch is pretty > powerful in the Java side, do we have some thing is similar in Ruby? It > will be great if the crawler also handlers incremental index update > easily. > > Thanks > > Victor -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: This is a digitally signed message part. URL: From hgs at dmu.ac.uk Fri Mar 20 08:34:37 2009 From: hgs at dmu.ac.uk (Hugh Sasse) Date: Fri, 20 Mar 2009 12:34:37 +0000 (GMT) Subject: [Ferret-talk] Crawler for Ferret In-Reply-To: <0BB549C6E74E24409FB20B3B1D1B66440F24B8CD@ATL1EX11.corp.etradegrp.com> References: <0BB549C6E74E24409FB20B3B1D1B66440F24B8CD@ATL1EX11.corp.etradegrp.com> Message-ID: On Thu, 19 Mar 2009, Huang, Zijian(Victor) wrote: > Hi, guys: > Can you please recommend a good crawler for Ferret? Nutch is pretty > powerful in the Java side, do we have some thing is similar in Ruby? It > will be great if the crawler also handlers incremental index update > easily. And then this shows up in my news feeds: http://www.rubyinside.com/building-a-search-engine-in-200ish-lines-of-ruby-1655.html I've not followed the links off it, though, so YMMV. > > Thanks > > Victor > > Hugh From patcito at gmail.com Mon Mar 23 13:45:11 2009 From: patcito at gmail.com (Patrick Aljord) Date: Mon, 23 Mar 2009 12:45:11 -0500 Subject: [Ferret-talk] porting to Ruby 1.9? Message-ID: <6b6419750903231045k189898f4u50191ee188417aa@mail.gmail.com> Hi, Is there anyone working on porting ferret to ruby 1.9.1? Thanks in advance. Pat -------------- next part -------------- An HTML attachment was scrubbed... URL: