From leonardochen0 at gmail.com Wed Sep 1 07:47:05 2010 From: leonardochen0 at gmail.com (Leonardo Chen) Date: Wed, 1 Sep 2010 07:47:05 -0400 Subject: [Xapian-fu-discuss] Multiple files or a single one? Message-ID: Hello For the past two weeks I've been reading about acts_as_xapian to integrate it to my website. It works well if you want something quick, but I now realize it's now quite flexible, since it has only one database to handle every field of information for my models. (besides the fact that it has a problem updating the database on Windows, which I use as my developing machine) I recently learned about xapian_fu, and it looks very very promising. I wonder if having multiple databases for my models is a good idea and a best practice. For example, I have a catalog of all shops registered in the website. I plan to create a database to catalog only the store names. Another one for the store description field and keywords. Another one for store category, and finally one with all the fields (name, description, keywords and category). When I process a user search, I first run through the name database. I hope Xapian will return close spelling matches, and suggestions of store names. If the best match doesn't get a minimum score, I try the next database. Is this a good idea? Or is there a better way of doing this? How do I access the score (weight/percent) of the result items, by the way? Thanks Leo From leonardochen0 at gmail.com Wed Sep 1 08:11:34 2010 From: leonardochen0 at gmail.com (Leonardo Chen) Date: Wed, 1 Sep 2010 08:11:34 -0400 Subject: [Xapian-fu-discuss] Removing items from the DB Message-ID: Hello Any way to remove a document from the database? For example, let's say a document is removed from the system. Should I reprocess all of my documents to ensure that :id is not included in the search results? Would it be an easy thing to remove all references to that :id in the xapian db? Thanks Leo From john at johnleach.co.uk Mon Sep 13 06:23:42 2010 From: john at johnleach.co.uk (John Leach) Date: Mon, 13 Sep 2010 11:23:42 +0100 Subject: [Xapian-fu-discuss] Removing items from the DB In-Reply-To: References: Message-ID: <1284373422.25768.31.camel@dogen> On Wed, 2010-09-01 at 08:11 -0400, Leonardo Chen wrote: > Hello Hi Leo, > Any way to remove a document from the database? > For example, let's say a document is removed from the system. Should I > reprocess all of my documents to ensure that :id is not included in > the search results? > > Would it be an easy thing to remove all references to that :id in the xapian db? yeah, easy, just use the documents.delete method: http://rdoc.info/github/johnl/xapian-fu/master/XapianFu/XapianDocumentsAccessor#delete-instance_method xdb.documents.delete(12345). No need to rebuild the db! John. From john at johnleach.co.uk Wed Sep 15 06:45:08 2010 From: john at johnleach.co.uk (John Leach) Date: Wed, 15 Sep 2010 11:45:08 +0100 Subject: [Xapian-fu-discuss] Multiple files or a single one? In-Reply-To: References: Message-ID: <1284547508.4376.39.camel@dogen> On Wed, 2010-09-01 at 07:47 -0400, Leonardo Chen wrote: > Hello Hi Leo, > > For the past two weeks I've been reading about acts_as_xapian to > integrate it to my website. It works well if you want something quick, > but I now realize it's now quite flexible, since it has only one > database to handle every field of information for my models. > (besides the fact that it has a problem updating the database on > Windows, which I use as my developing machine) > > I recently learned about xapian_fu, and it looks very very promising. > I wonder if having multiple databases for my models is a good idea and > a best practice. For example, I have a catalog of all shops registered > in the website. I plan to create a database to catalog only the store > names. Another one for the store description field and keywords. > Another one for store category, and finally one with all the fields > (name, description, keywords and category). You can index all those fields into one database and run separate searches, limiting your search to each field each time. So you'd do the following searches: name:burgers description:burgers OR keywords:burgers category_name:burgers burgers though I think you should ideally search all fields at once and let Xapian return the best results. If you want the name field to be seen as more important than the other fields, then you'd set the weight for that field to be higher, and any matches in it will get a higher score than the others. XapianFu doesn't currently support specifying the weight on fields when indexing, but you should be able to do it at query time (though I can't find the syntax for that right now). I could add support for specifying the weight on fields quite easily, if you'd like. > When I process a user search, I first run through the name database. I > hope Xapian will return close spelling matches, and suggestions of > store names. If the best match doesn't get a minimum score, I try the > next database. > Is this a good idea? Or is there a better way of doing this? > > How do I access the score (weight/percent) of the result items, by the way? the search method returns a list of XapianDoc objects loaded with the :weight attribute: http://rdoc.info/github/johnl/xapian-fu/master/XapianFu/XapianDoc you can use the Xapian match set (accessible via the XapianFu::ResultSet) to convert it to a percent: result_set = db.search("burgers") result_set.each do |result| puts result_set.mset.convert_to_percent(result.weight) end http://xapian.org/docs/apidoc/html/classXapian_1_1MSet.html John.