[Ferret-talk] Ferret 0.11.4.win32 indexing speed vs Ferret0.10.9.win32

Neville Burnell Neville.Burnell at bmsoft.com.au
Thu Apr 12 20:32:24 EDT 2007

> I haven't really looked at the performance in Windows. A few questions
> here might allow me to fix this problem. Are you using the Index class
> or the IndexWriter class? What parameters are you passing to the
> indexer? I'll see what I can do but I can't promise anything.

I'm using IndexWriter.add_document(doc) 

For the purposes of the timing comparison, I'm using an empty directory,
and passing :create => true and a :field_infos hash which details
certain fields which indexes but not stored, or vice versa.

> it shouldn't be slower for bulk updates. 

I hope I haven't misused "bulk" 

> Actually, looking at your times, it seems like you may not
> have the optimal settings   for indexing as even 297 seconds seems
> like a long time to index 35,000 documents although it depends on the
> documents and where they are coming from. If you give me a little more
> information I may be able to help you speed this up.

Thanks Dave. I'm generating the index for rows from a SQL database and
in general I'm ok with the 297 secs for 35,000 docs, but a 3x hit does
hurt somewhat, particularly for larger SQL databases. 

The logic goes something like this:

Create new ferret index
Connect to SQL dbms
For t in table[1..n] do
  Prepare sql
  For row in resultset do

Each row retrieved from the SQL dbms is a hash of up to 30 fields, and
some fields are longish text [3000chars]. 
For a baseline, if I comment out the IndexWriter.add_document(row) then
the SQL part of the process only takes around 12 secs, so most of the
work is done by add_document I think.

Thanks for your help,


More information about the Ferret-talk mailing list