[Ferret-talk] Ferret 0.11.4.win32 indexing speed vs Ferret0.10.9.win32
Neville.Burnell at bmsoft.com.au
Thu Apr 12 20:32:24 EDT 2007
> I haven't really looked at the performance in Windows. A few questions
> here might allow me to fix this problem. Are you using the Index class
> or the IndexWriter class? What parameters are you passing to the
> indexer? I'll see what I can do but I can't promise anything.
I'm using IndexWriter.add_document(doc)
For the purposes of the timing comparison, I'm using an empty directory,
and passing :create => true and a :field_infos hash which details
certain fields which indexes but not stored, or vice versa.
> it shouldn't be slower for bulk updates.
I hope I haven't misused "bulk"
> Actually, looking at your times, it seems like you may not
> have the optimal settings for indexing as even 297 seconds seems
> like a long time to index 35,000 documents although it depends on the
> documents and where they are coming from. If you give me a little more
> information I may be able to help you speed this up.
Thanks Dave. I'm generating the index for rows from a SQL database and
in general I'm ok with the 297 secs for 35,000 docs, but a 3x hit does
hurt somewhat, particularly for larger SQL databases.
The logic goes something like this:
Create new ferret index
Connect to SQL dbms
For t in table[1..n] do
For row in resultset do
Each row retrieved from the SQL dbms is a hash of up to 30 fields, and
some fields are longish text [3000chars].
For a baseline, if I comment out the IndexWriter.add_document(row) then
the SQL part of the process only takes around 12 secs, so most of the
work is done by add_document I think.
Thanks for your help,
More information about the Ferret-talk