[sup-talk] sup-sync and xapian memory usage
andrew at pimlott.net
Mon Sep 7 15:26:11 EDT 2009
On Mon, Sep 07, 2009 at 02:33:06PM -0400, Rich Lane wrote:
> Xapian keeps writes buffered in memory. Try setting the environment
> variable XAPIAN_FLUSH_THRESHOLD to a smaller value (the default is 10000
> documents) and see if that helps.
Thanks--it was hard for me to find that kind of information. I first
tried setting XAPIAN_FLUSH_THRESHOLD to 1, and sup-sync ran slowly and
just kept getting slower:
## read 139m (about 7%) @ 9.2m/s. 0:00:15 elapsed, about 0:03:21 remaining
## read 1238m (about 35%) @ 3.1m/s. 0:06:36 elapsed, about 0:12:08 remaining
I stopped at this point because it was taking too long. The memory use
seemed stable, but that could have been because it was making such slow
progress. I guess xapian gets a lot slower writing as the db grows?
That's a bit discouraging. Using ferret, sup-sync only dropped from
28.1m/s to 27.3m/s during its run. For reference, when I didn't set
XAPIAN_FLUSH_THRESHOLD, I was getting 35-36m/s until it ran out of
I then set XAPIAN_FLUSH_THRESHOLD to 100 and got more reasonable
results. It started at 25.6m/s and slowed to 17.8m/s. It stabilized at
around 41M virtual memory used and finished successfuly. I also note
that the memory use didn't jump during the finish-up phase ("Deleting
missing messages") as it had with ferret.
Finally, I set XAPIAN_FLUSH_THRESHOLD to 1000. It started at 34.6m/s
and dropped to 29.8m/s., stabilized at around 51M virtual memory, and
finished successfully. In this case, it stays faster than ferret, but
it sill bugs me that xapian still slows down while ferret doesn't.
So I conclude... I don't know what I conclude. Letting xapian use a lot
of memory sure helps its performance. And a big sup-sync should only
have to be done rarely. So maybe just document that those on low-memory
systems should consider using XAPIAN_FLUSH_THRESHOLD during sup-sync.
Thanks again for your help!
More information about the sup-talk