[Backgroundrb-devel] Memory leak and long process problem

Dave Dupre gobigdave at gmail.com
Thu Jan 3 15:10:10 EST 2008


I use backgroundrb for many long tasks in my system, but I'm having issues
with one in particular.  Two large tasks for me are importing people and
updating companies.

  def import_contacts(args = nil)
    thread_pool.defer(args) do |job_id|
      begin
        job = ImportJob.find(job_id)
        job.process_job
      rescue => err
        logger.error "MscWorker#import_contacts failed! #{err.class}:
#{err}"
      end
    end
  end

  def update_company_from_vendor(args = nil)
    thread_pool.defer(args) do |company_id|
      begin
        company = Company.find(company_id)
        info = company.firm_info_from_vendor  # webservice call to vendor
        if info && info.size == 1
          company.update_from_vendor!(Company.find_firm_info_details_from_vendor(info[0]))
# webservice call to vendor
        end
      rescue => err
        logger.error "MscWorker#update_company_from_vendor failed! #{
err.class}: #{err}"
      end
    end
  end

Part of import_contacts will result in many ask_work calls to
update_company_from_vendor while it is processing.  Importing contacts is
heavily db dependent, but not very code intensive.  If I upload two files
with > 1000 contacts each (two ask_work calls to import_contacts), things
will progress along and then pause for 20-40 seconds.  There is no DB
activity during the pause, but the backgroundrb process is using most of CPU
(98-99%).  There are no deadlock errors when things startup again, but it
really slows things down.  Are you using polling somewhere?

Also, on my Mac, Activity Monitor is only showing 1 thread and 1.2 Gig(!!)
of memory used.  I expected to see many threads due to my use of
thread_pool.

Since all of my processing code is in models, it is very easy to switch to
synchronous execution.  When I execute job.process_job (see
import_contacts), things never pause, and the ruby process never gets over
120Meg in size.

This all leaves me with two questions:

1. Sure looks like there is a serious memory leak someplace, but I don't
think it is my code.

2. What is the recommended method for this processing.  Currently, I have a
single worker for my web app to call for background tasks -- each task is
implemented as a thread pool.  I don't have much need for status since I can
get the status from the database.  Should I change things to dynamically
create workers?

3. I should repeat that I never saw multiple threads being created even
though update_company_from_vendor was called 1500 times during one call to
import_contacts.  update_company_from_vendor takes several seconds to
execute so I know calls should have queued up.

Thoughts?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://rubyforge.org/pipermail/backgroundrb-devel/attachments/20080103/912d7699/attachment.html 


More information about the Backgroundrb-devel mailing list