[Backgroundrb-devel] Memory leak and long process problem

hemant gethemant at gmail.com
Thu Jan 3 16:32:49 EST 2008

Hi Dave,

On Jan 4, 2008 1:40 AM, Dave Dupre <gobigdave at gmail.com> wrote:
> I use backgroundrb for many long tasks in my system, but I'm having issues
> with one in particular.  Two large tasks for me are importing people and
> updating companies.

So method below is invoked from rails using ask_work command, right?
Are you by any chance, passing uploaded  file itself to worker? If
yes, preferably don't do that. Save the file somewhere or in db and
pas the location.

And why do you need, thread_pool there? Do you want concurrent
execution of tasks, or you just want a worker queue?

>   def import_contacts(args = nil)
>     thread_pool.defer(args) do |job_id|
>       begin
>         job = ImportJob.find(job_id)
>         job.process_job
>       rescue => err
>         logger.error "MscWorker#import_contacts failed! #{err.class}:
> #{err}"
>       end
>     end
>   end

Similar doubts as previous worker method.

>   def update_company_from_vendor(args = nil)
>     thread_pool.defer(args) do |company_id|
>       begin
>         company = Company.find(company_id)
>         info = company.firm_info_from_vendor  # webservice call to vendor
>         if info && info.size == 1
>           company.update_from_vendor!(Company.find_firm_info_details
> _from_vendor(info[0]))  # webservice call to vendor
>         end
>       rescue => err
>         logger.error "MscWorker#update_company_from_vendor failed!
> #{err.class}: #{err}"
>       end
>     end
>   end
> Part of import_contacts will result in many ask_work calls to
> update_company_from_vendor while it is processing.  Importing contacts is
> heavily db dependent, but not very code intensive.  If I upload two files
> with > 1000 contacts each (two ask_work calls to import_contacts), things
> will progress along and then pause for 20-40 seconds.  There is no DB
> activity during the pause, but the backgroundrb process is using most of CPU
> (98-99%).  There are no deadlock errors when things startup again, but it
> really slows things down.  Are you using polling somewhere?
> Also, on my Mac, Activity Monitor is only showing 1 thread and 1.2 Gig(!!)
> of memory used.  I expected to see many threads due to my use of
> thread_pool.
> Since all of my processing code is in models, it is very easy to switch to
> synchronous execution.  When I execute job.process_job (see
> import_contacts), things never pause, and the ruby process never gets over
> 120Meg in size.
> This all leaves me with two questions:
> 1. Sure looks like there is a serious memory leak someplace, but I don't
> think it is my code.
> 2. What is the recommended method for this processing.  Currently, I have a
> single worker for my web app to call for background tasks -- each task is
> implemented as a thread pool.  I don't have much need for status since I can
> get the status from the database.  Should I change things to dynamically
> create workers?
> 3. I should repeat that I never saw multiple threads being created even
> though update_company_from_vendor was called 1500 times during one call to
> import_contacts.  update_company_from_vendor takes several seconds to
> execute so I know calls should have queued up.

Ruby uses green threads, so I don't think Activity Monitor will show
multiple created threads.Also, thread pool reaches its pool size
depending upon number of tasks in the queue. If queue is empty, only
one thread will be actually created initially.

Let them talk of their oriental summer climes of everlasting
conservatories; give me the privilege of making my own summer with my
own coals.


