[Backgroundrb-devel] Threadpool and queuing of tasks

Dave Dupre gobigdave at gmail.com
Wed Jan 23 09:28:12 EST 2008

I wasn't specific.  I am using the trunk version.

Yes, I want to limit methods to the number of threads they run.  For
example, I have a worker with many methods that I use for several longer
running tasks.  All methods are called from the web application so I use the
thread_pool to queue up requests.  However, during some processes, I may
queue up a few hundred calls to a couple of methods.  These methods use a
3rd party web service to gather information, and at a pool_size of 20 I
overload the 3rd party server.  Through trial and error, 2-3 is about as
many as I can run.  The problem now is that I have to reduce the threads for
every method in the worker or create another process.

Regarding the queue, are you saying just do something like:

def create(args = nil)
  # Fire everyone back up if the process is restarted
  jobs = Job.find(:all, :conditions => "state = 'queued' or state =
'processing'", :order => 'created_at ')
  jobs.each  { |j| my_worker(j.id)

def my_worker(args = nil)
  thread_pool.defer(args) do |job_id|
    job = Job.find(job_id, :lock => true)  # Lock the row till we update the
    job.process!  # assuming acts_as_state_machine (this will change the
state and process the job)

This I get, and in fact I started with an implementation just like this.
The reason I added the dispatcher is because of the way my app works.  My
app makes use of saved searches to connect users based on several
properties. The searches are very complex and are automatically kicked off
based on several triggers.  The problem with the above is that multiple
quick triggers will result in multiple calls to my_worker, and an expensive
operation runs multiple times.  However, if all the triggers do is update
the databased to change the state to 'queued' and wake up the dispatcher, I
will only have one call to my_worker (acts_as_state_machine will only change
the state to 'queued' if the current state is 'ready').  Obviously, I can
add a lot of logic to handle this, but using the persistent, database queue
seems so much simpler and safer.

Also, another reason for the dispatcher to allow for future scale. I got in
the habit of adding a dispatcher in cases like this to make it very simple
to have the dispatcher send tasks to multiple processing queues, often on
separate servers.  One dispatcher process can easily handle dozens of
processing servers (I've had as many as 75 processing servers without a


On Jan 23, 2008 12:30 AM, hemant <gethemant at gmail.com> wrote:

> Hi Dave,
> On Jan 23, 2008 3:13 AM, Dave Dupre <gobigdave at gmail.com> wrote:
> > I recently switched over to v1.0, and things are rolling along pretty
> well.
> Switch to trunk or 1.0.1. Preferably trunk, we fixed lots of issues in
> past.
> > However, one thing that has always been a little confusing to me is
> knowing
> > when to use thread_pool.  Since most of my bgrb workers are called from
> my
> > web app to process rather than being scheduled, I'm using the
> thread_pool
> > for every call.  Unfortunately, that means that I have to split up
> workers
> > by how many threads I can have.  It would be great if one worker could
> > partition a single thread pool among the methods.  I want to avoid too
> many
> > workers to keep the process count down.
> I don't think I follow you here. Since a worker comes with thread pool
> of size 20, you should be good to go. But obviously Ruby green threads
> don't offer you any parallel execution.
> Thread pools has been designed to run concurrent tasks (not parallel),
> it wouldn't be useful to have the ability to partition thread pool
> among methods.
> In fact, I don't follow that notion at all, you mean you want to
> assign number of threads for each method that you are invoking from
> rails?
> Can you just clarify things a bit?
> >
> > I'm now working on a new scheme that pushes this example.  Basically, I
> have
> > some long running, saved searches that are triggered by various events
> > throughout the site.  All I want my site to do is update a status that
> the
> > job is queued and have it picked up from there.  Here is where I run
> into
> > trouble, possibly because I've built too many systems like this that use
> > real queuing packages.  Here is what I want:
> >
> > Dispatch method (usually one thread is necessary):
> > 1. Find the oldest 'queued' record (make sure to find with :lock =>
> true)
> > 1a. If none, goto step 5
> > 2. Update status to 'processing'
> >  3. Send to search method
> > 4. Repeat 1
> > 5. Done
> >
> > Search method (many threads):
> > 1. Perform the search
> > 2. Update status to 'complete'
> > 3. Done
> >
> > The easy answer is to split these into two workers.  Set the pool_size
> of
> > Dispatch to 1, and Search to 5 or 10.  However, eating two processes
> (master
> > and worker) for something so simple as Dispatch seems like serious
> overkill
> > to me.  Since I currently run on one server, the extra processes cut
> into
> > the memory the main site wants.
> Again, I didn't quite follow you there, so let me just rephrase the
> thing that you want and if I understood it correctly.
> Basically you want to implement a queue, so when a task is submitted
> from rails the task gets queued.
> Now dispatch worker, finds the latest(or oldest?) task, and updates
> the status of the task as taken and hands over the tasks to search
> worker.
> Is that right?
> You can use Queue class for that purpose. And you don't need to poll
> on a queue, because if Queue is empty all the threads that reading
> from the queue are automatically blocked.
> bdrb thread pool implementation makes use of that. So, whats wrong
> with a thread pool of size 10 or 20 and add each task to that queue,
> when a task is removed from the queue, mark the status as processing
> and just go on with processing of the task. You don't need two
> processes for this.
> >
> > A related question is how to implement Dispatch without polling.  Call
> me
> > anal, but I feel dirty whenever I using polling, especially something
> that I
> > want to be picked up immediately.  Is there a way I can trigger it to
> run if
> > it isn't already?  The old bgrb had a singleton that let me do something
> > like that.
> >
> --
> Let them talk of their oriental summer climes of everlasting
> conservatories; give me the privilege of making my own summer with my
> own coals.
> http://gnufied.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://rubyforge.org/pipermail/backgroundrb-devel/attachments/20080123/09a26c56/attachment-0001.html 

More information about the Backgroundrb-devel mailing list