[Backgroundrb-devel] best approach to managing workers and getting status

hemant gethemant at gmail.com
Wed Jun 25 06:26:22 EDT 2008


On Wed, Jun 25, 2008 at 1:02 PM, Jack Nutting <jnutting at gmail.com> wrote:
> On Tue, Jun 24, 2008 at 7:26 PM, Frank Schwach <f.schwach at uea.ac.uk> wrote:
>> Jack,
>> I just found your interesting post in the archive and I would like to
>> come back to this. I need to implement something like this:
>>
>> I have some very long running tasks (several hours) that should run on a
>> remote machine and talk to the database on the Rails server. I need to
>> keep track of jobs including those that have been run in the past, so a
>> table for background jobs with their status as you describe would be the
>> best solution for me.
>>
>> I am just wondering whether backgroundrb wouldn't be a bit of an
>> overkill in the scenario you describe? In the new "Advanced Rails
>> Recipes" from the Pragmatic Programmers Bookshelf there is a recipe
>> using a simple daemonized ruby process that polls the database for
>> pending jobs and uses acts_as_state_machine to set the state of the jobs
>> (there is also a nice BackgrounDRb recipe in the book by the way).
>> I am just wondering if the daemonized process isn't easier to handle in
>> this case since you don't integrate your app with backgroundrb very
>> tightly anyway?
>>
>> I would be grateful for any suggestions because there seem to be lots of
>> possible solutions for this problem and some more or less well
>> documented plugins and I haven't used any of them before. I need a
>> simple and robust method that doesn't have too many dependencies and
>> doesn't require too much maintenance because I want to make the finished
>> app available for others to install on their local systems.
>
> This is an interesting question, Frank.  My usage of backgroundrb is
> somewhat of an edge case, and most of what I'm doing with it could
> definitely be done with a simpler system.  I initially chose
> backgroundrb for my project because it seemed to make the most sense
> at the time (for what I *thought* I needed; actual needs changed with
> further exploration of the problem space), and I was enough of a ruby
> newbie that it felt comfortable for me to have a packaged solution
> that (mostly) "just worked".  If I were starting from scratch today, I
> might make a different decision.
>
> However, it's not only inertia that keeps me using backgroundrb.  For
> one thing, backgroundrb does provide some handy things--centralized
> logging, IPC for storing runtime status info about my processes,
> etc--that would take some time for me to implement if I were rolling
> my own solutions with a daemonized script, and from my perspective
> that would be wasted time, since I have those things working today
> thanks to backgroundrb.  Another reason for me to keep it is that I
> have a few spots in my system where I'm considering using some of
> backgroundrb's other key features, like launching a short-lived
> process to handle something in response to some action happening in
> the main application
>

Well, I am working on couple of  new things with BackgrounDRb. Result
storage and retrieval is one of them,as I mentioned in earlier mails
and solicited opinions from fellows who are using bdrb. You can
checkout

http://github.com/gnufied/backgroundrb/commits/testcase

So whats there on this branch of BackgrounDRb which will become master
very soon.

1> True clustering system for clustering backgroundrb servers running
on N nodes. Tasks are dispatched in a round robin manner, but you can
specify the host on which you want execute task:

MiddleMan.worker(:foo_worker).async_some_work(:args => "lol")

^^ will choose any server in a round robin manner and run "some_work"
method in the specified worker. You can also specify:
:host => <local or all or "10.0.0.6:11001">

which overrides the default behaviour and run specified method on
local bdrb server, all bdrb server or specified server.

2> Clustering is failsafe and if one bdrb node goes down, all the
requests are immediately started to being routed to remaining servers.
Once that node comes up, it automatically starts participating in
clustering process.

3> Results can be stored in memcache and register_status method has
been replace by a "cache" object available in all workers. Hence you
can cache results with:

cache[@user.id] = some_data

in your workers and later you can retrieve results using:

MiddleMan.worker(:foo_worker).ask_result(@user.id)

I will seriously recommend using memcache if you are clustering bdrb
servers. Also, cache object's caching mechanism is completely thread
safe and hence can be used from within the thread pool or anywhere you
want.

4> Apart from memory based job queue that you can use with thread
pools, testcase branch implements database based job queues. So, to
enquue a particular task:

MiddleMan.worker(:foo_worker).enq_some_task(:job_key,args)

some_task method will be automatically called in first availbable
worker and task will be dequed from database.Also, jobs with duplicate
keys automatically get rejected.

Note that, above things are already working on test case branch. I
think, these features make bdrb a very compelling choice.

Some things that I will finish in a day or two:

5> Similar to worker method invocation, with each scheduled method,
you can specify host on which this task should run. For example, if
you have 5 bdrb servers and you have scheduled billing task to run
every sunday. Now, you don't want billing task to run on sunday on all
the servers. So, by default scheduled task will run on the server on
which its been created but you can specify host on which it should
run.


More information about the Backgroundrb-devel mailing list