[Rake-devel] Announcement: Alternate -j implementation

Michael Bishop mbishop at me.com
Thu Apr 19 15:48:03 UTC 2012


Hi Hongli Lai,

This is a good question. I only knew of the Drake project as of last night so I haven't had a long period of time to investigate its implementation.

My initial analysis based on the Drake official pull-request announcement indicates a few differences but please add corrections if I am misinformed. I might be.

Drake links:

  https://github.com/quix/rake/tree/mainline-merge-candidate#readme

  http://quix.github.com/rake/files/doc/parallel_rdoc.html


For the purposes of discussion, I'll refer to my submitted rake as "mjbrake" here.


Dependency differences:
-----------------------

Drake relies on an external gem named comp_tree for parallelism. mjbrake relies on no library but instead, like the original rake, leverages the stack to make sure all the prerequisites are executed in order.


Code size differences:
----------------------

Github shows the change for mjbrake at

  "Showing 3 changed files with 74 additions and 7 deletions."

Drake shows 79 lines (+257 lines of code for comp_tree). It's quite possible this is due to drake having a larger feature set. mjbrake is focused on changing multitask to limit the total number of concurrent tasks.


Implementation differences:
---------------------------

It's unclear to me, but it seems like Drake doesn't ask the rakefile author to specify what can be parallelized and instead, asks that the dependencies be correct and Drake will figure that out dynamically.

If this is true, I suspect the transition to a parallelized build might be more gentle to the user if they can build a rakefile starting with 'task' and migrate to 'multitask' as in the original rake. On the other hand, there is something elegant about not having to specify 'multitask' at all.

_ michael



On Apr 19, 2012, at 10:34 AM, Hongli Lai wrote:

> Have you also looked at http://drake.rubyforge.org? How does your
> implementation differ from his?
> 
> On Thu, Apr 19, 2012 at 1:36 PM, Michael Bishop <mbishop at me.com> wrote:
>> Hello Everyone,
>> 
>> I've recently finished an implementation of -j <max_concurrent_jobs> and
>> would be delighted if the members of this list would take a detailed look at
>> it
>> 
>> It's in this rake branch on github:
>> 
>> 
>>  https://github.com/michaeljbishop/rake/commit/295c7a4d6d58b3e10c27b940a8259dd3e01c52f0
>>  or
>>    http://bit.ly/JMVARy
>> 
>> In addition, I've added a pull-request for the inclusion of the change to
>> the master branch (allowing of course for further changes to match the style
>> and simplicity of the original)
>> 
>>  https://github.com/jimweirich/rake/pull/112
>> 
>> My apologies if this has been hashed out many times on this list. I'm hoping
>> to offer a fresh look at the problem. Thank you for your consideration.
>> 
>> Sincerely,
>> 
>> Michael Bishop
>> 
>> 
>> ---
>> 
>> 
>> SUMMARY
>> -------
>> 
>> USER INTERFACE:
>> 
>> The user-interface is simply a new -j flag that specifies the maximum number
>> of tasks that can execute simultaneously (discussed here before as I've
>> seen). If the -j flag is omitted, the old behavior of unlimited concurrent
>> tasks is retained.
>> 
>> 
>> IMPLEMENTATION:
>> 
>> The implementation is inspired by Apple's Grand Central Dispatch model. The
>> core of the problem is solved via changes to the file "multi-task.rb".
>> 
>> The current implementation of MultiTask#invoke_prerequisites spawns a new
>> thread for each prerequisite and then waits to return until all the threads
>> have completed.
>> 
>> In the alternate implementation, MultiTask#invoke_prerequisites creates a
>> list of blocks inside which each prerequisite is called. Each block is then
>> added to a thread-safe Queue for processing. A class-member ThreadPool is
>> then expanded to include enough threads to consume and call the blocks, but
>> only up to the limit as passed to -j.
>> 
>> Then, rather than MultiTask#invoke_prerequisites sleeping by joining to the
>> newly spawned threads, it participates in the processing of the queued
>> blocks while it waits. This prevents deadlock situations should more threads
>> call MultiTask#invoke_prerequisites.
>> 
>> How then does MultiTask#invoke_prerequisites know when its prerequisites
>> have finished? Before enqueing the prerequisite blocks, it surrounds the
>> original prerequisite block with *another* block which maintains bookkeeping
>> tasks as to when the prerequisite is completed and which thread is working
>> on it. That block is what is added to the queue.
>> 
>> Two conditions need to be met for MultiTask#invoke_prerequisites to return:
>> 
>>   1 - It notices that its prerequisites have all been processed
>>   2 - It notices there are no more blocks on the queue but its prerequisites
>> are not yet finished. In this case, it joins those threads that are still
>> executing its prerequisites.
>> 
>> What is attractive to me about this implementation is that the flow retains
>> the simplicity of the original: MultiTask#invoke_prerequisites sends all the
>> prerequisites to be executed then waits until they are done.
>> 
>> 
>> 
>> THE CASE FOR -J
>> ---------------
>> 
>> (Quoted from the pull-request)
>> 
>> PROBLEM SUMMARY:
>> 
>> Rake can be unusable for builds invoking large numbers of concurrent
>> external processes.
>> 
>> PROBLEM DESCRIPTION:
>> 
>> Rake makes it easy to maximize concurrency in builds with its "multitask"
>> function. When using rake to build non-ruby projects quite often rake needs
>> to execute shell tasks to process files. Unfortunately, when executing
>> multitasks, rake spawns a new thread for each task-prerequisite. This
>> shouldn't cause problems when the build code is pure ruby (for green
>> threads), but when the tasks are executing external processes, the sheer
>> number of spawned processes can cause the machine to thrash. Additionally
>> ruby can reach the maximum number of open files (presumably because it's
>> reading stdout for all those processes).
>> 
>> SOLUTION SUMMARY:
>> 
>> This request includes the code to add support for a "--jobs NUMBER (-j)"
>> command-line option to specify the number of simultaneous tasks to execute.
>> 
>> SOLUTION:
>> 
>> The solution creates a work queue to which blocks calling the
>> task-prerequisites are added and a thread pool to process them. To prevent
>> deadlock, the task that added the pre-requisites processes items on the
>> queue (alongside the thread pool) until its prerequisites have been
>> processed.
>> 
>> To maintain backward compatibility, not passing -j reverts to the old
>> behavior of unlimited concurrent tasks.
>> 
>> REQUIREMENTS:
>> 
>> The Ruby version requirements remain the same. "multi-task.rb" adds two new
>> requirements: 'thread' and 'set'
>> 
>> 
>> _______________________________________________
>> Rake-devel mailing list
>> Rake-devel at rubyforge.org
>> http://rubyforge.org/mailman/listinfo/rake-devel
> 
> 
> 
> -- 
> Phusion | Ruby & Rails deployment, scaling and tuning solutions
> 
> Web: http://www.phusion.nl/
> E-mail: info at phusion.nl
> Chamber of commerce no: 08173483 (The Netherlands)
> _______________________________________________
> Rake-devel mailing list
> Rake-devel at rubyforge.org
> http://rubyforge.org/mailman/listinfo/rake-devel



More information about the Rake-devel mailing list