[Rake-devel] Announcement: Alternate -j implementation

Hongli Lai hongli at phusion.nl
Thu Apr 19 14:34:08 UTC 2012


Have you also looked at http://drake.rubyforge.org? How does your
implementation differ from his?

On Thu, Apr 19, 2012 at 1:36 PM, Michael Bishop <mbishop at me.com> wrote:
> Hello Everyone,
>
> I've recently finished an implementation of -j <max_concurrent_jobs> and
> would be delighted if the members of this list would take a detailed look at
> it
>
> It's in this rake branch on github:
>
>
>  https://github.com/michaeljbishop/rake/commit/295c7a4d6d58b3e10c27b940a8259dd3e01c52f0
>  or
>    http://bit.ly/JMVARy
>
> In addition, I've added a pull-request for the inclusion of the change to
> the master branch (allowing of course for further changes to match the style
> and simplicity of the original)
>
>  https://github.com/jimweirich/rake/pull/112
>
> My apologies if this has been hashed out many times on this list. I'm hoping
> to offer a fresh look at the problem. Thank you for your consideration.
>
> Sincerely,
>
> Michael Bishop
>
>
> ---
>
>
> SUMMARY
> -------
>
> USER INTERFACE:
>
> The user-interface is simply a new -j flag that specifies the maximum number
> of tasks that can execute simultaneously (discussed here before as I've
> seen). If the -j flag is omitted, the old behavior of unlimited concurrent
> tasks is retained.
>
>
> IMPLEMENTATION:
>
> The implementation is inspired by Apple's Grand Central Dispatch model. The
> core of the problem is solved via changes to the file "multi-task.rb".
>
> The current implementation of MultiTask#invoke_prerequisites spawns a new
> thread for each prerequisite and then waits to return until all the threads
> have completed.
>
> In the alternate implementation, MultiTask#invoke_prerequisites creates a
> list of blocks inside which each prerequisite is called. Each block is then
> added to a thread-safe Queue for processing. A class-member ThreadPool is
> then expanded to include enough threads to consume and call the blocks, but
> only up to the limit as passed to -j.
>
> Then, rather than MultiTask#invoke_prerequisites sleeping by joining to the
> newly spawned threads, it participates in the processing of the queued
> blocks while it waits. This prevents deadlock situations should more threads
> call MultiTask#invoke_prerequisites.
>
> How then does MultiTask#invoke_prerequisites know when its prerequisites
> have finished? Before enqueing the prerequisite blocks, it surrounds the
> original prerequisite block with *another* block which maintains bookkeeping
> tasks as to when the prerequisite is completed and which thread is working
> on it. That block is what is added to the queue.
>
> Two conditions need to be met for MultiTask#invoke_prerequisites to return:
>
>   1 - It notices that its prerequisites have all been processed
>   2 - It notices there are no more blocks on the queue but its prerequisites
> are not yet finished. In this case, it joins those threads that are still
> executing its prerequisites.
>
> What is attractive to me about this implementation is that the flow retains
> the simplicity of the original: MultiTask#invoke_prerequisites sends all the
> prerequisites to be executed then waits until they are done.
>
>
>
> THE CASE FOR -J
> ---------------
>
> (Quoted from the pull-request)
>
> PROBLEM SUMMARY:
>
> Rake can be unusable for builds invoking large numbers of concurrent
> external processes.
>
> PROBLEM DESCRIPTION:
>
> Rake makes it easy to maximize concurrency in builds with its "multitask"
> function. When using rake to build non-ruby projects quite often rake needs
> to execute shell tasks to process files. Unfortunately, when executing
> multitasks, rake spawns a new thread for each task-prerequisite. This
> shouldn't cause problems when the build code is pure ruby (for green
> threads), but when the tasks are executing external processes, the sheer
> number of spawned processes can cause the machine to thrash. Additionally
> ruby can reach the maximum number of open files (presumably because it's
> reading stdout for all those processes).
>
> SOLUTION SUMMARY:
>
> This request includes the code to add support for a "--jobs NUMBER (-j)"
> command-line option to specify the number of simultaneous tasks to execute.
>
> SOLUTION:
>
> The solution creates a work queue to which blocks calling the
> task-prerequisites are added and a thread pool to process them. To prevent
> deadlock, the task that added the pre-requisites processes items on the
> queue (alongside the thread pool) until its prerequisites have been
> processed.
>
> To maintain backward compatibility, not passing -j reverts to the old
> behavior of unlimited concurrent tasks.
>
> REQUIREMENTS:
>
> The Ruby version requirements remain the same. "multi-task.rb" adds two new
> requirements: 'thread' and 'set'
>
>
> _______________________________________________
> Rake-devel mailing list
> Rake-devel at rubyforge.org
> http://rubyforge.org/mailman/listinfo/rake-devel



-- 
Phusion | Ruby & Rails deployment, scaling and tuning solutions

Web: http://www.phusion.nl/
E-mail: info at phusion.nl
Chamber of commerce no: 08173483 (The Netherlands)


More information about the Rake-devel mailing list