[Rake-devel] ANN: (yet another) version of -j implementation (includes -m for multitask)

Michael Bishop mbishop at me.com
Fri Apr 27 13:54:53 UTC 2012

Hello Everyone,

My last implementation of -j had a fatal flaw which could have led to stack overflow for large amounts of tasks. I discovered this after prototyping a version with drake's feature of turning tasks into multitasks.

I have a new version with a rewritten MultiTask implementation which removes that flaw, is smaller, and easily fits into Rake code. In addition, I've added an optional "--multitask -m" flag to turn every task into a multitask (in direct homage to drake).


I've again added a pull-request for the inclusion of the change to the master branch (as always allowing for further changes to match the style and simplicity of the original)


Questions and comments as always, are welcome.


Michael Bishop



Rake can be unusable for builds invoking large numbers of concurrent external processes.


Rake makes it easy to maximize concurrency in builds with its "multitask" function. When using rake to build non-ruby projects quite often rake needs to execute shell tasks to process files. Unfortunately, when executing multitasks, rake spawns a new thread for each task-prerequisite. This shouldn't cause problems when the build code is pure ruby (for green threads), but when the tasks are executing external processes, the sheer number of spawned processes can cause the machine to thrash. Additionally ruby can reach the maximum number of open files (presumably because it's reading stdout for all those processes).


This request includes the code to add support for a `--jobs NUMBER (-j)` command-line option to specify the number of simultaneous tasks to execute.

  * To maintain backward compatibility, not passing `-j` reverts to the old behavior of unlimited concurrent tasks.

As a nod to [drake](http://drake.rubyforge.org), a `--multitask (-m)` flag is also included which when supplied, changes tasks into multitasks.


Rather than spawning a new thread per prerequisite `MultiTask` now sends its prerequisites to a `WorkerPool` object. `WorkerPool.new(n).execute_blocks` has the same semantics as `Thread.new`...`join` but caps the thread count at `n`.

### Core Change

    threads = @prerequisites.collect { |p|
      Thread.new(p) { |r| application[r, @scope].invoke_with_call_chain(args, invocation_chain) }
    threads.each { |t| t.join }


    @@wp ||= WorkerPool.new(application.options.thread_pool_size)
    blocks = @prerequisites.collect { |r|
      lambda { application[r, @scope].invoke_with_call_chain(args, invocation_chain) }
    @@wp.execute_blocks blocks

To support `-m`, the `MultiTask` implementation has moved to `Task#invoke_prerequisites_concurrently` and is called from `MultiTask#invoke_prerequisites`. This enables concurrent behavior for `Task`  when `-m` is used.

### Details

`WorkerPool#execute_blocks` adds the passed-in blocks to a queue, ensures there are enough threads to execute them (under the maximum), and sleeps the current thread until the blocks are processed.

This creates a few potential problems:

> What if all of the blocks then called `#execute_blocks`? Wouldn't that sleep all the threads?

Yes it would. This is solved as `#execute_blocks` removes the current thread from the thread pool just before it sleeps and creates a new one in its place. When all the blocks are processed, the current thread is added back to the pool (adjusting for the max-size). There are always enough available threads in the thread pool for processing.

> When do the threads shutdown?

`WorkerPool#execute_blocks` knows how many threads are waiting for their blocks to be processed. If, upon its awakening, it notices there are no threads waiting on blocks, it shuts down the thread pool.

### Statistics

     ---LINES--     ----LOC---
      old   new      old   new   File Name
     ----------     ----------   ----------
      598   605      477   484   lib/rake/application.rb
       16    13       11     8   lib/rake/multi_task.rb
      327   341      210   222   lib/rake/task.rb
            111             80   lib/rake/worker_pool.rb
     4264  4393     2696  2792   TOTAL
           +129            +96   SUMMARY


Tests are included for all new functionality


The Ruby version requirements remain the same. `lib/rake/worker_pool.rb` adds two new requirements: `thread` and `set`

