[Rake-devel] Parallel tasks with Rake

James M. Lawrence quixoticsycophant at gmail.com
Thu Sep 11 03:59:08 EDT 2008


On Thu, Sep 11, 2008 at 12:44 AM, Jim Weirich <jim.weirich at gmail.com> wrote:
> On Sep 4, 2008, at 2:33 AM, James M. Lawrence wrote:
>
> Hi James,
>
> Saw your announcement on ruby-talk and want to say good job on getting drake
> out.  Now that 0.8.2 is released, I've taken some time to look at some of
> what you've done.  It looks impressive.
Thanks.

> (1) Are you using Ruby threads or processes for the parallelism?

Threads.  CompTree (http://comptree.rubyforge.org) has an option to
fork nodes, but I haven't enabled it for Drake.  Since I expect -j to
be commonly used for compiling, forking would be redundant anyway.
Yet it's ready to go with the options [-k, --fork] in rake.rb, just
commented out.  Especially for the first release I didn't see a
compelling need to go there yet.  A single option -j felt nicer.

> (2) We should think about the sematics of the the command "rake -j2 a b"
>  Are "a" and "b" executed in parallel or sequentially.  It looks like the
> code base goes with sequentially, and I think this is the right choice.  But
> it may be worth a discussion.

Yes, I intentionally decided not to put a and b under the same parent
node.  On the command line we especially think sequentially.

> (3) I see a lot of the files are marked "GENERATED -- DO NOT EDIT".
>  Generated from what?  Will I be able to regenerate them if they need
> changing?  Would it be better to just use CompTree as a gem?

The master files are under contrib/comp_tree.  rake pull_contrib gets
the latest comp_tree from github then repackages it under the
Rake::CompTree in lib/rake/comp_tree.

I was never happy with this, btw.  I wanted to Rake to contain a fork
of CompTree while at the same time not conflicting with a regular
installation of CompTree.  I experimented with dynamically evaling it
into Rake::CompTree, but my method failed to act transitively to the
sub-sub module Quix (utilities).

The other reason was that I thought having external dependencies might
give you pause about merging into the mainline.  But if you are OK
with the dependency, we could use the gem.

Also, at the time I thought the CompTree API might change enough to
cause hassles being an external dependency.  But since versioning
seems to work OK in rubygems, it's kind of a moot point.  And CompTree
may be stable enough after all.

> (4) As far as I can tell, when running with num_threads > 1, you invoke all
> the tasks and gather the task arguments.  Then you pass the task dependency
> graph off to the CompTree code to execute the code in parallel.  So all the
> code execution actually happens after ALL the invokes are done on the code,
> rather than intermingled in standard rake.  Is my understanding correct.
>  (if so, very interesting ... I'm thinking that if it wasn't for the need
> for the task arguments, you could skip the invoke step and pass the
> dependency graph immediately to your CompTree package, yes?)

Yes, excellent detective work.  I wrested with getting the task
arguments right on my own and finally gave up.  Actually I still don't
understand them -- they seemed to be context-dependent.  It was
amazing -- one unit test would fail and the other would succeed.  I
make a change, and now it's swapped!  The former succeeds and the
latter fails.

> (5) I see there is a synchronization lock in the invoke method.  Since this
> part of the code is executed by a single task (the main task), I'm not sure
> I see the need for a lock.  Am I missing something?

Calling invoke inside invoke seemed to be a problem, both practically
and theoretically.  Practically, the computation tree has already been
built, but now someone wants to build a new one with possibly
overlapping nodes.

On the theoretical side, I said in the readme: Parallelizing code
means surrendering control over the micro-management of its execution.
Manually invoking tasks inside other tasks is rather contrary to this
notion, throwing a monkey wrench into the system.

It seems the parallelizer cannot make good decisions if the user is
allowed to rearrange the furniture on a whim.  But I leave the door
open here -- I haven't fully considered invoke inside invoke.

> (6) Have you tried running any of this under Ruby 1.9?

The CompTree unit tests will core dump 1.9.  But those unit tests
pound extremely hard on the system, running many threads with many
forks on large trees.  It crashes cygwin too.  On darwin I have to
catch EAGAIN signals, but the tests all succeed.  I designed it this
way, of course, to see if I could shake out any race conditions or
whatever multi-threaded problems that might exist.

CompTree might be OK in 1.9 for a sunday drive.  I haven't tried it
with small Rakefiles cases yet.

> That's all for now.  Again, thanks for the work you put into this.  I'll
> probably have more questions later.

And thanks for Rake.  It is a pleasure to be involved.

James M. Lawrence


More information about the Rake-devel mailing list