[Rake-devel] Parallel tasks with Rake

James M. Lawrence quixoticsycophant at gmail.com
Thu Sep 11 14:49:18 EDT 2008

Though it would be a privilege to see my code merged into Rake, of
course a case would have to be made for it.  What are the problems it
solves?  I am not a good person to make the case, as I have no
experience with non-trivial use of 'multitask'.  But fortunately
others here do.

After all, Drake may be a totally misguided project.  That would be OK
with me, as my primary interest was in CompTree.  The implementation
of Drake is trivial, given CompTree.

Here are Ittay's points from a previous thread,

> 1. If some top level tasks run in parallel, and each of them
> recursively runs other tasks, and one of the bottom tasks fail, it
> is impossible to stop the other tasks, short of a very ugly abort of
> all threads.
> 2. Tasks that run in parallel can't tell when another task's execution
> has failed. They may read a wrong timestamp from the failed task.

CompTree uses Thread.abort_on_exception = true.  If something goes
wrong, why shouldn't we abort all threads?  I don't yet see the issue

That said, CompTree does give us an option.  A Rake task is translated
into a computation node which discards its result.  But we don't have
to discard the result.  We could wrap the task inside begin/rescue,
returning the Exception instance as the result.  CompTree 'computes'
the exception.

  # top-level
  result = driver.compute(root_node, :num_threads => n)
  if result and result.is_a? Exception
    raise result

> 3. Threads are created per prerequisite task, rather than a fixed number
> (based initially on the number of cores/cpus), which causes thrashing

CompTree uses a fixed number of threads.

> 4. Even if a thread pool will be utilized, dependency information is
> still hard to take into account. Imagine a task has 2 prerequisites,
> where one depends on the other. Adding the tasks into a thread pool,
> they may be invoked in two different threads, but one waits on the
> other so the thread is not utilized. Maybe add a "distance" method
> which calculates how far one task if from the other in dependency
> (adding nil if not dependent), so when adding tasks to a queue, they
> are added to the queue where the current tasks have the minimal
> distance (nil being infinite).

In CompTree, whenever a node finishes its computation, the tree is
scanned for nodes waiting to be computed.  Available nodes are handed
out to the available threads.  Those threads which didn't get a node
are put to sleep.  No soup for you, and go to bed.  Repeat.

Thus CompTree operates at "max capacity" at all times.  Given N
threads, if at any time N computations are not running, it is because
the graph topology demands it (children node results are not
available).  In short, I believe it does what you want.

As you can see, I can't help myself from using graph terminology, as
my contact with Rake has been rather superficial.  A quick rosetta

  node           <-->  task
  child          <-->  prerequisite
  parent         <-->  ?? what do you call this ??
  compute        <-->  invoke
  function       <-->  @actions.each { |act| act.call }
  function.call  <-->  execute
  result         <-->  N/A
  node.name      <-->  task.name.to_sym

It is not clear whether these issues would be solved better within
Rake itself, or whether CompTree should be used to solve them.  What
are the other issues, and how would you solve them?

James M. Lawrence

More information about the Rake-devel mailing list