[Rake-devel] Parallel tasks with Rake

Ittay Dror ittay.dror at gmail.com
Thu Sep 11 15:33:55 EDT 2008

James M. Lawrence wrote:
> Though it would be a privilege to see my code merged into Rake, of
> course a case would have to be made for it.  What are the problems it
> solves?  I am not a good person to make the case, as I have no
> experience with non-trivial use of 'multitask'.  But fortunately
> others here do.
1. when compiling c/c++ code, usually a compiler is launched per source 
file. each runs separately from the other, so they can be parallelized. 
as c/c++ compilations are usually slower than java (for example), and 
since multi-core servers are common these days, this is an important 
2. multi-module builds can benefit in the same way, so a project with 2 
java modules can run their build in parallel.
3. it makes rake more compatible with make which helps non-ruby people 
(c/c++ peolple) choose it.
4. multitask does not do a good job for two reasons:
   a. it cannot be controlled from outside (no equivalent to -j), so on 
1/2/4 cpus you get the same amount of threads (depending on the size of 
   b. it runs thread for each prerequisites. so it can thrash the 
system, running for example 30 threads on 2 cpus (again, the main use 
case here being c/c++ with 30 source files)
5. alternatives that give better control, but are still implementations 
of task are wrong since you eventually loose control of the number of 
threads. i have a JobTask implementation that can be configured to tell 
it how many jobs to run. now imagine i use it to compile modules in 
parallel, and i have 2 top modules each with 2 sub modules, i end up 
with 3 job tasks (top and 2 sub) each running 2 threads, so 6 threads, 
which again brings us to thrashing the system. it is hard to overall 
control the existence of only x threads. (currently i use my job task 
for only the "leaf" prerequisites).
6. both multitask and jobtask suffer from the fact that the threads are 
independent. usually, if there's a compilation error, i want to stop the 
build and signal the error to the user. but when there are several 
independent threads, it is hard to know them all (think several 
multi-task/job-task) so as to stop them (cleanly) when one fails
   - btw,  this is possible with drake, right? so if one thread fails 
the other is signaled to quit execution when the current execution block 
finishes (cleanly)
7. current rake relies on recursion to execute tasks. for a large build 
this may create a deep recursion stack that exhausts ruby's execution 
stack (for 20 deep execution, the ruby stack was ~800 which caused 
segfaults in linux even when the stack size was unlimited)

> After all, Drake may be a totally misguided project.  That would be OK
> with me, as my primary interest was in CompTree.  The implementation
> of Drake is trivial, given CompTree.
> Here are Ittay's points from a previous thread,
>> 1. If some top level tasks run in parallel, and each of them
>> recursively runs other tasks, and one of the bottom tasks fail, it
>> is impossible to stop the other tasks, short of a very ugly abort of
>> all threads.
>> 2. Tasks that run in parallel can't tell when another task's execution
>> has failed. They may read a wrong timestamp from the failed task.
> CompTree uses Thread.abort_on_exception = true.  If something goes
> wrong, why shouldn't we abort all threads?  I don't yet see the issue
> here.
> That said, CompTree does give us an option.  A Rake task is translated
> into a computation node which discards its result.  But we don't have
> to discard the result.  We could wrap the task inside begin/rescue,
> returning the Exception instance as the result.  CompTree 'computes'
> the exception.
>   # top-level
>   result = driver.compute(root_node, :num_threads => n)
>   if result and result.is_a? Exception
>     raise result
>   end
>> 3. Threads are created per prerequisite task, rather than a fixed number
>> (based initially on the number of cores/cpus), which causes thrashing
> CompTree uses a fixed number of threads.
>> 4. Even if a thread pool will be utilized, dependency information is
>> still hard to take into account. Imagine a task has 2 prerequisites,
>> where one depends on the other. Adding the tasks into a thread pool,
>> they may be invoked in two different threads, but one waits on the
>> other so the thread is not utilized. Maybe add a "distance" method
>> which calculates how far one task if from the other in dependency
>> (adding nil if not dependent), so when adding tasks to a queue, they
>> are added to the queue where the current tasks have the minimal
>> distance (nil being infinite).
> In CompTree, whenever a node finishes its computation, the tree is
> scanned for nodes waiting to be computed.  Available nodes are handed
> out to the available threads.  Those threads which didn't get a node
> are put to sleep.  No soup for you, and go to bed.  Repeat.
> Thus CompTree operates at "max capacity" at all times.  Given N
> threads, if at any time N computations are not running, it is because
> the graph topology demands it (children node results are not
> available).  In short, I believe it does what you want.
> As you can see, I can't help myself from using graph terminology, as
> my contact with Rake has been rather superficial.  A quick rosetta
> stone:
>   node           <-->  task
>   child          <-->  prerequisite
>   parent         <-->  ?? what do you call this ??
>   compute        <-->  invoke
>   function       <-->  @actions.each { |act| act.call }
>   function.call  <-->  execute
>   result         <-->  N/A
>   node.name      <-->  task.name.to_sym
> It is not clear whether these issues would be solved better within
> Rake itself, or whether CompTree should be used to solve them.  What
> are the other issues, and how would you solve them?
> James M. Lawrence
> _______________________________________________
> Rake-devel mailing list
> Rake-devel at rubyforge.org
> http://rubyforge.org/mailman/listinfo/rake-devel

Ittay Dror <ittay.dror at gmail.com>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://rubyforge.org/pipermail/rake-devel/attachments/20080911/f579d36f/attachment-0001.html>

More information about the Rake-devel mailing list