From mbishop at me.com Thu Apr 19 11:36:37 2012 From: mbishop at me.com (Michael Bishop) Date: Thu, 19 Apr 2012 07:36:37 -0400 Subject: [Rake-devel] Announcement: Alternate -j implementation Message-ID: <10955079-8F1F-45D2-8349-65DE9238C276@me.com> Hello Everyone, I've recently finished an implementation of -j and would be delighted if the members of this list would take a detailed look at it It's in this rake branch on github: https://github.com/michaeljbishop/rake/commit/295c7a4d6d58b3e10c27b940a8259dd3e01c52f0 or http://bit.ly/JMVARy In addition, I've added a pull-request for the inclusion of the change to the master branch (allowing of course for further changes to match the style and simplicity of the original) https://github.com/jimweirich/rake/pull/112 My apologies if this has been hashed out many times on this list. I'm hoping to offer a fresh look at the problem. Thank you for your consideration. Sincerely, Michael Bishop --- SUMMARY ------- USER INTERFACE: The user-interface is simply a new -j flag that specifies the maximum number of tasks that can execute simultaneously (discussed here before as I've seen). If the -j flag is omitted, the old behavior of unlimited concurrent tasks is retained. IMPLEMENTATION: The implementation is inspired by Apple's Grand Central Dispatch model. The core of the problem is solved via changes to the file "multi-task.rb". The current implementation of MultiTask#invoke_prerequisites spawns a new thread for each prerequisite and then waits to return until all the threads have completed. In the alternate implementation, MultiTask#invoke_prerequisites creates a list of blocks inside which each prerequisite is called. Each block is then added to a thread-safe Queue for processing. A class-member ThreadPool is then expanded to include enough threads to consume and call the blocks, but only up to the limit as passed to -j. Then, rather than MultiTask#invoke_prerequisites sleeping by joining to the newly spawned threads, it participates in the processing of the queued blocks while it waits. This prevents deadlock situations should more threads call MultiTask#invoke_prerequisites. How then does MultiTask#invoke_prerequisites know when its prerequisites have finished? Before enqueing the prerequisite blocks, it surrounds the original prerequisite block with *another* block which maintains bookkeeping tasks as to when the prerequisite is completed and which thread is working on it. That block is what is added to the queue. Two conditions need to be met for MultiTask#invoke_prerequisites to return: 1 - It notices that its prerequisites have all been processed 2 - It notices there are no more blocks on the queue but its prerequisites are not yet finished. In this case, it joins those threads that are still executing its prerequisites. What is attractive to me about this implementation is that the flow retains the simplicity of the original: MultiTask#invoke_prerequisites sends all the prerequisites to be executed then waits until they are done. THE CASE FOR -J --------------- (Quoted from the pull-request) PROBLEM SUMMARY: Rake can be unusable for builds invoking large numbers of concurrent external processes. PROBLEM DESCRIPTION: Rake makes it easy to maximize concurrency in builds with its "multitask" function. When using rake to build non-ruby projects quite often rake needs to execute shell tasks to process files. Unfortunately, when executing multitasks, rake spawns a new thread for each task-prerequisite. This shouldn't cause problems when the build code is pure ruby (for green threads), but when the tasks are executing external processes, the sheer number of spawned processes can cause the machine to thrash. Additionally ruby can reach the maximum number of open files (presumably because it's reading stdout for all those processes). SOLUTION SUMMARY: This request includes the code to add support for a "--jobs NUMBER (-j)" command-line option to specify the number of simultaneous tasks to execute. SOLUTION: The solution creates a work queue to which blocks calling the task-prerequisites are added and a thread pool to process them. To prevent deadlock, the task that added the pre-requisites processes items on the queue (alongside the thread pool) until its prerequisites have been processed. To maintain backward compatibility, not passing -j reverts to the old behavior of unlimited concurrent tasks. REQUIREMENTS: The Ruby version requirements remain the same. "multi-task.rb" adds two new requirements: 'thread' and 'set' -------------- next part -------------- An HTML attachment was scrubbed... URL: From hongli at phusion.nl Thu Apr 19 14:34:08 2012 From: hongli at phusion.nl (Hongli Lai) Date: Thu, 19 Apr 2012 16:34:08 +0200 Subject: [Rake-devel] Announcement: Alternate -j implementation In-Reply-To: <10955079-8F1F-45D2-8349-65DE9238C276@me.com> References: <10955079-8F1F-45D2-8349-65DE9238C276@me.com> Message-ID: Have you also looked at http://drake.rubyforge.org? How does your implementation differ from his? On Thu, Apr 19, 2012 at 1:36 PM, Michael Bishop wrote: > Hello Everyone, > > I've recently finished an implementation of -j and > would be delighted if the members of this list would take a detailed look at > it > > It's in this rake branch on github: > > > ?https://github.com/michaeljbishop/rake/commit/295c7a4d6d58b3e10c27b940a8259dd3e01c52f0 > ?or > ? ?http://bit.ly/JMVARy > > In addition, I've added a pull-request for the inclusion of the change to > the master branch (allowing of course for further changes to match the style > and simplicity of the original) > > ?https://github.com/jimweirich/rake/pull/112 > > My apologies if this has been hashed out many times on this list. I'm hoping > to offer a fresh look at the problem. Thank you for your consideration. > > Sincerely, > > Michael Bishop > > > --- > > > SUMMARY > ------- > > USER INTERFACE: > > The user-interface is simply a new -j flag that specifies the maximum number > of tasks that can execute simultaneously (discussed here before as I've > seen). If the -j flag is omitted, the old behavior of unlimited concurrent > tasks is retained. > > > IMPLEMENTATION: > > The implementation is inspired by Apple's Grand Central Dispatch model. The > core of the problem is solved via changes to the file "multi-task.rb". > > The current implementation of?MultiTask#invoke_prerequisites spawns a new > thread for each prerequisite and then waits to return until all the threads > have completed. > > In the alternate implementation, MultiTask#invoke_prerequisites creates a > list of blocks inside which each prerequisite is called. Each block is then > added to a thread-safe Queue for processing. A class-member ThreadPool is > then expanded to include enough threads to consume and call the blocks, but > only up to the limit as passed to -j. > > Then, rather than MultiTask#invoke_prerequisites sleeping by joining to the > newly spawned threads, it participates in the processing of the queued > blocks while it waits. This prevents deadlock situations should more threads > call MultiTask#invoke_prerequisites. > > How then does MultiTask#invoke_prerequisites know when its prerequisites > have finished? Before enqueing the prerequisite blocks, it surrounds the > original prerequisite block with *another* block which maintains bookkeeping > tasks as to when the prerequisite is completed and which thread is working > on it. That block is what is added to the queue. > > Two conditions need to be met for MultiTask#invoke_prerequisites to return: > > ? 1 - It notices that its prerequisites have all been processed > ? 2 - It notices there are no more blocks on the queue but its prerequisites > are not yet finished. In this case,?it joins those threads that are still > executing its prerequisites. > > What is attractive to me about this implementation is that the flow retains > the simplicity of the original: MultiTask#invoke_prerequisites sends all the > prerequisites to be executed then waits until they are done. > > > > THE CASE FOR -J > --------------- > > (Quoted from the pull-request) > > PROBLEM SUMMARY: > > Rake can be unusable for builds invoking large numbers of concurrent > external processes. > > PROBLEM DESCRIPTION: > > Rake makes it easy to maximize concurrency in builds with its "multitask" > function. When using rake to build non-ruby projects quite often rake needs > to execute shell tasks to process files. Unfortunately, when executing > multitasks, rake spawns a new thread for each task-prerequisite. This > shouldn't cause problems when the build code is pure ruby (for green > threads), but when the tasks are executing external processes, the sheer > number of spawned processes can cause the machine to thrash. Additionally > ruby can reach the maximum number of open files (presumably because it's > reading stdout for all those processes). > > SOLUTION SUMMARY: > > This request includes the code to add support for a "--jobs NUMBER (-j)" > command-line option to specify the number of simultaneous tasks to execute. > > SOLUTION: > > The solution creates a work queue to which blocks calling the > task-prerequisites are added and a thread pool to process them. To prevent > deadlock, the task that added the pre-requisites processes items on the > queue (alongside the thread pool) until its prerequisites have been > processed. > > To maintain backward compatibility, not passing -j reverts to the old > behavior of unlimited concurrent tasks. > > REQUIREMENTS: > > The Ruby version requirements remain the same. "multi-task.rb" adds two new > requirements: 'thread' and 'set' > > > _______________________________________________ > Rake-devel mailing list > Rake-devel at rubyforge.org > http://rubyforge.org/mailman/listinfo/rake-devel -- Phusion | Ruby & Rails deployment, scaling and tuning solutions Web: http://www.phusion.nl/ E-mail: info at phusion.nl Chamber of commerce no: 08173483 (The Netherlands) From mbishop at me.com Thu Apr 19 15:48:03 2012 From: mbishop at me.com (Michael Bishop) Date: Thu, 19 Apr 2012 11:48:03 -0400 Subject: [Rake-devel] Announcement: Alternate -j implementation In-Reply-To: References: <10955079-8F1F-45D2-8349-65DE9238C276@me.com> Message-ID: <56811457-79E9-4CCA-A6E7-D097042703B8@me.com> Hi Hongli Lai, This is a good question. I only knew of the Drake project as of last night so I haven't had a long period of time to investigate its implementation. My initial analysis based on the Drake official pull-request announcement indicates a few differences but please add corrections if I am misinformed. I might be. Drake links: https://github.com/quix/rake/tree/mainline-merge-candidate#readme http://quix.github.com/rake/files/doc/parallel_rdoc.html For the purposes of discussion, I'll refer to my submitted rake as "mjbrake" here. Dependency differences: ----------------------- Drake relies on an external gem named comp_tree for parallelism. mjbrake relies on no library but instead, like the original rake, leverages the stack to make sure all the prerequisites are executed in order. Code size differences: ---------------------- Github shows the change for mjbrake at "Showing 3 changed files with 74 additions and 7 deletions." Drake shows 79 lines (+257 lines of code for comp_tree). It's quite possible this is due to drake having a larger feature set. mjbrake is focused on changing multitask to limit the total number of concurrent tasks. Implementation differences: --------------------------- It's unclear to me, but it seems like Drake doesn't ask the rakefile author to specify what can be parallelized and instead, asks that the dependencies be correct and Drake will figure that out dynamically. If this is true, I suspect the transition to a parallelized build might be more gentle to the user if they can build a rakefile starting with 'task' and migrate to 'multitask' as in the original rake. On the other hand, there is something elegant about not having to specify 'multitask' at all. _ michael On Apr 19, 2012, at 10:34 AM, Hongli Lai wrote: > Have you also looked at http://drake.rubyforge.org? How does your > implementation differ from his? > > On Thu, Apr 19, 2012 at 1:36 PM, Michael Bishop wrote: >> Hello Everyone, >> >> I've recently finished an implementation of -j and >> would be delighted if the members of this list would take a detailed look at >> it >> >> It's in this rake branch on github: >> >> >> https://github.com/michaeljbishop/rake/commit/295c7a4d6d58b3e10c27b940a8259dd3e01c52f0 >> or >> http://bit.ly/JMVARy >> >> In addition, I've added a pull-request for the inclusion of the change to >> the master branch (allowing of course for further changes to match the style >> and simplicity of the original) >> >> https://github.com/jimweirich/rake/pull/112 >> >> My apologies if this has been hashed out many times on this list. I'm hoping >> to offer a fresh look at the problem. Thank you for your consideration. >> >> Sincerely, >> >> Michael Bishop >> >> >> --- >> >> >> SUMMARY >> ------- >> >> USER INTERFACE: >> >> The user-interface is simply a new -j flag that specifies the maximum number >> of tasks that can execute simultaneously (discussed here before as I've >> seen). If the -j flag is omitted, the old behavior of unlimited concurrent >> tasks is retained. >> >> >> IMPLEMENTATION: >> >> The implementation is inspired by Apple's Grand Central Dispatch model. The >> core of the problem is solved via changes to the file "multi-task.rb". >> >> The current implementation of MultiTask#invoke_prerequisites spawns a new >> thread for each prerequisite and then waits to return until all the threads >> have completed. >> >> In the alternate implementation, MultiTask#invoke_prerequisites creates a >> list of blocks inside which each prerequisite is called. Each block is then >> added to a thread-safe Queue for processing. A class-member ThreadPool is >> then expanded to include enough threads to consume and call the blocks, but >> only up to the limit as passed to -j. >> >> Then, rather than MultiTask#invoke_prerequisites sleeping by joining to the >> newly spawned threads, it participates in the processing of the queued >> blocks while it waits. This prevents deadlock situations should more threads >> call MultiTask#invoke_prerequisites. >> >> How then does MultiTask#invoke_prerequisites know when its prerequisites >> have finished? Before enqueing the prerequisite blocks, it surrounds the >> original prerequisite block with *another* block which maintains bookkeeping >> tasks as to when the prerequisite is completed and which thread is working >> on it. That block is what is added to the queue. >> >> Two conditions need to be met for MultiTask#invoke_prerequisites to return: >> >> 1 - It notices that its prerequisites have all been processed >> 2 - It notices there are no more blocks on the queue but its prerequisites >> are not yet finished. In this case, it joins those threads that are still >> executing its prerequisites. >> >> What is attractive to me about this implementation is that the flow retains >> the simplicity of the original: MultiTask#invoke_prerequisites sends all the >> prerequisites to be executed then waits until they are done. >> >> >> >> THE CASE FOR -J >> --------------- >> >> (Quoted from the pull-request) >> >> PROBLEM SUMMARY: >> >> Rake can be unusable for builds invoking large numbers of concurrent >> external processes. >> >> PROBLEM DESCRIPTION: >> >> Rake makes it easy to maximize concurrency in builds with its "multitask" >> function. When using rake to build non-ruby projects quite often rake needs >> to execute shell tasks to process files. Unfortunately, when executing >> multitasks, rake spawns a new thread for each task-prerequisite. This >> shouldn't cause problems when the build code is pure ruby (for green >> threads), but when the tasks are executing external processes, the sheer >> number of spawned processes can cause the machine to thrash. Additionally >> ruby can reach the maximum number of open files (presumably because it's >> reading stdout for all those processes). >> >> SOLUTION SUMMARY: >> >> This request includes the code to add support for a "--jobs NUMBER (-j)" >> command-line option to specify the number of simultaneous tasks to execute. >> >> SOLUTION: >> >> The solution creates a work queue to which blocks calling the >> task-prerequisites are added and a thread pool to process them. To prevent >> deadlock, the task that added the pre-requisites processes items on the >> queue (alongside the thread pool) until its prerequisites have been >> processed. >> >> To maintain backward compatibility, not passing -j reverts to the old >> behavior of unlimited concurrent tasks. >> >> REQUIREMENTS: >> >> The Ruby version requirements remain the same. "multi-task.rb" adds two new >> requirements: 'thread' and 'set' >> >> >> _______________________________________________ >> Rake-devel mailing list >> Rake-devel at rubyforge.org >> http://rubyforge.org/mailman/listinfo/rake-devel > > > > -- > Phusion | Ruby & Rails deployment, scaling and tuning solutions > > Web: http://www.phusion.nl/ > E-mail: info at phusion.nl > Chamber of commerce no: 08173483 (The Netherlands) > _______________________________________________ > Rake-devel mailing list > Rake-devel at rubyforge.org > http://rubyforge.org/mailman/listinfo/rake-devel From jim.weirich at gmail.com Thu Apr 19 15:48:04 2012 From: jim.weirich at gmail.com (Jim Weirich) Date: Thu, 19 Apr 2012 11:48:04 -0400 Subject: [Rake-devel] Announcement: Alternate -j implementation In-Reply-To: References: <10955079-8F1F-45D2-8349-65DE9238C276@me.com> Message-ID: <38408AA5-64FE-46C4-8940-4757CC5FE2BA@gmail.com> On Apr 19, 2012, at 10:34 AM, Hongli Lai wrote: > Have you also looked at http://drake.rubyforge.org? How does your > implementation differ from his? Michael can correct me if I'm wrong, but the -j option just limits the number of concurrent tasks used to service the "multitask" task targets. Multitask is an existing rake facility that the -j option makes more usable. Drake actually changes the semantics of the regular rake tasks so that they run in parallel. -- -- Jim Weirich -- jim.weirich at gmail.com From jos at catnook.com Thu Apr 19 16:21:54 2012 From: jos at catnook.com (Jos Backus) Date: Thu, 19 Apr 2012 09:21:54 -0700 Subject: [Rake-devel] Announcement: Alternate -j implementation In-Reply-To: <38408AA5-64FE-46C4-8940-4757CC5FE2BA@gmail.com> References: <10955079-8F1F-45D2-8349-65DE9238C276@me.com> <38408AA5-64FE-46C4-8940-4757CC5FE2BA@gmail.com> Message-ID: To be honest, I prefer the drake semantics as they are closer to make's. I would love to see rake and drake merged. Jos -- Jos Backus jos at catnook.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From mbishop at me.com Thu Apr 19 16:28:24 2012 From: mbishop at me.com (Michael Bishop) Date: Thu, 19 Apr 2012 12:28:24 -0400 Subject: [Rake-devel] Announcement: Alternate -j implementation In-Reply-To: <38408AA5-64FE-46C4-8940-4757CC5FE2BA@gmail.com> References: <10955079-8F1F-45D2-8349-65DE9238C276@me.com> <38408AA5-64FE-46C4-8940-4757CC5FE2BA@gmail.com> Message-ID: <1DC7EBD8-6E5A-47C9-9980-3EEBF4CF6A9A@me.com> On Apr 19, 2012, at 11:48 AM, Jim Weirich wrote: > > On Apr 19, 2012, at 10:34 AM, Hongli Lai wrote: > >> Have you also looked at http://drake.rubyforge.org? How does your >> implementation differ from his? > > Michael can correct me if I'm wrong, but the -j option just limits the number of concurrent tasks used to service the "multitask" task targets. Multitask is an existing rake facility that the -j option makes more usable. That is exactly right. It only affects Multitask. Some additional interesting tidbits: There is a queue of blocks that is a class member of Multitask which all Multitask instances use to queue their prerequisites. So, your thread's call to MultiTask#invoke_prerequisites may end up processing prerequisites for another's call to MultiTask#invoke_prerequisites. No matter though, your call won't return until all your prerequisites are completed. > > Drake actually changes the semantics of the regular rake tasks so that they run in parallel. Thank you for that clarification. I'd only assumed that, given my rather cursory look, and it's nice to have it confirmed. _ michael From mbishop at me.com Thu Apr 19 16:36:32 2012 From: mbishop at me.com (Michael Bishop) Date: Thu, 19 Apr 2012 12:36:32 -0400 Subject: [Rake-devel] Announcement: Alternate -j implementation In-Reply-To: <38408AA5-64FE-46C4-8940-4757CC5FE2BA@gmail.com> References: <10955079-8F1F-45D2-8349-65DE9238C276@me.com> <38408AA5-64FE-46C4-8940-4757CC5FE2BA@gmail.com> Message-ID: <421EB1D0-0DB6-4602-B3E3-3F65DE7C251F@me.com> On Apr 19, 2012, at 11:48 AM, Jim Weirich wrote: > > On Apr 19, 2012, at 10:34 AM, Hongli Lai wrote: > >> Have you also looked at http://drake.rubyforge.org? How does your >> implementation differ from his? > > Michael can correct me if I'm wrong, but the -j option just limits the number of concurrent tasks used to service the "multitask" task targets. Multitask is an existing rake facility that the -j option makes more usable. That is exactly right. It only affects Multitask. Some additional interesting tidbits: There is a queue of blocks that is a class member of Multitask which all Multitask instances use to queue their prerequisites. So, your thread's call to MultiTask#invoke_prerequisites may end up processing prerequisites for another's call to MultiTask#invoke_prerequisites. No matter though, your call won't return until all your prerequisites are completed. > > Drake actually changes the semantics of the regular rake tasks so that they run in parallel. Thank you for that clarification. I'd only assumed that, given my rather cursory look, and it's nice to have it confirmed. _ michael From watsonmw at gmail.com Thu Apr 19 17:03:21 2012 From: watsonmw at gmail.com (Mark Watson) Date: Thu, 19 Apr 2012 18:03:21 +0100 Subject: [Rake-devel] Announcement: Alternate -j implementation In-Reply-To: References: <10955079-8F1F-45D2-8349-65DE9238C276@me.com> <38408AA5-64FE-46C4-8940-4757CC5FE2BA@gmail.com> Message-ID: > To be honest, I prefer the drake semantics as they are closer to make's. Second that. The option to specify parallel builds for any task is handy because the task may not derive from multitask, e.g. it could be a file task to build a c library from multiple source files that you want to build in parallel. Personally I've used my own patch to rake to do this, that was before drake, but I still use it due to drake having an extra dependency. https://github.com/watsonmw/rakecpp/blob/master/minusj/minusj.rb On 19 April 2012 17:21, Jos Backus wrote: > To be honest, I prefer the drake semantics as they are closer to make's. > > I would love to see rake and drake merged. > > Jos > -- > Jos Backus > jos at catnook.com > > > _______________________________________________ > Rake-devel mailing list > Rake-devel at rubyforge.org > http://rubyforge.org/mailman/listinfo/rake-devel From hongli at phusion.nl Thu Apr 19 21:40:18 2012 From: hongli at phusion.nl (Hongli Lai) Date: Thu, 19 Apr 2012 23:40:18 +0200 Subject: [Rake-devel] Announcement: Alternate -j implementation In-Reply-To: References: <10955079-8F1F-45D2-8349-65DE9238C276@me.com> <38408AA5-64FE-46C4-8940-4757CC5FE2BA@gmail.com> Message-ID: On Thu, Apr 19, 2012 at 7:03 PM, Mark Watson wrote: >> To be honest, I prefer the drake semantics as they are closer to make's. > > Second that. The option to specify parallel builds for any task is > handy because the task may not derive from multitask, e.g. it could be > a file task to build a c library from multiple source files that you > want to build in parallel. > > Personally I've used my own patch to rake to do this, that was before > drake, but I still use it due to drake having an extra dependency. > > https://github.com/watsonmw/rakecpp/blob/master/minusj/minusj.rb I agree. I would really love to see a 'make -j' equivalent in Rake. -- Phusion | Ruby & Rails deployment, scaling and tuning solutions Web: http://www.phusion.nl/ E-mail: info at phusion.nl Chamber of commerce no: 08173483 (The Netherlands) From smparkes at smparkes.net Fri Apr 20 02:25:38 2012 From: smparkes at smparkes.net (Steven Parkes) Date: Thu, 19 Apr 2012 19:25:38 -0700 Subject: [Rake-devel] Announcement: Alternate -j implementation In-Reply-To: References: <10955079-8F1F-45D2-8349-65DE9238C276@me.com> <38408AA5-64FE-46C4-8940-4757CC5FE2BA@gmail.com> Message-ID: On Apr 19, 2012, at 2:40 PM, Hongli Lai wrote: > On Thu, Apr 19, 2012 at 7:03 PM, Mark Watson wrote: >>> To be honest, I prefer the drake semantics as they are closer to make's. >> >> Second that. The option to specify parallel builds for any task is >> handy because the task may not derive from multitask, e.g. it could be >> a file task to build a c library from multiple source files that you >> want to build in parallel. >> >> Personally I've used my own patch to rake to do this, that was before >> drake, but I still use it due to drake having an extra dependency. >> >> https://github.com/watsonmw/rakecpp/blob/master/minusj/minusj.rb > > I agree. I would really love to see a 'make -j' equivalent in Rake. I went through this just this week, after getting tired of some C++ builds using only 1 core. I first looked at multitask but got stopped by the file dependence issue. Then I did some wider searching and stumbled across drake. Been happy ever since. I used make a lot in the past and always made the effort to make all the dependencies explicit (rather than implicit through dependence list order) so that make -j would work. But a lot of Makefiles don't do this and break with make -j just as many rakefiles would. I can see this being a bigger issue with rake since its expressiveness makes it a lot easier to write things that have side-effects and won't parallelize. (Just like I see people write rakefiles that don't converge: that rebuild artifacts because they haven't described the dependencies correctly.) But in any case, for my part, with (d)rake -j, I don't go back to make anymore at all, which makes my world a little nicer. From mbishop at me.com Fri Apr 20 08:51:17 2012 From: mbishop at me.com (Michael Bishop) Date: Fri, 20 Apr 2012 04:51:17 -0400 Subject: [Rake-devel] Announcement: Alternate -j implementation In-Reply-To: <38408AA5-64FE-46C4-8940-4757CC5FE2BA@gmail.com> References: <10955079-8F1F-45D2-8349-65DE9238C276@me.com> <38408AA5-64FE-46C4-8940-4757CC5FE2BA@gmail.com> Message-ID: Quick question for the list: After reading more about drake, I wanted to summarize the differences for my own understanding. Differences in drake from rake: 1 - All tasks are implicitly multitask 2 - There is a cap on the number of concurrent tasks 3 - There is a control to create a repeatable single-threaded random invocation of tasks (--randomize[=SEED]) Is this correct? _ michael From watsonmw at gmail.com Fri Apr 20 21:33:40 2012 From: watsonmw at gmail.com (Mark Watson) Date: Fri, 20 Apr 2012 22:33:40 +0100 Subject: [Rake-devel] Announcement: Alternate -j implementation In-Reply-To: References: <10955079-8F1F-45D2-8349-65DE9238C276@me.com> <38408AA5-64FE-46C4-8940-4757CC5FE2BA@gmail.com> Message-ID: > 1 - All tasks are implicitly multitask Yes, in the sense that all tasks can be executed in parallel with other tasks. Upside parallelism is octagonal to the method ('file'/'directory'/'task') used to define the task. It has the downside that some tasks may not be suitable for executing in parallel. E.g. the dependencies are wrong, or the task itself incorrectly shares some resource with other tasks (say multiple tasks modify the same file as an intermediate step during the task execution). IMHO, making stuff work in parallel requires thinking about the problem anyway and often leads to better code. If one needed to make something single thread a mutex should do the trick. So it's not that hard to be a good -j citizen :) > 2 - There is a cap on the number of concurrent tasks Yes, this is whatever is set in the -j option. > 3 - There is a control to create a repeatable single-threaded random invocation of tasks (--randomize[=SEED]) Yes, it's to help people debug rakefile dependencies, as you've probably figured :) On 20 April 2012 09:51, Michael Bishop wrote: > Quick question for the list: > > After reading more about drake, I wanted to summarize the differences for my own understanding. > > Differences in drake from rake: > > 1 - All tasks are implicitly multitask > 2 - There is a cap on the number of concurrent tasks > 3 - There is a control to create a repeatable single-threaded random invocation of tasks (--randomize[=SEED]) > > Is this correct? > > _ michael > > _______________________________________________ > Rake-devel mailing list > Rake-devel at rubyforge.org > http://rubyforge.org/mailman/listinfo/rake-devel From mbishop at me.com Fri Apr 27 13:54:53 2012 From: mbishop at me.com (Michael Bishop) Date: Fri, 27 Apr 2012 09:54:53 -0400 Subject: [Rake-devel] ANN: (yet another) version of -j implementation (includes -m for multitask) Message-ID: <8402E594-122B-4538-9041-F717AE74998A@me.com> Hello Everyone, My last implementation of -j had a fatal flaw which could have led to stack overflow for large amounts of tasks. I discovered this after prototyping a version with drake's feature of turning tasks into multitasks. I have a new version with a rewritten MultiTask implementation which removes that flaw, is smaller, and easily fits into Rake code. In addition, I've added an optional "--multitask -m" flag to turn every task into a multitask (in direct homage to drake). https://github.com/michaeljbishop/rake I've again added a pull-request for the inclusion of the change to the master branch (as always allowing for further changes to match the style and simplicity of the original) https://github.com/jimweirich/rake/pull/113 Questions and comments as always, are welcome. Sincerely, Michael Bishop --- ## PROBLEM SUMMARY (THE CASE FOR -j and -m) Rake can be unusable for builds invoking large numbers of concurrent external processes. ## PROBLEM DESCRIPTION Rake makes it easy to maximize concurrency in builds with its "multitask" function. When using rake to build non-ruby projects quite often rake needs to execute shell tasks to process files. Unfortunately, when executing multitasks, rake spawns a new thread for each task-prerequisite. This shouldn't cause problems when the build code is pure ruby (for green threads), but when the tasks are executing external processes, the sheer number of spawned processes can cause the machine to thrash. Additionally ruby can reach the maximum number of open files (presumably because it's reading stdout for all those processes). ## SOLUTION SUMMARY This request includes the code to add support for a `--jobs NUMBER (-j)` command-line option to specify the number of simultaneous tasks to execute. * To maintain backward compatibility, not passing `-j` reverts to the old behavior of unlimited concurrent tasks. As a nod to [drake](http://drake.rubyforge.org), a `--multitask (-m)` flag is also included which when supplied, changes tasks into multitasks. ## SOLUTION Rather than spawning a new thread per prerequisite `MultiTask` now sends its prerequisites to a `WorkerPool` object. `WorkerPool.new(n).execute_blocks` has the same semantics as `Thread.new`...`join` but caps the thread count at `n`. ### Core Change threads = @prerequisites.collect { |p| Thread.new(p) { |r| application[r, @scope].invoke_with_call_chain(args, invocation_chain) } } threads.each { |t| t.join } ...becomes... @@wp ||= WorkerPool.new(application.options.thread_pool_size) blocks = @prerequisites.collect { |r| lambda { application[r, @scope].invoke_with_call_chain(args, invocation_chain) } } @@wp.execute_blocks blocks To support `-m`, the `MultiTask` implementation has moved to `Task#invoke_prerequisites_concurrently` and is called from `MultiTask#invoke_prerequisites`. This enables concurrent behavior for `Task` when `-m` is used. ### Details `WorkerPool#execute_blocks` adds the passed-in blocks to a queue, ensures there are enough threads to execute them (under the maximum), and sleeps the current thread until the blocks are processed. This creates a few potential problems: > What if all of the blocks then called `#execute_blocks`? Wouldn't that sleep all the threads? Yes it would. This is solved as `#execute_blocks` removes the current thread from the thread pool just before it sleeps and creates a new one in its place. When all the blocks are processed, the current thread is added back to the pool (adjusting for the max-size). There are always enough available threads in the thread pool for processing. > When do the threads shutdown? `WorkerPool#execute_blocks` knows how many threads are waiting for their blocks to be processed. If, upon its awakening, it notices there are no threads waiting on blocks, it shuts down the thread pool. ### Statistics ---LINES-- ----LOC--- old new old new File Name ---------- ---------- ---------- 598 605 477 484 lib/rake/application.rb 16 13 11 8 lib/rake/multi_task.rb 327 341 210 222 lib/rake/task.rb 111 80 lib/rake/worker_pool.rb 4264 4393 2696 2792 TOTAL -------------------------------------- +129 +96 SUMMARY ## TESTS Tests are included for all new functionality ## REQUIREMENTS The Ruby version requirements remain the same. `lib/rake/worker_pool.rb` adds two new requirements: `thread` and `set` From mbishop at me.com Mon Apr 30 19:00:12 2012 From: mbishop at me.com (Michael Bishop) Date: Mon, 30 Apr 2012 15:00:12 -0400 Subject: [Rake-devel] Overloading the divide symbol on String to facilitate path concatenation? Message-ID: <46264ABA-DA77-45DE-8441-456CDB2B065F@me.com> Hello fellow rake developers, I started using this construction in my rake files and I wanted to get your opinion on it (though if there is a better forum, please direct me there). Problem ------- In my rake scripts, I like to assign directory paths to variables so I can use the variables later in my program. The problem is, when concatenating the variables, I end up with ugly combinations like: File.join(BUILD_D, "bin", "x86") File.join(CACHE_D, "scripts", SCRIPT_TEMP_NAME) Possible Solution ----------------- I find this hard to read so I tried something: class String def /(path) if !path self else File.join(self, path) end end end Now I can make paths like: BUILD_D/"bin"/"x86" CACHE_D/"scripts"/SCRIPT_TEMP_NAME Discussion ---------- My spidey sense is on overload from former C++ days where I was filled with "operator overloading is bad. Always. Never ever do it. Ever... ever" But I have to say, the more I use this, the more I like it. I can't think of a situation where the operator overloading will get me in to trouble but maybe I haven't thought it through hard enough. What do you think? Do you have strategies for making path concatenation less ugly in your scripts? _ michael