From quixoticsycophant at gmail.com Mon Jun 7 02:39:39 2010 From: quixoticsycophant at gmail.com (James M. Lawrence) Date: Mon, 7 Jun 2010 02:39:39 -0400 Subject: [Rake-devel] Comments about parallel building In-Reply-To: <4BF5586A.7030701@budcat.com> References: <4BF42755.9080900@budcat.com> <4BF5586A.7030701@budcat.com> Message-ID: Thanks for this work, Heath. I've pushed a version of your patch and added some tests. You corrected an inadequate fix I made for some curious behavior in mainline rake, mentioned in the second half of http://rubyforge.org/pipermail/rake-devel/2008-October/000610.html Though my solution is more efficient, it fails on a kite-shaped graph (a diamond with a tail). However your solution is simpler, nicer and (incidentally!) fixes the problem. Do you have an example which shows the reason for the "if needed?" clause after "task.execute(task_args)" in your patch? I'd be surprised if that does anything; nodes are executed only once, and duplicate nodes are not possible (being keyed on object_id). While investigating this bug I found some more curiosities with mainline rake (next email). JL On Thu, May 20, 2010 at 11:42 AM, Heath Kehoe wrote: > > On 5/19/2010 1:00 PM, Heath Kehoe wrote: >> >> So we're sticking with ruby 1.8 for now. As I said, perhaps drake's >> approach will be better for 1.9, I'll try that out sometime. >> >> -heath >> >> > > I tried drake, and it is promising, since its compute tree model is much > more efficient than the MultiTask model I was using, especially in null > builds (no creating/joining of thousands of threads). > > However, it has a bug that prevents me from deploying it in our project; > that bug is it sometimes fails to execute tasks during parallel builds. I > created a simple test case and filed a bug on the drake rubyforge tracker. > > -h From quixoticsycophant at gmail.com Mon Jun 7 03:03:29 2010 From: quixoticsycophant at gmail.com (James M. Lawrence) Date: Mon, 7 Jun 2010 03:03:29 -0400 Subject: [Rake-devel] A handful of issues Message-ID: ==== 1 - Task#needed? applies to immediate prereqs only file "a" => "b" do touch "a" end file "b" => "c" do touch "b" end file "c" do touch "c" end task :default do touch ["a", "b"] sleep 1 touch "c" p Rake::Task["a"].needed? # => false # ... but "a" is needed because "c" is newer end ==== 2 - Task#needed? can be wrong even for immediate prereqs file "a" => "b" do touch "a" end file "b" do touch "b" end task :default do touch "a" rm_f "b" p Rake::Task["a"].needed? # => false # ... but "a" is needed because "b" will be newly created end ==== 3 - Filesystem timestamp granularity memo = [] file "a" => "b" do memo << "a" touch "a" end file "b" do memo << "b" touch "b" end task :default do touch "a" rm_f "b" Rake::Task["a"].invoke p memo # => ["b"] # ... but should be ["b", "a"] since "b" is newer end ==== 4 - Dependency relation should be transitive memo = [] file "a" => "b" do memo << "a" touch "a" end file "b" => "c" do memo << "b" # does not create b end file "c" do memo << "c" touch "c" end task :default do touch ["a", "b"] rm_f "c" sleep 1 # filesystem timestamp workaround Rake::Task["a"].invoke p memo # => ["c", "b"] # ... but "a" is needed since it (transitively) depends on "c" end ==== 5 - Trace output is misleading file "a" => "b" do touch "a" end file "b" => "c" do touch "b" end file "c" do touch "c" end task :default do touch ["a", "b"] sleep 1 touch "c" Rake.application.options.trace = true Rake::Task["a"].invoke # => # touch a b # touch c # ** Invoke a (first_time, not_needed) <--- Liar! It is needed. # ** Invoke b (first_time) # ** Invoke c (first_time, not_needed) # ** Execute b # touch b # ** Execute a <--- See? # touch a end **** Discussion ==== 1 - Task#needed? applies to immediate prereqs only Having Task#needed? recursively check all prereqs is needlessly slow, so the current behavior is fine. However I think its use should be deprecated in the published API. We could alias it to Task#locally_needed? and additionally write Task#globally_needed?. ==== 2 - Task#needed? can be wrong even for immediate prereqs Should be fixed if Task#needed? (or Task#locally_needed?) is "officially" supported. http://github.com/quix/rake/commits/mainline-file-task-needed ==== 3 - Filesystem timestamp granularity Unless you have ext4fs, your filesystem most likely records timestamps only to the nearest second, making it easy to create same-time files. The fix is easy: change timestamp comparison from > to >=. It's a pessimization: "When in doubt, rerun." It sounds safer, but what consequences lurk? http://github.com/quix/rake/commits/mainline-timestamp-granularity ==== 4 - Dependency relation should be transitive Let ~> denote "eventually depends on". If a~>b and b~>c, then we want a~>c. Thus if c executes then a should execute, as is implied by the dependency graph. Parallel Rake has to assume transitivity, so its behavior is different. I think a file task which executes but does not update the file is reneging. Regarding this as an error would make regular Rake consistent with parallel Rake, sidestepping the transitivity issue. ==== 5 - Trace output is misleading This is related to item 1 since "not_needed" reflects Task#needed?. Perhaps change the output to "not_immediately_needed" or some such? From quixoticsycophant at gmail.com Mon Jun 7 12:26:42 2010 From: quixoticsycophant at gmail.com (James M. Lawrence) Date: Mon, 7 Jun 2010 12:26:42 -0400 Subject: [Rake-devel] New loading behavior Message-ID: task :default do end extra = "Rakefile.extra" File.open(extra, "w") { |f| f.puts "p SPEC" } SPEC = 99 # OK in rake-0.8.7; error in rake-0.8.99 (uninitialized constant SPEC) load extra # OK in rake-0.8.99; error in rake-0.8.7 (load_rakefile not present) load_rakefile extra Is this the expected behavior? From luislavena at gmail.com Mon Jun 7 12:40:02 2010 From: luislavena at gmail.com (Luis Lavena) Date: Mon, 7 Jun 2010 12:40:02 -0400 Subject: [Rake-devel] New loading behavior In-Reply-To: References: Message-ID: On Mon, Jun 7, 2010 at 12:26 PM, James M. Lawrence wrote: > task :default do > end > > extra = "Rakefile.extra" > File.open(extra, "w") { |f| f.puts "p SPEC" } > SPEC = 99 > > # OK in rake-0.8.7; error in rake-0.8.99 (uninitialized constant SPEC) > load extra > > # OK in rake-0.8.99; error in rake-0.8.7 (load_rakefile not present) > load_rakefile extra > > Is this the expected behavior? What about Rake.import? -- Luis Lavena AREA 17 - Perfection in design is achieved not when there is nothing more to add, but rather when there is nothing more to take away. Antoine de Saint-Exup?ry From hkehoe at budcat.com Mon Jun 7 12:52:31 2010 From: hkehoe at budcat.com (Heath Kehoe) Date: Mon, 07 Jun 2010 11:52:31 -0500 Subject: [Rake-devel] Comments about parallel building In-Reply-To: References: <4BF42755.9080900@budcat.com> <4BF5586A.7030701@budcat.com> Message-ID: <4C0D23CF.4090804@budcat.com> On 6/7/2010 1:39 AM, James M. Lawrence wrote: > Do you have an example which shows the reason for the "if needed?" > clause after "task.execute(task_args)" in your patch? I'd be > surprised if that does anything; nodes are executed only once, and > duplicate nodes are not possible (being keyed on object_id). > > I was depending on rake's non-transitive aspect that you mention in item #4 of your "A handful of issues" message. Specifically, I have a custom task that syncs one directory tree from another, then another task that indexes the files in the directory tree. The sync task executes every time; after it executes its timestamp method will either return Time.now or EARLY based on whether anything changed in the destination tree (before execution, timestamp returns Time.now to insure that the task will execute). The index task depends on the sync task, so it will execute only if the sync task changed any files. Adding 'if needed?' makes this work in drake the same as it does in rake. -heath ______________________________________________________________________ This email has been scanned by the MessageLabs Email Security System. For more information please visit http://www.messagelabs.com/email ______________________________________________________________________ From quixoticsycophant at gmail.com Mon Jun 7 13:23:01 2010 From: quixoticsycophant at gmail.com (James M. Lawrence) Date: Mon, 7 Jun 2010 13:23:01 -0400 Subject: [Rake-devel] New loading behavior In-Reply-To: References: Message-ID: On Mon, Jun 7, 2010 at 12:40 PM, Luis Lavena wrote: > > What about Rake.import? The issue is whether breaking load was intended. Rake.import is not interchangeable with load. task :default do end extra = "Rakefile.extra" File.open(extra, "w") { |f| f.puts "SPEC = 99" } # need load instead import extra p SPEC # => uninitialized constant From hkehoe at budcat.com Mon Jun 7 13:36:52 2010 From: hkehoe at budcat.com (Heath Kehoe) Date: Mon, 07 Jun 2010 12:36:52 -0500 Subject: [Rake-devel] A handful of issues In-Reply-To: References: Message-ID: <4C0D2E34.4000309@budcat.com> On 6/7/2010 2:03 AM, James M. Lawrence wrote: > ==== 2 - Task#needed? can be wrong even for immediate prereqs > > Should be fixed if Task#needed? (or Task#locally_needed?) is > "officially" supported. > > http://github.com/quix/rake/commits/mainline-file-task-needed > > Instead of that change to out_of_date?(), I think there should be a different return value for Task#timestamp that indicates a missing file (instead of EARLY like it does now). Perhaps nil, or another 'special' time "LATE" which works like EARLY except LATE>time is always true. > ==== 3 - Filesystem timestamp granularity > > Unless you have ext4fs, your filesystem most likely records timestamps > only to the nearest second, making it easy to create same-time files. > > The fix is easy: change timestamp comparison from> to>=. It's a > pessimization: "When in doubt, rerun." It sounds safer, but what > consequences lurk? > > http://github.com/quix/rake/commits/mainline-timestamp-granularity > > My immediate feeling is this change should be safe... but like you said, there may be unforeseen problems. > ==== 4 - Dependency relation should be transitive > > Let ~> denote "eventually depends on". If a~>b and b~>c, then we want > a~>c. Thus if c executes then a should execute, as is implied by the > dependency graph. Parallel Rake has to assume transitivity, so its > behavior is different. > > I think a file task which executes but does not update the file is > reneging. Regarding this as an error would make regular Rake > consistent with parallel Rake, sidestepping the transitivity issue. > > I don't think it's necessarily the case that dependencies should be strictly transitive. I think it's useful to have a task not execute if its immediate dependencies aren't updated, even if the dependencies' dependencies were (such as the example I mentioned in my previous message). -heath ______________________________________________________________________ This email has been scanned by the MessageLabs Email Security System. For more information please visit http://www.messagelabs.com/email ______________________________________________________________________ From quixoticsycophant at gmail.com Mon Jun 7 17:16:49 2010 From: quixoticsycophant at gmail.com (James M. Lawrence) Date: Mon, 7 Jun 2010 17:16:49 -0400 Subject: [Rake-devel] A handful of issues In-Reply-To: <4C0D2E34.4000309@budcat.com> References: <4C0D2E34.4000309@budcat.com> Message-ID: On Mon, Jun 7, 2010 at 1:36 PM, Heath Kehoe wrote: > Instead of that change to out_of_date?(), I think there should be a > different return value for Task#timestamp that indicates a missing > file (instead of EARLY like it does now). Perhaps nil, or another > 'special' time "LATE" which works like EARLY except LATE>time is > always true. > Yes, my original fix was to replace EARLY with an analogous LATE object in FileTask#timestamp. Although it worked, I was hesitant to make a seemingly fundamental change (in the opposite direction, no less). But it's possible that EARLY has been harmlessly wrong for six years. > I don't think it's necessarily the case that dependencies should be > strictly transitive. I think it's useful to have a task not execute > if its immediate dependencies aren't updated, even if the > dependencies' dependencies were (such as the example I mentioned in > my previous message). > Until reading your last message, I hadn't seen that the first part of your Drake patch was for non-transitivity. That's another way to obtain consistent behavior which had not occurred to me. Initially the "if needed?" line in your patch looks absurd because it creates an interaction between nodes of an in-progress parallel computation. But in this case only nodes which have finished running are examined, so presumably their states are frozen. Do you have this running on a large project, and does it appear to work correctly? My change removes the old workaround, so you should get latest with "gem install drake" or from github (though you'll have to add the "if needed?" clause yourself, for the moment). From hkehoe at budcat.com Mon Jun 7 18:51:25 2010 From: hkehoe at budcat.com (Heath Kehoe) Date: Mon, 07 Jun 2010 17:51:25 -0500 Subject: [Rake-devel] A handful of issues In-Reply-To: References: <4C0D2E34.4000309@budcat.com> Message-ID: <4C0D77ED.6080701@budcat.com> On 6/7/2010 4:16 PM, James M. Lawrence wrote: > > Yes, my original fix was to replace EARLY with an analogous LATE > object in FileTask#timestamp. Although it worked, I was hesitant to > make a seemingly fundamental change (in the opposite direction, no > less). But it's possible that EARLY has been harmlessly wrong for six > years. > > There are only two significant places where FileTask#timestamp is used (that I found) where changing EARLY to LATE would make a difference. One is FileTask#out_of_date? which is only called from FileTask#needed?; and when needed? is called where it really counts, the prerequisites have all been invoked, so the file-not-found condition should not hit. The other place is the base Task#timestamp; which propagates the newest timestamp of its prereqs, or Time.now if there are no prereqs. The ramifications of that timestamp coming back 'LATE' aren't clear to me. Nor is it clear to me why it propagates its prereqs' timestamps as its own. We could always modify that method to substitute EARLY for LATE to preserve the original behavior, which is what I've done. > Do you have this running on a large project, and does it appear to > work correctly? My change removes the old workaround, so you should > get latest with "gem install drake" or from github (though you'll have > to add the "if needed?" clause yourself, for the moment). > > Yes, it has been running on a large project without issue (since not long after I posted the patch to the bug report); and much more recently, I have implemented the LATE change to FileTask#timestamp. Here's what I've got: module Rake class Task def timestamp @prerequisites.collect { |p| t = application[p].timestamp t == Rake::LATE ?Rake::EARLY :t }.max || Time.now end end class FileTask < Task def needed? t = timestamp t == Rake::LATE || out_of_date?(t) end def timestamp begin File.mtime(name) rescue Errno::ENOENT Rake::LATE end end end end Note, this also contains a couple of additional optimizations I added to FileTask's methods to reduce the number of stat syscalls. (With the original code, each call to needed? generated 2n+3 stat calls where n is the number of file prereqs. This code generates n+1 stat calls.) I just grabbed drake-0.8.7.0.2.4 and I'll hammer on it and report back in a couple days. -heath ______________________________________________________________________ This email has been scanned by the MessageLabs Email Security System. For more information please visit http://www.messagelabs.com/email ______________________________________________________________________