Bugs: Browse | Submit New | Admin
I think I found something that looks like a fundamental problem with vlad's remote_task handling. suppose I have a remote_task T1 that *invokes* remote_task T2 as part of its code. Also suppose I run task T1 on multiple nodes H1 and H2. The problem is the *first* host to invoke T2 will invoke it on *all* nodes, even on those that didn't yet finish the part of T1 preceding invocation of T2. So if T2 depends on the code running in T1 it will fail on some of the nodes. This is not a hypothetical situation. This is what happens with vlad:update invoking vlad:update_symlinks. The problem is that vlad:update_symlinks depends on vlad:update preparing the source directories before it runs. so if on one of the nodes the checkout and export operation finishes faster then on the other one, it will invoke vlad:update_symlinks on *all* nodes, even before the source code is checked out on the other noes. Vlad-1.4.0 (btw, there's no appropriate 'group' for it.)
Add A Comment:
Date: 2009-08-08 14:57 Sender: No One /me sighs I was going to say that invoke should not cause this problem if it comes before any run/sudo commands because the invoke should block until the invoked task finishes for all target hosts (I'm not entirely sure this a true statement, but I don't have a setup for testing this handy). However, as you pointed out, Vitaly, the update remote task in core.rb invokes update_symlinks after a run command. That could explain why you had the problem with the slower server when you used it across multiple servers: remote_task :update, :roles => :app do symlink = false begin run [ "cd #{scm_path}", "#{source.checkout revision, scm_path}", "#{source.export revision, release_path}", "chmod -R g+w #{latest_release}", "rm -rf #{latest_release}/log #{latest_release}/public/system #{latest_release}/tmp/pids", "mkdir -p #{latest_release}/db #{latest_release}/tmp" ].join(" && ") # Potential problem here - some target hosts may not # have completed the run above before this task gets # invoked by the most responsive target host. Rake::Task['vlad:update_symlinks'].invoke I don't have a fix off the top of my head, but some options to throw out: 1. Some kind of "synch" command in a remote task that blocks for all target hosts until all target hosts reach it to insert in the update task before the invoke command. 2. Move update_symlinks logic outside of the update_symlinks remote_task and call as a method call from the update task (perhaps passing reference to the remote task so you can then call task.run in the extracted logic) here so that as this executes for each target host, it waits until the run command has completed for that target host.
Date: 2009-08-07 06:58 Sender: Vitaly Kushner jeez. yes, I did have an actual problem. since then we upgraded the slower server and it doesn't happen anymore. The problem described quite exactly. update_symlinks failed since 'tmp' directory was not yet created on one of the computers (the one that didn't reach the update_symlink invokation *yet*) you can ignore it all you want.
Date: 2009-08-06 18:18 Sender: Ryan Davis Do you have an _actual_ problem or not? I don't see a reproduction. I don't see a stack trace. i see a lot of theorizing. Please provide a concrete example WITH backtrace and we can work with it. If not, I'm going to have to close this bug until someone does.
Date: 2009-08-06 14:04 Sender: Vitaly Kushner ok, I see. you are right, when writing the example I had a messed mental model of the execution process. For a moment I forgot that although the commands for the remote task run on the remote hosts, the tasks themselves run on the localhost. So you are right and my example is wrong. BUT The original bug report is correct and you correctly summarized the core issue. And that would be it, BUT the problem is not with *my* code. The problem is with vlad:update from the core.rb. It directly invokes vlad:update_symlinks, so I guess the fix will be to split it into vlad:update_code, vlad:update_symlinks, and vlad:symlink or smth like that. with the proper dependencies between them. I guess .invoke will produce the reported problem in most cases. And it is used quite a lot in the current vlad recipes. grep from the 1.4.0 codebase: apache.rb:27: Rake::Task['vlad:start_app'].invoke apache.rb:28: Rake::Task['vlad:start_web'].invoke apache.rb:34: Rake::Task['vlad:stop_app'].invoke apache.rb:35: Rake::Task['vlad:stop_web'].invoke core.rb:36: Rake::Task['vlad:setup_app'].invoke core.rb:62: Rake::Task['vlad:update_symlinks'].invoke core.rb:110: $ rake vlad:invoke COMMAND='uptime'".cleanup core.rb:112: remote_task :invoke do core.rb:159: Rake::Task['vlad:start'].invoke lighttpd.rb:75: Rake::Task['vlad:start_app'].invoke lighttpd.rb:76: Rake::Task['vlad:start_web'].invoke lighttpd.rb:82: Rake::Task['vlad:stop_app'].invoke lighttpd.rb:83: Rake::Task['vlad:stop_web'].invoke merb.rb:48: Rake::Task['vlad:stop_app'].invoke merb.rb:49: Rake::Task['vlad:start_app'].invoke nginx.rb:40: Rake::Task['vlad:start_app'].invoke nginx.rb:41: Rake::Task['vlad:start_web'].invoke nginx.rb:45: Rake::Task['vlad:stop_app'].invoke nginx.rb:46: Rake::Task['vlad:stop_web'].invoke
Date: 2009-08-06 04:12 Sender: No One Vitaly, have you actually tested your scenario where t2 would run multiple times against each target host? If you use: Rake::Task[ :task ].invoke In rake, it will only execute any task once, no matter how many times you invoke it. Now, if you use Rake::Task[ :task ].execute, that's a different story. I do not think this is Vlad-specific. The same would be true of a regular, non-remote task. Now, on your original problem report, you were talking about something different, I think. You were saying that if you invoke a remote task from a remote task, the first target host to get through all its run/sudo commands first will launch the second remote task, which will start running on all the target hosts, perhaps before the first task has completed for all the other target hosts. That is an issue if you code it that way, but rake is dependency patterned rather than imperative patterned, which is to say that the canonical rake pattern would be to declare that t2 depends on t1 rather than have t1 imperatively invoke t2. You could do this by having an invoke call to t1 as the start of t2, but the more canonical rake way would be to declare the dependency in the definition of t2: remote_task :t2 => :t1 I believe vlad will hold up on proceeding with t2 in that case until *all* target hosts have completed t1. If it does not and t2 fires on *all* target hosts once the first target host gets through t1, then we do have a vlad issue, but a very solvable one. Does that help?
Date: 2009-07-28 01:14 Sender: Vitaly Kushner I guess if you try to vlad:update to 2 servers one of which is significantly faster then the other you will get the same problem. lets review the vlad:update process: * checkout into scm_path * export into release_path * remove log, public/system and tmp/pids dirs * create db and tmp dir ******* invoke vlad:update_symlinks and vlad:update_symlinks will try to link shared_path/pids into latest_release/tmp/pids So we have a classical rase condition. if :update_symlinks runs before :update created the 'tmp' dir, the link into tmp/pids will fail due to missing parent directory. Note that simply changing order of things will help, but this will be a wrong solution IMHO. The real problem is that if you have X hosts then vlad:update_symlinks will run X times ON EACH OF THEM! I'm not sure what is the proper solution for that. What *might* work is excluding the OTHER hosts running SAME task as the current from the invocation list. This still wont be the full solution. consider the following: host "H1", :foo, :bar host "H2", :foo, :bar host "H3", :bar task :t1, :roles => :foo do Rake::Task[:t2].invoke end task :t2, :roles => :bar do ... end with the current implementation task :t2 will run TWISE on each of the 3 hosts. If we exclude the other hosts then it will run once on H1 and H2, but it will still run twise on H3. i.e. t1 running on H1 will invoke :t2 on H1 and H3, and :t1 running on H2 will invoke it on H1 and H3, so we get H3 twice. In fact I'm not sure there is a good solution for this. I think the model is fundamentally broken. The better way I think would be to only go to the remote host ONCE and run all the tasks locally there. i.e. the only remote_task should be :deploy, which should do the rest locally on the machine it runs. but this might be too big a change for vlad to bear, thought I plan to explore it in my own deployment script I'm working on.
Date: 2009-07-27 23:53 Sender: Ryan Davis John deals with hard issues... I'm assigning to him. I will say that we can't see how to repro yet. If you could provide more details that'd be great.