Bugs: Browse | Submit New | Admin

[#26730] problem with task dependencies with multiple deployment targets.

Date:
2009-07-23 23:52
Priority:
3
Submitted By:
Vitaly Kushner (vitaly)
Assigned To:
John Barnette (jbarnette)
Category:
vlad
State:
Open
Summary:
problem with task dependencies with multiple deployment targets.

Detailed description
I think I found something that looks like a fundamental problem with vlad's remote_task handling.

suppose I have a remote_task T1 that *invokes* remote_task T2 as part of its code.
Also suppose I run task T1 on multiple nodes H1 and H2. The problem is the *first* host to invoke T2 will invoke it
on *all* nodes, even on those that didn't yet finish the part of T1 preceding invocation of T2. So if T2 depends on
the code running in T1 it will fail on some of the nodes.

This is not a hypothetical situation. This is what happens with vlad:update invoking vlad:update_symlinks. The problem
is that vlad:update_symlinks depends on vlad:update preparing the source directories before it runs. so if on one of
the nodes the checkout and export operation finishes faster then on the other one, it will invoke vlad:update_symlinks
on *all* nodes, even before the source code is checked out on the other noes.


Vlad-1.4.0 (btw, there's no appropriate 'group' for it.)

Add A Comment: Notepad

Please login


Followup

Message
Date: 2009-08-08 14:57
Sender: No  One

/me sighs

I was going to say that invoke should not cause this problem
if it comes before any run/sudo commands because the invoke should
block until the invoked task finishes for all target hosts (I'm
not entirely sure this a true statement, but I don't have a setup
for testing this handy). However, as you pointed out, Vitaly,
the update remote task in core.rb invokes update_symlinks after
a run command. That could explain why you had the problem with
the slower server when you used it across multiple servers:

remote_task :update, :roles => :app do
    symlink = false
    begin
      run [ "cd #{scm_path}",
            "#{source.checkout revision, scm_path}",
            "#{source.export revision, release_path}",
            "chmod -R g+w #{latest_release}",
            "rm -rf #{latest_release}/log
#{latest_release}/public/system #{latest_release}/tmp/pids",
            "mkdir -p #{latest_release}/db
#{latest_release}/tmp"
          ].join(" && ")

      # Potential problem here - some target hosts may not 
      # have completed the run above before this task gets 
      # invoked by the most responsive target host.
      Rake::Task['vlad:update_symlinks'].invoke

I don't have a fix off the top of my head, but some options to
throw out:

1. Some kind of "synch" command in a remote task that
blocks for all target hosts until all target hosts reach it to
insert in the update task before the invoke command.
2. Move update_symlinks logic outside of the update_symlinks
remote_task and call as a method call from the update task (perhaps
passing reference to the remote task so you can then call task.run
in the extracted logic) here so that as this executes for each
target host, it waits until the run command has completed for
that target host.
Date: 2009-08-07 06:58
Sender: Vitaly Kushner

jeez. yes, I did have an actual problem. since then we upgraded
the slower
server and it doesn't happen anymore. The problem described quite
exactly.
update_symlinks failed since 'tmp' directory was not yet created
on one of the
computers (the one that didn't reach the update_symlink invokation
*yet*)

you can ignore it all you want.
Date: 2009-08-06 18:18
Sender: Ryan Davis

Do you have an _actual_ problem or not? I don't see a reproduction.
I don't see a stack trace. i see a lot of theorizing.

Please provide a concrete example WITH backtrace and we can work
with it. If not, I'm going to have to close this bug until someone
does.
Date: 2009-08-06 14:04
Sender: Vitaly Kushner

ok, I see. you are right, when writing the example I had a messed
mental
model of the execution process. For a moment I forgot that although
the
commands for the remote task run on the remote hosts, the tasks
themselves
run on the localhost.

So you are right and my example is wrong. 

BUT

The original bug report is correct and you correctly summarized
the core
issue. And that would be it, BUT
the problem is not with *my* code. The problem is with vlad:update
from the
core.rb. It directly invokes vlad:update_symlinks, so I guess
the fix will be to
split it into vlad:update_code, vlad:update_symlinks, and vlad:symlink
or
smth like that. with the proper dependencies between them. 

I guess .invoke will produce the reported problem in most cases.
And it is
used quite a lot in the current vlad recipes. grep from the 1.4.0
codebase:

apache.rb:27:    Rake::Task['vlad:start_app'].invoke
apache.rb:28:    Rake::Task['vlad:start_web'].invoke
apache.rb:34:    Rake::Task['vlad:stop_app'].invoke
apache.rb:35:    Rake::Task['vlad:stop_web'].invoke
core.rb:36:    Rake::Task['vlad:setup_app'].invoke
core.rb:62:      Rake::Task['vlad:update_symlinks'].invoke
core.rb:110:      $ rake vlad:invoke
COMMAND='uptime'".cleanup
core.rb:112:  remote_task :invoke do
core.rb:159:    Rake::Task['vlad:start'].invoke
lighttpd.rb:75:    Rake::Task['vlad:start_app'].invoke
lighttpd.rb:76:    Rake::Task['vlad:start_web'].invoke
lighttpd.rb:82:    Rake::Task['vlad:stop_app'].invoke
lighttpd.rb:83:    Rake::Task['vlad:stop_web'].invoke
merb.rb:48:    Rake::Task['vlad:stop_app'].invoke
merb.rb:49:    Rake::Task['vlad:start_app'].invoke
nginx.rb:40:    Rake::Task['vlad:start_app'].invoke
nginx.rb:41:    Rake::Task['vlad:start_web'].invoke
nginx.rb:45:    Rake::Task['vlad:stop_app'].invoke
nginx.rb:46:    Rake::Task['vlad:stop_web'].invoke
Date: 2009-08-06 04:12
Sender: No  One

Vitaly, have you actually tested your scenario where t2 would
run multiple times against each target host? If you use:

Rake::Task[ :task ].invoke 

In rake, it will only execute any task once, no matter how many
times you invoke it.

Now, if you use Rake::Task[ :task ].execute, that's a different
story.

I do not think this is Vlad-specific. The same would be true
of a regular, non-remote task.

Now, on your original problem report, you were talking about
something different, I think. You were saying that if you invoke
a remote task from a remote task, the first target host to get
through all its run/sudo commands first will launch the second
remote task, which will start running on all the target hosts,
perhaps before the first task has completed for all the other
target hosts.

That is an issue if you code it that way, but rake is dependency
patterned rather than imperative patterned, which is to say that
the canonical rake pattern would be to declare that t2 depends
on t1 rather than have t1 imperatively invoke t2.

You could do this by having an invoke call to t1 as the start
of t2, but the more canonical rake way would be to declare the
dependency in the definition of t2:

remote_task :t2 => :t1

I believe vlad will hold up on proceeding with t2 in that case
until *all* target hosts have completed t1. If it does not and
t2 fires on *all* target hosts once the first target host gets
through t1, then we do have a vlad issue, but a very solvable
one.

Does that help?
Date: 2009-07-28 01:14
Sender: Vitaly Kushner

I guess if you try to vlad:update to 2 servers one of which is
significantly
faster then the other you will get the same problem.

lets review the vlad:update process:

* checkout into scm_path
* export into release_path
* remove log, public/system and tmp/pids dirs
* create db and tmp dir
******* invoke vlad:update_symlinks

and vlad:update_symlinks will try to link shared_path/pids into 
latest_release/tmp/pids

So we have a classical rase condition. if :update_symlinks runs
before :update
created the 'tmp' dir, the link into tmp/pids will fail due to
missing parent
directory.

Note that simply changing order of things will help, but this
will be a wrong
solution IMHO. The real problem is that if you have X hosts then
vlad:update_symlinks will run X times ON EACH OF THEM!

I'm not sure what is the proper solution for that. 

What *might* work is excluding the OTHER hosts running SAME task
as the
current from the invocation list.
This still wont be the full solution. consider the following:

host "H1", :foo, :bar
host "H2", :foo, :bar
host "H3", :bar

task :t1, :roles => :foo do
   Rake::Task[:t2].invoke
end

task :t2, :roles => :bar do
  ...
end


with the current implementation task :t2 will run TWISE on each
of the 3
hosts. If we exclude the other hosts then it will run once on
H1 and H2, but it
will still run twise on H3. i.e. t1 running on H1 will invoke
:t2 on H1 and H3,
and :t1 running on H2 will invoke it on H1 and H3, so we get
H3 twice.

In fact I'm not sure there is a good solution for this. I think
the model is
fundamentally broken. The better way I think would be to only
go to the
remote host ONCE and run all the tasks locally there. i.e. the
only
remote_task should be :deploy, which should do the rest locally
on the
machine it runs. but this might be too big a change for vlad
to bear, thought I
plan to explore it in my own deployment script I'm working on.
Date: 2009-07-27 23:53
Sender: Ryan Davis

John deals with hard issues... I'm assigning to him.

I will say that we can't see how to repro yet. If you could provide
more details that'd be great.

Attached Files:

Name Description Download
No Files Currently Attached

Changes:

Field Old Value Date By
resolution_idNone2009-07-27 23:53zenspider
assigned_tozenspider2009-07-27 23:53zenspider