feature request - when_ready() hook
normalperson at yhbt.net
Thu Nov 26 01:05:19 EST 2009
Suraj Kurapati <sunaku at gmail.com> wrote:
> I've been trying to achieve truly transparent zero downtime deploys
> with Unicorn and Rails for some time now (using SIGUSR2 and SIGQUIT
> strategy) and I've always hit the problem of my "last worker sends
> SIGQUIT to the old master" logic being executed way too soon.
> In particular, I tried killing the old master in:
> * before_fork() -- approx. 2 minute downtime
> * after_fork() -- approx. 2 minute downtime
> * storing the old-master-killing logic inside a lambda in after_fork()
> (for the last worker only) and later executing that lambda in Rails'
> config.after_initialize() hook -- approx. 20 second downtime
I'm looking at those times and can't help but wonder if there's
something very weird/broken with your setup.. 20 seconds is already an
eternity (even with preload_app=false), but 2 minutes?(!).
Are you doing per-process listeners and retrying? The new ones could be
fighting for a port held by the old workers... Other than that...
I have many questions, because those times look extremely scary to me
and I wonder if such a hook would only be masking the symptoms of
a bigger problem.
What kind of software/hardware stack are you running?
(please don't say NSLU2 :)
How many workers?
How heavy is traffic on the site when you're deploying?
How long does it take for a single worker to get ready and start
Are you using preload_app? It should be faster if you do, but there
really appears to be something else wrong based on those times.
Thanks in advance.
> As you can see, the more I delayed the execution of that "killing the
> old master" logic, the closer I got to zero downtime deploys. In this
> manner, I request the addition of a when_ready() hook which is
> executed just after Unicorn prints "worker=# ready!" to its error log
> inside Unicorn::HttpServer#worker_loop().
At this stage, maybe even implementing something as middleware and
making it hook into request processing (that way you really know the
worker is really responding to requests) is the way to go...
> I am happy to implement this (with tests) and submit a patch, but I
> first wanted to know your opinion on this approach. (I should note
> that my unicorn setup does not run very close to the memory limit of
> its host; instead, the amount of free memory is more than double of
> the current unicorn memory footprint, so I can safely spawn a second
> set of Unicorn master + workers (via SIGUSR2) without worrying about
> the SIGTTOU before_fork() strategy shown in the Unicorn configuration
Given your memory availability, I wouldn't even worry about the
automated killing of the old workers.
Automatically killing old workers means you need a redeploy to roll back
changes, whereas if you SIGWINCH the old set away, you can HUP the old
master to bring them back in case the new set is having problems.
More information about the mongrel-unicorn