feature request - when_ready() hook
sunaku at gmail.com
Thu Nov 26 14:05:08 EST 2009
On Wed, Nov 25, 2009 at 10:05 PM, Eric Wong <normalperson at yhbt.net> wrote:
> Suraj Kurapati <sunaku at gmail.com> wrote:
>> * before_fork() -- approx. 2 minute downtime
>> * after_fork() -- approx. 2 minute downtime
>> * storing the old-master-killing logic inside a lambda in after_fork()
>> (for the last worker only) and later executing that lambda in Rails'
>> config.after_initialize() hook -- approx. 20 second downtime
> I'm looking at those times and can't help but wonder if there's
> something very weird/broken with your setup.. 20 seconds is already an
> eternity (even with preload_app=false), but 2 minutes?(!).
Yes, I am using preload_app=false. These delays mainly come from
establishing DB connections and loading XML datasets into the Rails
app. Our production DBs are pretty slow to give out new connections.
The startup time is much faster in development, where I use SQLite.
Please note that the reported downtimes are shocking only because they
were *visible* downtimes, where the last worker of the new Unicorn
master killed the old Unicorn master too soon. IMHO, it doesn't
matter how long it takes for the Rails app to become ready, so long as
the old Unicorn master + workers continue to exist & service requests
until the new Unicorn master + workers take over.
> Are you doing per-process listeners and retrying? The new ones could be
> fighting for a port held by the old workers... Other than that...
No, I have one listen() call (on a UNIX socket) at the top level of my
Unicorn configuration file. Nothing fancy.
> I have many questions, because those times look extremely scary to me
> and I wonder if such a hook would only be masking the symptoms of
> a bigger problem.
> What kind of software/hardware stack are you running?
> (please don't say NSLU2 :)
The hardware is some kind of VM running on a VM server farm, running CentOS 4.
The software is Ruby 1.9.1-p243 with Rails 2.3.3, running on Unicorn
0.95.1, behind Nginx, behind M$ IIS.
> How many workers?
> How heavy is traffic on the site when you're deploying?
About 15 to 20 users.
> How long does it take for a single worker to get ready and start
> serving requests?
Approximately 2 minutes.
> Are you using preload_app? It should be faster if you do, but there
> really appears to be something else wrong based on those times.
I was for a few weeks, but I stopped because the XML dataset loading
(see above) kept increasing the master's (and the new set of workers')
memory footprint by 1.5x every time Unicorn was restarted via SIGUSR2.
>> As you can see, the more I delayed the execution of that "killing the
>> old master" logic, the closer I got to zero downtime deploys. In this
>> manner, I request the addition of a when_ready() hook which is
>> executed just after Unicorn prints "worker=# ready!" to its error log
>> inside Unicorn::HttpServer#worker_loop().
> At this stage, maybe even implementing something as middleware and
> making it hook into request processing (that way you really know the
> worker is really responding to requests) is the way to go...
Hmm, but that would incur a penalty on each request (check if I've
already killed the old master and do it if necessary). I'm pretty
confident that killing the old master in the when_ready() hook will be
Good Enough for my setup (at most I expect to see 1-2 second
"down"time). Let me try this out and I'll tell you the results &
submit a patch.
>> my unicorn setup does not run very close to the memory limit of
>> its host; instead, the amount of free memory is more than double of
>> the current unicorn memory footprint, so I can safely spawn a second
>> set of Unicorn master + workers (via SIGUSR2) without worrying about
>> the SIGTTOU before_fork() strategy shown in the Unicorn configuration
> Given your memory availability, I wouldn't even worry about the
> automated killing of the old workers.
> Automatically killing old workers means you need a redeploy to roll back
> changes, whereas if you SIGWINCH the old set away, you can HUP the old
> master to bring them back in case the new set is having problems.
Wow this is cool. Perhaps this strategy could be mentioned in the
Thanks for your consideration.
More information about the mongrel-unicorn