pid file handling issue
mfischer at zendesk.com
Thu Oct 24 19:57:22 UTC 2013
On Thu, Oct 24, 2013 at 11:21 AM, Eric Wong <normalperson at yhbt.net> wrote:
> Right, we looked at using rename last year but I didn't think it's possible
> given we need to write the pid file before binding new listen sockets
> But perhaps we can drop the pid file late iff ENV["UNICORN_FD"] is
> detected. I'll see if that can be done w/o breaking compatibility.
My opinion is that supporting backward compatibility cases that are
clearly poorly designed, at least in open-source software, is
ill-advised. (I'm referring to the Mongrel compatibility semantics
discussed in that article.)
That aside, I don't yet understand this "need" you're referring to.
The control flow I'm proposing is as follows:
(1) Previous-generation parent (P) receives SIGUSR2.
(2) P renames unicorn.pid to unicorn.oldpid
(3) P forks child (P'); if fork unsuccessful, P renames unicorn.oldpid
(4) P' calls exec and attempts to start; creates unicorn.pid. P
watches for SIGCHLD from P'. If received, P renames unicorn.oldpid to
(5) P' sends SIGQUIT to P. P' unlinks unicorn.oldpid. P' is now P.
What am I missing here? This is, to my knowledge, precisely what
nginx does (http://wiki.nginx.org/CommandLine#Upgrading_To_a_New_Binary_On_The_Fly).
>> If the file's mtime or inode number changes under my proposal, that
>> means the reload must have been successful. What race condition are
>> you referring to that would render this conclusion inaccurate?
> It doesn't mean the process didn't exit/crash right after writing the PID.
That should not happen per (4) above.
> But NTP syncs early in the boot process before most processes (including
> unicorn) are started. It shouldn't matter, then, right?
Truth be told, I'm not completely certain why this is an issue. My
reading of procps and the kernel suggests it should be doing the right
thing, but I tried this at first:
- Touch a timestamp file before sending P a SIGUSR2.
- Wait for oldpid to disappear
- Read the stime field from ps(1) for the remaining master process (P or P')
- If stime < mtime of timestamp: new process failed. If stime >
mtime, new process succeeded.
But for reasons unclear to me, sometimes the stime of P' (successful
reload) would predate the timestamp! This was obviously agonizing.
>> To reiterate, I'm not using the PID file in this instance to determine
>> Unicorn's PID. It could be empty, for all I care.
> OK. I assume you do the same for nginx?
With nginx we have -t; we can at least test the config file and have a
reasonable degree of certainty that it will reload properly. With
Rack apps, not so much. :)
More information about the mongrel-unicorn