pid file handling issue
normalperson at yhbt.net
Thu Oct 24 18:21:37 UTC 2013
Michael Fischer <mfischer at zendesk.com> wrote:
> On Wed, Oct 23, 2013 at 7:03 PM, Eric Wong <normalperson at yhbt.net> wrote:
> >> > I read and stash the value of the pid file before issuing any USR2.
> >> > Later, you can issue "kill -0 $old_pid" after sending SIGQUIT
> >> > to ensure it's dead.
> >> That's inherently racy; another process can claim the old PID in the interim.
> > Right, but raciness goes for anything regarding pid files.
> > The OS does make an effort to avoid recycling PIDs too often,
> > and going through all the PIDs in a system quickly is
> > probably rare. I haven't hit it, at least.
> That's not good enough.
> The fact that the pid file contains a pid is immaterial to me; I don't
> even need to look at it. I only care about when it was created, or
> what its inode number is, so that I can detect whether Unicorn was
> last successfully started or restarted. rename(2) is atomic per POSIX
> and is not subject to race conditions.
Right, we looked at using rename last year but I didn't think it's possible
given we need to write the pid file before binding new listen sockets
But perhaps we can drop the pid file late iff ENV["UNICORN_FD"] is
detected. I'll see if that can be done w/o breaking compatibility.
> >> > Checking the mtime of the pidfile is really bizarre...
> >> Perhaps (though it's a normative criticism), but on the other hand, it
> >> isn't subject to the race above.
> > It's still racy in a different way, though (file could change right
> > after checking).
> If the file's mtime or inode number changes under my proposal, that
> means the reload must have been successful. What race condition are
> you referring to that would render this conclusion inaccurate?
It doesn't mean the process didn't exit/crash right after writing the PID.
> > Having the process start time in /proc be unreliable because the server
> > has the wrong time is also in the same category of corner cases.
> This is absolutely not true. A significant minority, if not a
> majority, of servers will have at least slightly inaccurate wall
> clocks on boot. This is usually corrected during boot by an NTP sync,
> but by then the die has already been cast insofar as ps(1) output is
But NTP syncs early in the boot process before most processes (including
unicorn) are started. It shouldn't matter, then, right?
> > Also, can you check the inode of the /proc/$pid entry? Perhaps
> That's not portable.
> > PID files are horrible, really :<
> To reiterate, I'm not using the PID file in this instance to determine
> Unicorn's PID. It could be empty, for all I care.
OK. I assume you do the same for nginx?
More information about the mongrel-unicorn