pid file handling issue

Eric Wong normalperson at yhbt.net
Thu Oct 24 18:21:37 UTC 2013


Michael Fischer <mfischer at zendesk.com> wrote:
> On Wed, Oct 23, 2013 at 7:03 PM, Eric Wong <normalperson at yhbt.net> wrote:
> 
> >> > I read and stash the value of the pid file before issuing any USR2.
> >> > Later, you can issue "kill -0 $old_pid" after sending SIGQUIT
> >> > to ensure it's dead.
> >>
> >> That's inherently racy; another process can claim the old PID in the interim.
> >
> > Right, but raciness goes for anything regarding pid files.
> >
> > The OS does make an effort to avoid recycling PIDs too often,
> > and going through all the PIDs in a system quickly is
> > probably rare.  I haven't hit it, at least.
> 
> That's not good enough.
> 
> The fact that the pid file contains a pid is immaterial to me; I don't
> even need to look at it.  I only care about when it was created, or
> what its inode number is, so that I can detect whether Unicorn was
> last successfully started or restarted.  rename(2) is atomic per POSIX
> and is not subject to race conditions.

Right, we looked at using rename last year but I didn't think it's possible
given we need to write the pid file before binding new listen sockets

  http://mid.gmane.org/20121127215146.GA23452@dcvr.yhbt.net

But perhaps we can drop the pid file late iff ENV["UNICORN_FD"] is
detected.  I'll see if that can be done w/o breaking compatibility.

> >> > Checking the mtime of the pidfile is really bizarre...
> >>
> >> Perhaps (though it's a normative criticism), but on the other hand, it
> >> isn't subject to the race above.
> >
> > It's still racy in a different way, though (file could change right
> > after checking).
> 
> If the file's mtime or inode number changes under my proposal, that
> means the reload must have been successful.   What race condition are
> you referring to that would render this conclusion inaccurate?

It doesn't mean the process didn't exit/crash right after writing the PID.

> > Having the process start time in /proc be unreliable because the server
> > has the wrong time is also in the same category of corner cases.
> 
> This is absolutely not true.  A significant minority, if not a
> majority, of servers will have at least slightly inaccurate wall
> clocks on boot.  This is usually corrected during boot by an NTP sync,
> but by then the die has already been cast insofar as ps(1) output is
> concerned.

But NTP syncs early in the boot process before most processes (including
unicorn) are started.  It shouldn't matter, then, right?

> > Also, can you check the inode of the /proc/$pid entry?  Perhaps
> 
> That's not portable.
> 
> > PID files are horrible, really :<
> 
> To reiterate, I'm not using the PID file in this instance to determine
> Unicorn's PID.  It could be empty, for all I care.

OK.  I assume you do the same for nginx?


More information about the mongrel-unicorn mailing list