Problem with binding UNIX listeners before checking PID

Eric Wong normalperson at
Mon Oct 4 00:17:13 EDT 2010

Jordan Ritter <jpr5 at> wrote:
> Howdy.
> I have lately been frustrated by the following use case:
> 	1. Run nginx/unicorn in production, listening on a UNIX socket
> 	with a defined pid file.  Things run good.
> 	2. Someone pushes code, unicorn restarts just fine, workers are
> 	all up and running.
> 	3. But someone is suspicious, or maybe they forget which
> 	box they're logged into, so they invoke unicorn manually.
> 	Same directory, same settings.
> 	4. It looks like the pid file check kicked in, because unicorn
> 	refuses to boot - hey, it's already running, bugger off.  great.
> 	5. BUT, this happened *after* the listener processing: the
> 	manually-invoked unicorn unlinks the real unicorn master's unix
> 	listener, so it's left dead in the water and everybody loses.
> unicorn master doesn't know its listener is actually gone (but lsof shows
> open unix socket fd, netstat shows unix socket still present, so cursory
> investigation is misleading), but nginx keeps spewing ECONNREFUSEDs
> because the unix socket it's hitting belongs to that accidental unicorn
> instance that already decided not to stick around.
> I think this is effectively about a behavioral difference in
> Unicorn::SocketHelper#bind_listen around the handling of UNIX vs. TCP
> sockets (this doesn't happen with TCP sockets because there's no
> unlink/disconnect step), and the fact that HttpServer#start evaluates
> the listener config before the PID path/config.
> Now I see comments in and around HttpServer#initialize talking about races
> wrt binding to the listener and whatnot, and being newish to the codebase
> I admit I haven't yet fully absorbed all the considerations at play.
> But I think it's fair to say that killing the listener(s) (in the UNIX
> socket case) before discovering you shouldn't have run in the first place
> (from the PID file) qualifies as buggy/bad/broken behavior.

Hi Jordan,

Thanks for the detailed bug report.  I knew from experience with other
daemons that lingering UNIX sockets caused troubles for some users, but
I failed to take into account the case where a user mistakenly starting
the process twice.

Yes, getting pid file writing/ordering "right"[1] is very tricky.

> I might suggest simply swapping their processing order in #start, but
> given the complexity of in-place restarts and other race considerations,
> I have doubts solving this would be that easy.

That wouldn't work if pid files weren't in use at all.

> Any thoughts/ideas?

A simpler check would be to use connect(2) (but not make any HTTP request)
to see if the socket is alive.  Patch coming.

[1] - I don't believe there actually is a way to always be right,
      just less bad/broken than the alternatives.
Eric Wong

More information about the mongrel-unicorn mailing list