A barrage of unexplained timeouts

Eric Wong normalperson at yhbt.net
Tue Aug 20 18:49:02 UTC 2013


nick at auger.net wrote:
> "Eric Wong" <normalperson at yhbt.net> said:
> > 
> > Do you have any other requests in your logs which could be taking
> > a long time and hogging workers, but not high enough to trigger the
> > unicorn kill timeout.

> I don't *think* so.  Most requests finish <300ms.  We do have some
> more intensive code-paths, but they're administrative and called much
> less frequently.  Most of these pages complete in <3seconds.
> 
> For requests that made it to rails logging, the LAST processed request
> before the worker timed-out all completed very quickly (and no real
> pattern in terms of which page may be triggering it.)

This is really strange.  This was only really bad for a 7s period?
Has it happened again?  Anything else going on with the system at that
time?  Swapping, particularly...

And if you're inside a VM, maybe your neighbors were hogging things.
Large PUT/POST requests which require filesystem I/O are particularly
sensitive to this.

> > Is this with Unix or TCP sockets?  If it's over a LAN, maybe there's
> > still a bad switch/port/cable somewhere (that happens often to me).
> 
> TCP sockets, with nginx and unicorn running on the same box.

OK, that probably rules out a bunch of problems.
Just to be thorough, anything interesting in dmesg or syslogs?

> > With Unix sockets, I don't recall encountering recent problems under
> > Linux.  Which OS are you running?
> 
> Stock RHEL 5, kernel 2.6.18.

RHEL 5.0 or 5.x?  I can't remember /that/ far back to 5.0 (I don't think
I even tried it until 5.2), but don't recall anything being obviously
broken in those...


More information about the mongrel-unicorn mailing list