A barrage of unexplained timeouts

Eric Wong normalperson at yhbt.net
Tue Aug 20 20:42:44 UTC 2013


nick at auger.net wrote:
> "Eric Wong" <normalperson at yhbt.net> said:
> > This is really strange.  This was only really bad for a 7s period?
> 
> It was a 7 minute period.  All of the workers would become busy and
> exceed their >120s timeout.  Master would kill and re-spawn them,
> they'd start to respond to a handful of requests (anywhere from 5-50)
> after which they'd become "busy" again, and get force killed by
> master.  This pattern happened 3 times over a 7 minute period.
> 
> > Has it happened again?
> 
> No
> 
> > Anything else going on with the system at that time?  Swapping,
> > particularly...
> 
> No swap activity or high load.  Our munin graphs indicate a peak of
> web/app server disk latency around that time, although our graphs show
> many other similar peaks, without incident.

I'm stumped :<

Do you have any background threads running that could be hanging the
workers?   This is Ruby 1.8, after all, so there's more likely to be
some blocking call hanging the entire process.  AFAIK, some monitoring
software runs a background thread in the unicorn worker and maybe the
OpenSSL extension doesn't work as well if it encountered network
problems under Ruby 1.8

Otherwise, I don't know...


More information about the mongrel-unicorn mailing list