More unexplained timeouts

Eric Wong normalperson at yhbt.net
Mon Sep 30 00:06:17 UTC 2013


nick at auger.net wrote:
> We're still suffering from unexplained workers timing out.  We
> recently upgraded to the latest unicorn 4.6.3 (while still on REE
> 1.8.7) in the hopes that it would solve our issues.  Unfortunately,
> this seemed to exacerbate the problem, with timeouts happening more
> frequently, but that could be related to greater precision in timeouts
> in newer versions of unicorn.  (In our unicorn 3.6.2, a timeout set to
> 120s might not ACTUALLY timeout until 180s or more, thus allowing a
> bit more time for Ruby to finish whatever it was choking on.)

Yes, there were some fixes in 4.x to improve the timeout accuracy.

> We dropped the timeout down to 65s (to make sure it was triggered) and
> then tried to add greater logging (per
> http://permalink.gmane.org/gmane.comp.lang.ruby.unicorn.general/1269.)
> The START/FINISH approach confirms it's not an issue with our
> application code, ie:
> 
> HH:MM:SS- S/F[PID]- /PATH
> 15:21:01- START-25904- /pathA
> 15:21:01- FINISH-25904- /pathA
> 15:21:01- START-25904- /pathB
> 15:21:01- FINISH-25904- /pathB
> 15:21:01- START-25904- /pathC
> 15:21:01- FINISH-25904- /pathC
> worker=11 PID:25904 timeout (66s > 65s), killing
> reaped #<Process::Status: pid=25904,signaled(SIGKILL=9)> worker=11
> 
> For each START we always get a corresponding FINISH and then the
> worker is killed.  Additionally, our nginx logs confirm that this last
> request was sent back to the client.  No 'upstream' errors in our
> nginx log, either.
> 
> When we tried the Thread sleep approach, nothing actually appeared in
> the logs.  I imagine this means that ruby or some C extension is
> misbehaving.

Sounds like it.  1.8 and old C extensions could easily lock up the
interpreter on blocking calls.

Another problem could be using new versions of C extensions that are no
longer tested under 1.8.  I admit I haven't tested recent versions of
unicorn/kgio/raindrops on 1.8 lately, either, but I'm _fairly_ sure they
still work since they haven't changed much.

> Unfortunately, it's been impossible for us to recreate this in
> development.  

Are you running any different gems/extensions in development vs
production?

> Thoughts?
> 
> RHEL 5.6
> REE 1.8.7 2011.12
> Unicorn 4.6.3
> 16 unicorn workers on 8 cores
> No swap activity, no peaks in load

What other gems/extensions do you use?


More information about the mongrel-unicorn mailing list