Unicorn is killing our rainbows workers

Eric Wong normalperson at yhbt.net
Thu Jul 19 20:16:33 UTC 2012


Samuel Kadolph <samuel.kadolph at shopify.com> wrote:
> On Wed, Jul 18, 2012 at 8:26 PM, Eric Wong <normalperson at yhbt.net> wrote:
> > Samuel Kadolph <samuel.kadolph at shopify.com> wrote:
> >> On Wed, Jul 18, 2012 at 5:52 PM, Eric Wong <normalperson at yhbt.net> wrote:
> >> > Samuel Kadolph <samuel.kadolph at jadedpixel.com> wrote:
> >> >> https://gist.github.com/9ec96922e55a59753997. Any insight into why
> >> >> unicorn is killing our ThreadPool workers would help us greatly. If
> >> >> you require additional info I would be happy to provide it.
> >
> > Also, are you using "preload_app true" ?
> 
> Yes we are using preload_app true.
> 
> > I'm a bit curious how these messages are happening, too:
> > D, [2012-07-18T15:12:43.185808 #17213] DEBUG -- : waiting 151.5s after
> > suspend/hibernation
> 
> They are strange. My current hunch is the killing and that message are
> symptoms of the same issue. Since it always follows a killing.

I wonder if there's some background thread one of your gems spawns on
load that causes the master to stall.  I'm not seeing how else unicorn
could think it was in suspend/hibernation.

> > Can you tell (from Rails logs) if the to-be-killed workers are still
> > processing requests/responses the 300s before when the unicorn timeout
> > hits it?  AFAIK, Rails logs the PID of each worker processing the
> > request.
> 
> rails doesn't log the pid but it would seem that after upgrading to
> mysql 0.2.18 it is no longer killing workers that are busy with
> requests.

Oops, I think I've been spoiled into thinking the Hodel3000CompliantLogger
is the default Rails logger :)

> > If anything, I'd lower the unicorn timeout to something low (maybe
> > 5-10s) since that detects hard lockups at the VM level.  Individual
> > requests in Rainbows! _are_ allowed to take longer than the unicorn
> > timeout.
> 
> We lowered the unicorn timeout to 5 seconds and but that did not
> change the killings but they seem to be happening less often. I have
> some of our stderr logs after setting the timeout to 5 seconds at
> https://gist.github.com/3144250.

Thanks for trying that!

> > Newer versions of mysql2 should avoid potential issues with
> > ThreadTimeout/Timeout (or anything that hits Thread#kill).  I think
> > mysql2 0.2.9 fixed a fairly important bug, and 0.2.18 fixed a very rare
> > (but possibly related to your issue) bug,
> 
> Upgrading mysql2 seems to have stopped unicorn from killing workers
> that are currently busy. We were stress testing it last night and
> after we upgraded to 0.2.18 we had no more 502s from the app but this
> could be a coincidence since the killings are still happen.

Alright, good to know 0.2.18 solved your problems.  Btw, have you
noticed any general connectivity issues to your MySQL server?
There were quite a few bugfixes from 0.2.6..0.2.18, though.

Anyways, I'm happy your problem seems to be fixed with the mysql2
upgrade :)

> Our ops guys say we had this problem before we were using ThreadTimeout.

OK.  That's somewhat reassuring to know (especially since the culprit
seems to be an old mysql2 gem).  I've had other users (privately) report
issues with recursive locking because of ensure clauses (e.g.
Mutex#synchronize) that I forgot to document.


More information about the rainbows-talk mailing list