Unicorn is killing our rainbows workers

Eric Wong normalperson at yhbt.net
Thu Jul 19 21:31:25 UTC 2012


Samuel Kadolph <samuel.kadolph at shopify.com> wrote:
> On Thu, Jul 19, 2012 at 4:16 PM, Eric Wong <normalperson at yhbt.net> wrote:
> > Samuel Kadolph <samuel.kadolph at shopify.com> wrote:
> > > On Wed, Jul 18, 2012 at 8:26 PM, Eric Wong <normalperson at yhbt.net> wrote:
> > > > Samuel Kadolph <samuel.kadolph at shopify.com> wrote:
> > > >> On Wed, Jul 18, 2012 at 5:52 PM, Eric Wong <normalperson at yhbt.net> wrote:
> > > >> > Samuel Kadolph <samuel.kadolph at jadedpixel.com> wrote:
> > > >> >> https://gist.github.com/9ec96922e55a59753997. Any insight into why
> > > >> >> unicorn is killing our ThreadPool workers would help us greatly. If
> > > >> >> you require additional info I would be happy to provide it.
> > > >
> > > > Also, are you using "preload_app true" ?
> > >
> > > Yes we are using preload_app true.
> > >
> > > > I'm a bit curious how these messages are happening, too:
> > > > D, [2012-07-18T15:12:43.185808 #17213] DEBUG -- : waiting 151.5s after
> > > > suspend/hibernation
> > >
> > > They are strange. My current hunch is the killing and that message are
> > > symptoms of the same issue. Since it always follows a killing.
> >
> > I wonder if there's some background thread one of your gems spawns on
> > load that causes the master to stall.  I'm not seeing how else unicorn
> > could think it was in suspend/hibernation.

> > Anyways, I'm happy your problem seems to be fixed with the mysql2
> > upgrade :)
> 
> Unfortunately that didn't fix the problem. We had a large sale today
> and had 2 502s. We're going to try p194 on next week and I'll let you
> know if that fixes it.

Are you seeing the same errors as before in stderr for those?

Can you also try disabling preload_app?

But before disabling preload_app, you can also check a few things on
a running master?

* "lsof -p <pid_of_master>"

  To see if there's odd connections the master is making.

* Assuming you're on Linux, can you also check for any other threads
  the master might be running (and possibly stuck on)?

    ls /proc/<pid_of_master>/task/

  The output should be 2 directories:

    <pid_of_master>/
    <tid_of_timer_thread>/

  If you have a 3rd entry, you can confirm something in your app one of
  your gems is spawning a background thread which could be throwing
  the master off...

> > > Our ops guys say we had this problem before we were using ThreadTimeout.
> >
> > OK.  That's somewhat reassuring to know (especially since the culprit
> > seems to be an old mysql2 gem).  I've had other users (privately) report
> > issues with recursive locking because of ensure clauses (e.g.
> > Mutex#synchronize) that I forgot to document.
> 
> We're going to try going without ThreadTimeout again to make sure
> that's not the issue.

Alright.

Btw, I also suggest any Rails/application-level logs include the PID and
timestamp of the request.  This way you can see and correlate the worker
killing the request to when/if the Rails app stopped processing
requests.


More information about the rainbows-talk mailing list