Unicorn is killing our rainbows workers
normalperson at yhbt.net
Thu Jul 19 21:31:25 UTC 2012
Samuel Kadolph <samuel.kadolph at shopify.com> wrote:
> On Thu, Jul 19, 2012 at 4:16 PM, Eric Wong <normalperson at yhbt.net> wrote:
> > Samuel Kadolph <samuel.kadolph at shopify.com> wrote:
> > > On Wed, Jul 18, 2012 at 8:26 PM, Eric Wong <normalperson at yhbt.net> wrote:
> > > > Samuel Kadolph <samuel.kadolph at shopify.com> wrote:
> > > >> On Wed, Jul 18, 2012 at 5:52 PM, Eric Wong <normalperson at yhbt.net> wrote:
> > > >> > Samuel Kadolph <samuel.kadolph at jadedpixel.com> wrote:
> > > >> >> https://gist.github.com/9ec96922e55a59753997. Any insight into why
> > > >> >> unicorn is killing our ThreadPool workers would help us greatly. If
> > > >> >> you require additional info I would be happy to provide it.
> > > >
> > > > Also, are you using "preload_app true" ?
> > >
> > > Yes we are using preload_app true.
> > >
> > > > I'm a bit curious how these messages are happening, too:
> > > > D, [2012-07-18T15:12:43.185808 #17213] DEBUG -- : waiting 151.5s after
> > > > suspend/hibernation
> > >
> > > They are strange. My current hunch is the killing and that message are
> > > symptoms of the same issue. Since it always follows a killing.
> > I wonder if there's some background thread one of your gems spawns on
> > load that causes the master to stall. I'm not seeing how else unicorn
> > could think it was in suspend/hibernation.
> > Anyways, I'm happy your problem seems to be fixed with the mysql2
> > upgrade :)
> Unfortunately that didn't fix the problem. We had a large sale today
> and had 2 502s. We're going to try p194 on next week and I'll let you
> know if that fixes it.
Are you seeing the same errors as before in stderr for those?
Can you also try disabling preload_app?
But before disabling preload_app, you can also check a few things on
a running master?
* "lsof -p <pid_of_master>"
To see if there's odd connections the master is making.
* Assuming you're on Linux, can you also check for any other threads
the master might be running (and possibly stuck on)?
The output should be 2 directories:
If you have a 3rd entry, you can confirm something in your app one of
your gems is spawning a background thread which could be throwing
the master off...
> > > Our ops guys say we had this problem before we were using ThreadTimeout.
> > OK. That's somewhat reassuring to know (especially since the culprit
> > seems to be an old mysql2 gem). I've had other users (privately) report
> > issues with recursive locking because of ensure clauses (e.g.
> > Mutex#synchronize) that I forgot to document.
> We're going to try going without ThreadTimeout again to make sure
> that's not the issue.
Btw, I also suggest any Rails/application-level logs include the PID and
timestamp of the request. This way you can see and correlate the worker
killing the request to when/if the Rails app stopped processing
More information about the rainbows-talk