Unicorn is killing our rainbows workers
samuel.kadolph at shopify.com
Wed Jul 18 23:06:07 UTC 2012
On Wed, Jul 18, 2012 at 5:52 PM, Eric Wong <normalperson at yhbt.net> wrote:
> Samuel Kadolph <samuel.kadolph at jadedpixel.com> wrote:
>> Hey rainbows-talk,
>> We have 40 servers that each run rainbows with 2 workers with 100
>> threads using ThreadPool. We're having an issue where unicorn is
>> killing the worker process. We use ThreadTimeout (set to 70 seconds)
>> and originally had the unicorn timeout set to 150 seconds and we're
>> seeing unicorn eventually killing each worker. So we bumped the
>> timeout to 300 seconds and it took about 5 minutes but we started
>> seeing unicorn starting to kill workers again. You can see our stderr
>> log file (timeout at 300s) at
>> https://gist.github.com/9ec96922e55a59753997. Any insight into why
>> unicorn is killing our ThreadPool workers would help us greatly. If
>> you require additional info I would be happy to provide it.
> Which Ruby version/patchlevel are you using? 1.8 and 1.9 have vastly
> different thread implementations and workarounds to deal with.
> What C extensions are you using?
> ThreadTimeout might also be conflicting with some libraries you use and
> causing deadlocks. Also, ThreadTimeout might not be a good idea with
> many common libraries which:
> 1) use the stdlib Timeout internally
> 2) rely on ensure clauses firing
> ThreadTimeout turns out to be difficult to use correctly with existing
> code, so it may not be appropriate for you. Your app should use
> localized timeouts as much as possible (using timeout mechanisms built
> into libraries you use).
> Also, please don't use private gist (especially when posting to public
> mailing list), it requires a github account to clone from and I'll
> never require (nor encourage :P) needing any website account
> for contributing to Rainbows!, just an email address.
We're running ruby 1.9.3-p125 with the performance patches at
https://gist.github.com/1688857. I listed the gems we use and which
ones that have c extension at https://gist.github.com/3139226.
We'll try running without the ThreadTimeout. We don't think we're
having deadlock issues because our stress tests do not timeout but
they do 502 when the rainbows worker gets killed during a request.
More information about the rainbows-talk