Master repeatedly killing workers due to timeouts

Eric Wong normalperson at yhbt.net
Thu Dec 8 15:23:40 EST 2011


Jonathan del Strother <maillist at steelskies.com> wrote:
> Hi,
> We're using unicorn as a Rails server on Solaris, and it's been
> running great for several months.  We've recently been having a few
> problems and I'm at a loss what might cause it.  A number of times in
> the past few days, our unicorn slaves keep timing out & the master
> keeps restarting them.  unicorn.log looks something like :

Which versions of Ruby and Unicorn are you running?

> E, [2011-12-08T18:11:32.912237 #26661] ERROR -- : worker=5 PID:15367
> timeout (61s > 60s), killing
> E, [2011-12-08T18:11:32.952041 #26661] ERROR -- : reaped
> #<Process::Status: pid 15367 SIGKILL (signal 9)> worker=5
> I, [2011-12-08T18:11:32.985925 #17824]  INFO -- : worker=5 ready

The time between the "killing" and "ready" lines is small enough
to suggest the app isn't getting killed while it's loading

> E, [2011-12-08T18:11:42.962869 #26661] ERROR -- : worker=3 PID:15499
> timeout (61s > 60s), killing

Most of these timeouts are several minutes apart.  How many requests
do you service per second/minute during these times?

<snip>

> - which seems like quite a lot of timeouts.  However, our database
> connection, network, NFS etc all seem fine.  There's no useful errors
> in the rails log files.  The master still serves occasional requests,
> but drops a high percentage of them.

Since the errors you showed us were several minutes apart,
"high percentage" suggests your app sits idle for longish periods
of time.  Is there any external dependency that doesn't like periods
of idleness?

60s is an eternity for a typical HTTP request for Rails.

> Can anyone suggest where to look into this?  I'm not even sure if that
> timeout is occurring during the initial fork of the worker process, or
> if it's timing out during a slow rails request.

It seems to be the latter based on the timestamps.

You can try auditing potential slow paths in your code, and maybe
add alerts/timeouts to something lower than 60s.  Also, take a
look at: http://unicorn.bogomips.org/Application_Timeouts.html
if you haven't already.

I've also seen inexperienced developers do things like:
"SELECT * FROM huge_table" to kill servers in production because
their test fixtures only had 30 rows in huge_table while the
production DB has millions of rows.

MySQL has the ability to log slow queries, other DBs should, too.  Can
you check to see if you don't have any of those taking a long time?


In a related note, I also released the terrible_timeout RubyGem
yesterday: http://bogomips.org/terrible_timeout/
I hate it, but at least it documents is failings :)


More information about the mongrel-unicorn mailing list