A barrage of unexplained timeouts

nick at auger.net nick at auger.net
Wed Aug 21 13:33:58 UTC 2013

"Eric Wong" <normalperson at yhbt.net> said:
> nick at auger.net wrote:
>> "Eric Wong" <normalperson at yhbt.net> said:
>> > nick at auger.net wrote:
>> >> "Eric Wong" <normalperson at yhbt.net> said:
>> > I'm stumped :<
>> I was afraid you'd say that :(.
> Actually, another potential issue is DNS lookups timing out.  But they
> shouldn't take *that* long...
>> > Do you have any background threads running that could be hanging the
>> > workers?   This is Ruby 1.8, after all, so there's more likely to be
>> > some blocking call hanging the entire process.  AFAIK, some monitoring
>> > software runs a background thread in the unicorn worker and maybe the
>> > OpenSSL extension doesn't work as well if it encountered network
>> > problems under Ruby 1.8
>> We don't explicitly create any threads in our rails code.  We do
>> communicate with backgroundrb worker processes, although, none of the
>> strangeness today involved any routes that would hit backgroundrb
>> workers.
> I proactively audit every piece of code (including external
> libraries/gems) loaded by an app for potentially blocking calls (hits to
> the filesystem, socket calls w/o timeout/blocking).   I use strace to
> help me find that sometimes...
>> Is there any instrumentation that I could add that might help
>> debugging in the future? ($request_time and $upstream_response_time
>> are now in my nginx logs.)  We have noticed these "unexplainable
>> timeouts" before, but typically for a single worker.  If there's some
>> debugging that could be added I might be able to track it down during
>> these one-off events.
> As an experiment, can you replay traffic a few minutes leading up to and
> including that 7m period in a test setup with only one straced worker?

I've replayed this in a development environment without triggering the issue.  Lack of POST payloads and slight differences between the two environments, make this difficult to test.

> Run "strace -T -f -o $FILE -p $PID_OF_WORKER" and see if there's any
> unexpected/surprising dependencies (connect() to unrecognized addresses,
> open() to networked filesystems, fcntl locks, etc...).
> You can play around with some other strace options (-v/-s SIZE/-e filters)

I'll do my best to look at the output for anything out of the ordinary.  Unfortunately, strace is a new tool for me and is a bit out of my wheelhouse.

More information about the mongrel-unicorn mailing list