A barrage of unexplained timeouts

Eric Wong normalperson at yhbt.net
Tue Aug 20 21:32:57 UTC 2013


nick at auger.net wrote:
> "Eric Wong" <normalperson at yhbt.net> said:
> > nick at auger.net wrote:
> >> "Eric Wong" <normalperson at yhbt.net> said:
> > I'm stumped :<
> 
> I was afraid you'd say that :(.

Actually, another potential issue is DNS lookups timing out.  But they
shouldn't take *that* long...

> > Do you have any background threads running that could be hanging the
> > workers?   This is Ruby 1.8, after all, so there's more likely to be
> > some blocking call hanging the entire process.  AFAIK, some monitoring
> > software runs a background thread in the unicorn worker and maybe the
> > OpenSSL extension doesn't work as well if it encountered network
> > problems under Ruby 1.8
> 
> We don't explicitly create any threads in our rails code.  We do
> communicate with backgroundrb worker processes, although, none of the
> strangeness today involved any routes that would hit backgroundrb
> workers.

I proactively audit every piece of code (including external
libraries/gems) loaded by an app for potentially blocking calls (hits to
the filesystem, socket calls w/o timeout/blocking).   I use strace to
help me find that sometimes...

> Is there any instrumentation that I could add that might help
> debugging in the future? ($request_time and $upstream_response_time
> are now in my nginx logs.)  We have noticed these "unexplainable
> timeouts" before, but typically for a single worker.  If there's some
> debugging that could be added I might be able to track it down during
> these one-off events.

As an experiment, can you replay traffic a few minutes leading up to and
including that 7m period in a test setup with only one straced worker?

Run "strace -T -f -o $FILE -p $PID_OF_WORKER" and see if there's any
unexpected/surprising dependencies (connect() to unrecognized addresses,
open() to networked filesystems, fcntl locks, etc...).

You can play around with some other strace options (-v/-s SIZE/-e filters)

Maybe you'll find something, there.


More information about the mongrel-unicorn mailing list