502s with Nginx, Unicorn, and Unix Domain Sockets

Eric Wong normalperson at yhbt.net
Fri Sep 18 02:48:31 EDT 2009


Tom Preston-Werner <tom at github.com> wrote:
> I'm doing some benchmarking on our new Rackspace frontend machines (8
> core, 16GB) and running into some problems with the Unix domain socket
> setup. At high request rates (on simple pages) I'm getting a lot of
> HTTP 502 errors from Nginx. Nothing shows up in the Unicorn error log,
> but Nginx has the following in its error log:

Hi Tom,

At what request rates were you running into this?  Also how large are
your responses?  It could be the listen() backlog overflowing if Unicorn
isn't logging anything.  Anything in the system/kernel logs (doubtful,
actually)?

Does increasing the listen :backlog parameter work?  Default is 1024
(which is pretty high already), maybe try a higher number along with the
net.core.netdev_max_backlog sysctl.

Is there a large discrepancy between the times your benchmark client
logs, the request time nginx logs, and whatever Rails/Rack logs for
request times for any particular request?

If the Rails/Rack logging times all seem consistently low but your
nginx/benchmark has some weird spikes/outliers, then some are stuck in
the kernel listen backlog.

How much of the 8 cores are being used on those boxes when this
starts happening?

> 2009/09/17 19:36:52 [error] 28277#0: *524824 connect() to
> unix:/data/github/current/tmp/sockets/unicorn.sock failed (11:
> Resource temporarily unavailable) while connecting to upstream,
> client: 172.17.1.5, server: github.com, request: "GET /site/junk
> HTTP/1.1", upstream:
> "http://unix:/data/github/current/tmp/sockets/unic
> orn.sock:/site/junk", host: "github.com"

Raising proxy_connect_timeout in nginx may be a work around, what is it
set to now?  On the other hand, keeping it (and :backlog in Unicorn) low
would give better indications for failover to other hosts.

> This problem does not exist with the nginx -> haproxy -> unicorn
> setup. Thinking this might be a file descriptor problem, I upped the
> fd limit to 32768 with no luck. Then I tried upping net.core.somaxconn
> to 262144 which also had no effect. I thought I'd ask about the
> problem here to see if anyone knows a simple solution that I'm
> missing. Perhaps there is an Nginx configuration directive I need?
> Thanks. Unicorn rocks!

Definitely not a file descriptor problem (at least not inside Unicorn).

Also, I'm not sure there's a reason to keep haproxy between nginx
and Unicorn...  Maybe haproxy in front of the entire cluster of servers.

Are you already hitting higher request rates (and more consistent
times logged by client/nginx) with:

  nginx -> unicorn/unix

vs

  nginx -> unicorn/tcp(localhost)

?

Under extremely high loads, 502s may actually be wanted since it allows
failover to a less loaded box if there's uneven balancing; but we really
need to have numbers on the request rates.

-- 
Eric Wong


More information about the mongrel-unicorn mailing list