Combating nginx 499 HTTP responses during flash traffic scenario
tom.burns at jadedpixel.com
Mon Oct 29 17:44:27 UTC 2012
We're dealing with an issue with our large-scale deployment of unicorn
& nginx. The problem occurs during "flash" scenarios where we receive
an order magnitude more traffic for up to an hour. Much of this
traffic cannot be cached and it's typical for some of our rails
responses to take a few seconds to generate. Our preference is to
queue incoming requests instead of returning 502 if we can respond
within a reasonable amount of time.
On each of our servers the stack is nginx -> unicorn.
The main connection queue in front of Rails is the unicorn connection
queue. Our problem is that when the traffic hits, the unicorn queue
grows. When users begin hitting refresh, their abandoned requests in
the unicorn queue are still passed to the rails application and
rendered. In this case we see a 200 HTTP response in our Rails log
and a 499 in the nginx access log. Once this starts happening the
problem can compound: app workers are busy rendering pages for clients
who have already detached so response time grows and more users hit
Our nginx config:
6 nginx workers, 1024 worker connections.
Our unicorn config:
70 unicorn workers, 2048 connection backlog.
Our goal is to not have our app process requests for clients that have
already disconnected while their connection is still queued in
unicorn. We also would prefer not to shrink our queue such that we
begin to return 502 when our queue is a few seconds deep.
We're looking at potential solutions to this problem, including:
- modifying unicorn to select() on the client connection after reading
the request, to see if it's been closed upstream, and avoid calling
- Replacing nginx with haproxy and queuing connections there. This
goes against the nginx recommendation at
Any input would be appreciated.
Developer @ Shopify
More information about the mongrel-unicorn