502 bad gateway on nginx with recv() failed

Naresh V nareshov at gmail.com
Sat Oct 23 00:48:07 EDT 2010


On 23 October 2010 02:44, Eric Wong <normalperson at yhbt.net> wrote:
> Naresh V <nareshov at gmail.com> wrote:
>> Hi,
>>
>> I'm serving the puppetmaster application with its config.ru through
>> unicorn - proxied by nginx.
>> I'm using unix sockets, 4 workers, and 2048 backlog.
>>
>> The clients - after their typical "puppet run" - send back a report to
>> the master in YAML.
>> Some clients whose reports tend to be large (close to 2mb) get a 502
>> bad gateway error and error out.
>>
>> nginx log:
>>
>> 2010/10/22 14:20:27 [error] 19461#0: *17115 recv() failed (104:
>> Connection reset by peer) while reading response header from upstream,
>> client: 1x.yy.zz.x4, server: , request: "PUT /production/report/nagios
>> HTTP/1.1", upstream:
>> "http://unix:/tmp/.sock:/production/report/nagios", host:
>> "puppet:8140"
>
> Hi Naresh, do you see anything in the Unicorn stderr log file?

Hi Eric, I think I caught it:

E, [2010-10-22T23:03:30.207455 #10184] ERROR -- : worker=2 PID:10206
timeout (60.207392s > 60s), killing
I, [2010-10-22T23:03:31.212533 #10184]  INFO -- : reaped
#<Process::Status: pid=10206,signaled(SIGKILL=9)> worker=2
I, [2010-10-22T23:03:31.214768 #10490]  INFO -- : worker=2 spawned pid=10490
I, [2010-10-22T23:03:31.221748 #10490]  INFO -- : worker=2 ready


> Is the 2mb report part of the response or request?  Unicorn should
> have no problems accepting large requests (Rainbows! defaults the
> client_max_body_size to 1mb, just like nginx).

It's part of the PUT request, I guess.

> It could be Unicorn's internal (default 60s) timeout kicking
> in because puppet is slowly reading/generating the 2mb body.

I raised the timeout first to 120, then 180 - and I continued to get
the 502 (with the logs as above)
When I raised it upto 240, puppetd complained:

#-(1)> puppetd -t -v  --trace
notice: Ignoring --listen on onetime run
info: Caching catalog for nagios
info: Applying configuration version '1287807847'
notice: Finished catalog run in 25.86 seconds
/usr/lib/ruby/1.8/timeout.rb:54:in `rbuf_fill'
/usr/lib/ruby/1.8/timeout.rb:56:in `timeout'
/usr/lib/ruby/1.8/timeout.rb:76:in `timeout'
/usr/lib/ruby/1.8/net/protocol.rb:132:in `rbuf_fill'
/usr/lib/ruby/1.8/net/protocol.rb:116:in `readuntil'
/usr/lib/ruby/1.8/net/protocol.rb:126:in `readline'
/usr/lib/ruby/1.8/net/http.rb:2020:in `read_status_line'
/usr/lib/ruby/1.8/net/http.rb:2009:in `read_new'
/usr/lib/ruby/1.8/net/http.rb:1050:in `request'
/usr/lib/ruby/1.8/net/http.rb:1037:in `request'
/usr/lib/ruby/1.8/net/http.rb:543:in `start'
/usr/lib/ruby/1.8/net/http.rb:1035:in `request'
/usr/lib/ruby/1.8/net/http.rb:857:in `put'
/usr/lib/ruby/site_ruby/1.8/puppet/indirector/rest.rb:90:in `save'
/usr/lib/ruby/site_ruby/1.8/puppet/indirector/indirection.rb:253:in `save'
/usr/lib/ruby/site_ruby/1.8/puppet/indirector.rb:64:in `save'
/usr/lib/ruby/site_ruby/1.8/puppet/configurer.rb:178:in `send_report'
/usr/lib/ruby/site_ruby/1.8/puppet/configurer.rb:172:in `run'
/usr/lib/ruby/site_ruby/1.8/puppet/agent.rb:39:in `run'
/usr/lib/ruby/site_ruby/1.8/puppet/agent/locker.rb:21:in `lock'
/usr/lib/ruby/site_ruby/1.8/puppet/agent.rb:39:in `run'
/usr/lib/ruby/1.8/sync.rb:229:in `synchronize'
/usr/lib/ruby/site_ruby/1.8/puppet/agent.rb:39:in `run'
/usr/lib/ruby/site_ruby/1.8/puppet/agent.rb:103:in `with_client'
/usr/lib/ruby/site_ruby/1.8/puppet/agent.rb:37:in `run'
/usr/lib/ruby/site_ruby/1.8/puppet/application.rb:171:in `call'
/usr/lib/ruby/site_ruby/1.8/puppet/application.rb:171:in `controlled_run'
/usr/lib/ruby/site_ruby/1.8/puppet/agent.rb:35:in `run'
/usr/lib/ruby/site_ruby/1.8/puppet/application/agent.rb:114:in `onetime'
/usr/lib/ruby/site_ruby/1.8/puppet/application/agent.rb:88:in `run_command'
/usr/lib/ruby/site_ruby/1.8/puppet/application.rb:287:in `run'
/usr/lib/ruby/site_ruby/1.8/puppet/application.rb:393:in `exit_on_fail'
/usr/lib/ruby/site_ruby/1.8/puppet/application.rb:287:in `run'
/usr/sbin/puppetd:4
err: Could not run Puppet configuration client: execution expired


> Also, which version of Unicorn and nginx is this?

unicorn (1.1.4) and nginx-0.8.49-1.el5

>> I was getting the same thing earlier when I had unicorn listening on
>> TCP sockets instead of UNIX sockets. And I had a lot of connections in
>> TIME_WAIT:
>>
>> tcp        0      0 127.0.0.1:8141              127.0.0.1:54507
>>      TIME_WAIT   -
>> tcp        0      0 127.0.0.1:8141              127.0.0.1:57322
>>      TIME_WAIT   -
>>
>> Fluctuating all the way from 20 to 800. A quick restart of nginx
>> tended to bring the number down.
>
> Having many TIME_WAIT sockets is normal and expected when you're
> starting/stopping lots of TCP connections.  It's nothing to worry about
> unless you get several thousands of requests/second, and then you should
> apply tcp_tw_reuse/tcp_tw_recycle as mentioned in
> http://unicorn.bogomips.org/TUNING.html (or switch to UNIX domain
> sockets and use nginx for keepalive).
>
> --
> Eric Wong


-Naresh V.


More information about the mongrel-unicorn mailing list