[Mongrel] one mongrel with hundreds of CLOSE_WAIT tcp connections

yossarian1 at gmail.com yossarian1 at gmail.com
Wed Jul 18 22:22:14 EDT 2007


Hi, I'm running into a strange issue where one mongrel will sometimes
develop hundreds of CLOSE_WAIT TCP connections, mostly to apache (I think --
see sample lsof output below).    I haven't had a chance to get the mongrel
with this behavior into USR1 debug mode yet.    I didn't catch it in time.
This happens a couple times a day on average at seemingly random times.  The
problem goes away within a minute or two, probably after a restart of the
mongrel.

I'm probably doing something crazy to cause this behavior, but I'm having
trouble figuring out exactly what the problem is.  It probably has to do
with the fact that my mongrels get files off of amazon s3 for some
requests.   We do HTTPClient.get(url) for some s3 urls.    I'm setting up
dnsmasq now, by the way, but it's not up yet.

My next steps are to get the mongrel into USR1 debugging mode and to see
what actions are causing the problem, and to install dnsmasq and cacti.  I
think I've got a good guess which action is responsible -- it's probably the
one that gets the files from s3, but I'll make sure.

If you have any thoughts or other ideas, please let me know.  Thanks a ton
for your help!


Some sample output from lsof:

lsof -i -P | grep CLOSE_ | grep mongrel

CLOSE_WAIT --mysite
mongrel_r   831    root    6u  IPv4 95162945       TCP localhost.localdomain
:8011->localhost.localdomain:59311 (CLOSE_WAIT)
mongrel_r   831    root    9u  IPv4 95161753       TCP
mysite.com:49269->xxx-xxx-xxx-xxx.amazon.com:80<http://xxx-xxx-xxx-xxx.amazon.com/>(CLOSE_WAIT)
mongrel_r   831    root   11u  IPv4 95162093       TCP mysite.com:49339->
xxx-xxx-xxx-xxx.amazon.com:80 <http://xxx-xxx-xxx-xxx.amazon.com/>(CLOSE_WAIT)
mongrel_r   831    root   14u  IPv4 95162202       TCP mysite.com:49373->
xxx-xxx-xxx-xxx.amazon.com:80 <http://xxx-xxx-xxx-xxx.amazon.com/>(CLOSE_WAIT)
mongrel_r   831    root   15u  IPv4 95162229       TCP mysite.com:49380->
xxx-xxx-xxx-xxx.amazon.com:80 <http://xxx-xxx-xxx-xxx.amazon.com/>(CLOSE_WAIT)
mongrel_r   831    root   16u  IPv4 95162319       TCP
mysite.com:49399->xxx-xxx-xxx-xxx.amazon.com:80<http://xxx-xxx-xxx-xxx.amazon.com/>(CLOSE_WAIT)
mongrel_r   831    root   17u  IPv4 95162477       TCP mysite.com:49436->
xxx-xxx-xxx-xxx.amazon.com:80 <http://xxx-xxx-xxx-xxx.amazon.com/>(CLOSE_WAIT)
mongrel_r   831    root   19u  IPv4 95163082       TCP localhost.localdomain
:8011->localhost.localdomain:59348 (CLOSE_WAIT)
mongrel_r   831    root   20u  IPv4 95163221       TCP localhost.localdomain
:8011->localhost.localdomain :59387 (CLOSE_WAIT)
mongrel_r   831    root   21u  IPv4 95163360       TCP localhost.localdomain
:8011->localhost.localdomain:59426 (CLOSE_WAIT)
mongrel_r   831    root   22u  IPv4 95161592       TCP mysite.com:49227 ->
xxx-xxx-xxx-xxx.amazon.com:80 <http://xxx-xxx-xxx-xxx.amazon.com/>(CLOSE_WAIT)
mongrel_r   831    root   23u  IPv4 95163507       TCP localhost.localdomain
:8011->localhost.localdomain :59463 (CLOSE_WAIT)
mongrel_r   831    root   24u  IPv4 95163675       TCP localhost.localdomain
:8011->localhost.localdomain:59495 (CLOSE_WAIT)
mongrel_r   831    root   25u  IPv4 95164041       TCP
localhost.localdomain:8011->
localhost.localdomain:59586 (CLOSE_WAIT)
mongrel_r   831    root   26u  IPv4 95164181       TCP localhost.localdomain
:8011->localhost.localdomain:59618 (CLOSE_WAIT)
mongrel_r   831    root   27u  IPv4 95164293       TCP localhost.localdomain
:8011->localhost.localdomain:59641 (CLOSE_WAIT)
mongrel_r   831    root   28u  IPv4 95164441       TCP localhost.localdomain
:8011->localhost.localdomain:59670 (CLOSE_WAIT)
mongrel_r   831    root   29u  IPv4 95164607       TCP localhost.localdomain
:8011->localhost.localdomain:59705 (CLOSE_WAIT)
mongrel_r   831    root   30u  IPv4 95164748       TCP localhost.localdomain
:8011->localhost.localdomain:59746 (CLOSE_WAIT)
mongrel_r   831    root   31u  IPv4 95164895       TCP localhost.localdomain
:8011->localhost.localdomain:59786 (CLOSE_WAIT)
mongrel_r   831    root   32u  IPv4 95165064       TCP localhost.localdomain
:8011->localhost.localdomain:59830 (CLOSE_WAIT)


etc.  this goes on for 700 lines, where the mongrel on port 8011 has roughly
700 CLOSE_WAIT TCP connections to the 30-60k port range (to apache, I
believe).   All of these close_waits are for the mongrel on port 8011, in
this case.    Also, any ideas what's going on with the close_wait
connections to amazon s3?



lsof -i -P | grep CLOSE_ | grep mongrel | wc -l
703

netstat  | grep 56586       # an example port
tcp        1      0 localhost.localdomain:8011  localhost.localdomain:56586
CLOSE_WAIT
tcp        0      0 localhost.localdomain :56586 localhost.localdomain:8011
FIN_WAIT2
getnameinfo failed
getnameinfo failed
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://rubyforge.org/pipermail/mongrel-users/attachments/20070718/1aef129a/attachment.html 


More information about the Mongrel-users mailing list