Page request roundtrip time increases substantially after a bit of use
kristopolous at yahoo.com
Mon Jan 24 16:03:25 EST 2011
--- Note ---
I'm sorry if you get this twice. I was sending this email from Yahoo, and I
guess they defaulted to HTML for the mail. My apologies if this was the case.
If not, please ignore this message. Thank you for your time.
--- Note ---
First of all, let me thank all of you for creating such a wonderful product.
Rainbows is a unique solution and is the perfect candidate to solve our complex
problems. I don't know where this current project could possibly be without
your fine work. :-)
Now about the problem. First a little background on the architecture, so you
can get the context:
I'm dealing with some code that I can't just legally paste for example (although
I can probably make a simple proof of concept if needed) ... Here's the design:
I have a cascading long-poll connection, which listens for various JSON
messages. The throughput is quite low (a few a second) and I have a policy of
falling over based on either a large amount of traffic, or a predefined amount
of time (30 seconds) lapsing.
Essentially I have a 30 second connection and at second 25, a new one opens up
... the first one closes, and that one lasts for 30 seconds, etc.
This is designed for web browsers and it's implemented through hidden iframes.
This is a problem that has persisted across Firefox 3.6/OS X and Firefox 3.6/XP
and Firefox 4.0b6/XP along with IE8/XP and Chrome/OS X and FF 3.6/Ubuntu so I
don't think that any nuance of a browser or OS could be considered culpable.
I'm also not using any middle webserver like nginx and am connecting directly to
*** The Problem ***
When serving static files along with my long polled connections, I will get a
round trip time of ten millisecond or so, usually. This is perfectly
Every now and then, however, it will be about 2.5 seconds. This will then be
followed by a bunch of the snappy millisecond level transaction times. This is
one browser with 1 persistent connection against rainbows configured with 10
worker_processes and 100 worker_connections.
Everything changes about 5-10 minutes into things.
Then every transaction takes about 2-4 seconds. Static files that are 10 bytes
in size, 2-4 seconds. Ruby code to emit "Hello World"? 2-4 seconds. Every
request. Still using just 1 browser.
After I exit all browsers and then do a netstat on the client machine to see
that the connections have closed, I can then do a curl command for a static
file; again 2-4 seconds.
On the machine running rainbows if I do a netstat, I get this:
tcp 1 0 10.10.192.12:7788 10.10.131.165:17443 CLOSE_WAIT
tcp 1 0 10.10.192.12:7788 10.10.131.165:17352 CLOSE_WAIT
tcp 1 196 10.10.192.12:7788 10.10.131.165:17317 CLOSE_WAIT
tcp 1 196 10.10.192.12:7788 10.10.131.165:17310 CLOSE_WAIT
tcp 1 0 10.10.192.12:7788 10.10.131.165:17437 CLOSE_WAIT
tcp 1 0 10.10.192.12:7788 10.10.131.165:17366 CLOSE_WAIT
tcp 1 0 10.10.192.12:7788 10.10.131.165:17410 CLOSE_WAIT
tcp 1 0 10.10.192.12:7788 10.10.131.165:17447 CLOSE_WAIT
tcp 1 0 10.10.192.12:7788 10.10.131.165:17357 CLOSE_WAIT
tcp 1 0 10.10.192.12:7788 10.10.131.165:17449 CLOSE_WAIT
tcp 1 0 10.10.192.12:7788 10.10.131.165:17347 CLOSE_WAIT
^^^ rainbows is running on 7788 on this machine ^^^.
The reference machine, in this case, windows, claims that all the connections
Proto Local Address Foreign Address State
TCP 0.0.0.0:80 0.0.0.0:0 LISTENING
TCP 0.0.0.0:135 0.0.0.0:0 LISTENING
TCP 0.0.0.0:443 0.0.0.0:0 LISTENING
TCP 0.0.0.0:445 0.0.0.0:0 LISTENING
TCP 0.0.0.0:6060 0.0.0.0:0 LISTENING
TCP 0.0.0.0:63508 0.0.0.0:0 LISTENING
TCP 10.10.131.165:139 0.0.0.0:0 LISTENING
TCP 10.10.131.165:16841 188.8.131.52:11827 ESTABLISHED
TCP 10.10.131.165:16844 184.108.40.206:5222 ESTABLISHED
TCP 10.10.131.165:16849 10.10.10.71:5222 ESTABLISHED
TCP 10.10.131.165:16883 220.127.116.11:443 ESTABLISHED
TCP 10.10.131.165:16904 18.104.22.168:443 ESTABLISHED
TCP 10.10.131.165:16915 22.214.171.124:443 ESTABLISHED
TCP 10.10.131.165:16918 126.96.36.199:443 ESTABLISHED
TCP 10.10.131.165:16958 188.8.131.52:443 ESTABLISHED
TCP 10.10.131.165:16961 184.108.40.206:443 ESTABLISHED
TCP 10.10.131.165:17102 10.10.131.136:22 ESTABLISHED
TCP 10.10.131.165:17466 10.10.192.12:22 ESTABLISHED
TCP 10.10.131.165:17470 10.0.0.29:515 SYN_SENT
TCP 127.0.0.1:1030 0.0.0.0:0 LISTENING
TCP 192.168.56.1:139 0.0.0.0:0 LISTENING
TCP [::]:135 [::]:0 LISTENING 0
I can disconnect the windows client machine; turn it off even, and this problem
I think that somewhere in the ruby stack, the connections are not closing. If I
increase my worker_process count and prolong the long poll, then yes, I'll
survive for 15 minutes instead of 5; but the problem will still eventually occur
and I will hit the wall.
I have yet to try to test unicorn or zbatery for this style of solution because
I need the keep-alive; and although I know that unicorn put in the keep-alive
support for rainbows, I haven't really taken the time necessary to know how to
invoke it. If you think this would be instructive, I'd be happy to do so.
As far as my ruby set-up, I'm using ruby1.8 and I have the ThreadSpawn model.
We are running on a modern version of Ubuntu without any serious customization.
If you think that another configuration would do the trick or if you know how to
squash this bug, it would be
very helpful. This problem has become of great concern for us. We love
rainbows and all that it is. Thanks for the project and keep up the good work.
More information about the rainbows-talk