Page request roundtrip time increases substantially after a bit of use

chris mckenzie kristopolous at yahoo.com
Mon Jan 24 16:03:25 EST 2011


--- Note ---
I'm sorry if you get this twice.  I was sending this email from Yahoo, and I 
guess they defaulted to HTML for the mail.  My apologies if this was the case.  
If not, please ignore this message.  Thank you for your time.
--- Note ---


Hi,

First of all, let me thank all of you for creating such a wonderful product.  
Rainbows is a unique solution and is the perfect candidate to solve our complex 
problems.  I don't know where this current project could possibly be without 
your fine work.  :-)

Now about the problem.  First a little background on the architecture, so you 
can get the context:

I'm dealing with some code that I can't just legally paste for example (although 

I can probably make a simple proof of concept if needed) ... Here's the design:

I have a cascading long-poll connection, which listens for various JSON 
messages.  The throughput is quite low (a few a second) and I have a policy of 
falling over based on either a large amount of traffic, or a predefined amount  
of time (30 seconds) lapsing.

Essentially I have a 30 second connection and at second 25, a new one opens up 
... the first one closes, and that one lasts for 30 seconds, etc. 


This is designed for web browsers and it's implemented through hidden iframes.

This is a problem that has persisted across Firefox 3.6/OS X and Firefox 3.6/XP 
and Firefox 4.0b6/XP along with IE8/XP and Chrome/OS X and FF 3.6/Ubuntu so I 
don't think that any nuance of a browser or OS could be considered culpable.

I'm also not using any middle webserver like nginx and am connecting directly to 

rainbows!

*** The Problem ***
When serving static files along with my long polled connections, I will get a 
round trip time of ten millisecond or so, usually.  This is perfectly 
acceptable.

Every now and then, however, it will be about 2.5 seconds.  This will then be 
followed by a bunch of the snappy millisecond level transaction  times.  This is 

one browser with 1 persistent connection against rainbows configured with 10 
worker_processes and 100 worker_connections.

Everything changes about 5-10 minutes into things. 

Then every transaction takes about 2-4 seconds.  Static files that are 10 bytes 
in size, 2-4 seconds. Ruby code to emit "Hello World"? 2-4 seconds.  Every 
request.  Still using just 1 browser.

After I exit all browsers and then do a netstat on the client machine to see 
that the connections have closed,  I can then do a curl command for a static 
file; again 2-4 seconds.

On the machine running rainbows if I do a netstat, I get this:

tcp        1      0 10.10.192.12:7788       10.10.131.165:17443     CLOSE_WAIT
tcp        1      0  10.10.192.12:7788       10.10.131.165:17352     CLOSE_WAIT
tcp        1    196 10.10.192.12:7788       10.10.131.165:17317     CLOSE_WAIT
tcp        1    196 10.10.192.12:7788       10.10.131.165:17310     CLOSE_WAIT
tcp        1      0 10.10.192.12:7788       10.10.131.165:17437     CLOSE_WAIT
tcp        1      0 10.10.192.12:7788       10.10.131.165:17366     CLOSE_WAIT
tcp        1      0  10.10.192.12:7788       10.10.131.165:17410     CLOSE_WAIT
tcp        1      0 10.10.192.12:7788       10.10.131.165:17447     CLOSE_WAIT
tcp        1      0 10.10.192.12:7788       10.10.131.165:17357     CLOSE_WAIT
tcp        1      0 10.10.192.12:7788       10.10.131.165:17449     CLOSE_WAIT
tcp        1      0 10.10.192.12:7788       10.10.131.165:17347     CLOSE_WAIT
^^^ rainbows is running on 7788 on this machine ^^^.  

The reference  machine, in this case, windows, claims that all the connections 
are closed:
  Proto  Local Address          Foreign Address        State
  TCP    0.0.0.0:80             0.0.0.0:0              LISTENING
  TCP    0.0.0.0:135            0.0.0.0:0              LISTENING
  TCP    0.0.0.0:443            0.0.0.0:0              LISTENING
  TCP     0.0.0.0:445            0.0.0.0:0              LISTENING
  TCP    0.0.0.0:6060           0.0.0.0:0              LISTENING
  TCP    0.0.0.0:63508          0.0.0.0:0              LISTENING
  TCP    10.10.131.165:139      0.0.0.0:0              LISTENING
  TCP    10.10.131.165:16841    187.39.33.180:11827    ESTABLISHED
  TCP    10.10.131.165:16844     69.63.181.105:5222     ESTABLISHED
  TCP    10.10.131.165:16849    10.10.10.71:5222       ESTABLISHED
  TCP    10.10.131.165:16883    64.12.29.50:443        ESTABLISHED
  TCP    10.10.131.165:16904    64.12.28.222:443       ESTABLISHED
  TCP    10.10.131.165:16915    64.12.165.99:443       ESTABLISHED
  TCP    10.10.131.165:16918    64.12.202.37:443       ESTABLISHED
  TCP    10.10.131.165:16958    205.188.248.151:443    ESTABLISHED
  TCP    10.10.131.165:16961    205.188.254.83:443      ESTABLISHED
  TCP    10.10.131.165:17102    10.10.131.136:22       ESTABLISHED
  TCP    10.10.131.165:17466    10.10.192.12:22        ESTABLISHED
  TCP    10.10.131.165:17470    10.0.0.29:515          SYN_SENT
  TCP    127.0.0.1:1030         0.0.0.0:0              LISTENING
  TCP    192.168.56.1:139       0.0.0.0:0              LISTENING
  TCP    [::]:135                [::]:0                 LISTENING       0

I can disconnect the windows client machine; turn it off even, and this problem 
persists.

I think that somewhere in the ruby stack, the connections are not closing.  If I 

increase my worker_process count and prolong the long poll, then yes, I'll 
survive for 15 minutes instead of 5; but the problem will still eventually occur 

and I will hit the wall.

I have yet to try to test unicorn or zbatery for this style of solution because 
I need the keep-alive; and although I know that unicorn put in the keep-alive 
support for rainbows, I haven't really taken the time necessary to know how to 
invoke it.  If you think this would be instructive, I'd be happy to do so.

As far as my ruby set-up, I'm using ruby1.8 and I have the ThreadSpawn model.  
We are running on a modern  version of Ubuntu without any serious customization.

If you think that another configuration would do the trick or if you know how to 

squash this bug, it would be
very helpful.   This problem has become of great concern for us.  We love 
rainbows and all that it is. Thanks for the project and keep up the good work.

Cheers,
~chris.


      


More information about the rainbows-talk mailing list