[Mongrel] Why Rails + mongrel_cluster + load balancing doesn't work for us and the beginning of a solution

Paul Butcher paul at paulbutcher.com
Wed Sep 20 06:18:53 EDT 2006


We have been searching for a Rails deployment architecture which works for
us for some time. We've recently moved from Apache 1.3 + FastCGI to Apache
2.2 + mod_proxy_balancer + mongrel_cluster, and it's a significant
improvement. But it still exhibits serious performance problems.

We have the beginnings of a fix that we would like to share.

To illustrate the problem, imagine a 2 element mongrel cluster running a
Rails app containing the following simple controller:

  class HomeController < ApplicationController
    def fast
      sleep 1
      render :text => "I'm fast"
    end

    def slow
      sleep 10
      render :text => "I'm slow"
    end
  end

and the following test app 

  #!/usr/bin/env ruby
  require File.dirname(__FILE__) + '/config/boot'
  require File.dirname(__FILE__) + '/config/environment'

  end_time = 1.minute.from_now

  fast_count = 0
  slow_count = 0

  fastthread = Thread.start do
    while Time.now < end_time do
      Net::HTTP.get 'localhost', '/home/fast'
      fast_count += 1
    end
  end

  slowthread = Thread.start do
    while Time.now < end_time do
      Net::HTTP.get 'localhost', '/home/slow'
      slow_count += 1
    end
  end

  fastthread.join
  slowthread.join

  puts "Fast: #{fast_count}"
  puts "Slow: #{slow_count}"

In this scenario, there will be two requests outstanding at any time, one
"fast" and one "slow". You would expect approximately 60 fast and 6 slow
GETs to complete over the course of a minute. This is not what happens;
approximately 12 fast and 6 slow GETs complete per minute.

The reason is that mod_proxy_balancer assumes that it can send multiple
requests to each mongrel and fast requests end up waiting for slow requests,
even if there is an idle mongrel server available.

We've experimented with various different configurations for
mod_proxy_balancer without successfully solving this issue. As far as we can
tell, all other popular load balancers (Pound, Pen, balance) behave in
roughly the same way.

This is causing us real problems. Our user interface is very time-sensitive.
For common user actions, a page refresh delay of more than a couple of
seconds is unacceptable. What we're finding is that if we have (say) a
reporting page which takes 10 seconds to display (an entirely acceptable
delay for a rarely-used report) then our users are seeing similar delays on
pages which should be virtually instantaneous (and would be, if their
requests were directed to idle servers). Worse, we're occasionally seeing
unnecessary timeouts because requests are queuing up on one server.

The real solution to the problem would be to remove Rails' inability to
handle more than one thread. In the absence of that solution, however, we've
implemented (in Ruby) what might be the world's smallest load-balancer. It
only ever sends a single request to each member of the cluster at a time.
It's called HighWire and is available on RubyForge (no Gem yet - it's on the
list of things to do!):

  svn checkout svn://rubyforge.org/var/svn/highwire

Using this instead of mod_proxy_balancer, and running the same test script
above, we see approximately 54 fast and 6 slow requests per minute.

HighWire is very young and has a way to go. It's not had any serious
optimization or testing, and there are a bunch of things that need doing
before it can really be considered production ready. But it does work for
us, and does produce a significant performance improvement.

Please check it out and let us know what you think.

--
paul.butcher->msgCount++

Snetterton, Castle Combe, Cadwell Park...
Who says I have a one track mind?

MSN: paul at paulbutcher.com
AIM: paulrabutcher
Skype: paulrabutcher
LinkedIn: https://www.linkedin.com/in/paulbutcher



More information about the Mongrel-users mailing list