[Mongrel] qrp followup, deployment results

Eric Wong normalperson at yhbt.net
Thu Apr 17 19:34:01 EDT 2008


I posted about qrp (http://qrp.rubyforge.org/) many weeks ago, but only
deployed it to a live site a few weeks ago (after a few bug fixes
leading up to qrp v0.4.0).

qrp was deployed late on 2008-03-27 to roughly half our servers, and
then fully deployed on 2008-03-28.  So far, the results have been
fairly decent (see below).

The only change I needed to make to Mongrel was the following
patch to disable the excessive logging since I disabled concurrency
in Mongrel:

--- a/mongrel.rb	2008-03-03 16:42:04.000000000 -0800
+++ b/mongrel.rb	2008-04-17 15:30:57.313952784 -0700
@@ -210,7 +210,7 @@
     # after the reap is done.  It only runs if there are workers to reap.
     def reap_dead_workers(reason='unknown')
       if @workers.list.length > 0
-        STDERR.puts "#{Time.now}: Reaping #{@workers.list.length} threads for slow workers because of '#{reason}'"
+        #STDERR.puts "#{Time.now}: Reaping #{@workers.list.length} threads for slow workers because of '#{reason}'"
         error_msg = "Mongrel timed out this thread: #{reason}"
         mark = Time.now
         @workers.list.each do |worker|
@@ -278,7 +278,7 @@
               worker_list = @workers.list
   
               if worker_list.length >= @num_processors
-                STDERR.puts "Server overloaded with #{worker_list.length} processors (#@num_processors max). Dropping connection."
+                #STDERR.puts "Server overloaded with #{worker_list.length} processors (#@num_processors max). Dropping connection."
                 client.close rescue nil
                 reap_dead_workers("max processors")
               else

As far as I know, we're the only Rails site running qrp in our
configuration, but it should be safe now that we're doing it ;)


Results:

  While I don't have hard numbers for average response time and standard
  deviation, they have not changed much since qrp was deployed.

  However, our metric for requests taking over 10 seconds has improved
  greatly since qrp was deployed[1].

  The Date is actually shifted by one day (so the report that I received
  on 2008-03-02 was actually for the previous days traffic).

  While a few hundredths of one percent doesn't sound like a lot, that's
  still a reasonable amount of unhappy users that get bogged down.

Date       | % of requests taking >10s, (0-100)
2008-03-01 | 0.1192 | *****************
2008-03-02 | 0.1537 | ***********************
2008-03-03 | 0.0634 | *********
2008-03-04 | 0.1094 | ****************
2008-03-05 | 0.1241 | ******************
2008-03-06 | 0.1075 | ****************
2008-03-07 | 0.1086 | ****************
2008-03-08 | 0.1664 | ************************
2008-03-09 | 0.1647 | ************************
2008-03-10 | 0.0705 | **********
2008-03-11 | 0.1190 | *****************
2008-03-12 | 0.1754 | **************************
2008-03-13 | 0.1202 | ******************
2008-03-14 | 0.1351 | ********************
2008-03-15 | 0.1463 | *********************
2008-03-16 | 0.1468 | **********************
2008-03-17 | 0.1425 | *********************
2008-03-18 | 0.1271 | *******************
2008-03-19 | 0.1260 | ******************
2008-03-20 | 0.1209 | ******************
2008-03-21 | 0.1438 | *********************
2008-03-23 | 0.1139 | *****************
2008-03-24 | 0.0916 | *************
2008-03-25 | 0.1469 | **********************
2008-03-26 | 0.1316 | *******************
2008-03-26 | 0.1323 | *******************
2008-03-27 | 0.1397 | ********************
2008-03-28 | 0.0927 | *************                <partial qrp deployment>
2008-03-29 | 0.0425 | ******                       <full qrp deployment>
2008-03-30 | 0.0440 | ******
2008-03-31 | 0.0461 | ******
2008-04-01 | 0.0357 | *****
2008-04-02 | 0.0319 | ****
2008-04-03 | 0.0325 | ****
2008-04-04 | 0.0314 | ****
2008-04-05 | 0.0664 | *********
2008-04-05 | 0.0652 | *********
2008-04-06 | 0.0823 | ************
2008-04-07 | 0.0605 | *********
2008-04-08 | 0.0553 | ********
2008-04-09 | 0.0537 | ********
2008-04-10 | 0.1166 | *****************           <something broke this day>
2008-04-11 | 0.0512 | *******
2008-04-12 | 0.0546 | ********
2008-04-13 | 0.0619 | *********
2008-04-14 | 0.0519 | *******
2008-04-15 | 0.0421 | ******
2008-04-16 | 0.0441 | ******
2008-04-17 | 0.0409 | ******

We had some internal problems on 2008-04-10 so things went to hell that
day.

Once again, qrp is needed for a Rails site I work on because:

  a) we unfortunately use a web service run by folks who suck at the
     Internet.  Unfortunately the tech folks like myself have little
     control of this.

  b) One of our internal backend services have some pathologically
     bad corner cases we occasionally hit.  Eliminating them
     isn't possible due to strange business requirements (and some
     of the troublesome backend code is proprietary and we can't
     improve it).

[1] yes, I realize that saying that the number of >10s responses have
dropped is like saying we've won the Special Olympics :)

-- 
Eric Wong


More information about the Mongrel-users mailing list