[Mongrel] problems with apache 2.2 proxying to mongrel cluster

Michael Kovacs kovacs at gmail.com
Tue Jan 2 15:09:30 EST 2007

Hi all,

I've been having problems with the apache 2.2-mod_proxy_balancer- 
mongrel setup.

My setup is:

CentOS 4.3
apache 2.2.3 (compiled from source) with mod_proxy_balancer
mysql 4.1
ruby 1.8.4
mongrel 0.3.14 (I know I need to update but I think this problem is  
independent of the mongrel version)
mongrel_cluster 0.2.0
rails_machine 0.1.1

I have apache setup as per Coda's configuration on his blog posting  
from several months back.

I have 4 mongrels in my cluster.

Things work fine for periods of time but after several hours of  
inactivity (I think 8 hours or so) I experience oddness where only 1  
of the 4 mongrels is properly
responding. I end up getting a "500 internal server error" 3 out of 4  
requests as they round robin from mongrel to mongrel. There is  
nothing in the production
log file nor in the mongrel log. I've reproduced this problem on my  
staging box as well as my production box.

The last time I reproduced the problem I decided to run "top" and see  
what's going on when I hit the server.
Mongrel does receive every request but mysql is only active on the 1  
request that works. In the other mongrels it never spikes up in CPU  

Looking at the mysql process list revealed that all of the processes  
had received the "sleep" command but one of the processes is still
working properly. I've played with connection timeouts other than to  
set the timeout in my application's environment
(ActiveRecord::Base.verification_timeout = 14400) as well as the  
mysql interactive_timeout variable but it seems that all the mongrels  
should work or they shouldn't. The fact that 1 out of 4 always works  
is rather puzzling to me.

Trying a 'killall -USR1 mongrel_rails" to turn debug on simply killed  
the 4 threads running mongrel. So now I'm running the cluster in  
debug mode and am going to just let it sit there for several hours  
until it happens again and hopefully get some idea of where the  
breakdown is happening. I still think it has to be a mysql connection  
timeout but again, the fact that 1 of the 4 always works doesn't lend  
credence to the timeout theory.

Has anyone experienced this phenomenon themselves?

Thanks for any tips/pointers and thanks Zed for all your hard work  
with mongrel.


