[Mongrel] Mongrel 0.3.13.3 processes occasionally lock up - critical issue

kigsteronline at mac.com kigsteronline at mac.com
Wed Aug 30 18:10:16 EDT 2006


Hello Mongrel gurus!

We are about to deploy our Rails application to production, and even  
though the last couple of weeks of testing went really well, today we  
are seeing a new critical issue - the browser just hangs sometimes  
trying to load a page. When this happens, it seems that one or more  
of the mongrel_rails processes is hanging and not responding, while  
others are (so depending on which process the request is routed to,  
the page loads or not).  The only remedy from this point onward  
appears to be brutal kill and restart of mongrel processes.

I found the following lines in mongrel.log:

Wed Aug 30 14:03:22 PDT 2006: Reaping 1 threads for slow workers  
because of 'shutdown'
Thread #<Thread:0xb79d382c sleep> is too old, killing.
Waiting for 1 requests to finish, could take 60 seconds.Wed Aug 30  
14:03:23 PDT 2006: Reaping 1 threads for slow workers because of  
'shutdown'
Thread #<Thread:0xb78a03ac sleep> is too old, killing.

And apache error log has this:

[Wed Aug 30 14:46:59 2006] [error] [client 64.161.139.66] proxy:  
Error reading from remote server returned by /bookstore, referer: ht
tp://wp01.my-secret-domain.com/create/book

Apache's SSL error log has this:

[Wed Aug 30 14:42:35 2006] [error] [client 64.161.139.66] proxy:  
error reading status line from remote server 127.0.0.1
[Wed Aug 30 14:42:35 2006] [error] [client 64.161.139.66] proxy:  
Error reading from remote server returned by /my/account/save_edit_a
ddress/80


Restarting mongrel_cluster in a regular fashion leaves some of those  
processes running, and subsequent processes fail to start.  Killing  
mongrel_rails processes with 'kill -9' actually clears the cluster  
and after a restart it appears to work fine.  This happened a couple  
of times today, and I am wondering if anyone has seen this behavior -  
and if there is a recommended fix?

- Should I upgrade to the latest unreleased version of Mongrel?  I am  
a little worried that it's also unstable.
- We did not change anything about Rails session handling.  Could  
mongrel be locking up because it is dead-locking on the session files?
- How do I tell what's really going on?
- Can I enable more mongrel logging that is acceptable for production  
use (ie - not the full debugging info)
- Can I turn on PID and date/time logging to mongrel.log so that I  
can see which process is having an issue?

Any advice on how to approach this issue is much appreciated.

Our environment:

Apache 2.2.3 + mod_proxy_balancer + mod_ssl, etc...
Mongrel 0.3.13.3
Ruby 1.8.4
Rails 1.1.6, SslRequirements, etc
Linux 2.6.9-42.ELsmp (RedHat ES4 with all up-to-date patches) on Intel

Mongrel Config:
---
cwd: /data/apps/app1/current
port: "5000"
environment: production
address: 127.0.0.1
pid_file: log/mongrel.pid
servers: 10


Apache Info:

# httpd -v

Server version: Apache/2.2.3
Server built:   Aug 22 2006 10:26:14
wp01[root]# httpd -V
Server version: Apache/2.2.3
Server built:   Aug 22 2006 10:26:14
Server's Module Magic Number: 20051115:3
Server loaded:  APR 1.2.7, APR-Util 1.2.7
Compiled using: APR 1.2.7, APR-Util 1.2.7
Architecture:   32-bit
Server MPM:     Prefork
   threaded:     no
     forked:     yes (variable process count)
Server compiled with....
-D APACHE_MPM_DIR="server/mpm/prefork"
-D APR_HAS_SENDFILE
-D APR_HAS_MMAP
-D APR_HAVE_IPV6 (IPv4-mapped addresses enabled)
-D APR_USE_SYSVSEM_SERIALIZE
-D APR_USE_PTHREAD_SERIALIZE
-D SINGLE_LISTEN_UNSERIALIZED_ACCEPT
-D APR_HAS_OTHER_CHILD
-D AP_HAVE_RELIABLE_PIPED_LOGS
-D DYNAMIC_MODULE_LIMIT=128
-D HTTPD_ROOT="/usr/local/apache-2.2.3"
-D SUEXEC_BIN="/usr/local/apache-2.2.3/bin/suexec"
-D DEFAULT_PIDLOG="logs/httpd.pid"
-D DEFAULT_SCOREBOARD="logs/apache_runtime_status"
-D DEFAULT_LOCKFILE="logs/accept.lock"
-D DEFAULT_ERRORLOG="logs/error_log"
-D AP_TYPES_CONFIG_FILE="conf/mime.types"
-D SERVER_CONFIG_FILE="conf/httpd.conf"

Apache Module Info:

# httpd -l
Compiled in modules:
   core.c
   mod_authn_file.c
   mod_authn_default.c
   mod_authz_host.c
   mod_authz_groupfile.c
   mod_authz_user.c
   mod_authz_default.c
   mod_auth_basic.c
   mod_include.c
   mod_filter.c
   mod_log_config.c
   mod_env.c
   mod_headers.c
   mod_setenvif.c
   mod_proxy.c
   mod_proxy_connect.c
   mod_proxy_ftp.c
   mod_proxy_http.c
   mod_proxy_ajp.c
   mod_proxy_balancer.c
   mod_ssl.c
   prefork.c
   http_core.c
   mod_mime.c
   mod_status.c
   mod_autoindex.c
   mod_asis.c
   mod_cgi.c
   mod_negotiation.c
   mod_dir.c
   mod_actions.c
   mod_userdir.c
   mod_alias.c
   mod_rewrite.c
   mod_so.c


Restarting using capistrano with a few hanging mongrel processes:


 > cap restart
     loading configuration /usr/local/lib/ruby/gems/1.8/gems/ 
capistrano-1.1.0/lib/capistrano/recipes/standard.rb
     loading configuration ./config/deploy.rb
     loading configuration #<Proc:0x003566d0@/usr/local/lib/ruby/gems/ 
1.8/gems/mongrel_cluster-0.2.0/lib/mongrel_cluster/recipes.rb:1>
   * executing task restart
   * executing task restart_mongrel_cluster
   * executing task stop_mongrel_cluster
   * executing "mongrel_rails cluster::stop -C /data/apps/app1/ 
current/config/mongrel_cluster.yml"
     servers: ["wp01.my-secret-domain.com"]
     [wp01.my-secret-domain.com] executing command
** [out :: wp01.my-secret-domain.com] Stopping 10 Mongrel servers...
     command finished
   * executing task start_mongrel_cluster
   * executing "mongrel_rails cluster::start -C /data/apps/app1/ 
current/config/mongrel_cluster.yml"
     servers: ["wp01.my-secret-domain.com"]
     [wp01.my-secret-domain.com] executing command
** [out :: wp01.my-secret-domain.com] Starting 10 Mongrel servers...
** [out :: wp01.my-secret-domain.com] ** !!! PID file log/mongrel. 
5000.pid already exists.  Mongrel could be running already. Check  
your log/mongrel.log for errors.
** [out :: wp01.my-secret-domain.com] ** !!! PID file log/mongrel. 
5001.pid already exists.  Mongrel could be running already. Check  
your log/mongrel.log for errors.
** [out :: wp01.my-secret-domain.com] **
** [out :: wp01.my-secret-domain.com] !!! PID file log/mongrel. 
5004.pid already exists.  Mongrel could be running alrea
** [out :: wp01.my-secret-domain.com] dy. Check your log/mongrel.log  
for errors.
** [out :: wp01.my-secret-domain.com] **
** [out :: wp01.my-secret-domain.com] !!! PID file log/mongrel. 
5005.pid already exists.  Mongrel could be running alrea
** [out :: wp01.my-secret-domain.com] dy. Check your log/mongrel.log  
for errors.
** [out :: wp01.my-secret-domain.com] **
** [out :: wp01.my-secret-domain.com] !!! PID file log/mongrel. 
5007.pid already exists.  Mongrel could be running alrea
** [out :: wp01.my-secret-domain.com] dy. Check your log/mongrel.log  
for errors.
     command finished

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://rubyforge.org/pipermail/mongrel-users/attachments/20060830/1d9d33a8/attachment-0001.html 


More information about the Mongrel-users mailing list