[Mongrel] Making Mongrel play well with Monit

Vishnu Gopal g.vishnu at gmail.com
Fri Sep 1 10:24:49 EDT 2006


Hi!

I run a mongrel cluster with 6 mongrels in it. I want to monitor them
individually for process hangs (and then restart them) and this is the
solution I came up with:

Here's my configuration file for monit (/usr/local/etc/monitrc): [snipped
relevant bits]

------

#check lighttpd process
check process lighttpd with pidfile /var/run/lighttpd.pid
    start program = "/usr/local/etc/rc.d/lighttpd.sh start"
    stop program  = "/usr/local/etc/rc.d/lighttpd.sh stop"
    if totalmem > 200.0 MB for 5 cycles then restart
    group server

#check pound process
check process pound with pidfile /var/run/pound.pid
    start program = "/usr/local/etc/rc.d/pound.sh start"
    stop program  = "/usr/local/etc/rc.d/pound.sh stop"
    if totalmem > 400.0 MB for 5 cycles then restart
    if failed port 6000 protocol http
        with timeout 10 seconds
        then restart
    group server

#check mongrel processes

#6001
check process mongrel-6001 with pidfile
/home/xxx/sshare/app/log/mongrel.6001.pid
    start program = "/home/xxx/scripts/mongrel_rails_start 6001"
    stop program  = "/home/xxx/scripts/mongrel_rails_stop 6001"
    if totalmem > 50.0 MB for 5 cycles then restart
    if failed port 6001 protocol http
        with timeout 10 seconds
        then restart
    group mongrel

#6002
check process mongrel-6002 with pidfile
/home/xxx/sshare/app/log/mongrel.6002.pid
    start program = "/home/xxx/scripts/mongrel_rails_start 6002"
    stop program  = "/home/xxx/scripts/mongrel_rails_stop 6002"
    if totalmem > 50.0 MB for 5 cycles then restart
    if failed port 6002 protocol http
        with timeout 10 seconds
        then restart
    group mongrel

#6003
check process mongrel-6003 with pidfile
/home/xxx/sshare/app/log/mongrel.6003.pid
    start program = "/home/xxx/scripts/mongrel_rails_start 6003"
    stop program  = "/home/xxx/scripts/mongrel_rails_stop 6003"
    if totalmem > 50.0 MB for 5 cycles then restart
    if failed port 6003 protocol http
        with timeout 10 seconds
        then restart
    group mongrel

#6004
check process mongrel-6004 with pidfile
/home/xxx/sshare/app/log/mongrel.6004.pid
    start program = "/home/xxx/scripts/mongrel_rails_start 6004"
    stop program  = "/home/xxx/scripts/mongrel_rails_stop 6004"
    if totalmem > 50.0 MB for 5 cycles then restart
    if failed port 6004 protocol http
        with timeout 10 seconds
        then restart
    group mongrel

#6005
check process mongrel-6005 with pidfile
/home/xxx/sshare/app/log/mongrel.6005.pid
    start program = "/home/xxx/scripts/mongrel_rails_start 6005"
    stop program  = "/home/xxx/scripts/mongrel_rails_stop 6005"
    if totalmem > 50.0 MB for 5 cycles then restart
    if failed port 6005 protocol http
        with timeout 10 seconds
        then restart
    group mongrel

#6006
check process mongrel-6006 with pidfile
/home/xxx/sshare/app/log/mongrel.6006.pid
    start program = "/home/xxx/scripts/mongrel_rails_start 6006"
    stop program  = "/home/xxx/scripts/mongrel_rails_stop 6006"
    if totalmem > 50.0 MB for 5 cycles then restart
    if failed port 6006 protocol http
        with timeout 10 seconds
        then restart
    group mongrel

------

The mongrel_rails_start and mongrel_rails_stop are simple scripts like this:

mongrel_rails_start:

------

/usr/local/bin/mongrel_rails start -e production -d -p $1 -P
/home/xxx/sshare/app/log/mongrel.$1.pid -c /home/xxx/sshare/app

------

Right. When I do this from the command-line, i.e.
/home/xxx/scripts/mongrel_rails_start 6001, etc, it works fine. But when
monit tries to start or stop the service, it consistently says "execution
failed". I thought it might be the fact that the PATH variable wasn't
getting set, so I played around with that, but it's still borked. I tried
inserting the "/usr/local/bin/mongrel_rails start" with all the params as
the monit command there and that didn't work. I even tried "mongrel_rails
cluster::start -c /etc/mongrel_rails/sshare.conf" and that's not working
too... (note lighttpd and pound are getting restarted fine, so that's not
the issue...)

I'm aware this is a more 'monit question', but Zed had suggested on the list
sometime back that we do monitoring this way.

Is there something I'm doing wrong? Please help me out.

Vish
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://rubyforge.org/pipermail/mongrel-users/attachments/20060901/5cae1c0e/attachment-0001.html 


More information about the Mongrel-users mailing list