[Mongrel] Problems with mongrel dying

Robert Vogel robert at kitchendemocracy.org
Thu Nov 2 21:13:54 EST 2006


Hello  -

I need help.  For the past 11 days, one of the two mongrel processes on
my railsmachine VPS has been crashing intermittently - it has crashed
about 10 times, with increasing frequency in the past few days. 
Unfortunately, after many, many hours I still have not been able to
reproduce this problem in a controlled way - neither on my production
railsmachine nor on my development machine.  As far as I can tell, I
have followed these suggestions from Bradley and Zed and Zed's Mongrel
book:
 lsop -i -P | grep CLOSE_WAIT shows nothing

99% CPU is not associated with either mongrel process - the CPU is never
above 5%, usually at 0%, both while the process is crashed and while
they are running

Memory leak seems impossible.  %MEM for both processes never above 15%
both when crashed and when running.

Dash Bee logging on my development machine shows that no object is
steadily increasing its memory consumption - garbage collection seems
to be working fine.

Dash Bee logging on my develoment machine also shows no leaking files. 
Number of open files is stable (at 6).


Traffic is miniscule (< 100 requests / hour); Inserting
ActiveRecord::Base.verification_timeout = 14400 
in environments/production.rb had no effect.  

Upgrading to pre-release mongrel had no effect:
sudo gem install mongrel --source=http://mongrel.rubyforge.org/releases

My application is butt simple, and supported by oodles of unit,
functional and integration test code. 

There is no transaction processing in the application and no opportunity
for a jammed request due to using shared resources without proper
locking.  The application does not use RMagik, does not explicitely
manipulate any files, and though I am not sure what you mean by 'shared
resources', I suspect I am not using any.  The only external libraries
(external to rails) are three gems which access geo-coding services -
but these were not in play when the processes crashed.

killall -USR1 mongrel_rails  has been in effect now through the last two
crashes.  The rails action which held things up was different in both
cases - and is butt simple in both cases.  Here is the mongrel.log in
the vicinity of those two crashes:

Thu Nov 02 13:07:16 PST 2006: 0 threads sync_waiting for /, 1 still
active in Mongrel.
Thu Nov 02 13:07:19 PST 2006: 0 threads sync_waiting for /login, 1 still
active in Mongrel.
Thu Nov 02 13:07:27 PST 2006: 0 threads sync_waiting for /login, 1 still
active in Mongrel.
Thu Nov 02 13:07:33 PST 2006: 0 threads sync_waiting for
/admin/list_vote, 1 still active in Mongrel.
Thu Nov 02 13:07:42 PST 2006: 0 threads sync_waiting for
/admin/mark_reviewed, 1 still active in Mongrel.
Thu Nov 02 13:08:17 PST 2006: 0 threads sync_waiting for
/admin/mark_reviewed, 1 still active in Mongrel.
Thu Nov 02 13:08:26 PST 2006: 0 threads sync_waiting for
/admin/mark_reviewed, 3 still active in Mongrel.
Thu Nov 02 13:08:37 PST 2006: 0 threads sync_waiting for
/admin/mark_reviewed, 3 still active in Mongrel.
Thu Nov 02 13:09:08 PST 2006: 1 threads sync_waiting for
/admin/mark_reviewed, 4 still active in Mongrel.
Thu Nov 02 13:09:35 PST 2006: Error calling Dispatcher.dispatch
#<Sync_m::Err::UnknownLocker: Thread(#<Thread:0xb7234be4 aborting>) not
locked.>
/usr/lib/ruby/1.8/sync.rb:57:in `Fail'
/usr/lib/ruby/1.8/sync.rb:63:in `Fail'


and

Thu Nov 02 00:05:29 PST 2006: 0 threads sync_waiting for
/berkeley/downzoning/comments, 1 still active in Mongrel.
Thu Nov 02 00:05:37 PST 2006: 0 threads sync_waiting for
/berkeley/downzoning/comments, 1 still active in Mongrel.
Thu Nov 02 00:06:11 PST 2006: 0 threads sync_waiting for
/berkeley/downzoning/comments, 1 still active in Mongrel.
Thu Nov 02 00:07:07 PST 2006: 0 threads sync_waiting for
/berkeley/downzoning/comments, 1 still active in Mongrel.
Thu Nov 02 00:07:27 PST 2006: 0 threads sync_waiting for /email_updates,
1 still active in Mongrel.
Thu Nov 02 00:07:27 PST 2006: 0 threads sync_waiting for
/email_updates_edit, 1 still active in Mongrel.
Thu Nov 02 00:07:53 PST 2006: 0 threads sync_waiting for
/berkeley/bus_rapid_transit/page/brtqanda, 1 still active in Mongrel.
Thu Nov 02 00:08:11 PST 2006: 0 threads sync_waiting for
/email_updates_edit, 2 still active in Mongrel.
Thu Nov 02 00:08:39 PST 2006: 0 threads sync_waiting for /robots.txt, 1
still active in Mongrel.
Thu Nov 02 00:08:39 PST 2006: 1 threads sync_waiting for
/email_updates_edit, 3 still active in Mongrel.
Thu Nov 02 00:09:50 PST 2006: 0 threads sync_waiting for
/howitworks.php, 1 still active in Mongrel.
Thu Nov 02 00:09:50 PST 2006: 3 threads sync_waiting for
/email_updates_edit, 5 still active in Mongrel.

So - as you can tell, I am a newbie at wits end, hoping you guys can
1) help me fix the problem, and
2) help me implement a temporary workaround so I can stop checking every
few hours to see if I need to cap -a restart_app (which so far, has
always worked...)

Thanks for your careful attention.

Cheers
Robert Vogel



> -------- Original Message --------
> Subject: Re: [Mongrel] Problems with mongrel dying
> From: "Zed A. Shaw" <zedshaw at zedshaw.com>
> Date: Tue, October 31, 2006 2:36 pm
> To: mongrel-users at rubyforge.org
> 
> On Tue, 31 Oct 2006 12:48:02 -0700
> Robert Vogel <robert at kitchendemocracy.org> wrote:
> 
> > Hi
> > 
> > One of the two mongrel processes has died in the middle of the night
> > four times in the past 9 days, and I need help debugging this.
> > 
> > Each time the symptoms are the same:
> 
> Really, quick, but upgrade to the pre-release and then tell me if you still get these:
> 
> sudo gem install mongrel --source=http://mongrel.rubyforge.org/releases
> 
> If it does not fix the problem (remember, it's random so let it run in production for a while), then turn on USR1 logging and watch for the rails action that is blocking things:
> 
> sudo killall -USR1 mongrel_rails
> 
> Otherwise, keep in mind that many many people use Mongrel without blocking problems, so you need to rule out anything non-standard you're using that can cause problems.  RMagick, frequent DNS calls, working with files or shared resources, are all main culprits.
> 
> -- 
> Zed A. Shaw, MUDCRAP-CE Master Black Belt Sifu
> http://www.zedshaw.com/
> http://safari.oreilly.com/0321483502 -- The Mongrel Book
> http://mongrel.rubyforge.org/
> http://www.lingr.com/room/3yXhqKbfPy8 -- Come get help.
> _______________________________________________
> Mongrel-users mailing list
> Mongrel-users at rubyforge.org
> http://rubyforge.org/mailman/listinfo/mongrel-users



More information about the Mongrel-users mailing list