[Mongrel] Mongrel woes fixed

Jacob Atzen jacob at jacobatzen.dk
Tue Oct 3 17:13:26 EDT 2006

On Sun, Oct 01, 2006 at 04:39:07PM -0700, Zed A. Shaw wrote:
> > The second problem stems from the fact that Mongrel uses the
> > Thread#abort_on_exception. I'm not sure why this is even in there, as
> > the documentation says:
> > 
> >         When set to true, causes all threads (including the main
> >         program) to abort if an exception is raised in thr. The process
> >         will effectively exit(0).
> > 
> > The patch simply removes the abort_on_exception from mongrel.rb. After
> > applying this patch I have been unable to make Mongrel crash.
> > 
> The abort was put in there to catch exceptions that are "leaking"
> through and not being caught.  I'm curious what exception was being
> thrown that *none* of the damn begin/rescue blocks catches.  We
> seriously need a begin/rescue-every-damn-thing-no-matter-what
> construct that actually works.

Some further digging gives another pointer towards the problem. I am
seeing some backtraces that look like this (beware, the line numbers
might be a little off):

/usr/local/lib/ruby/gems/1.8/gems/mongrel- `run': Mongrel timed out this thread: max processors#<Thread:0x8f28d9c run> (Mongrel::TimeoutError)
        from /usr/local/lib/ruby/gems/1.8/gems/mongrel- `run'
        from /usr/local/lib/ruby/gems/1.8/gems/mongrel- `run'
        from /usr/local/lib/ruby/gems/1.8/gems/mongrel- `run'
        from /usr/local/lib/ruby/gems/1.8/gems/mongrel- `run'
        from /usr/local/lib/ruby/gems/1.8/gems/mongrel- `run'
        from /usr/local/lib/ruby/gems/1.8/gems/mongrel- `run'
        from /usr/local/lib/ruby/gems/1.8/gems/mongrel-
        from /usr/local/bin/mongrel_rails:18

The mongrel.rb:646 is the w.raise() call and 704 is:

        thread = Thread.new(client) { |c|

What I make of this is that somehow the thread is put aside as soon as
it's created, that is _before_ the process_client() call and when the
thread is then killed off there's nothing to catch the exception.

Now I am seeing exceptions being cast at line 646 before this one, but
they have different backtraces, typically involving sync.rb and seems to
be handled more properly. As far as I can tell they are all handled by
the "Error calling Dispatcher.dispatch" rescue block.

Another interesting note is that after an exception is cast at this
exact point and with the above backtrace all processes go into the
aborting state. And when they're all killed off Mongrel dies. It's
actually quite clear as the following log exerpt will show:

% grep -ni timeout log/mongrel.14.log
127011:Tue Oct 03 21:57:28 CEST 2006: Error calling Dispatcher.dispatch #<Mongrel::TimeoutError: Mongrel timed out this thread: max processors#<Thread:0x8fe760c run>> from #<Thread:0x8fe760c run>
203806:/usr/local/lib/ruby/gems/1.8/gems/mongrel- `run': Mongrel timed out this thread: max processors#<Thread:0x8f28d9c run> (Mongrel::TimeoutError)

% grep -n "aborting" log/mongrel.14.log | head -n 3
203815:[sync_synchronize] Unlocking #<Thread:0x8f3a72c aborting>
203816:[sync_unlock:1] Thread.critical = true #<Thread:0x8f3a72c aborting>
203817:[sync_unlock:4] Thread.critical = false #<Thread:0x8f3a72c aborting>

I have tried adding a begin/rescue/end around the process_client() call
like this:

    rescue Object

And it seems to be doing the job. At least I haven't been able to make
Mongrel crash yet, with whatever little testing I've been doing. Yet I
have seen the error message show up a couple of times.

What are your thoughts on this? Does it make sense?

> Now, I believe the problem you'll have is when this exception leaks
> through that thread will become dead and eventually you'll fill up and
> Mongrel will die anyway.  If you can, try to find out what exception
> causes this so I can have the server kill off its threads properly by
> handling this yet another annoying random exception.

Actually from what I've seen so far, the threads seems to be killed of
properly, if somewhat slow. I'm printing the process list in every run
of the sleeper thread, and from what I can see Mongrel will clean out
the threads even when the exception leaks through. From my understanding
when an exception gets to the top level of the thread, the thread is
simply killed off. Feel free to correct me if I'm wrong.

It does seem that some threads keep lingering in Mongrel even after all
requests have died. But this seems to be the case whether the abort flag
is set or not. It looks as if some enter an eternal sleeping state. I
don't think it's much of a problem stability wise though, as they'll
just get killed of whenever the reaper gets to work.

> This might not be needed as the ruby-core guys finally started taking
> a serious look at how array works and we can probably switch back to
> Mutex in the near future.

I have no idea how Mutex handles locking, but sync seems to be doing a
whole lot of work over and over again. It seems to me it's really meant
for smaller locking queues.

> Thanks again Jacob, if you can answer the questions I had so I can
> work on a fix that doesn't involve updating ruby.  Also an explanation
> as to why you were having these problems will help people decide if
> they should apply the patches too.

I'm still not sure why I'm the only one seeing these problems. Maybe
others are seing them too and just not being aware of them. Maybe they
only show up when the Mongrels gets severely loaded. Maybe I'm simply
the only one butchering poor Mongrels for fun in my spare-time? ;-)

In regards to the sync issues the jury is still out on that one. I will
need to get a deeper understanding of how sync works to get to the
bottom of it, but I'll continue poking around.

- Jacob Atzen

More information about the Mongrel-users mailing list