[Mongrel] Mongrel woes fixed

Zed A. Shaw zedshaw at zedshaw.com
Sun Oct 1 19:39:07 EDT 2006


On Sun, 1 Oct 2006 21:38:12 +0200
Jacob Atzen <jacob at jacobatzen.dk> wrote:

> Hello all,
> 
> For the past couple of weeks I have been spending some time debugging a
> couple of issues I was having with Mongrel when I put load on it. I have
> seen two distinct issues:
> 
> 1. Mongrel stopped responding as if in an endless loop.
> 2. Mongrel crashed when severely loaded.
> 

Cool, glad you took the time to figure this out.

> I believe to have resolved these two issues and have attached patches
> which shows the resolution (simple as it is). Explanation of the patches
> is given below.
> 
> The first problem is handled by the patch to sync.rb from the standard
> library. What is happening here is that when sync_unlock is called
> Thread.critical is set to true. Now if the thread is not the
> sync_ex_locker an exception is thrown without Thread.critical being set
> to false. This in turn resulted in a situation where the
> mongrel_sleeper_thread (configurator.rb:270) was the only thread getting
> back on the cpu and Thread.critical stayed true. The patch simply
> ensures that Thread.critical is set to false upon leaving sync.rb.
> 

Ok, is there a way to fix this without having people backpatch their ruby?  Also, why were you the only one having this problem?  I'd like to know how the error is caused if you could explain it.

> I am not sure if this is really the correct way to handle this issue
> though. As some famous programmers have been known to say "select()
> ain't broken" so I'm not really sure what to think of this.
>

Interesting side note is that when you mix threads and select in ruby then the interpreter can randomly decide it's in deadlock without much explanation.  Damned if you do, also if you don't.
 
> The second problem stems from the fact that Mongrel uses the
> Thread#abort_on_exception. I'm not sure why this is even in there, as
> the documentation says:
> 
>         When set to true, causes all threads (including the main
>         program) to abort if an exception is raised in thr. The process
>         will effectively exit(0).
> 
> The patch simply removes the abort_on_exception from mongrel.rb. After
> applying this patch I have been unable to make Mongrel crash.
> 

The abort was put in there to catch exceptions that are "leaking" through and not being caught.  I'm curious what exception was being thrown that *none* of the damn begin/rescue blocks catches.  We seriously need a begin/rescue-every-damn-thing-no-matter-what construct that actually works.

Now, I believe the problem you'll have is when this exception leaks through that thread will become dead and eventually you'll fill up and Mongrel will die anyway.  If you can, try to find out what exception causes this so I can have the server kill off its threads properly by handling this yet another annoying random exception.

> Finally I have provided a debug patch for the Sync library which simply
> adds a lot of debug output to STDERR. I believe it might be of use in
> future performance optimizations as there seems to be happening a lot of
> work managing the queued up clients.

This might not be needed as the ruby-core guys finally started taking a serious look at how array works and we can probably switch back to Mutex in the near future.

Thanks again Jacob, if you can answer the questions I had so I can work on a fix that doesn't involve updating ruby.  Also an explanation as to why you were having these problems will help people decide if they should apply the patches too.

-- 
Zed A. Shaw, MUDCRAP-CE Master Black Belt Sifu
http://www.zedshaw.com/
http://mongrel.rubyforge.org/
http://www.lingr.com/room/3yXhqKbfPy8 -- Come get help.


More information about the Mongrel-users mailing list