[Mongrel] random cpu spikes, EBADF errors

Steve Midgley public at misuse.org
Mon Oct 29 22:34:40 EDT 2007


Hi Evan,

You are doing a really great job supporting so many people on this list 
- thank you! I'm learning a lot just listening.

I've been considering that what you're asking below (and what Robert 
Mela has been pushing everyone on) is essentially identifying that 
there might need to be at least two modes to approaching Mongrel 
queuing:

1) Mongrel queues all requests (current model)
2) Load balancer / webserver (or even IP stack) queues most requests

I think Mongrel right now is designed solely for the case where Mongrel 
is supposed to queue all requests? Robert Mela seems to want an 
environment where Mongrel queues only some or no requests b/c he seems 
to have a way to get Apache + mod_proxy_magic_bullet to queue and 
re-try failed requests from mongrels.

I wonder if it makes sense to create a mode for Mongrel where it queues 
only a few (or no) requests and the load balancer or webserver (or even 
ip stack) is designed to queue the majority of backlogged requests? 
Could this be a user-configurable setting (:queue_length => 2.requests 
or whatever)?

In the event that the queue length grows bigger than this limit, 
Mongrel responds with a 503. If the calling agent understands 503, it 
be able to try other mongrels in the cluster until it finds one that is 
free. If they are all busy it would just keep knocking on doors until 
one frees up.

This approach could make things worse in extreme load environments b/c 
now you have backed up mongrels and a pile of re-requests hammering on 
the door to all the mongrels as well. But that's a worst case scenario 
(e.g. slashdotting) that is going to break SOMETHING SOMEWHERE anyway. 
So why not have it melt-down at the interface between the webservers 
and the mongrel cluster instead of inside the mongrels (what's the 
difference)?

The benefit of this alternate mode of operation would be that free 
mongrels get called more often and overloaded mongrels get skipped more 
often, which creates a much smoother user experience on the front-end, 
generally speaking (this approach improve performance of moderately 
loaded websites at the expense of punishing heavily loaded ones - who 
should probably add more mongrels/hardware anyway).

The only changes to Mongrel code would be to allow a configurable queue 
length on a per-mongrel basis (maybe already in there?) and a setting 
to cause Mongrels to accept and return 503 instead of accepting and 
closing the connection? Defaults would remain the same as they now..

Would such a dual mode of operation for mongrels make sense for some 
users or am I just completely barking up the wrong tree here? Apologies 
if this is a distraction from the real issue you are discussing.

Best,

Steve

At 06:27 PM 10/29/2007, you wrote:
>Date: Mon, 29 Oct 2007 20:02:32 -0400
>From: "Evan Weaver" <evan at cloudbur.st>
>Subject: Re: [Mongrel] random cpu spikes, EBADF errors
>To: mongrel-users at rubyforge.org
>Message-ID:
>         <b6f68fc60710291702x604374c4xaa27af4920dd2de7 at mail.gmail.com>
>Content-Type: text/plain; charset=ISO-8859-1
>
>It's a Mongrel-configured limit to avoid queuing an impossibly long
>number of requests in an overloaded situation. So we can return
>whatever we want.
>
>I think the issue might be, if you can only handle 500 requests p/s,
>and you are getting 600, if Mongrel closes the connection, at least
>those 500 will get served, but if Mongrel returns 503, the web server
>will say "hey, error" and try on the next mongrel, which won't help
>clear the request queue. The requests will still queue, just at a
>higher level, and noone will end up getting a request served in a sane
>amount of time.
>
>Evan

>On Oct 29, 2007 7:55 PM, Will Green <will at hotgazpacho.com> wrote:
> > Evan, I hear you! I know you have the best interests of Mongrel in 
> mind.
> >
> > X-SendFile is just a header, right?  If so, yeah, it could be moved 
> to core.
> >
> > If we're talking the Ruby Sendfile, then I think that should NOT be 
> in core. I recall many people
> > having issues (i.e. it doesn't work) with that.
> >
> > Regarding the closing of the socket without notice, is that 
> something that Ruby does, or is it that
> > a resource limit was reached, and this handle was chosen by the OS 
> to be closed? If the form, a HTTP
> > 503 response is appropriate. If the latter, seems to me that 
> another Mongrel should be employed in a
> > cluster configuration, or the app code examined to see if it might 
> be the source of the problem.
> >



More information about the Mongrel-users mailing list