[Mongrel] random cpu spikes, EBADF errors
public at misuse.org
Mon Oct 29 22:34:40 EDT 2007
You are doing a really great job supporting so many people on this list
- thank you! I'm learning a lot just listening.
I've been considering that what you're asking below (and what Robert
Mela has been pushing everyone on) is essentially identifying that
there might need to be at least two modes to approaching Mongrel
1) Mongrel queues all requests (current model)
2) Load balancer / webserver (or even IP stack) queues most requests
I think Mongrel right now is designed solely for the case where Mongrel
is supposed to queue all requests? Robert Mela seems to want an
environment where Mongrel queues only some or no requests b/c he seems
to have a way to get Apache + mod_proxy_magic_bullet to queue and
re-try failed requests from mongrels.
I wonder if it makes sense to create a mode for Mongrel where it queues
only a few (or no) requests and the load balancer or webserver (or even
ip stack) is designed to queue the majority of backlogged requests?
Could this be a user-configurable setting (:queue_length => 2.requests
In the event that the queue length grows bigger than this limit,
Mongrel responds with a 503. If the calling agent understands 503, it
be able to try other mongrels in the cluster until it finds one that is
free. If they are all busy it would just keep knocking on doors until
one frees up.
This approach could make things worse in extreme load environments b/c
now you have backed up mongrels and a pile of re-requests hammering on
the door to all the mongrels as well. But that's a worst case scenario
(e.g. slashdotting) that is going to break SOMETHING SOMEWHERE anyway.
So why not have it melt-down at the interface between the webservers
and the mongrel cluster instead of inside the mongrels (what's the
The benefit of this alternate mode of operation would be that free
mongrels get called more often and overloaded mongrels get skipped more
often, which creates a much smoother user experience on the front-end,
generally speaking (this approach improve performance of moderately
loaded websites at the expense of punishing heavily loaded ones - who
should probably add more mongrels/hardware anyway).
The only changes to Mongrel code would be to allow a configurable queue
length on a per-mongrel basis (maybe already in there?) and a setting
to cause Mongrels to accept and return 503 instead of accepting and
closing the connection? Defaults would remain the same as they now..
Would such a dual mode of operation for mongrels make sense for some
users or am I just completely barking up the wrong tree here? Apologies
if this is a distraction from the real issue you are discussing.
At 06:27 PM 10/29/2007, you wrote:
>Date: Mon, 29 Oct 2007 20:02:32 -0400
>From: "Evan Weaver" <evan at cloudbur.st>
>Subject: Re: [Mongrel] random cpu spikes, EBADF errors
>To: mongrel-users at rubyforge.org
> <b6f68fc60710291702x604374c4xaa27af4920dd2de7 at mail.gmail.com>
>Content-Type: text/plain; charset=ISO-8859-1
>It's a Mongrel-configured limit to avoid queuing an impossibly long
>number of requests in an overloaded situation. So we can return
>whatever we want.
>I think the issue might be, if you can only handle 500 requests p/s,
>and you are getting 600, if Mongrel closes the connection, at least
>those 500 will get served, but if Mongrel returns 503, the web server
>will say "hey, error" and try on the next mongrel, which won't help
>clear the request queue. The requests will still queue, just at a
>higher level, and noone will end up getting a request served in a sane
>amount of time.
>On Oct 29, 2007 7:55 PM, Will Green <will at hotgazpacho.com> wrote:
> > Evan, I hear you! I know you have the best interests of Mongrel in
> > X-SendFile is just a header, right? If so, yeah, it could be moved
> to core.
> > If we're talking the Ruby Sendfile, then I think that should NOT be
> in core. I recall many people
> > having issues (i.e. it doesn't work) with that.
> > Regarding the closing of the socket without notice, is that
> something that Ruby does, or is it that
> > a resource limit was reached, and this handle was chosen by the OS
> to be closed? If the form, a HTTP
> > 503 response is appropriate. If the latter, seems to me that
> another Mongrel should be employed in a
> > cluster configuration, or the app code examined to see if it might
> be the source of the problem.
More information about the Mongrel-users