[Mongrel] Design flaw? - num_processors, accept/close

Brian Williams bwillenator at gmail.com
Mon Oct 15 19:43:34 EDT 2007


We recently ran into exactly this issue.  Some rails requests were making
external requests that were taking 5 minutes (networking issues out of our
control).  If your request got queued behind one of these stuck mongrels,
the experience was terrible.  I experimented with adjusting the
mod_proxy_balance settings to try to get it to fail over to the next mongrel
(I had hoped that the min,max,smax could all be set to one, and force only
one connection to a mongrel at a time) but this didn't seem to work.

Solution - I stuck lighttpd in between.  Lighttp has a proxying algorithm
that does exactly this - round robin but to worker with lightest load.

I'd love to hear that there's a way to use mod_proxy_balancer, but I
couldn't get to work.

--Brian

On 10/15/07, Evan Weaver <evan at cloudbur.st> wrote:
>
> Ah, no, they are only marked operational until the retry timeout is
> elapsed. So I guess if you had extremely small timeouts in Apache and
> Mongrel both it would work ok.
>
> Someone else respond, because clearly I don't know what I'm talking about.
> :)
>
> Evan
>
> On 10/15/07, Evan Weaver <evan at cloudbur.st> wrote:
> > Oh, I misunderstood your code.
> >
> > I don't think mod_proxy_balancer gracefully moves on so perhaps you
> > are right. On the other hand, I thought when a worker timed out it got
> > removed from the pool permanently. I can't seem to verify that one way
> > or the other in the Apache docs, though.
> >
> > Evan
> >
> > On 10/15/07, Robert Mela <rob at robmela.com> wrote:
> > > But it is precisely because of mod_proxy_balancer's round-robin
> > > algorithm that I think the fix *would* work.  If we give
> > > mod_proxy_balancer the option of timing out on connect, it will
> iterate
> > > to the next mongrel instance in the pool.
> > >
> > > Of course, I should look at Evented Mongrel, and swiftiply.
> > >
> > > But still, my original question remains.  I think that Mongrel would
> > > play much more nicely with mod_proxy_balancer out-of-the-box if it
> > > refused to call accept()  call accept until worker_list.length has
> been
> > > reduced.   I personally prefer that to request queuing and certainly
> to
> > > "accept then drop without warning".
> > >
> > > The wildcard, of course, is what mod_proxy_balancer does in the drop
> > > without warning case -- if it gracefully moves on to the next Mongrel
> > > server in its balancer pool, then all is well, and I'm making a fuss
> > > about nothing.
> > >
> > > Here's an armchair scenario to better illustrate why I think a fix
> would
> > > work.  Again, I need to test to insure that mod_proxy_balancer doesn't
> > > currently handle the situation gracefully --
> > >
> > > Consider:
> > >
> > > - A pool of 10 mongrels behind mod_proxy_balancer.
> > > - One mongrel, say #5,  gets a request that takes one minute to run (
> > > e.g., complex report )
> > > - System as a whole gets 10 processing requests per second
> > >
> > > What happens (I think) with the current code and mod_proxy_balancer
> > >
> > >  - Mongrel instance #5 will continue receiving a new request every
> second.
> > >  - Over the one minute period, 10% of requests will either be
> > >      -  queued and unnecessarily delayed (num_processors > 60 )
> > >      - be picked up and dropped without warning ( num_processors == 1
> )
> > >
> > > What should happen if mongrel does not invoke "accept" when all
> workers
> > > are busy:
> > >
> > >  - Mongrel instance #5 will continue getting new *connection requests*
> > > every second
> > >  - mod_proxy_balancer connect() will time out
> > >  - mod_proxy_balancer will continue cycling through the pool till it
> > > finds an available Mongrel instance
> > >
> > >
> > > Again, if all is well under the current scenario -- Apache
> > > mod_proxy_balancer gracefully moves on to another Mongrel instance
> after
> > > the accept/drop, then I've just made a big fuss over a really dumb
> > > question...
> > >
> > >
> > > Evan Weaver wrote:
> > > > Mod_proxy_balancer is just a weighted round-robin, and doesn't
> > > > consider actual worker load, so I don't think this will help you.
> Have
> > > > you looked at Evented Mongrel?
> > > >
> > > > Evan
> > > >
> > > > On 10/15/07, Robert Mela <rob at robmela.com> wrote:
> > > >
> > > >> Rails instances themselves are almost always single-threaded,
> whereas
> > > >> Mongrel, and it's acceptor, are multithreaded.
> > > >>
> > > >> In a situation with long-running Rails pages this presents a
> problem for
> > > >> mod_proxy_balancer.
> > > >>
> > > >> If num_processors is greater than 1 ( default: 950 ), then Mongrel
> will
> > > >> gladly accept incoming requests and queue them if its rails
> instance is
> > > >> currently busy.    So even though there are non-busy mongrel
> instances,
> > > >> a busy one can accept a new request and queue it behind a
> long-running
> > > >> request.
> > > >>
> > > >> I tried setting num_processors to 1.   But it looks like this is
> less
> > > >> than ideal -- I need to dig into mod_proxy_balancer to be
> sure.  But at
> > > >> first glance, it appears this replaces queuing problem with a proxy
> > > >> error.   That's because Mongrel still accepts the incoming request
> --
> > > >> only to close the new socket immediately if Rails is busy.
> > > >>
> > > >> Once again, I do need to set up a test and see exactly how
> > > >> mod_proxy_balancer handles this... but...
> > > >>
> > > >> If I understand the problem correctly, then one solution might be
> moving
> > > >> lines 721 thru 734 into a loop, possibly in its own method, which
> does
> > > >> sth like this:
> > > >>
> > > >> def myaccept
> > > >>    while true
> > > >>       return @socket.accept if worker_list.length <
> num_processors  ##
> > > >> check first to see if we can handle the request.  Let client worry
> about
> > > >> connect timeouts.
> > > >>       while @num_processors < reap_dead_workers
> > > >>         sleep @loop_throttle
> > > >>      end
> > > >>    end
> > > >> end
> > > >>
> > > >>
> > > >>
> > > >>     720       @acceptor = Thread.new do
> > > >>     721         while true
> > > >>     722           begin
> > > >>  *   723             client = @socket.accept
> > > >>  *   724
> > > >>     725             if $tcp_cork_opts
> > > >>     726               client.setsockopt(*$tcp_cork_opts) rescue nil
> > > >>     727             end
> > > >>     728
> > > >>     729             worker_list = @workers.list
> > > >>     730
> > > >>     731             if worker_list.length >= @num_processors
> > > >>     732               STDERR.puts "Server overloaded with
> > > >> #{worker_list.length} processors (#@num_processors max).
> > > >> Dropping connection."
> > > >>  *   733               client.close rescue Object*
> > > >>     734               reap_dead_workers("max processors")
> > > >>     735             else
> > > >>     736               thread = Thread.new(client) {|c|
> process_client(c) }
> > > >>     737               thread[:started_on] = Time.now
> > > >>     738               @workers.add(thread)
> > > >>     739
> > > >>     740               sleep @timeout/100 if @timeout > 0
> > > >>     741             end
> > > >>
> > > >>
> > > >> _______________________________________________
> > > >> Mongrel-users mailing list
> > > >> Mongrel-users at rubyforge.org
> > > >> http://rubyforge.org/mailman/listinfo/mongrel-users
> > > >>
> > > >>
> > > >>
> > > >
> > > >
> > > >
> > >
> > >
> > >
> >
> >
> > --
> > Evan Weaver
> > Cloudburst, LLC
> >
>
>
> --
> Evan Weaver
> Cloudburst, LLC
> _______________________________________________
> Mongrel-users mailing list
> Mongrel-users at rubyforge.org
> http://rubyforge.org/mailman/listinfo/mongrel-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://rubyforge.org/pipermail/mongrel-users/attachments/20071015/9bfd13cd/attachment.html 


More information about the Mongrel-users mailing list