[Mongrel] scaling unicorn

Eric Wong normalperson at yhbt.net
Mon Jun 21 20:16:32 EDT 2010

snacktime <snacktime at gmail.com> wrote:
> Interested in some feeback on this (does it sound right?), or maybe
> this might be of interest to others.

Hi Chris,

I think you meant to post this to the mongrel-unicorn at rubyforge.org
list, not mongrel-users at rubyforge.org :>

> We are launching a new facebook app in a couple weeks and we did some
> load testing over the weekend on our unicorn web cluster.  The servers
> are 8 way xeon's with 24gb ram.  Our app ended up being primarily cpu
> bound.  So far the sweet spot for the number of unicorns seems to be
> around 40.  This seemed to yield the most requests per second without
> overloading the server or hitting memory bandwidth issues.  The
> backlog is at the somaxconn default of 128, I'm still not sure if we
> will bump that up or not.

The default backlog we try to specify is actually 1024 (same as
Mongrel).  But it's always a murky value anyways, as it's
kernel/sysctl-dependent.  With Unix domain sockets, some folks use
crazy values like 2048 to look better on synthetic benchmarks :)

> Increasing the number of unicorns beyond a
> certain point resulted in a noticable drop in the requests per second
> the server could handle.   I'm pretty sure the cause is the box
> running out of memory bandwidth.  The load average and resource usage
> in general (except for memory) would keep going down but so did the
> requests per second.  At 80 unicorns the requests per second dropped
> by more then half.  I'm going to disable hyperthreading and rerun some
> of the tests to see what impact that has.

That's "8 way xeon" _before_ hyperthreading, right?  Which family of
Xeons are you using, the Pentium4-based crap or the awesome new ones?

How much memory is each Unicorn worker using for your app?

40 workers for 8 physical cores sounds reasonable.  Depending on the
app, I think the reasonable range is anywhere from 2-8 workers per
physical core.  More if you're (unfortunately) limited by external
network calls, but since you claim to be CPU bound, less.

Do you have actual performance numbers you're able to share?
Mean/median request times/rates would be very useful.  If your requests
run very quickly, you may be limited by contention with the accept()
syscall on the listen socket, too.

I assume you're using nginx as the proxy, is this with Unix domain
sockets or TCP sockets?  Unix domain sockets should give a small
performance over TCP if it's all on the same box.

With TCP, you should also check to see you have enough local ports
available if you're hitting extremely high (and probably unrealistic :)
request rates.

Eric Wong

More information about the Mongrel-users mailing list