murdering high-memory workers and auto-scaling

Eric Wong normalperson at
Fri Mar 2 01:07:52 UTC 2012

Ben Somers <somers.ben at> wrote:
> Two ideas, one more controversial than the other.

Neither is really controversal.

> First: auto-killing bloated workers. My current app has some memory
> leakage that wasn't really visible on our older passenger setup, since
> the auto-scaling meant that bloated workers got killed periodically.
> In a perfect world, we'd find and patch all of the leaks, but in the
> meantime (and as a safety net) I'd like to get the bloated workers
> auto-killed. It looks like it'd be simple to add in a bloated-worker
> check at the same point when we check for timeout violations, and it
> could be hidden behind a config setting. Alternately, I could write
> this in a separate script.
> Pros: might be a useful built-in feature, looks easy to implement the killing
> Cons: Getting the memory usage might actually be surprisingly
> difficult. Comparing to passenger's memory management code, where they
> actually use platform-specific system calls, and we might get a
> sizeable quantity of code that we don't want dirtying up the unicorn
> internals. Also, some methods of checking appear to have performance
> risks.

You can try something like the following middleware (totally untested,
but I've done similar things here and there).  I don't know about
non-Linux, but I suspect /proc/#$$/* is likely to have something

class MemCheckLinux <
  def call(env)
    # a faster, but less-readable version may use /proc/#$$/stat or
    # /proc/#$$/statm but those aren't as human-friendly as
    # /proc/#$$/status
    if /VmRSS:\s+(\d+)\s/ =~"/proc/#$$/status")
      # gracefully kill ourselves if we exceed ~100M
      Process.kill(:QUIT, $$) if $1.to_i > 100_000

use MemCheckLinux

Sadly, setrlimit(:RLIMIT_AS) only causes SIGSEGV to get raised,
and Ruby controls that signal for itself.  I sometimes use
setrlimit(:RLIMIT_CPU) + trap(:XCPU) to kill runaway processes.

Apache has long had a similar parameter where you could just
tell a worker to gracefully die after X number of requests.  That'd
also be trivial to implement with middlware using SIGQUIT.

> Second: in my use case, I have webservers running as VMs, sharing a
> physical box with backend utility servers. The util servers run lots
> of very CPU- and memory-hungry jobs, mostly at night; the webservers
> handle requests, mostly in the daytime. Currently, most of these
> webservers are running passenger, which is very polite about not using
> more resources than it needs to handle requests. Unicorn, by contrast
> (and by design) is very resource-greedy, what with the "scale to what
> you can theoretically handle" strategy. If I spin down my number of
> unicorn workers when they're not needed, I free up resources for my
> util servers, which is important. TTOU and TTIN signals give me a
> (very nice) means to write an auto-scaling module outside of unicorn,
> but it might be nice to have it included as an optional component. (I
> expect this will get voted down, as I expect the dev team is not
> interested in it).

If you can make an effort to support it when it breaks, I wouldn't mind
including a script in the examples/ section or as an optional module.

It's definitely not going to ever be the default.  Auto-scaling is hard
(if not impossible) to get right.  In my experience, it always get
things wrong by default or gets configured wrong, making things more
difficult to fix.

Dedicated servers will always be the primary target of unicorn.
(And unicorn of course only scales to server + backend resources,
 nginx handles scaling to client connections :)

> Happy to work on implementing these myself, just wanted to poll to see
> if it'd be worth developing them as part of unicorn proper rather than
> standalone scripts.

If you're willing to help support users of these scripts/modules,
I'd have no reservations about distributing them along with unicorn.

More information about the mongrel-unicorn mailing list