From somers.ben at gmail.com Thu Mar 1 23:52:07 2012 From: somers.ben at gmail.com (Ben Somers) Date: Thu, 1 Mar 2012 15:52:07 -0800 Subject: murdering high-memory workers and auto-scaling Message-ID: Two ideas, one more controversial than the other. First: auto-killing bloated workers.?My current app has some memory leakage that wasn't really visible on our older passenger setup, since the auto-scaling meant that bloated workers got killed periodically. In a perfect world, we'd find and patch all of the leaks, but in the meantime (and as a safety net) I'd like to get the bloated workers auto-killed. It looks like it'd be simple to add in a bloated-worker check at the same point when we check for timeout violations, and it could be hidden behind a config setting. Alternately, I could write this in a separate script. Pros: might be a useful built-in feature, looks easy to implement the killing Cons: Getting the memory usage might actually be surprisingly difficult. Comparing to passenger's memory management code, where they actually?use platform-specific system calls, and we might get a sizeable quantity of code that we don't want dirtying up the unicorn internals. Also, some methods of checking appear to have performance risks. Second: in my use case, I have webservers running as VMs, sharing a physical box with backend utility servers. The util servers run lots of very CPU- and memory-hungry jobs, mostly at night; the webservers handle requests, mostly in the daytime. Currently, most of these webservers are running passenger, which is very polite about not using more resources than it needs to handle requests. Unicorn, by contrast (and by design) is very resource-greedy, what with the "scale to what you can theoretically handle" strategy. If I spin down my number of unicorn workers when they're not needed, I free up resources for my util servers, which is important. TTOU and TTIN signals give me a (very nice) means to write an auto-scaling module outside of unicorn, but it might be nice to have it included as an optional component. (I expect this will get voted down, as I expect the dev team is not interested in it). Happy to work on implementing these myself, just wanted to poll to see if it'd be worth developing them as part of unicorn proper rather than standalone scripts. -ben From cliftonk at gmail.com Fri Mar 2 00:12:20 2012 From: cliftonk at gmail.com (Clifton King) Date: Thu, 1 Mar 2012 18:12:20 -0600 Subject: murdering high-memory workers and auto-scaling In-Reply-To: References: Message-ID: <7AADD66C-3140-4D06-8D1D-FE015C649BE5@gmail.com> We use the following at the bottom of a God config. I believe it's in an example somewhere. Bloated worker gets sent a QUIT and it will finish processing the request and exits gracefully. We never really use TTOU and TTIN since the # of workers used is basically determined by the ram in the machine in question. ```ruby # unicorn workers unicorn_worker_memory_limit = 220_000 Thread.new do loop do begin workers = `ps -e -www -o pid,rss,command | grep '[u]nicorn_rails worker'` workers.split("\n").each do |line| parts = line.split(' ') if parts[1].to_i > unicorn_worker_memory_limit # tell the worker to die after it finishes serving its request ::Process.kill('QUIT', parts[0].to_i) end end rescue Object # don't die ever once we've tested this nil end sleep 30 end end ``` Clifton On Mar 1, 2012, at 5:52 PM, Ben Somers wrote: > Two ideas, one more controversial than the other. > First: auto-killing bloated workers. My current app has some memory > leakage that wasn't really visible on our older passenger setup, since > the auto-scaling meant that bloated workers got killed periodically. > In a perfect world, we'd find and patch all of the leaks, but in the > meantime (and as a safety net) I'd like to get the bloated workers > auto-killed. It looks like it'd be simple to add in a bloated-worker > check at the same point when we check for timeout violations, and it > could be hidden behind a config setting. Alternately, I could write > this in a separate script. > > Pros: might be a useful built-in feature, looks easy to implement the killing > Cons: Getting the memory usage might actually be surprisingly > difficult. Comparing to passenger's memory management code, where they > actually use platform-specific system calls, and we might get a > sizeable quantity of code that we don't want dirtying up the unicorn > internals. Also, some methods of checking appear to have performance > risks. > > Second: in my use case, I have webservers running as VMs, sharing a > physical box with backend utility servers. The util servers run lots > of very CPU- and memory-hungry jobs, mostly at night; the webservers > handle requests, mostly in the daytime. Currently, most of these > webservers are running passenger, which is very polite about not using > more resources than it needs to handle requests. Unicorn, by contrast > (and by design) is very resource-greedy, what with the "scale to what > you can theoretically handle" strategy. If I spin down my number of > unicorn workers when they're not needed, I free up resources for my > util servers, which is important. TTOU and TTIN signals give me a > (very nice) means to write an auto-scaling module outside of unicorn, > but it might be nice to have it included as an optional component. (I > expect this will get voted down, as I expect the dev team is not > interested in it). > > Happy to work on implementing these myself, just wanted to poll to see > if it'd be worth developing them as part of unicorn proper rather than > standalone scripts. > > -ben > _______________________________________________ > Unicorn mailing list - mongrel-unicorn at rubyforge.org > http://rubyforge.org/mailman/listinfo/mongrel-unicorn > Do not quote signatures (like this one) or top post when replying From normalperson at yhbt.net Fri Mar 2 01:07:52 2012 From: normalperson at yhbt.net (Eric Wong) Date: Thu, 1 Mar 2012 17:07:52 -0800 Subject: murdering high-memory workers and auto-scaling In-Reply-To: References: Message-ID: <20120302010752.GA12687@dcvr.yhbt.net> Ben Somers wrote: > Two ideas, one more controversial than the other. Neither is really controversal. > First: auto-killing bloated workers.?My current app has some memory > leakage that wasn't really visible on our older passenger setup, since > the auto-scaling meant that bloated workers got killed periodically. > In a perfect world, we'd find and patch all of the leaks, but in the > meantime (and as a safety net) I'd like to get the bloated workers > auto-killed. It looks like it'd be simple to add in a bloated-worker > check at the same point when we check for timeout violations, and it > could be hidden behind a config setting. Alternately, I could write > this in a separate script. > > Pros: might be a useful built-in feature, looks easy to implement the killing > Cons: Getting the memory usage might actually be surprisingly > difficult. Comparing to passenger's memory management code, where they > actually?use platform-specific system calls, and we might get a > sizeable quantity of code that we don't want dirtying up the unicorn > internals. Also, some methods of checking appear to have performance > risks. You can try something like the following middleware (totally untested, but I've done similar things here and there). I don't know about non-Linux, but I suspect /proc/#$$/* is likely to have something similar... class MemCheckLinux < Struct.new(:app) def call(env) # a faster, but less-readable version may use /proc/#$$/stat or # /proc/#$$/statm but those aren't as human-friendly as # /proc/#$$/status if /VmRSS:\s+(\d+)\s/ =~ File.read("/proc/#$$/status") # gracefully kill ourselves if we exceed ~100M Process.kill(:QUIT, $$) if $1.to_i > 100_000 end app.call(env) end end use MemCheckLinux run Rack::Lobster.new Sadly, setrlimit(:RLIMIT_AS) only causes SIGSEGV to get raised, and Ruby controls that signal for itself. I sometimes use setrlimit(:RLIMIT_CPU) + trap(:XCPU) to kill runaway processes. Apache has long had a similar parameter where you could just tell a worker to gracefully die after X number of requests. That'd also be trivial to implement with middlware using SIGQUIT. > Second: in my use case, I have webservers running as VMs, sharing a > physical box with backend utility servers. The util servers run lots > of very CPU- and memory-hungry jobs, mostly at night; the webservers > handle requests, mostly in the daytime. Currently, most of these > webservers are running passenger, which is very polite about not using > more resources than it needs to handle requests. Unicorn, by contrast > (and by design) is very resource-greedy, what with the "scale to what > you can theoretically handle" strategy. If I spin down my number of > unicorn workers when they're not needed, I free up resources for my > util servers, which is important. TTOU and TTIN signals give me a > (very nice) means to write an auto-scaling module outside of unicorn, > but it might be nice to have it included as an optional component. (I > expect this will get voted down, as I expect the dev team is not > interested in it). If you can make an effort to support it when it breaks, I wouldn't mind including a script in the examples/ section or as an optional module. It's definitely not going to ever be the default. Auto-scaling is hard (if not impossible) to get right. In my experience, it always get things wrong by default or gets configured wrong, making things more difficult to fix. Dedicated servers will always be the primary target of unicorn. (And unicorn of course only scales to server + backend resources, nginx handles scaling to client connections :) > Happy to work on implementing these myself, just wanted to poll to see > if it'd be worth developing them as part of unicorn proper rather than > standalone scripts. If you're willing to help support users of these scripts/modules, I'd have no reservations about distributing them along with unicorn. From normalperson at yhbt.net Fri Mar 2 01:20:16 2012 From: normalperson at yhbt.net (Eric Wong) Date: Thu, 1 Mar 2012 17:20:16 -0800 Subject: murdering high-memory workers and auto-scaling In-Reply-To: References: Message-ID: <20120302012016.GB12687@dcvr.yhbt.net> Ben Somers wrote: > First: auto-killing bloated workers.?My current app has some memory > leakage that wasn't really visible on our older passenger setup, since Btw, you reported issues with memory usage on a Ubuntu system a few months ago, is this the same system? Are you using stock malloc() or tcmalloc()? (tcmalloc comes standard with REE afaik and never releases memory to the kernel). For glibc malloc (ptmalloc) users I mentioned MALLOC_MMAP_THRESHOLD_, but forgot about MALLOC_ARENA_MAX. Since MRI is mostly single-threaded (especially the memory allocation portions), I'm tending to think the per-thread optimizations in glibc malloc do not help in most cases and will only lead to internal fragmentation. So perhaps setting MALLOC_ARENA_MAX=1 (perhaps along with mmap threshold) in the environment will reduce fragmentation and memory usage. It doesn't look like the MALLOC_ARENA_* environment variables are in the manpages, yet: http://udrepper.livejournal.com/20948.html From seamus at abshere.net Wed Mar 7 16:49:56 2012 From: seamus at abshere.net (Seamus Abshere) Date: Wed, 07 Mar 2012 10:49:56 -0600 Subject: Running unicorn gracefully on Heroku Message-ID: <4F5791B4.9040102@abshere.net> hi, I just posted a StackOverflow question about running unicorn gracefully on Heroku... http://stackoverflow.com/questions/9605703/how-can-i-tell-unicorn-to-understand-herokus-signals Would this group be able to provide any wisdom? Thank you. Best, Seamus PS. Please CC: me on any responses! -- Seamus Abshere From normalperson at yhbt.net Wed Mar 7 20:22:12 2012 From: normalperson at yhbt.net (Eric Wong) Date: Wed, 7 Mar 2012 12:22:12 -0800 Subject: Running unicorn gracefully on Heroku In-Reply-To: <4F5791B4.9040102@abshere.net> References: <4F5791B4.9040102@abshere.net> Message-ID: <20120307202212.GA11637@dcvr.yhbt.net> Seamus Abshere wrote: > I just posted a StackOverflow question about running unicorn > gracefully on Heroku... > > http://stackoverflow.com/questions/9605703/how-can-i-tell-unicorn-to-understand-herokus-signals Ugh, can you just send/copy the question over here next time so we don't have to follow a link and parse out? Thanks. > Would this group be able to provide any wisdom? It's best to talk to the Heroku folks about this since it's their service. Maybe some of them hang out on this ML... > PS. Please CC: me on any responses! Done, thanks for the heads up. From ajsharp at gmail.com Wed Mar 7 21:27:58 2012 From: ajsharp at gmail.com (Alex Sharp) Date: Wed, 7 Mar 2012 13:27:58 -0800 Subject: Running unicorn gracefully on Heroku In-Reply-To: <20120307202212.GA11637@dcvr.yhbt.net> References: <4F5791B4.9040102@abshere.net> <20120307202212.GA11637@dcvr.yhbt.net> Message-ID: This guy seems to have figured it out: http://michaelvanrooijen.com/articles/2011/06/01-more-concurrency-on-a-single-heroku-dyno-with-the-new-celadon-cedar-stack/ -- Alex Sharp Zaarly, Inc | @ajsharp | github.com/ajsharp | alexjsharp.com From jamie at jamiedubs.com Thu Mar 8 05:58:24 2012 From: jamie at jamiedubs.com (Jamie Wilkinson) Date: Wed, 7 Mar 2012 21:58:24 -0800 Subject: Running unicorn gracefully on Heroku In-Reply-To: References: <4F5791B4.9040102@abshere.net> <20120307202212.GA11637@dcvr.yhbt.net> Message-ID: <0F44E152-42AA-446B-938E-4A5ED999D797@jamiedubs.com> On Mar 7, 2012, at 1:27 PM, Alex Sharp wrote: > This guy seems to have figured it out: http://michaelvanrooijen.com/articles/2011/06/01-more-concurrency-on-a-single-heroku-dyno-with-the-new-celadon-cedar-stack/ Yeah this works great, I've used it for all my Heroku rails apps without issue for a while now. You just put unicorn in your bundle, make a Procfile like web: bundle exec unicorn -p $PORT -c ./config/unicorn.rb and a config/unicorn.rb like worker_processes 2 timeout 5 # Heroku timeout is 5s Related: like this post[1] I'm also interested in a smarter way to trap Heroku's SIGINT restarts into a unicorn USR2-styl restart, so we can actually take advantage of unicorn. At high concurrency Heroku's "kill world" restarts cause a lot of errors. [1] From maimoukhali at kimo.com Thu Mar 8 20:45:45 2012 From: maimoukhali at kimo.com (Mrs. Maimouna Khaled) Date: Thu, 8 Mar 2012 20:45:45 GMT Subject: I await your response Message-ID: I sent this as a personal mail to you and I request it to be treated as such. My name is Mrs. Maimouna Khalid and I am 75yrs old of age, I stay in Cote d'Ivoire; All the way I lost my husband and in fatal accident that occur in November 5th 2010. Later in the year February 2011 I was sent a letter of medical checkup, as my personal Doctor testify that I have a lung cancer, which can easily take off my life soon. I found it uneasy to survive myself, because a lot of investment cannot be run and manage by me again. I quickly call up a pastor/prophet to give me positive thinking on this solution, as my adviser. He ministered to me to share my properties, wealth, to motherless baby/orphanage homes/people that need money for survival, both students that need money/ business woman and man for their investment and for future rising. So I am writing this letter to you to help me distribute this(USD$12.5M )Twelve Million Five Hundred Thousand United States Dollars which I kept with a Fiduciary fund holder here in Abidjan Cote d'Ivoire to motherless babies/orphanage homes/people that need money for survivor in your country. 10% of this money will be for you and your family; you must give 80% to (Motherless homes), orphanages, and widows in your country. Please contact me and stop weeping. Your suggestion and idea will be highly regarded. I will give more information to you as I await your response immediately. You are blessed. Sincerely, Mrs. Maimouna Khalid From Jeffrey.Yeung at polycom.com Fri Mar 9 21:48:26 2012 From: Jeffrey.Yeung at polycom.com (Yeung, Jeffrey) Date: Fri, 9 Mar 2012 13:48:26 -0800 Subject: Unicorn_rails ignores USR2 signal Message-ID: Hi, I have a problem with one of my Rails servers where the daemonized unicorn_rails master seems to completely ignores the USR2 signal (only). First, about the environment. I've searched for solutions to this problem for several day now, and my situation appears to differ from most in that my environment is not using RVM, bundler, Capistrano, or other sandboxing tools. It's a fairly plain jane Rails deployment. The OS is 64-bit Ubuntu 10.04 server. $ uname -a Linux 2.6.32-37-server #81-Ubuntu SMP Fri Dec 2 20:49:12 UTC 2011 x86_64 GNU/Linux $ lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 10.04.4 LTS Release: 10.04 Codename: lucid $ ruby -v ruby 1.9.1p376 (2009-12-07 revision 26041) [x86_64-linux] The Rails gem version is 2.3.14. Unicorn gem version 4.2.0. The Unicorn config includes "preload_app true". We use the same Unicorn configs on similar servers and for some yet unknown reason, this one is seeing a problem with USR2. I have done my best to debug it using strace and the stderr logs. The unicorn_rails master process handles HUP, QUIT, and USR1 signals as expected, no problems. However, the USR2 signal is being completely ignored. I added a before_exec block in the Unicorn conf with some puts statements, but this hook never gets called (unlike the before_fork and after_fork hooks, which are working fine for me). before_exec do |server| $stderr.puts("DEBUG before_exec") $stderr.puts(ENV.inspect) $stderr.puts(Unicorn::HttpServer::START_CTX.inspect) end While monitoring the master with strace, there are no calls observed when sending a kill -USR2. Nothing is logged to STDERR when sending this signal, either. This has me completely stymied with the lack of logging info. Does anyone have any clues that can point me in the right direction? -Jeff From normalperson at yhbt.net Fri Mar 9 22:24:12 2012 From: normalperson at yhbt.net (Eric Wong) Date: Fri, 9 Mar 2012 22:24:12 +0000 Subject: Unicorn_rails ignores USR2 signal In-Reply-To: References: Message-ID: <20120309222412.GA21753@dcvr.yhbt.net> "Yeung, Jeffrey" wrote: > $ ruby -v > ruby 1.9.1p376 (2009-12-07 revision 26041) [x86_64-linux] > > The Rails gem version is 2.3.14. Unicorn gem version 4.2.0. > I have done my best to debug it using strace and the stderr logs. The > unicorn_rails master process handles HUP, QUIT, and USR1 signals as > expected, no problems. However, the USR2 signal is being completely > ignored. > While monitoring the master with strace, there are no calls observed > when sending a kill -USR2. Nothing is logged to STDERR when sending > this signal, either. Really strange, especially since other signals seem to work... Since you're on Ruby (>= 1.9.0 && < 1.9.3), are you stracing with -f? If you're not already, try using: strace -f -e '!futex' 1.9.x versions use a dedicated timer thread to accept signals, so you won't see system signal handlers in the VM without "strace -f". The "-e '!futex'" filters out the noise from the polling wakeup in <1.9.3 versions of Ruby. Can you try Ruby 1.9.3? The signal handling got completely reworked (for the better) and you won't need "-e '!futex'" From Jeffrey.Yeung at polycom.com Fri Mar 9 22:39:38 2012 From: Jeffrey.Yeung at polycom.com (Yeung, Jeffrey) Date: Fri, 9 Mar 2012 14:39:38 -0800 Subject: Unicorn_rails ignores USR2 signal In-Reply-To: <20120309222412.GA21753@dcvr.yhbt.net> References: <20120309222412.GA21753@dcvr.yhbt.net> Message-ID: Thanks Eric. New strace capture as follows: $ sudo strace -f -e '!futex' -p 14255 Process 14255 attached with 12 threads - interrupt to quit [pid 14322] restart_syscall(<... resuming interrupted call ...> [pid 14271] restart_syscall(<... resuming interrupted call ...> [pid 14267] accept(12, [pid 14264] restart_syscall(<... resuming interrupted call ...> [pid 14255] select(6, [5], NULL, NULL, {42, 86504} [pid 14322] <... restart_syscall resumed> ) = -1 ETIMEDOUT (Connection timed out) [pid 14271] <... restart_syscall resumed> ) = -1 ETIMEDOUT (Connection timed out) [pid 14264] <... restart_syscall resumed> ) = -1 ETIMEDOUT (Connection timed out) [pid 14262] --- SIGUSR2 (User defined signal 2) @ 0 (0) --- [pid 14262] rt_sigreturn(0x20) = 202 The last two lines do not say much for the event. :( -----Original Message----- From: Eric Wong [mailto:normalperson at yhbt.net] Sent: Friday, March 09, 2012 2:24 PM To: unicorn list Cc: Yeung, Jeffrey Subject: Re: Unicorn_rails ignores USR2 signal "Yeung, Jeffrey" wrote: > $ ruby -v > ruby 1.9.1p376 (2009-12-07 revision 26041) [x86_64-linux] > > The Rails gem version is 2.3.14. Unicorn gem version 4.2.0. > I have done my best to debug it using strace and the stderr logs. The > unicorn_rails master process handles HUP, QUIT, and USR1 signals as > expected, no problems. However, the USR2 signal is being completely > ignored. > While monitoring the master with strace, there are no calls observed > when sending a kill -USR2. Nothing is logged to STDERR when sending > this signal, either. Really strange, especially since other signals seem to work... Since you're on Ruby (>= 1.9.0 && < 1.9.3), are you stracing with -f? If you're not already, try using: strace -f -e '!futex' 1.9.x versions use a dedicated timer thread to accept signals, so you won't see system signal handlers in the VM without "strace -f". The "-e '!futex'" filters out the noise from the polling wakeup in <1.9.3 versions of Ruby. Can you try Ruby 1.9.3? The signal handling got completely reworked (for the better) and you won't need "-e '!futex'" From normalperson at yhbt.net Sat Mar 10 00:02:39 2012 From: normalperson at yhbt.net (Eric Wong) Date: Sat, 10 Mar 2012 00:02:39 +0000 Subject: Unicorn_rails ignores USR2 signal In-Reply-To: References: <20120309222412.GA21753@dcvr.yhbt.net> Message-ID: <20120310000239.GA27195@dcvr.yhbt.net> "Yeung, Jeffrey" wrote: > Thanks Eric. New strace capture as follows: > > $ sudo strace -f -e '!futex' -p 14255 > Process 14255 attached with 12 threads - interrupt to quit >From that, you have 5 worker processes? For debugging this, it can cut down on noise to only use one worker process, too. You can check if SIGTTOU works, too :) Also, can you reproduce this on a freshly-started master? Or has the master been running and handling other signals for a while? The most common cause of USR2 failures is due to an executable or library being moved/replaced/upgraded away (sometimes due to Capistrano), but that should definitely get logged and doesn't seem to be the case for you. > [pid 14322] restart_syscall(<... resuming interrupted call ...> > [pid 14271] restart_syscall(<... resuming interrupted call ...> > [pid 14267] accept(12, > [pid 14264] restart_syscall(<... resuming interrupted call ...> > [pid 14255] select(6, [5], NULL, NULL, {42, 86504} > [pid 14322] <... restart_syscall resumed> ) = -1 ETIMEDOUT (Connection timed out) > [pid 14271] <... restart_syscall resumed> ) = -1 ETIMEDOUT (Connection timed out) > [pid 14264] <... restart_syscall resumed> ) = -1 ETIMEDOUT (Connection timed out) > [pid 14262] --- SIGUSR2 (User defined signal 2) @ 0 (0) --- > [pid 14262] rt_sigreturn(0x20) = 202 > > The last two lines do not say much for the event. :( Anything more after that? What happens when you send a previously working signal (USR1, HUP) after sending a failed USR2 to that process? From Jeffrey.Yeung at polycom.com Sat Mar 10 01:07:47 2012 From: Jeffrey.Yeung at polycom.com (Yeung, Jeffrey) Date: Fri, 9 Mar 2012 17:07:47 -0800 Subject: Unicorn_rails ignores USR2 signal In-Reply-To: <20120310000239.GA27195@dcvr.yhbt.net> References: <20120309222412.GA21753@dcvr.yhbt.net> <20120310000239.GA27195@dcvr.yhbt.net> Message-ID: I have it configured for 4 workers. I just turned it down to 1 worker and tested both TTIN and TTOU. They both work, creating and killing workers, respectively. The following is the strace capture from an TTOU signal, followed by a USR2 signal. Yes, that is the only output from the USR2 signal. Behavior doesn't change if the master is freshly started, or has been running for a while. One other thing I did not mention earlier is that the .oldbin file never gets created on the USR2 signal, but that's probably obvious already. [pid 14463] --- SIGTTOU (Stopped (tty output)) @ 0 (0) --- [pid 14463] rt_sigreturn(0x16) = -1 EINTR (Interrupted system call) [pid 14483] rt_sigprocmask(SIG_SETMASK, ~[SEGV VTALRM RTMIN RT_1], NULL, 8) = 0 [pid 14483] rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 [pid 14483] tgkill(14456, 14456, SIGVTALRM) = 0 [pid 14456] <... select resumed> ) = ? ERESTARTNOHAND (To be restarted) [pid 14456] --- SIGVTALRM (Virtual timer expired) @ 0 (0) --- [pid 14456] rt_sigreturn(0x1a) = -1 EINTR (Interrupted system call) [pid 14456] fcntl(6, F_GETFL) = 0x801 (flags O_WRONLY|O_NONBLOCK) [pid 14456] write(6, ".", 1) = 1 [pid 14456] select(6, [5], NULL, NULL, {30, 755188}) = 1 (in [5], left {30, 755184}) [pid 14456] fcntl(5, F_GETFL) = 0x800 (flags O_RDONLY|O_NONBLOCK) [pid 14456] read(5, ".", 11) = 1 [pid 14456] wait4(-1, 0x7fffcb5560ec, WNOHANG, NULL) = 0 [pid 14456] wait4(-1, 0x7fffcb5560ec, WNOHANG, NULL) = 0 [pid 14456] kill(14482, SIGQUIT [pid 14482] <... select resumed> ) = ? ERESTARTNOHAND (To be restarted) [pid 14456] <... kill resumed> ) = 0 [pid 14482] --- SIGQUIT (Quit) @ 0 (0) --- [pid 14456] select(6, [5], NULL, NULL, {83, 0} [pid 14482] rt_sigreturn(0x3) = -1 EINTR (Interrupted system call) [pid 14482] sched_yield() = 0 [pid 14482] sched_yield() = 0 [pid 14482] sched_yield() = 0 [pid 14482] sched_yield() = 0 [pid 14482] sched_yield() = 0 [pid 14482] sched_yield() = 0 [pid 14482] sched_yield() = 0 [pid 14482] sched_yield() = 0 [pid 14482] sched_yield() = 0 [pid 14482] sched_yield() = 0 [pid 14482] sched_yield() = 0 [pid 14482] sched_yield() = 0 [pid 14482] sched_yield() = 0 [pid 14482] sched_yield() = 0 [pid 14482] sched_yield() = 0 [pid 14482] sched_yield() = 0 [pid 14482] sched_yield() = 0 [pid 14482] sched_yield() = 0 [pid 14482] sched_yield() = 0 [pid 14482] sched_yield() = 0 [pid 14482] sched_yield() = 0 [pid 14482] sched_yield() = 0 [pid 14482] sched_yield() = 0 [pid 14482] sched_yield() = 0 [pid 14482] sched_yield() = 0 [pid 14482] sched_yield() = 0 [pid 14482] sched_yield() = 0 [pid 14482] sched_yield() = 0 [pid 14482] sched_yield() = 0 [pid 14482] sched_yield() = 0 [pid 14482] sched_yield() = 0 [pid 14482] sched_yield() = 0 [pid 14482] sched_yield() = 0 [pid 14482] sched_yield() = 0 [pid 14482] sched_yield() = 0 [pid 14482] sched_yield() = 0 [pid 14482] sched_yield() = 0 [pid 14482] sched_yield() = 0 [pid 14482] sched_yield() = 0 [pid 14482] sched_yield() = 0 [pid 14482] sched_yield() = 0 [pid 14482] sched_yield() = 0 [pid 14482] sched_yield() = 0 [pid 14482] sched_yield() = 0 [pid 14482] sched_yield() = 0 [pid 14482] sched_yield() = 0 [pid 14482] sched_yield() = 0 [pid 14482] sched_yield() = 0 [pid 14482] sched_yield() = 0 [pid 14482] sched_yield() = 0 [pid 14482] sched_yield() = 0 [pid 14482] sched_yield() = 0 [pid 14482] sched_yield() = 0 [pid 14482] sched_yield() = 0 [pid 14482] sched_yield() = 0 [pid 14482] sched_yield() = 0 [pid 14482] sched_yield() = 0 [pid 14482] sched_yield() = 0 [pid 14482] sched_yield() = 0 [pid 14482] sched_yield() = 0 [pid 14482] sched_yield() = 0 [pid 14482] sched_yield() = 0 [pid 14482] sched_yield() = 0 [pid 14482] sched_yield() = 0 [pid 14482] sched_yield() = 0 [pid 14482] sched_yield() = 0 [pid 14482] sched_yield() = 0 [pid 14482] sched_yield() = 0 [pid 14482] sched_yield() = 0 [pid 14482] sched_yield() = 0 [pid 14482] sched_yield() = 0 [pid 14482] sched_yield() = 0 [pid 14482] sched_yield() = 0 [pid 14482] sched_yield() = 0 [pid 14482] sched_yield() = 0 [pid 14482] sched_yield() = 0 [pid 14482] sched_yield() = 0 [pid 14482] sched_yield() = 0 [pid 14482] sched_yield() = 0 [pid 14482] sched_yield() = 0 [pid 14482] sched_yield() = 0 [pid 14482] sched_yield() = 0 [pid 14482] sched_yield() = 0 [pid 14482] sched_yield() = 0 [pid 14482] sched_yield() = 0 [pid 14482] sched_yield() = 0 [pid 14482] sched_yield() = 0 [pid 14482] sched_yield() = 0 [pid 14482] sched_yield() = 0 [pid 14482] sched_yield() = 0 [pid 14482] sched_yield() = 0 [pid 14482] sched_yield() = 0 [pid 14482] sched_yield() = 0 [pid 14482] sched_yield() = 0 [pid 14482] sched_yield( [pid 14485] rt_sigprocmask(SIG_SETMASK, ~[SEGV VTALRM RTMIN RT_1], NULL, 8) = 0 [pid 14485] rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 [pid 14482] <... sched_yield resumed> ) = 0 [pid 14482] close(3) = 0 [pid 14482] select(6, [3], NULL, [4 5], {22, 883208}) = -1 EBADF (Bad file descriptor) [pid 14482] rt_sigprocmask(SIG_SETMASK, ~[HUP INT QUIT KILL USR1 SEGV USR2 TERM CHLD STOP TTIN TTOU VTALRM WINCH RTMIN RT_1], NULL, 8) = 0 [pid 14482] rt_sigprocmask(SIG_SETMASK, ~[HUP INT QUIT KILL USR1 SEGV USR2 TERM CHLD STOP TTIN TTOU VTALRM WINCH RTMIN RT_1], NULL, 8) = 0 [pid 14482] rt_sigprocmask(SIG_SETMASK, ~[HUP INT QUIT KILL USR1 SEGV USR2 TERM CHLD STOP TTIN TTOU VTALRM WINCH RTMIN RT_1], NULL, 8) = 0 [pid 14482] getrlimit(RLIMIT_STACK, {rlim_cur=8192*1024, rlim_max=RLIM_INFINITY}) = 0 [pid 14485] _exit(0) = ? Process 14485 detached [pid 14482] rt_sigaction(SIGINT, {SIG_IGN, [], SA_RESTORER|SA_SIGINFO, 0x7f10515428f0}, {0x48fc00, [], SA_RESTORER|SA_SIGINFO, 0x7f10515428f0}, 8) = 0 [pid 14482] rt_sigaction(SIGINT, {SIG_DFL, [], SA_RESTORER|SA_SIGINFO, 0x7f10515428f0}, {SIG_IGN, [], SA_RESTORER|SA_SIGINFO, 0x7f10515428f0}, 8) = 0 [pid 14482] munmap(0x7f10517b4000, 4096) = 0 [pid 14482] close(4) = 0 [pid 14482] close(5) = 0 [pid 14482] close(7) = 0 [pid 14482] close(12) = 0 [pid 14482] unlink("/tmp/.java_pid14456") = -1 EPERM (Operation not permitted) [pid 14482] exit_group(0) = ? Process 14482 detached [pid 14463] --- SIGCHLD (Child exited) @ 0 (0) --- [pid 14463] rt_sigreturn(0x11) = -1 EINTR (Interrupted system call) [pid 14483] rt_sigprocmask(SIG_SETMASK, ~[SEGV VTALRM RTMIN RT_1], NULL, 8) = 0 [pid 14483] rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 [pid 14483] tgkill(14456, 14456, SIGVTALRM) = 0 [pid 14456] <... select resumed> ) = ? ERESTARTNOHAND (To be restarted) [pid 14456] --- SIGVTALRM (Virtual timer expired) @ 0 (0) --- [pid 14456] rt_sigreturn(0x1a) = -1 EINTR (Interrupted system call) [pid 14456] fcntl(6, F_GETFL) = 0x801 (flags O_WRONLY|O_NONBLOCK) [pid 14456] write(6, ".", 1) = 1 [pid 14456] select(6, [5], NULL, NULL, {82, 908414}) = 1 (in [5], left {82, 908410}) [pid 14456] fcntl(5, F_GETFL) = 0x800 (flags O_RDONLY|O_NONBLOCK) [pid 14456] read(5, ".", 11) = 1 [pid 14456] wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG, NULL) = 14482 [pid 14456] write(2, "reaped # [pid 14463] --- SIGUSR2 (User defined signal 2) @ 0 (0) --- [pid 14463] rt_sigreturn(0x20) = 202 -----Original Message----- From: Eric Wong [mailto:normalperson at yhbt.net] Sent: Friday, March 09, 2012 4:03 PM To: unicorn list Cc: Yeung, Jeffrey Subject: Re: Unicorn_rails ignores USR2 signal "Yeung, Jeffrey" wrote: > Thanks Eric. New strace capture as follows: > > $ sudo strace -f -e '!futex' -p 14255 Process 14255 attached with 12 > threads - interrupt to quit >From that, you have 5 worker processes? For debugging this, it can cut down on noise to only use one worker process, too. You can check if SIGTTOU works, too :) Also, can you reproduce this on a freshly-started master? Or has the master been running and handling other signals for a while? The most common cause of USR2 failures is due to an executable or library being moved/replaced/upgraded away (sometimes due to Capistrano), but that should definitely get logged and doesn't seem to be the case for you. > [pid 14322] restart_syscall(<... resuming interrupted call ...> > [pid 14271] restart_syscall(<... resuming > interrupted call ...> [pid 14267] accept(12, > [pid 14264] restart_syscall(<... resuming > interrupted call ...> [pid 14255] select(6, [5], > NULL, NULL, {42, 86504} [pid 14322] <... > restart_syscall resumed> ) = -1 ETIMEDOUT (Connection timed out) [pid > 14271] <... restart_syscall resumed> ) = -1 ETIMEDOUT (Connection > timed out) [pid 14264] <... restart_syscall resumed> ) = -1 ETIMEDOUT (Connection timed out) [pid 14262] --- SIGUSR2 (User defined signal 2) @ 0 (0) --- > [pid 14262] rt_sigreturn(0x20) = 202 > > The last two lines do not say much for the event. :( Anything more after that? What happens when you send a previously working signal (USR1, HUP) after sending a failed USR2 to that process? From normalperson at yhbt.net Sat Mar 10 01:30:51 2012 From: normalperson at yhbt.net (Eric Wong) Date: Fri, 9 Mar 2012 17:30:51 -0800 Subject: Unicorn_rails ignores USR2 signal In-Reply-To: References: <20120309222412.GA21753@dcvr.yhbt.net> <20120310000239.GA27195@dcvr.yhbt.net> Message-ID: <20120310013051.GA32091@dcvr.yhbt.net> "Yeung, Jeffrey" wrote: > I have it configured for 4 workers. I just turned it down to 1 worker > and tested both TTIN and TTOU. They both work, creating and killing > workers, respectively. The following is the strace capture from an > TTOU signal, followed by a USR2 signal. Yes, that is the only output > from the USR2 signal. Behavior doesn't change if the master is > freshly started, or has been running for a while. One other thing I > did not mention earlier is that the .oldbin file never gets > created on the USR2 signal, but that's probably obvious already. I also asked for sending another signal after USR2, can you try that? However, there's another possibility I hadn't considered, what if you disable preload_app? Your app or some libs it uses may be intercepting USR2 for something it does. Maybe this patch can work, but this may also silently break your application/lib, too... diff --git a/lib/unicorn/http_server.rb b/lib/unicorn/http_server.rb index 7d2c623..1b9d693 100644 --- a/lib/unicorn/http_server.rb +++ b/lib/unicorn/http_server.rb @@ -128,8 +128,7 @@ class Unicorn::HttpServer # setup signal handlers before writing pid file in case people get # trigger happy and send signals as soon as the pid file exists. # Note that signals don't actually get handled until the #join method - QUEUE_SIGS.each { |sig| trap(sig) { SIG_QUEUE << sig; awaken_master } } - trap(:CHLD) { awaken_master } + master_siginit self.pid = config[:pid] self.master_pid = $$ @@ -689,6 +688,9 @@ class Unicorn::HttpServer Gem.refresh end self.app = app.call + + # override signal handlers the app may have set + master_siginit if preload_app end end @@ -736,4 +738,9 @@ class Unicorn::HttpServer config_listeners.each { |addr| listen(addr) } raise ArgumentError, "no listeners" if LISTENERS.empty? end + + def master_siginit + QUEUE_SIGS.each { |sig| trap(sig) { SIG_QUEUE << sig; awaken_master } } + trap(:CHLD) { awaken_master } + end end From normalperson at yhbt.net Mon Mar 12 21:21:19 2012 From: normalperson at yhbt.net (Eric Wong) Date: Mon, 12 Mar 2012 21:21:19 +0000 Subject: Unicorn_rails ignores USR2 signal In-Reply-To: <20120310013051.GA32091@dcvr.yhbt.net> References: <20120309222412.GA21753@dcvr.yhbt.net> <20120310000239.GA27195@dcvr.yhbt.net> <20120310013051.GA32091@dcvr.yhbt.net> Message-ID: <20120312212119.GA26451@dcvr.yhbt.net> Eric Wong wrote: > However, there's another possibility I hadn't considered, what if you > disable preload_app? Your app or some libs it uses may be intercepting > USR2 for something it does. Ping? Was this it? From Jeffrey.Yeung at polycom.com Mon Mar 12 22:39:24 2012 From: Jeffrey.Yeung at polycom.com (Yeung, Jeffrey) Date: Mon, 12 Mar 2012 15:39:24 -0700 Subject: Unicorn_rails ignores USR2 signal In-Reply-To: <20120312212119.GA26451@dcvr.yhbt.net> References: <20120309222412.GA21753@dcvr.yhbt.net> <20120310000239.GA27195@dcvr.yhbt.net> <20120310013051.GA32091@dcvr.yhbt.net> <20120312212119.GA26451@dcvr.yhbt.net> Message-ID: Hi Eric, Sorry for the delay. It looks like disabling preload_app did the trick. A new master was created after sending the USR2. Now the $$$ question is, what in the world is intercepting the signal? :S And this is a little late now, but here's the strace of a USR2 signal followed by a TTOU signal, with preload_app true (latter signal working okay in this case): [pid 14542] --- SIGUSR2 (User defined signal 2) @ 0 (0) --- [pid 14542] rt_sigreturn(0x20) = 202 [pid 14535] <... select resumed> ) = 0 (Timeout) [pid 14535] wait4(-1, 0x7fff6bcccb9c, WNOHANG, NULL) = 0 [pid 14535] select(6, [5], NULL, NULL, {60, 0} [pid 14542] --- SIGTTOU (Stopped (tty output)) @ 0 (0) --- [pid 14542] rt_sigreturn(0x16) = -1 EINTR (Interrupted system call) [pid 14553] rt_sigprocmask(SIG_SETMASK, ~[SEGV VTALRM RTMIN RT_1], NULL, 8) = 0 [pid 14553] rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 [pid 14553] tgkill(14535, 14535, SIGVTALRM) = 0 [pid 14535] <... select resumed> ) = ? ERESTARTNOHAND (To be restarted) [pid 14535] --- SIGVTALRM (Virtual timer expired) @ 0 (0) --- [pid 14535] rt_sigreturn(0x1a) = -1 EINTR (Interrupted system call) [pid 14535] fcntl(6, F_GETFL) = 0x1 (flags O_WRONLY) [pid 14535] fcntl(6, F_SETFL, O_WRONLY|O_NONBLOCK) = 0 [pid 14535] write(6, ".", 1) = 1 [pid 14535] select(6, [5], NULL, NULL, {59, 939700}) = 1 (in [5], left {59, 939696}) [pid 14535] fcntl(5, F_GETFL) = 0 (flags O_RDONLY) [pid 14535] fcntl(5, F_SETFL, O_RDONLY|O_NONBLOCK) = 0 [pid 14535] read(5, ".", 11) = 1 [pid 14535] wait4(-1, 0x7fff6bcccb9c, WNOHANG, NULL) = 0 [pid 14535] wait4(-1, 0x7fff6bcccb9c, WNOHANG, NULL) = 0 [pid 14535] kill(14552, SIGQUIT) = 0 [pid 14535] select(6, [5], NULL, NULL, {60, 0} [pid 14542] --- SIGCHLD (Child exited) @ 0 (0) --- [pid 14542] rt_sigreturn(0x11) = -1 EINTR (Interrupted system call) [pid 14553] rt_sigprocmask(SIG_SETMASK, ~[SEGV VTALRM RTMIN RT_1], NULL, 8) = 0 [pid 14553] rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 [pid 14553] tgkill(14535, 14535, SIGVTALRM) = 0 [pid 14535] <... select resumed> ) = ? ERESTARTNOHAND (To be restarted) [pid 14535] --- SIGVTALRM (Virtual timer expired) @ 0 (0) --- [pid 14535] rt_sigreturn(0x1a) = -1 EINTR (Interrupted system call) [pid 14535] fcntl(6, F_GETFL) = 0x801 (flags O_WRONLY|O_NONBLOCK) [pid 14535] write(6, ".", 1) = 1 [pid 14535] select(6, [5], NULL, NULL, {59, 880004}) = 1 (in [5], left {59, 880001}) [pid 14535] fcntl(5, F_GETFL) = 0x800 (flags O_RDONLY|O_NONBLOCK) [pid 14535] read(5, ".", 11) = 1 [pid 14535] wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG, NULL) = 14552 [pid 14535] write(2, "reaped # wrote: > However, there's another possibility I hadn't considered, what if you > disable preload_app? Your app or some libs it uses may be > intercepting > USR2 for something it does. Ping? Was this it? From normalperson at yhbt.net Mon Mar 12 22:44:19 2012 From: normalperson at yhbt.net (Eric Wong) Date: Mon, 12 Mar 2012 15:44:19 -0700 Subject: Unicorn_rails ignores USR2 signal In-Reply-To: References: <20120309222412.GA21753@dcvr.yhbt.net> <20120310000239.GA27195@dcvr.yhbt.net> <20120310013051.GA32091@dcvr.yhbt.net> <20120312212119.GA26451@dcvr.yhbt.net> Message-ID: <20120312224419.GA2942@dcvr.yhbt.net> "Yeung, Jeffrey" wrote: > Sorry for the delay. It looks like disabling preload_app did the > trick. A new master was created after sending the USR2. Now the $$$ > question is, what in the world is intercepting the signal? :S Good to know, I'd just grep the installation directories for all your Ruby libs + gems for USR2. I haven't seen this problem before, but it'd be good to document the conflict, at least. From normalperson at yhbt.net Tue Mar 20 19:57:48 2012 From: normalperson at yhbt.net (Eric Wong) Date: Tue, 20 Mar 2012 19:57:48 +0000 Subject: Unicorn_rails ignores USR2 signal In-Reply-To: <20120312224419.GA2942@dcvr.yhbt.net> References: <20120309222412.GA21753@dcvr.yhbt.net> <20120310000239.GA27195@dcvr.yhbt.net> <20120310013051.GA32091@dcvr.yhbt.net> <20120312212119.GA26451@dcvr.yhbt.net> <20120312224419.GA2942@dcvr.yhbt.net> Message-ID: <20120320195748.GA1187@dcvr.yhbt.net> Eric Wong wrote: > "Yeung, Jeffrey" wrote: > > Sorry for the delay. It looks like disabling preload_app did the > > trick. A new master was created after sending the USR2. Now the $$$ > > question is, what in the world is intercepting the signal? :S > > Good to know, I'd just grep the installation directories for all your > Ruby libs + gems for USR2. I haven't seen this problem before, but it'd > be good to document the conflict, at least. Btw, did you ever figure out what was causing the conflict? Pushing this out to git://bogomips.org/unicorn.git >From 1e13ffee3469997286e65e0563b6433e7744388a Mon Sep 17 00:00:00 2001 From: Eric Wong Date: Tue, 20 Mar 2012 19:51:35 +0000 Subject: [PATCH] KNOWN_ISSUES: document signal conflicts in libs/apps Jeffrey Yeung confirmed this issue on the mailing list. ref: --- KNOWN_ISSUES | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/KNOWN_ISSUES b/KNOWN_ISSUES index f323c68..38263e7 100644 --- a/KNOWN_ISSUES +++ b/KNOWN_ISSUES @@ -3,6 +3,11 @@ Occasionally odd {issues}[link:ISSUES.html] arise without a transparent or acceptable solution. Those issues are documented here. +* Some libraries/applications may install signal handlers which conflict + with signal handlers unicorn uses. Leaving "preload_app false" + (the default) will allow unicorn to always override existing signal + handlers. + * Issues with FreeBSD jails can be worked around as documented by Tatsuya Ono: http://mid.gmane.org/CAHBuKRj09FdxAgzsefJWotexw-7JYZGJMtgUp_dhjPz9VbKD6Q at mail.gmail.com From normalperson at yhbt.net Tue Mar 20 19:59:58 2012 From: normalperson at yhbt.net (Eric Wong) Date: Tue, 20 Mar 2012 19:59:58 +0000 Subject: Suggestion for improvement of examples/nginx.conf In-Reply-To: <20120228210136.GA2808@dcvr.yhbt.net> References: <20120228210136.GA2808@dcvr.yhbt.net> Message-ID: <20120320195958.GB1187@dcvr.yhbt.net> Eric Wong wrote: > Eike Herzbach wrote: > > Hi, > > > > what do you think about that: > > > > diff --git a/examples/nginx.conf b/examples/nginx.conf > > index cc1038a..5ef43f3 100644 > > --- a/examples/nginx.conf > > +++ b/examples/nginx.conf > > @@ -55,7 +55,7 @@ http { > > # faster or not than doing compression via nginx. It's easier > > # to configure it all in one place here for static files and also > > # to disable gzip for clients who don't get gzip/deflate right. > > - # There are other other gzip settings that may be needed used to deal with > > + # There are other gzip settings that may be needed used to deal with > > Obviously correct, will apply. Thanks. Pushed: http://bogomips.org/unicorn.git/patch/?id=0daedd92d3e896a9fcd301bbb58e85bb54a939ee > > - # enable this if and only if you use HTTPS, this helps Rack > > + # enable this if you use HTTPS, this helps Rack > > # set the proper protocol for doing redirects: > > - # proxy_set_header X-Forwarded-Proto https; > > + # proxy_set_header X-Forwarded-Proto $scheme; > > > > # pass the Host: header from the client right along so redirects > > # can be set properly within the Rack application > > > > I haven't found an easy way to only set the header if the scheme is > > https, but I don't see any problems in sending the header for all > > requests either. > > Setting it for all requests uses an extra hash slot and leads to extra > method dispatches in Rack (and possibly code elsewhere). I suppose > it doesn't matter for most setups, though. > > How about this? > > --- a/examples/nginx.conf > +++ b/examples/nginx.conf > @@ -120,9 +120,9 @@ http { > # http://en.wikipedia.org/wiki/X-Forwarded-For > proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; > > - # enable this if and only if you use HTTPS, this helps Rack > - # set the proper protocol for doing redirects: > - # proxy_set_header X-Forwarded-Proto https; > + # enable this if you forward HTTPS traffic to unicorn, > + # this helps Rack set the proper URL scheme for doing redirects: > + # proxy_set_header X-Forwarded-Proto $scheme; > > # pass the Host: header from the client right along so redirects > # can be set properly within the Rack application Also pushed this: http://bogomips.org/unicorn.git/patch/?id=9fc5c24920726d3c10bc9f39d8e97686b93cbbe0 From normalperson at yhbt.net Tue Mar 20 20:10:10 2012 From: normalperson at yhbt.net (Eric Wong) Date: Tue, 20 Mar 2012 20:10:10 +0000 Subject: [PATCH] Start the server if another user has a PID matching our stale pidfile. In-Reply-To: <20120229172559.GA26211@dcvr.yhbt.net> References: <20120229172559.GA26211@dcvr.yhbt.net> Message-ID: <20120320201010.GA6857@dcvr.yhbt.net> Eric Wong wrote: > Graham Bleach wrote: > > If unicorn doesn't get terminated cleanly (for example if the machine > > has its power interrupted) and the pid in the pidfile gets used by > > another process, the current unicorn code will exit and not start a > > server. This tiny patch fixes that behaviour. > > Thanks! Acked-by: Eric Wong > > and pushed to master on git://bogomips.org/unicorn.git Btw, I also pushed this to be a little more informative: >From d0e7d8d770275654024887a05d9e986589ba358c Mon Sep 17 00:00:00 2001 From: Eric Wong Date: Tue, 20 Mar 2012 20:05:59 +0000 Subject: [PATCH] log EPERM errors from invalid pid files In some cases, EPERM may indicate a real configuration problem, but it can also just mean the pid file is stale. --- lib/unicorn/http_server.rb | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/lib/unicorn/http_server.rb b/lib/unicorn/http_server.rb index 0c2af5d..ede6264 100644 --- a/lib/unicorn/http_server.rb +++ b/lib/unicorn/http_server.rb @@ -656,7 +656,10 @@ class Unicorn::HttpServer wpid <= 0 and return Process.kill(0, wpid) wpid - rescue Errno::ESRCH, Errno::ENOENT, Errno::EPERM + rescue Errno::EPERM + logger.info "pid=#{path} possibly stale, got EPERM signalling PID:#{wpid}" + nil + rescue Errno::ESRCH, Errno::ENOENT # don't unlink stale pid files, racy without non-portable locking... end -- From Jeffrey.Yeung at polycom.com Tue Mar 20 23:09:34 2012 From: Jeffrey.Yeung at polycom.com (Yeung, Jeffrey) Date: Tue, 20 Mar 2012 16:09:34 -0700 Subject: Unicorn_rails ignores USR2 signal In-Reply-To: <20120320195748.GA1187@dcvr.yhbt.net> References: <20120309222412.GA21753@dcvr.yhbt.net> <20120310000239.GA27195@dcvr.yhbt.net> <20120310013051.GA32091@dcvr.yhbt.net> <20120312212119.GA26451@dcvr.yhbt.net> <20120312224419.GA2942@dcvr.yhbt.net> <20120320195748.GA1187@dcvr.yhbt.net> Message-ID: Eric, I have been unable to narrow down the cause of the conflict so far. The list of Ruby gems (and gem versions) on the affected deployment are identical to the ones on another deployment where Unicorn is upgrading just fine (with preloaded app). Grep'ing for USR2 in the gem installations did not reveal anything, unfortunately. Since then, I haven't been able to spend further time investigating. Not sure where else to look, really, but I'm open to further suggestions. -Jeff -----Original Message----- From: Eric Wong [mailto:normalperson at yhbt.net] Sent: Tuesday, March 20, 2012 12:58 PM To: unicorn list Cc: Yeung, Jeffrey Subject: Re: Unicorn_rails ignores USR2 signal Eric Wong wrote: > "Yeung, Jeffrey" wrote: > > Sorry for the delay. It looks like disabling preload_app did the > > trick. A new master was created after sending the USR2. Now the > > $$$ question is, what in the world is intercepting the signal? :S > > Good to know, I'd just grep the installation directories for all your > Ruby libs + gems for USR2. I haven't seen this problem before, but > it'd be good to document the conflict, at least. Btw, did you ever figure out what was causing the conflict? Pushing this out to git://bogomips.org/unicorn.git >From 1e13ffee3469997286e65e0563b6433e7744388a Mon Sep 17 00:00:00 2001 From: Eric Wong Date: Tue, 20 Mar 2012 19:51:35 +0000 Subject: [PATCH] KNOWN_ISSUES: document signal conflicts in libs/apps Jeffrey Yeung confirmed this issue on the mailing list. ref: --- KNOWN_ISSUES | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/KNOWN_ISSUES b/KNOWN_ISSUES index f323c68..38263e7 100644 --- a/KNOWN_ISSUES +++ b/KNOWN_ISSUES @@ -3,6 +3,11 @@ Occasionally odd {issues}[link:ISSUES.html] arise without a transparent or acceptable solution. Those issues are documented here. +* Some libraries/applications may install signal handlers which +conflict + with signal handlers unicorn uses. Leaving "preload_app false" + (the default) will allow unicorn to always override existing signal + handlers. + * Issues with FreeBSD jails can be worked around as documented by Tatsuya Ono: http://mid.gmane.org/CAHBuKRj09FdxAgzsefJWotexw-7JYZGJMtgUp_dhjPz9VbKD6Q at mail.gmail.com From dbenhur at whitepages.com Wed Mar 21 02:27:41 2012 From: dbenhur at whitepages.com (Devin Ben-Hur) Date: Tue, 20 Mar 2012 19:27:41 -0700 Subject: Unicorn_rails ignores USR2 signal In-Reply-To: References: <20120309222412.GA21753@dcvr.yhbt.net> <20120310000239.GA27195@dcvr.yhbt.net> <20120310013051.GA32091@dcvr.yhbt.net> <20120312212119.GA26451@dcvr.yhbt.net> <20120312224419.GA2942@dcvr.yhbt.net> <20120320195748.GA1187@dcvr.yhbt.net> Message-ID: <4F693C9D.7020702@whitepages.com> On 3/20/12 4:09 PM, Yeung, Jeffrey wrote: > I have been unable to narrow down the cause of the conflict so far. The list of Ruby gems (and gem versions) on the affected deployment are identical to the ones on another deployment where Unicorn is upgrading just fine (with preloaded app). Grep'ing for USR2 in the gem installations did not reveal anything, unfortunately. Since then, I haven't been able to spend further time investigating. Not sure where else to look, really, but I'm open to further suggestions. Jeffery, To uncover the culprit, you might try monkey-patching Kernel.trap and Signal.trap so it logs the last few entries in caller when it's called with USR2. Put something like this really early in you app bootstrap: [Kernel,Signal].each |klass| class << klass alias :orig_trap :trap def trap *args, &block if args.first.to_s =~ /USR2$/i || args.first.to_i == 31 $stderr.puts "Caught someone trapping USR2 caller is:", caller.last(2) end orig_trap *args, &block end end end From normalperson at yhbt.net Thu Mar 22 06:30:12 2012 From: normalperson at yhbt.net (Eric Wong) Date: Thu, 22 Mar 2012 06:30:12 +0000 Subject: unicorn 4.2.1 release soon? Message-ID: <20120322063011.GA9718@dcvr.yhbt.net> There's not much going on in unicorn.git nowadays, everything's stabilized nicely over the past few years. This release would mainly be for for Graham's EPERM fix/workaround for stale pid files and some documentation fixes improvements. Shortlog below: Eric Wong (4): examples/nginx.conf: remove redundant word examples/nginx.conf: use $scheme instead of hard-coded "https" KNOWN_ISSUES: document signal conflicts in libs/apps log EPERM errors from invalid pid files Graham Bleach (1): Start the server if another user has a PID matching our stale pidfile. $ git clone git://bogomips.org/unicorn.git I'm also going on vacation soon, so I'll have almost no Internet access for a few weeks. Help each other out when I'm away, thanks :) From bb at xnull.de Thu Mar 22 07:14:09 2012 From: bb at xnull.de (=?UTF-8?q?Benedikt=20B=C3=B6hm?=) Date: Thu, 22 Mar 2012 08:14:09 +0100 Subject: [PATCH] make stderr_path/stdout_path support IO objects directly Message-ID: <1332400449-25478-1-git-send-email-bb@xnull.de> --- lib/unicorn/configurator.rb | 2 +- lib/unicorn/http_server.rb | 6 +++++- 2 files changed, 6 insertions(+), 2 deletions(-) diff --git a/lib/unicorn/configurator.rb b/lib/unicorn/configurator.rb index 89cbf5c..c9b6816 100644 --- a/lib/unicorn/configurator.rb +++ b/lib/unicorn/configurator.rb @@ -559,7 +559,7 @@ private def set_path(var, path) #:nodoc: case path - when NilClass, String + when NilClass, String, IO set[var] = path else raise ArgumentError diff --git a/lib/unicorn/http_server.rb b/lib/unicorn/http_server.rb index 0c2af5d..bd71fbf 100644 --- a/lib/unicorn/http_server.rb +++ b/lib/unicorn/http_server.rb @@ -698,7 +698,11 @@ class Unicorn::HttpServer end def redirect_io(io, path) - File.open(path, 'ab') { |fp| io.reopen(fp) } if path + if path.is_a?(IO) + io.reopen(path) + elsif path + io.reopen(path, 'ab') + end io.sync = true end -- 1.7.4.5 From normalperson at yhbt.net Thu Mar 22 08:00:37 2012 From: normalperson at yhbt.net (Eric Wong) Date: Thu, 22 Mar 2012 08:00:37 +0000 Subject: [PATCH] make stderr_path/stdout_path support IO objects directly In-Reply-To: <1332400449-25478-1-git-send-email-bb@xnull.de> References: <1332400449-25478-1-git-send-email-bb@xnull.de> Message-ID: <20120322080037.GA11064@dcvr.yhbt.net> Benedikt B?hm wrote: Hello, can you explain in the commit message why this functionality is useful to have? SIGUSR1 log file reopening only works because it relies on knowing File#path on File objects > --- > lib/unicorn/configurator.rb | 2 +- > lib/unicorn/http_server.rb | 6 +++++- > 2 files changed, 6 insertions(+), 2 deletions(-) Test cases for new functionality would be nice, too. From normalperson at yhbt.net Mon Mar 26 21:45:35 2012 From: normalperson at yhbt.net (Eric Wong) Date: Mon, 26 Mar 2012 21:45:35 +0000 Subject: [ANN] unicorn 4.2.1 - minor fix and doc updates Message-ID: <20120326214535.GA29291@dcvr.yhbt.net> Changes: * Stale pid files are detected if a pid is recycled by processes belonging to another user, thanks to Graham Bleach. * nginx example config updates thanks to to Eike Herzbach. * KNOWN_ISSUES now documents issues with apps/libs that install conflicting signal handlers. * http://unicorn.bogomips.org/ * mongrel-unicorn at rubyforge.org * git://bogomips.org/unicorn.git * http://unicorn.bogomips.org/NEWS.atom.xml From Jeffrey.Yeung at polycom.com Fri Mar 30 22:16:38 2012 From: Jeffrey.Yeung at polycom.com (Yeung, Jeffrey) Date: Fri, 30 Mar 2012 15:16:38 -0700 Subject: Unicorn_rails ignores USR2 signal In-Reply-To: <20120320195748.GA1187@dcvr.yhbt.net> References: <20120309222412.GA21753@dcvr.yhbt.net> <20120310000239.GA27195@dcvr.yhbt.net> <20120310013051.GA32091@dcvr.yhbt.net> <20120312212119.GA26451@dcvr.yhbt.net> <20120312224419.GA2942@dcvr.yhbt.net> <20120320195748.GA1187@dcvr.yhbt.net> Message-ID: I have little to report, unfortunately. I still have not identified what is trapping USR2 in my application, if something is indeed doing that. I've attempted to do a bit more investigation work in my spare time this week, and the results are still rather mysterious. First, I applied your patch to http_server.rb that you suggested in this thread back on March 9th (quoted below), which re-overrides the signal handlers in build_app!(). The patch did work. Okay, that seems to reconfirm that the app is doing something with the USR2 signal handler... Then I modified the master_siginit() method a bit further to provide more info: def master_siginit QUEUE_SIGS.each { |sig| old_handler = trap(sig) { SIG_QUEUE << sig; awaken_master } logger.info("old handler for #{sig.to_s}: #{old_handler.inspect}") } trap(:CHLD) { awaken_master } end And the output was a bit surprising. Here is the Unicorn stderr log when starting up the service. master_siginit() is called twice, as expected... I, [2012-03-30T14:57:51.301839 #15828] INFO -- : listening on addr=0.0.0.0:8080 fd=3 I, [2012-03-30T14:57:51.302341 #15828] INFO -- : old handler for WINCH: nil I, [2012-03-30T14:57:51.302435 #15828] INFO -- : old handler for QUIT: "DEFAULT" I, [2012-03-30T14:57:51.302505 #15828] INFO -- : old handler for INT: "DEFAULT" I, [2012-03-30T14:57:51.302573 #15828] INFO -- : old handler for TERM: "DEFAULT" I, [2012-03-30T14:57:51.302641 #15828] INFO -- : old handler for USR1: "DEFAULT" I, [2012-03-30T14:57:51.302709 #15828] INFO -- : old handler for USR2: "DEFAULT" I, [2012-03-30T14:57:51.302786 #15828] INFO -- : old handler for HUP: "DEFAULT" I, [2012-03-30T14:57:51.302853 #15828] INFO -- : old handler for TTIN: nil I, [2012-03-30T14:57:51.302917 #15828] INFO -- : old handler for TTOU: nil I, [2012-03-30T14:57:51.303502 #15828] INFO -- : Refreshing Gem list old handler for WINCH: # old handler for QUIT: # old handler for INT: # old handler for TERM: # old handler for USR1: # old handler for USR2: # old handler for HUP: # old handler for TTIN: # old handler for TTOU: # master process ready worker=0 ready Which to me, looks like the USR2 handler was unchanged in the second master_siginit() call. Yet, if I remove that second master_siginit() call from there, USR2 ceases to work. :S -Jeff -----Original Message----- diff --git a/lib/unicorn/http_server.rb b/lib/unicorn/http_server.rb index 7d2c623..1b9d693 100644 --- a/lib/unicorn/http_server.rb +++ b/lib/unicorn/http_server.rb @@ -128,8 +128,7 @@ class Unicorn::HttpServer # setup signal handlers before writing pid file in case people get # trigger happy and send signals as soon as the pid file exists. # Note that signals don't actually get handled until the #join method - QUEUE_SIGS.each { |sig| trap(sig) { SIG_QUEUE << sig; awaken_master } } - trap(:CHLD) { awaken_master } + master_siginit self.pid = config[:pid] self.master_pid = $$ @@ -689,6 +688,9 @@ class Unicorn::HttpServer Gem.refresh end self.app = app.call + + # override signal handlers the app may have set + master_siginit if preload_app end end @@ -736,4 +738,9 @@ class Unicorn::HttpServer config_listeners.each { |addr| listen(addr) } raise ArgumentError, "no listeners" if LISTENERS.empty? end + + def master_siginit + QUEUE_SIGS.each { |sig| trap(sig) { SIG_QUEUE << sig; awaken_master } } + trap(:CHLD) { awaken_master } + end end From ajsharp at gmail.com Fri Mar 30 22:51:01 2012 From: ajsharp at gmail.com (Alex Sharp) Date: Fri, 30 Mar 2012 15:51:01 -0700 Subject: Unicorn_rails ignores USR2 signal In-Reply-To: References: <20120309222412.GA21753@dcvr.yhbt.net> <20120310000239.GA27195@dcvr.yhbt.net> <20120310013051.GA32091@dcvr.yhbt.net> <20120312212119.GA26451@dcvr.yhbt.net> <20120312224419.GA2942@dcvr.yhbt.net> <20120320195748.GA1187@dcvr.yhbt.net> Message-ID: <1060864B953E4C13B9B2C6E89B6C9BA7@gmail.com> Hey Jeffrey, we had a similar problem about 8 months ago. What version of ruby (including patch number) are you using? I think you said you're not using anything special in your deployment environment, right? You're not using god, by chance to do the restarts, are you? Also, how are you doing your deployments, if not with cap? -- Alex Sharp Zaarly, Inc | @ajsharp | github.com/ajsharp | alexjsharp.com