From james at imaj.es Tue Aug 2 16:09:12 2011 From: james at imaj.es (James Cox) Date: Tue, 2 Aug 2011 16:09:12 -0400 Subject: Strange quit behavior Message-ID: Hey, So here are some tasks for managing unicorn: https://gist.github.com/1121076 I've found that it's very unreliable for quitting / terminating unicorn and restarting with new code. When doing rapid deployments particularly, i've found that i have to go in and kill -9 the master process, and start again. any thoughts on why it seems ineffective? Thanks. PS: here's the unicorn config: https://gist.github.com/0dd07c5ad00c56d161c7 James From ajsharp at gmail.com Tue Aug 2 16:34:52 2011 From: ajsharp at gmail.com (Alex Sharp) Date: Tue, 2 Aug 2011 13:34:52 -0700 Subject: Strange quit behavior In-Reply-To: References: Message-ID: > I've found that it's very unreliable for quitting / terminating > unicorn and restarting with new code. When doing rapid deployments > particularly, i've found that i have to go in and kill -9 the master > process, and start again. We've noticed some of this behavior as well. We've seen the new master start spinning and consume 100% cpu (according to top). The old master and workers are still running and working, but the new master just hangs, and we have to kill -9. Our solution was to add the following to our unicorn config, which *seems* to have solved the problem: Unicorn::HttpServer::START_CTX[0] = "#{path_to_app}/shared/bundle/ruby/1.9.1/bin/unicorn_rails" This is outlined a bit here: http://unicorn.bogomips.org/Sandbox.html. I'm not really sure why this seems to have fixed the problem -- all I can tell you is that we haven't seen it since adding this line to the config. - Alex From cliftonk at gmail.com Tue Aug 2 16:45:52 2011 From: cliftonk at gmail.com (cliftonk at gmail.com) Date: Tue, 2 Aug 2011 15:45:52 -0500 Subject: Strange quit behavior In-Reply-To: References: Message-ID: <176653B1-D116-4261-88FD-890EE0AA5307@gmail.com> Your config has: sig = (worker.nr + 1) >= server.worker_processes ? :TERM : :TTOU Process.kill(sig, File.read(old_pid).to_i) I don't have such a conditional in mine, and I use a QUIT signal (although I don't see why TERM isn't working for you). Process.kill("QUIT", File.read(old_pid).to_i) You might also want to try logging a puts File.read(old_pid) to the unicorn.stdout.log to ensure the old_pid file is being correctly written to (and in the right path). From normalperson at yhbt.net Tue Aug 2 17:53:35 2011 From: normalperson at yhbt.net (Eric Wong) Date: Tue, 2 Aug 2011 14:53:35 -0700 Subject: Strange quit behavior In-Reply-To: References: Message-ID: <20110802215335.GA11745@dcvr.yhbt.net> James Cox wrote: > Hey, > > So here are some tasks for managing unicorn: > > https://gist.github.com/1121076 Can we ignore the :restart task? It's a bit fragile since it doesn't wait for the old process to die (SIGTERM means kill-as-quickly-as-possible, but given a loaded system it can still take some time). > I've found that it's very unreliable for quitting / terminating > unicorn and restarting with new code. When doing rapid deployments > particularly, i've found that i have to go in and kill -9 the master > process, and start again. If you SIGQUIT/SIGTERM before the app is loaded, the signal could be ignored. This behavior should probably change... > any thoughts on why it seems ineffective? > > Thanks. Which version of Unicorn are you using? I recall some minor tweaks to avoid/minimize race conditions over the years so maybe some are fixed. > PS: here's the unicorn config: > > https://gist.github.com/0dd07c5ad00c56d161c7 Avoid the top piece of the before_fork hook to TTOU workers, it's needlessly complex for most deployments unless you're really memory-constrained. -- Eric Wong From normalperson at yhbt.net Tue Aug 2 17:54:12 2011 From: normalperson at yhbt.net (Eric Wong) Date: Tue, 2 Aug 2011 14:54:12 -0700 Subject: Strange quit behavior In-Reply-To: References: Message-ID: <20110802215412.GA12725@dcvr.yhbt.net> Alex Sharp wrote: > > I've found that it's very unreliable for quitting / terminating > > unicorn and restarting with new code. When doing rapid deployments > > particularly, i've found that i have to go in and kill -9 the master > > process, and start again. > > We've noticed some of this behavior as well. We've seen the new master > start spinning and consume 100% cpu (according to top). The old master > and workers are still running and working, but the new master just > hangs, and we have to kill -9. Can you try to strace (or equivalent) the old master to see what's using 100% CPU? -- Eric Wong From james at imaj.es Tue Aug 2 18:46:44 2011 From: james at imaj.es (James Cox) Date: Tue, 2 Aug 2011 18:46:44 -0400 Subject: Strange quit behavior In-Reply-To: <20110802215335.GA11745@dcvr.yhbt.net> References: <20110802215335.GA11745@dcvr.yhbt.net> Message-ID: On Tue, Aug 2, 2011 at 17:53, Eric Wong wrote: > James Cox wrote: >> Hey, >> >> So here are some tasks for managing unicorn: >> >> https://gist.github.com/1121076 > > Can we ignore the :restart task? ? It's a bit fragile since it doesn't > wait for the old process to die (SIGTERM means > kill-as-quickly-as-possible, but given a loaded system it can still take > some time). We mostly restart (what surprise). What pattern works best here for that? (speed of deploy is important, but definitely assume a long boot time) > >> I've found that it's very unreliable for quitting / terminating >> unicorn and restarting with new code. When doing rapid deployments >> particularly, i've found that i have to go in and kill -9 the master >> process, and start again. > > If you SIGQUIT/SIGTERM before the app is loaded, the signal could > be ignored. ?This behavior should probably change... > >> any thoughts on why it seems ineffective? >> >> Thanks. > > Which version of Unicorn are you using? ?I recall some minor tweaks > to avoid/minimize race conditions over the years so maybe some > are fixed. > gem 'unicorn' - so whatever seems up to date. My lock says 4.0.1 >> PS: here's the unicorn config: >> >> https://gist.github.com/0dd07c5ad00c56d161c7 > > Avoid the top piece of the before_fork hook to TTOU workers, it's > needlessly complex for most deployments unless you're really > memory-constrained. > So what should that look like? all but that nr-workers stuff? can i just remove the before fork? and what would you say is a super good unicorn config to start from? thanks! james From normalperson at yhbt.net Tue Aug 2 19:08:51 2011 From: normalperson at yhbt.net (Eric Wong) Date: Tue, 2 Aug 2011 16:08:51 -0700 Subject: Strange quit behavior In-Reply-To: References: <20110802215335.GA11745@dcvr.yhbt.net> Message-ID: <20110802230851.GA19989@dcvr.yhbt.net> James Cox wrote: > On Tue, Aug 2, 2011 at 17:53, Eric Wong wrote: > > James Cox wrote: > >> Hey, > >> > >> So here are some tasks for managing unicorn: > >> > >> https://gist.github.com/1121076 > > > > Can we ignore the :restart task? ? It's a bit fragile since it doesn't > > wait for the old process to die (SIGTERM means > > kill-as-quickly-as-possible, but given a loaded system it can still take > > some time). > > We mostly restart (what surprise). What pattern works best here for > that? (speed of deploy is important, but definitely assume a long boot > time) Yeah, if you do a full shutdown, I would definitely wait a little for the old process to shut down (wait for pid to disappear), first. However, full shutdown means you drop connections, so I would use USR2 followed by QUIT (on same pid, USR2 will replace the current pid file). See http://unicorn.bogomips.org/SIGNALS.html > So what should that look like? all but that nr-workers stuff? can i > just remove the before fork? and what would you say is a super good > unicorn config to start from? Yeah, skip the before_fork and also after_fork. Those are mainly for disconnect/reconnect of ActiveRecord and anything else that might need a network connection. I try to have as little-as-possible in my unicorn config. -- Eric Wong From ajsharp at gmail.com Tue Aug 2 19:49:11 2011 From: ajsharp at gmail.com (Alex Sharp) Date: Tue, 2 Aug 2011 16:49:11 -0700 Subject: Strange quit behavior In-Reply-To: <20110802230851.GA19989@dcvr.yhbt.net> References: <20110802215335.GA11745@dcvr.yhbt.net> <20110802230851.GA19989@dcvr.yhbt.net> Message-ID: On Tue, Aug 2, 2011 at 4:08 PM, Eric Wong wrote: > James Cox wrote: >> So what should that look like? all but that nr-workers stuff? can i >> just remove the before fork? and what would you say is a super good >> unicorn config to start from? > > Yeah, skip the before_fork and also after_fork. ?Those are mainly for > disconnect/reconnect of ActiveRecord and anything else that might need a > network connection. FWIW, we use the before_fork hook to automatically kill the old master by sending it QUIT: before_fork do |server, worker| old_pid = '/var/www/api/shared/pids/unicorn.pid.oldbin' if File.exists?(old_pid) && server.pid != old_pid begin Process.kill("QUIT", File.read(old_pid).to_i) rescue Errno::ENOENT, Errno::ESRCH # another newly forked workers has already killed it end end end - alex From normalperson at yhbt.net Tue Aug 2 20:34:49 2011 From: normalperson at yhbt.net (Eric Wong) Date: Wed, 3 Aug 2011 00:34:49 +0000 Subject: Strange quit behavior In-Reply-To: <20110802215335.GA11745@dcvr.yhbt.net> References: <20110802215335.GA11745@dcvr.yhbt.net> Message-ID: <20110803003449.GA8373@dcvr.yhbt.net> Eric Wong wrote: > If you SIGQUIT/SIGTERM before the app is loaded, the signal could > be ignored. This behavior should probably change... I pushed the following to git://bogomips.org/unicorn >From 406b8b0e2ed6e5be34d8ec3cd4b16048233c2856 Mon Sep 17 00:00:00 2001 From: Eric Wong Date: Tue, 2 Aug 2011 23:52:14 +0000 Subject: [PATCH] trap death signals in the worker sooner This helps close a race condition preventing shutdown if loading the application (preload_app=false) takes a long time and the user decides to kil workers instead. --- lib/unicorn/http_server.rb | 8 ++++++-- 1 files changed, 6 insertions(+), 2 deletions(-) diff --git a/lib/unicorn/http_server.rb b/lib/unicorn/http_server.rb index ad5d6f0..565f132 100644 --- a/lib/unicorn/http_server.rb +++ b/lib/unicorn/http_server.rb @@ -535,12 +535,17 @@ class Unicorn::HttpServer handle_error(client, e) end + EXIT_SIGS = [ :QUIT, :TERM, :INT ] + WORKER_QUEUE_SIGS = QUEUE_SIGS - EXIT_SIGS + # gets rid of stuff the worker has no business keeping track of # to free some resources and drops all sig handlers. # traps for USR1, USR2, and HUP may be set in the after_fork Proc # by the user. def init_worker_process(worker) - QUEUE_SIGS.each { |sig| trap(sig, nil) } + # we'll re-trap :QUIT later for graceful shutdown iff we accept clients + EXIT_SIGS.each { |sig| trap(sig) { exit!(0) } } + WORKER_QUEUE_SIGS.each { |sig| trap(sig, nil) } trap(:CHLD, 'DEFAULT') SIG_QUEUE.clear proc_name "worker[#{worker.nr}]" @@ -578,7 +583,6 @@ class Unicorn::HttpServer # closing anything we IO.select on will raise EBADF trap(:USR1) { nr = -65536; SELF_PIPE[0].close rescue nil } trap(:QUIT) { worker = nil; LISTENERS.each { |s| s.close rescue nil }.clear } - [:TERM, :INT].each { |sig| trap(sig) { exit!(0) } } # instant shutdown logger.info "worker=#{worker.nr} ready" begin -- Eric Wong From normalperson at yhbt.net Tue Aug 2 21:36:33 2011 From: normalperson at yhbt.net (Eric Wong) Date: Wed, 3 Aug 2011 01:36:33 +0000 Subject: Strange quit behavior In-Reply-To: <20110803003449.GA8373@dcvr.yhbt.net> References: <20110802215335.GA11745@dcvr.yhbt.net> <20110803003449.GA8373@dcvr.yhbt.net> Message-ID: <20110803013633.GA9079@dcvr.yhbt.net> Eric Wong wrote: > Eric Wong wrote: > > If you SIGQUIT/SIGTERM before the app is loaded, the signal could > > be ignored. This behavior should probably change... > > I pushed the following to git://bogomips.org/unicorn > > From 406b8b0e2ed6e5be34d8ec3cd4b16048233c2856 Mon Sep 17 00:00:00 2001 Also pushed to rubygems.org for easier testing: gem install --pre unicorn -v 4.0.1.4.g406b This will probably be unicorn 4.0.2 shortly. From ononoma at gmail.com Wed Aug 3 12:41:03 2011 From: ononoma at gmail.com (Tatsuya Ono) Date: Wed, 3 Aug 2011 17:41:03 +0100 Subject: Unicorn in the jail Message-ID: I ran into a problem when sending USR2 signal to master process which is listening a TCP port. It shows the following error in log. err: adding listener failed addr=0.0.0.0:3010 (in use) I found similar error reported on this mailing list and yes, the host I got the error was in the FreeBSD jail. http://rubyforge.org/pipermail/mongrel-unicorn/2009-December/000205.html http://rubyforge.org/pipermail/mongrel-unicorn/2009-December/000212.html I dug into the problem and found some interesting thing. I want to share it and the solution here. * Why does it happen in the Jail? When you bind 0.0.0.0 address to the socket on a Jail host, Jail doesn't bind 0.0.0.0. Instead of it, it binds to the IP address of the host such as 10.100.1.50, 192.168.1.50, etc. Then the problem happens in Unicorn::HttpServer#inherit_listeners! when it gets signal like USR2. https://github.com/defunkt/unicorn/blob/406b8b0e2ed6e5be34d8ec3cd4b16048233c2856/lib/unicorn/http_server.rb#L703-731 Here you get 10.100.1.50 (for example) as inherited address from ENV['UNICORN_FD'] but 0.0.0.0. So Unicorn tries to bind to 0.0.0.0 (again) at line 728. Then it fails because it is actually in use. * How to solve the issue? Thankfully unicorn.conf is ruby script so that I could write a logic for Jail hosts without specifying IP address of those hosts (which we want to avoid since we deploy it multiple servers). I share the configuration file. https://gist.github.com/1122965 Then Unicorn flies in the jail happily. Tatsuya Ono From ajsharp at gmail.com Wed Aug 3 13:54:35 2011 From: ajsharp at gmail.com (Alex Sharp) Date: Wed, 3 Aug 2011 10:54:35 -0700 Subject: Question about HUP + USR2 + QUIT Message-ID: Currently, on deploys we're sending HUP to the "old" master to reload the config, and then we send it a USR2 to reload the app code. Is this redundant -- i.e. does USR2 reload the unicorn config, or just the application code? I am under the impression that USR2 forks a new master from the old, and only reloads app code, and not the unicorn config, but I just wanted to double check. Thanks. - alex From normalperson at yhbt.net Wed Aug 3 14:03:21 2011 From: normalperson at yhbt.net (Eric Wong) Date: Wed, 3 Aug 2011 18:03:21 +0000 Subject: Unicorn in the jail In-Reply-To: References: Message-ID: <20110803180321.GA24722@dcvr.yhbt.net> Thanks! I've pushed out the following patch and updated the site: >From ec8a8f32d257290aac377f1c7b1c496e1df75f73 Mon Sep 17 00:00:00 2001 From: Eric Wong Date: Wed, 3 Aug 2011 11:00:28 -0700 Subject: [PATCH] KNOWN_ISSUES: add link to FreeBSD jail workaround notes Thanks to Tatsuya Ono on the unicorn mailing list. --- KNOWN_ISSUES | 3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/KNOWN_ISSUES b/KNOWN_ISSUES index 2bd4151..f323c68 100644 --- a/KNOWN_ISSUES +++ b/KNOWN_ISSUES @@ -3,6 +3,9 @@ Occasionally odd {issues}[link:ISSUES.html] arise without a transparent or acceptable solution. Those issues are documented here. +* Issues with FreeBSD jails can be worked around as documented by Tatsuya Ono: + http://mid.gmane.org/CAHBuKRj09FdxAgzsefJWotexw-7JYZGJMtgUp_dhjPz9VbKD6Q at mail.gmail.com + * PRNGs (pseudo-random number generators) loaded before forking (e.g. "preload_app true") may need to have their internal state reset in the after_fork hook. Starting with \Unicorn 3.6.1, we -- Eric Wong From normalperson at yhbt.net Wed Aug 3 15:22:42 2011 From: normalperson at yhbt.net (Eric Wong) Date: Wed, 3 Aug 2011 12:22:42 -0700 Subject: Question about HUP + USR2 + QUIT In-Reply-To: References: Message-ID: <20110803192242.GA14121@dcvr.yhbt.net> Alex Sharp wrote: > Currently, on deploys we're sending HUP to the "old" master to reload > the config, and then we send it a USR2 to reload the app code. Is this > redundant -- i.e. does USR2 reload the unicorn config, or just the > application code? I am under the impression that USR2 forks a new > master from the old, and only reloads app code, and not the unicorn > config, but I just wanted to double check. Thanks. It's redundant to send HUP before USR2 most of the time. USR2 does a fork+exec, so it replaces the forked process with a new one and rereads the config file. The only reason I can think of to send HUP before USR2 is to to update the path to Unicorn before fork+exec, normally USR2 defaults to using the same path it started with: This can be useful for changing between new versions/installations of Ruby: Unicorn::HttpServer::START_CTX[0] = "/home/ew/ruby-1.9.3/bin/unicorn" Unicorn::HttpServer::START_CTX[0] = "/home/ew/ruby-1.8.7/bin/unicorn" Unicorn::HttpServer::START_CTX[0] = "/home/ew/ruby-trunk/bin/unicorn" .. -- Eric Wong From ononoma at gmail.com Wed Aug 3 17:57:45 2011 From: ononoma at gmail.com (Tatsuya Ono) Date: Wed, 3 Aug 2011 22:57:45 +0100 Subject: Unicorn in the jail In-Reply-To: <20110803180321.GA24722@dcvr.yhbt.net> References: <20110803180321.GA24722@dcvr.yhbt.net> Message-ID: Thank you, Eric! Tatsuya On 3 August 2011 19:03, Eric Wong wrote: > > Thanks! ?I've pushed out the following patch and updated the site: > > >From ec8a8f32d257290aac377f1c7b1c496e1df75f73 Mon Sep 17 00:00:00 2001 > From: Eric Wong > Date: Wed, 3 Aug 2011 11:00:28 -0700 > Subject: [PATCH] KNOWN_ISSUES: add link to FreeBSD jail workaround notes > > Thanks to Tatsuya Ono on the unicorn mailing list. > --- > ?KNOWN_ISSUES | ? ?3 +++ > ?1 files changed, 3 insertions(+), 0 deletions(-) > > diff --git a/KNOWN_ISSUES b/KNOWN_ISSUES > index 2bd4151..f323c68 100644 > --- a/KNOWN_ISSUES > +++ b/KNOWN_ISSUES > @@ -3,6 +3,9 @@ > ?Occasionally odd {issues}[link:ISSUES.html] arise without a transparent or > ?acceptable solution. ?Those issues are documented here. > > +* Issues with FreeBSD jails can be worked around as documented by Tatsuya Ono: > + ?http://mid.gmane.org/CAHBuKRj09FdxAgzsefJWotexw-7JYZGJMtgUp_dhjPz9VbKD6Q at mail.gmail.com > + > ?* PRNGs (pseudo-random number generators) loaded before forking > ? (e.g. "preload_app true") may need to have their internal state > ? reset in the after_fork hook. ?Starting with \Unicorn 3.6.1, we > -- > Eric Wong > _______________________________________________ > Unicorn mailing list - mongrel-unicorn at rubyforge.org > http://rubyforge.org/mailman/listinfo/mongrel-unicorn > Do not quote signatures (like this one) or top post when replying From ajsharp at gmail.com Wed Aug 3 18:22:29 2011 From: ajsharp at gmail.com (Alex Sharp) Date: Wed, 3 Aug 2011 15:22:29 -0700 Subject: Question about HUP + USR2 + QUIT In-Reply-To: <20110803192242.GA14121@dcvr.yhbt.net> References: <20110803192242.GA14121@dcvr.yhbt.net> Message-ID: Fantastic, thanks for clearing this up. - Alex From ajsharp at gmail.com Fri Aug 5 00:09:25 2011 From: ajsharp at gmail.com (Alex Sharp) Date: Thu, 4 Aug 2011 21:09:25 -0700 Subject: Strange quit behavior In-Reply-To: <20110802215412.GA12725@dcvr.yhbt.net> References: <20110802215412.GA12725@dcvr.yhbt.net> Message-ID: On Tue, Aug 2, 2011 at 2:54 PM, Eric Wong wrote: > Can you try to strace (or equivalent) the old master to see what's using > 100% CPU? > All I see is a whole lot of this: sched_yield() = 0 sched_yield() = 0 sched_yield() = 0 sched_yield() = 0 sched_yield() = 0 sched_yield() = 0 sched_yield() = 0 sched_yield() = 0 - alex From ajsharp at gmail.com Fri Aug 5 00:12:52 2011 From: ajsharp at gmail.com (Alex Sharp) Date: Thu, 4 Aug 2011 21:12:52 -0700 Subject: Strange quit behavior In-Reply-To: References: <20110802215412.GA12725@dcvr.yhbt.net> Message-ID: On Thu, Aug 4, 2011 at 9:09 PM, Alex Sharp wrote: > On Tue, Aug 2, 2011 at 2:54 PM, Eric Wong wrote: >> Can you try to strace (or equivalent) the old master to see what's using >> 100% CPU? >> > > All I see is a whole lot of this: > > sched_yield() ? ? ? ? ? ? ? ? ? ? ? ? ? = 0 > sched_yield() ? ? ? ? ? ? ? ? ? ? ? ? ? = 0 > sched_yield() ? ? ? ? ? ? ? ? ? ? ? ? ? = 0 > sched_yield() ? ? ? ? ? ? ? ? ? ? ? ? ? = 0 > sched_yield() ? ? ? ? ? ? ? ? ? ? ? ? ? = 0 > sched_yield() ? ? ? ? ? ? ? ? ? ? ? ? ? = 0 > sched_yield() ? ? ? ? ? ? ? ? ? ? ? ? ? = 0 > sched_yield() ? ? ? ? ? ? ? ? ? ? ? ? ? = 0 Actually, my fault -- the last email was the output of new master. Running strace on the old master shows the following: select(4, [3], NULL, NULL, {13, 466686}) = 0 (Timeout) wait4(-1, 0x7fff57e7bfcc, WNOHANG, NULL) = 0 clock_gettime(CLOCK_REALTIME, {1312517345, 425411398}) = 0 fstat(9, {st_mode=S_IFREG, st_size=0, ...}) = 0 clock_gettime(CLOCK_REALTIME, {1312517345, 425625251}) = 0 fstat(11, {st_mode=S_IFREG, st_size=0, ...}) = 0 clock_gettime(CLOCK_REALTIME, {1312517345, 425779281}) = 0 fstat(12, {st_mode=S_IFREG, st_size=0, ...}) = 0 clock_gettime(CLOCK_REALTIME, {1312517345, 425927762}) = 0 The first line was when the master was idle, and then I threw a few requests at it. - alex From normalperson at yhbt.net Fri Aug 5 04:07:29 2011 From: normalperson at yhbt.net (Eric Wong) Date: Fri, 5 Aug 2011 08:07:29 +0000 Subject: Strange quit behavior In-Reply-To: References: <20110802215412.GA12725@dcvr.yhbt.net> Message-ID: <20110805080729.GA6602@dcvr.yhbt.net> Alex Sharp wrote: > On Thu, Aug 4, 2011 at 9:09 PM, Alex Sharp wrote: > > On Tue, Aug 2, 2011 at 2:54 PM, Eric Wong wrote: > >> Can you try to strace (or equivalent) the old master to see what's using > >> 100% CPU? > >> > > Actually, my fault -- the last email was the output of new master. > Running strace on the old master shows the following: Is this Unicorn 3.x? Which 3.x version exactly? Maybe give 4.0.1 or the 4.0.2 git prerelease a try, too. > select(4, [3], NULL, NULL, {13, 466686}) = 0 (Timeout) > wait4(-1, 0x7fff57e7bfcc, WNOHANG, NULL) = 0 > clock_gettime(CLOCK_REALTIME, {1312517345, 425411398}) = 0 > fstat(9, {st_mode=S_IFREG, st_size=0, ...}) = 0 > clock_gettime(CLOCK_REALTIME, {1312517345, 425625251}) = 0 > fstat(11, {st_mode=S_IFREG, st_size=0, ...}) = 0 > clock_gettime(CLOCK_REALTIME, {1312517345, 425779281}) = 0 > fstat(12, {st_mode=S_IFREG, st_size=0, ...}) = 0 > clock_gettime(CLOCK_REALTIME, {1312517345, 425927762}) = 0 Can I get more? (until the next select() call, at least). I'd like to see the timeout argument that gets passed to the next select(). If you see select() with very small timeout, use "strace -v" to get the st_ctime from fstat()s... I could have a math bug in murder_lazy_workers (please read/review the logic in that method, I haven't noticed issues myself). I made some tweaks to the master sleep strategy within the past year to reduce wakeups during idle hours. This is intended to go with Ruby 1.9.3 which will have /much/ better thread wakeup strategy that reduces power consumption during idle times. > The first line was when the master was idle, and then I threw a few > requests at it. Are all workers responding as expected and not dying? -- Eric Wong From jstorimer at gmail.com Mon Aug 8 12:19:17 2011 From: jstorimer at gmail.com (Jesse Storimer) Date: Mon, 8 Aug 2011 12:19:17 -0400 Subject: What happens when a client terminates a connection? Message-ID: <20110808161915.GA49811@jessebook-2.local> I've been trying to understand what happens in Unicorn when a client terminates a connection, and nginx logs a 499 response code. In my debugging this can happen if the client is on a flaky connection, or if they double-click a form submit button, the first request is terminated and nginx logs a 499 response code. It seems that in this case the Rails app actually aborts the request, wherever it is in the course of it. The issue I ran into is that my app made a destructive request to an external service in the context of a request, but the client disconnected before the app was able to respond. So the external service returned its response but the request was aborted before the app was able to commit its transaction to the database, confusion ensued. Can you confirm that this is actually what happens in Unicorn when the client disconnects? I'm not seeing anything in the logs to indicate the actual behaviour. In dealing with this I'm thinking about turning on proxy_ignore_client_abort (http://wiki.nginx.org/HttpProxyModule#proxy_ignore_client_abort) so that requests that make it to the Rails app aren't aborted. Does anyone have experience with this? I can see it causing its own sorts of confusion. Jesse From normalperson at yhbt.net Mon Aug 8 15:28:24 2011 From: normalperson at yhbt.net (Eric Wong) Date: Mon, 8 Aug 2011 19:28:24 +0000 Subject: What happens when a client terminates a connection? In-Reply-To: <20110808161915.GA49811@jessebook-2.local> References: <20110808161915.GA49811@jessebook-2.local> Message-ID: <20110808192824.GA5759@dcvr.yhbt.net> Jesse Storimer wrote: > I've been trying to understand what happens in Unicorn when a client > terminates a connection, and nginx logs a 499 response code. We'd have to read the nginx sources to answer what nginx does, but of course I can answer what Unicorn does off the top of my head. > In my debugging this can happen if the client is on a flaky connection, > or if they double-click a form submit button, the first request is > terminated and nginx logs a 499 response code. (the snipped paragraph deserves independent observation/attention) > Can you confirm that this is actually what happens in Unicorn when the > client disconnects? I'm not seeing anything in the logs to indicate the > actual behaviour. It depends on when exactly the client (nginx) disconnect is detected. Unicorn has 4 distinct states : 1) reading headers, if a client disconnects before it has written _all_ of its request headers, the Rack app will never be called. Since no applicaton logic fired at this point. 2) inside Rack dispatch (rack.input reading) This will abort the Rack application dispatch if your client disconnects before _all_ of the request body is sent. Unlike most servers, Unicorn lazily reads any request bodies. You can catch exceptions from env["rack.input"].{read,gets,each} to detect this. The Unicorn::PrereadInput middleware can minimize the time window for this state by reading the request body ASAP. You can also ignore this if your app isn't handling requests with bodies (POST/PUT), but since you mentioned form input... 3) inside Rack dispatch (after rack.input reading) Your app has no way of knowing your client disconnected at this stage. You can hack Unicorn to IO.select in a separate thread, but there'll always be exposed windows leading up to 4) so it's not worth it... 4) writing the response: Unicorn will abort whenever a socket error is detected. Keep in mind that every single part of the Rack response array can be dynamically generated by the app. Your application can still be "running" even though the Rack app has returned its response for Unicorn to start writing. Clients/Rack middleware can be written to detect this in the response body "close" method by checking if body.each completed. > In dealing with this I'm thinking about turning on > proxy_ignore_client_abort > (http://wiki.nginx.org/HttpProxyModule#proxy_ignore_client_abort) so > that requests that make it to the Rails > app aren't aborted. Does anyone have experience with this? I can see it > causing its own sorts of confusion. I've never used it. -- Eric Wong From normalperson at yhbt.net Mon Aug 8 16:13:27 2011 From: normalperson at yhbt.net (Eric Wong) Date: Mon, 8 Aug 2011 20:13:27 +0000 Subject: What happens when a client terminates a connection? In-Reply-To: <20110808161915.GA49811@jessebook-2.local> References: <20110808161915.GA49811@jessebook-2.local> Message-ID: <20110808201327.GB7188@dcvr.yhbt.net> Jesse Storimer wrote: > It seems that in this case the Rails app actually aborts the request, > wherever it is in the course of it. The issue I ran into is that my app > made a destructive request to an external service in > the context of a request, but the client disconnected before the app was > able to respond. So the external service returned its response but the > request was aborted before the app was able > to commit its transaction to the database, confusion ensued. You should make this request to the external service and wait for this response inside state 3) as described in my other reply[1]. If you're affected by state 2), you should minimize the time in state 2) by using PrereadInput middleware to jump to state 3) (and you'll avoid making the external request entirely if the preread input fails due to the client aborting). Your goal is to minimize the time spent processing a non-idempotent request/parts-of-the-request and isolate where a client can fail (and maximize the time where they can gracefully recover). If possible, the request you make to the external service should be idempotent, but it doesn't seem to be... I haven't encountered this problem myself in many years, but here's what I did in the past for similar problems: Call the external service asynchronously and look at the various background job/queue libraries available to handle this. The client should probably auto-refresh on a page (idempotently) until the async request is complete. I would even hold off on starting your external request until the client has hit the auto-refresh page. This way you know the client has started the refresh cycle and you can fake the idempotency of the external request on your app side. [1] - http://mid.gmane.org/20110808192824.GA5759 at dcvr.yhbt.net -- Eric Wong From jstorimer at gmail.com Mon Aug 8 17:17:50 2011 From: jstorimer at gmail.com (Jesse Storimer) Date: Mon, 8 Aug 2011 17:17:50 -0400 Subject: What happens when a client terminates a connection? In-Reply-To: <20110808193252.GA7188@dcvr.yhbt.net> References: <20110808193252.GA7188@dcvr.yhbt.net> Message-ID: <20110808211748.GB49840@jessebook-2.local> On Mon, Aug 08, 2011 at 07:32:52PM +0000, Eric Wong wrote: > (not sure if you're subscribed or not, actually, I don't see you in > mailman...) > > ----- Forwarded message from Eric Wong ----- > > From: Eric Wong > To: unicorn list > Subject: Re: What happens when a client terminates a connection? > > Jesse Storimer wrote: > > I've been trying to understand what happens in Unicorn when a client > > terminates a connection, and nginx logs a 499 response code. > > We'd have to read the nginx sources to answer what nginx does, but > of course I can answer what Unicorn does off the top of my head. > > > In my debugging this can happen if the client is on a flaky connection, > > or if they double-click a form submit button, the first request is > > terminated and nginx logs a 499 response code. > > (the snipped paragraph deserves independent observation/attention) > > > Can you confirm that this is actually what happens in Unicorn when the > > client disconnects? I'm not seeing anything in the logs to indicate the > > actual behaviour. > > It depends on when exactly the client (nginx) disconnect is detected. > Unicorn has 4 distinct states : > > 1) reading headers, if a client disconnects before it has written _all_ > of its request headers, the Rack app will never be called. > > Since no applicaton logic fired at this point. > > 2) inside Rack dispatch (rack.input reading) > This will abort the Rack application dispatch if your client > disconnects before _all_ of the request body is sent. Unlike > most servers, Unicorn lazily reads any request bodies. > > You can catch exceptions from env["rack.input"].{read,gets,each} > to detect this. > > The Unicorn::PrereadInput middleware can minimize the time window for > this state by reading the request body ASAP. > > You can also ignore this if your app isn't handling > requests with bodies (POST/PUT), but since you mentioned form > input... > > 3) inside Rack dispatch (after rack.input reading) > Your app has no way of knowing your client disconnected at > this stage. You can hack Unicorn to IO.select in a separate > thread, but there'll always be exposed windows leading up to > 4) so it's not worth it... > > 4) writing the response: Unicorn will abort whenever a socket > error is detected. Keep in mind that every single part of the > Rack response array can be dynamically generated by the app. > Your application can still be "running" even though the Rack app > has returned its response for Unicorn to start writing. > > Clients/Rack middleware can be written to detect this in the > response body "close" method by checking if body.each completed. > > > In dealing with this I'm thinking about turning on > > proxy_ignore_client_abort > > (http://wiki.nginx.org/HttpProxyModule#proxy_ignore_client_abort) so > > that requests that make it to the Rails > > app aren't aborted. Does anyone have experience with this? I can see it > > causing its own sorts of confusion. > > I've never used it. > > -- > Eric Wong > > ----- End forwarded message ----- > > -- > Eric Wong Thanks for that explanation. Just so I understand, once the Rack application enters 3) then it should be unaffected by a client disconnect, or any socket error? I'll definitely give PrereadInput a try in that case. When you say that Unicorn lazily reads request bodies, do you mean that my Rails application might already be in the middle of processing the request but Unicorn is still reading from the client socket? At that point won't Rails have read all of the request body? Or does that only apply if I stick a middleware at the front of the stack that does all the heavy lifting? From normalperson at yhbt.net Mon Aug 8 17:47:29 2011 From: normalperson at yhbt.net (Eric Wong) Date: Mon, 8 Aug 2011 14:47:29 -0700 Subject: What happens when a client terminates a connection? In-Reply-To: <20110808211748.GB49840@jessebook-2.local> References: <20110808193252.GA7188@dcvr.yhbt.net> <20110808211748.GB49840@jessebook-2.local> Message-ID: <20110808214729.GA4419@dcvr.yhbt.net> Jesse Storimer wrote: > Eric Wong wrote: > > Unicorn has 4 distinct states : > > 3) inside Rack dispatch (after rack.input reading) > > Your app has no way of knowing your client disconnected at > > this stage. You can hack Unicorn to IO.select in a separate > > thread, but there'll always be exposed windows leading up to > > 4) so it's not worth it... > > > > 4) writing the response: Unicorn will abort whenever a socket > > error is detected. Keep in mind that every single part of the > > Rack response array can be dynamically generated by the app. > > Your application can still be "running" even though the Rack app > > has returned its response for Unicorn to start writing. > > > > Clients/Rack middleware can be written to detect this in the > > response body "close" method by checking if body.each completed. > > Thanks for that explanation. Just so I understand, once the Rack application > enters 3) then it should be unaffected by a client disconnect, or any > socket error? I'll definitely give PrereadInput a try in that case. Yes, however I don't think I made it clear that 3) will /always/ transition to state 4). So you'll be able to use body.close to detect a client write failure. > When you say that Unicorn lazily reads request bodies, do you mean that > my Rails application might already be in the middle of processing the > request but Unicorn is still reading from the client socket? Yes. We use http://unicorn.bogomips.org/Unicorn::TeeInput.html by default so it gives the Rack app a chance to reject a client if it sees something it doesnt like > At that point won't Rails have read all of the request body? Or does > that only apply if I stick a middleware at the front of the stack that > does all the heavy lifting? I'm not certain about middlewares. Rails may attempt to read all input ASAP anyways depending on the request Content-Type/Encoding. Check the Rails/Rack sources for your versions of Rails/Rack to be sure. -- Eric Wong From joe at tanga.com Fri Aug 12 00:39:25 2011 From: joe at tanga.com (Joe Van Dyk) Date: Thu, 11 Aug 2011 21:39:25 -0700 Subject: Rack content-length Rack::Lint::LintErrors errors with unicorn Message-ID: Has anyone seen anything like this before? I can get it to happen all the time if I issue a HEAD request, but it only happens very intermittently on GET requests. I'm using Ruby 1.9.2p180. Any ideas on where to start debugging? 204.93.223.151, 10.195.114.81 - - [11/Aug/2011 21:03:50] "GET / HTTP/1.0" 200 37902 0.5316 app error: Content-Length header was 37902, but should be 0 (Rack::Lint::LintError) /mnt/data/tanga/current/bundler/ruby/1.9.1/gems/rack-1.2.3/lib/rack/lint.rb:19:in `assert' /mnt/data/tanga/current/bundler/ruby/1.9.1/gems/rack-1.2.3/lib/rack/lint.rb:501:in `verify_content_length' /mnt/data/tanga/current/bundler/ruby/1.9.1/gems/rack-1.2.3/lib/rack/lint.rb:525:in `each' /mnt/data/tanga/current/bundler/ruby/1.9.1/gems/unicorn-4.0.1/lib/unicorn/http_response.rb:41:in `http_response_write' /mnt/data/tanga/current/bundler/ruby/1.9.1/gems/unicorn-4.0.1/lib/unicorn/http_server.rb:526:in `process_client' /mnt/data/tanga/current/bundler/ruby/1.9.1/gems/unicorn-4.0.1/lib/unicorn/http_server.rb:585:in `worker_loop' /mnt/data/tanga/current/bundler/ruby/1.9.1/gems/newrelic_rpm-3.0.1/lib/new_relic/agent/instrumentation/unicorn_instrumentation.rb:12:in `call' /mnt/data/tanga/current/bundler/ruby/1.9.1/gems/newrelic_rpm-3.0.1/lib/new_relic/agent/instrumentation/unicorn_instrumentation.rb:12:in `block (4 levels) in ' /mnt/data/tanga/current/bundler/ruby/1.9.1/gems/unicorn-4.0.1/lib/unicorn/http_server.rb:475:in `spawn_missing_workers' /mnt/data/tanga/current/bundler/ruby/1.9.1/gems/unicorn-4.0.1/lib/unicorn/http_server.rb:135:in `start' /mnt/data/tanga/current/bundler/ruby/1.9.1/gems/unicorn-4.0.1/bin/unicorn:121:in `' /data/tanga/current/sites/tanga/bin/unicorn:16:in `load' /data/tanga/current/sites/tanga/bin/unicorn:16:in `
' From normalperson at yhbt.net Fri Aug 12 01:42:52 2011 From: normalperson at yhbt.net (Eric Wong) Date: Fri, 12 Aug 2011 05:42:52 +0000 Subject: Rack content-length Rack::Lint::LintErrors errors with unicorn In-Reply-To: References: Message-ID: <20110812054252.GA30507@dcvr.yhbt.net> Joe Van Dyk wrote: > Has anyone seen anything like this before? I can get it to happen all > the time if I issue a HEAD request, but it only happens very > intermittently on GET requests. > > I'm using Ruby 1.9.2p180. > > Any ideas on where to start debugging? What web framework and other middlewares are you running? Are you using Rack::Head to generate HEAD responses or something else? > 204.93.223.151, 10.195.114.81 - - [11/Aug/2011 21:03:50] "GET / > HTTP/1.0" 200 37902 0.5316 > app error: Content-Length header was 37902, but should be 0 > (Rack::Lint::LintError) > /mnt/data/tanga/current/bundler/ruby/1.9.1/gems/rack-1.2.3/lib/rack/lint.rb:19:in > `assert' > /mnt/data/tanga/current/bundler/ruby/1.9.1/gems/rack-1.2.3/lib/rack/lint.rb:501:in > `verify_content_length' Looking at the 1.2.3 rack/lint.rb code, it should've set @head_request to true when env["REQUEST_METHOD"] == "HEAD" (rack/lint.rb line 56). Do you happen to have any middlewares that might rewrite REQUEST_METHOD? I would edit rack/lint.rb and put some print statements to show the value of @head_request and env["REQUEST_METHOD"] -- Eric Wong From joe at tanga.com Fri Aug 12 14:09:36 2011 From: joe at tanga.com (Joe Van Dyk) Date: Fri, 12 Aug 2011 11:09:36 -0700 Subject: Rack content-length Rack::Lint::LintErrors errors with unicorn In-Reply-To: <20110812054252.GA30507@dcvr.yhbt.net> References: <20110812054252.GA30507@dcvr.yhbt.net> Message-ID: On Thu, Aug 11, 2011 at 10:42 PM, Eric Wong wrote: > Joe Van Dyk wrote: >> Has anyone seen anything like this before? ?I can get it to happen all >> the time if I issue a HEAD request, but it only happens very >> intermittently on GET requests. >> >> I'm using Ruby 1.9.2p180. >> >> Any ideas on where to start debugging? > > What web framework and other middlewares are you running? ?Are you using Rack::Head to > generate HEAD responses or something else? Rails 3.0. I'm not doing anything special with HEAD requests. I actually don't care about the HEAD requests -- you'll note that I'm doing a GET request below. The Content-Length is 37902 (which is should be), but apparently there's nothing in the response body? >> 204.93.223.151, 10.195.114.81 - - [11/Aug/2011 21:03:50] "GET / >> HTTP/1.0" 200 37902 0.5316 >> app error: Content-Length header was 37902, but should be 0 >> (Rack::Lint::LintError) >> /mnt/data/tanga/current/bundler/ruby/1.9.1/gems/rack-1.2.3/lib/rack/lint.rb:19:in >> `assert' >> /mnt/data/tanga/current/bundler/ruby/1.9.1/gems/rack-1.2.3/lib/rack/lint.rb:501:in >> `verify_content_length' > > Looking at the 1.2.3 rack/lint.rb code, it should've set @head_request to true > when env["REQUEST_METHOD"] == "HEAD" (rack/lint.rb line 56). > Do you happen to have any middlewares that might rewrite REQUEST_METHOD? > > I would edit rack/lint.rb and put some print statements to show the > value of @head_request and env["REQUEST_METHOD"] > > -- > Eric Wong > _______________________________________________ > Unicorn mailing list - mongrel-unicorn at rubyforge.org > http://rubyforge.org/mailman/listinfo/mongrel-unicorn > Do not quote signatures (like this one) or top post when replying > From joe at tanga.com Fri Aug 12 15:22:32 2011 From: joe at tanga.com (Joe Van Dyk) Date: Fri, 12 Aug 2011 12:22:32 -0700 Subject: Rack content-length Rack::Lint::LintErrors errors with unicorn In-Reply-To: References: <20110812054252.GA30507@dcvr.yhbt.net> Message-ID: On Fri, Aug 12, 2011 at 11:09 AM, Joe Van Dyk wrote: > On Thu, Aug 11, 2011 at 10:42 PM, Eric Wong wrote: >> Joe Van Dyk wrote: >>> Has anyone seen anything like this before? ?I can get it to happen all >>> the time if I issue a HEAD request, but it only happens very >>> intermittently on GET requests. >>> >>> I'm using Ruby 1.9.2p180. >>> >>> Any ideas on where to start debugging? >> >> What web framework and other middlewares are you running? ?Are you using Rack::Head to >> generate HEAD responses or something else? > > Rails 3.0. ?I'm not doing anything special with HEAD requests. ?I > actually don't care about the HEAD requests -- you'll note that I'm > doing a GET request below. ?The Content-Length is 37902 (which is > should be), but apparently there's nothing in the response body? Here's another example, the body is apparently empty, but the content-length is set on a 302 request. Unfortunately, I can't reproduce the problem consistently. :( I'd expect the body to not be 0 here. 216.14.95.14, 10.195.114.81 - - [12/Aug/2011 12:00:46] "GET /deals/current/good_till_gone HTTP/1.0" 302 195 0.0148 app error: Content-Length header was 195, but should be 0 (Rack::Lint::LintError) /mnt/data/tanga/current/bundler/ruby/1.9.1/gems/rack-1.2.3/lib/rack/lint.rb:19:in `assert' /mnt/data/tanga/current/bundler/ruby/1.9.1/gems/rack-1.2.3/lib/rack/lint.rb:501:in `verify_content_length' /mnt/data/tanga/current/bundler/ruby/1.9.1/gems/rack-1.2.3/lib/rack/lint.rb:525:in `each' /mnt/data/tanga/current/bundler/ruby/1.9.1/gems/unicorn-4.0.1/lib/unicorn/http_response.rb:41:in `http_response_write' /mnt/data/tanga/current/bundler/ruby/1.9.1/gems/unicorn-4.0.1/lib/unicorn/http_server.rb:526:in `process_client' /mnt/data/tanga/current/bundler/ruby/1.9.1/gems/unicorn-4.0.1/lib/unicorn/http_server.rb:585:in `worker_loop' /mnt/data/tanga/current/bundler/ruby/1.9.1/gems/newrelic_rpm-3.0.1/lib/new_relic/agent/instrumentation/unicorn_instrumentation.rb:12:in `call' /mnt/data/tanga/current/bundler/ruby/1.9.1/gems/newrelic_rpm-3.0.1/lib/new_relic/agent/instrumentation/unicorn_instrumentation.rb:12:in `block (4 levels) in ' /mnt/data/tanga/current/bundler/ruby/1.9.1/gems/unicorn-4.0.1/lib/unicorn/http_server.rb:475:in `spawn_missing_workers' /mnt/data/tanga/current/bundler/ruby/1.9.1/gems/unicorn-4.0.1/lib/unicorn/http_server.rb:135:in `start' /mnt/data/tanga/current/bundler/ruby/1.9.1/gems/unicorn-4.0.1/bin/unicorn:121:in `' /data/tanga/current/sites/tanga/bin/unicorn:16:in `load' /data/tanga/current/sites/tanga/bin/unicorn:16:in `
' > >>> 204.93.223.151, 10.195.114.81 - - [11/Aug/2011 21:03:50] "GET / >>> HTTP/1.0" 200 37902 0.5316 >>> app error: Content-Length header was 37902, but should be 0 >>> (Rack::Lint::LintError) >>> /mnt/data/tanga/current/bundler/ruby/1.9.1/gems/rack-1.2.3/lib/rack/lint.rb:19:in >>> `assert' >>> /mnt/data/tanga/current/bundler/ruby/1.9.1/gems/rack-1.2.3/lib/rack/lint.rb:501:in >>> `verify_content_length' >> >> Looking at the 1.2.3 rack/lint.rb code, it should've set @head_request to true >> when env["REQUEST_METHOD"] == "HEAD" (rack/lint.rb line 56). >> Do you happen to have any middlewares that might rewrite REQUEST_METHOD? >> >> I would edit rack/lint.rb and put some print statements to show the >> value of @head_request and env["REQUEST_METHOD"] >> >> -- >> Eric Wong >> _______________________________________________ >> Unicorn mailing list - mongrel-unicorn at rubyforge.org >> http://rubyforge.org/mailman/listinfo/mongrel-unicorn >> Do not quote signatures (like this one) or top post when replying >> > From normalperson at yhbt.net Fri Aug 12 18:36:15 2011 From: normalperson at yhbt.net (Eric Wong) Date: Fri, 12 Aug 2011 15:36:15 -0700 Subject: Rack content-length Rack::Lint::LintErrors errors with unicorn In-Reply-To: References: <20110812054252.GA30507@dcvr.yhbt.net> Message-ID: <20110812223615.GA30885@dcvr.yhbt.net> Joe Van Dyk wrote: > /mnt/data/tanga/current/bundler/ruby/1.9.1/gems/newrelic_rpm-3.0.1/lib/new_relic/agent/instrumentation/unicorn_instrumentation.rb:12:in > `call' Does disabling newrelic help? I heard it might not work with Unicorn 4.x, yet. Maybe someone else has more experience with this, I don't rely on non-Free APIs or services like new relic, ever. -- Eric Wong From joe at tanga.com Sat Aug 13 00:09:34 2011 From: joe at tanga.com (Joe Van Dyk) Date: Fri, 12 Aug 2011 21:09:34 -0700 Subject: Rack content-length Rack::Lint::LintErrors errors with unicorn In-Reply-To: <20110812054252.GA30507@dcvr.yhbt.net> References: <20110812054252.GA30507@dcvr.yhbt.net> Message-ID: On Thu, Aug 11, 2011 at 10:42 PM, Eric Wong wrote: > Joe Van Dyk wrote: >> Has anyone seen anything like this before? ?I can get it to happen all >> the time if I issue a HEAD request, but it only happens very >> intermittently on GET requests. >> >> I'm using Ruby 1.9.2p180. >> >> Any ideas on where to start debugging? > > What web framework and other middlewares are you running? ?Are you using Rack::Head to > generate HEAD responses or something else? Does unicorn use Rack::Lint by default? What about Rack::ShowBacktrace? From normalperson at yhbt.net Sat Aug 13 00:51:55 2011 From: normalperson at yhbt.net (Eric Wong) Date: Sat, 13 Aug 2011 04:51:55 +0000 Subject: Rack content-length Rack::Lint::LintErrors errors with unicorn In-Reply-To: References: <20110812054252.GA30507@dcvr.yhbt.net> Message-ID: <20110813045155.GA11550@dcvr.yhbt.net> Joe Van Dyk wrote: > On Thu, Aug 11, 2011 at 10:42 PM, Eric Wong wrote: > > Joe Van Dyk wrote: > >> Has anyone seen anything like this before? ?I can get it to happen all > >> the time if I issue a HEAD request, but it only happens very > >> intermittently on GET requests. > >> > >> I'm using Ruby 1.9.2p180. > >> > >> Any ideas on where to start debugging? > > > > What web framework and other middlewares are you running? ?Are you using Rack::Head to > > generate HEAD responses or something else? > > Does unicorn use Rack::Lint by default? What about Rack::ShowBacktrace? Yes with RACK_ENV=development, it loads Rack::CommonLogger, Rack::ShowExceptions and Rack::Lint. This stack should match what "rackup" loads in development. Btw, did you notice my other reply? I forgot to Cc: you and I'm not sure if you're subscribed or not (if via gmane): http://mid.gmane.org/20110812223615.GA30885 at dcvr.yhbt.net -- Eric Wong From joe at tanga.com Sat Aug 13 03:24:33 2011 From: joe at tanga.com (Joe Van Dyk) Date: Sat, 13 Aug 2011 00:24:33 -0700 Subject: Rack content-length Rack::Lint::LintErrors errors with unicorn In-Reply-To: <20110813045155.GA11550@dcvr.yhbt.net> References: <20110812054252.GA30507@dcvr.yhbt.net> <20110813045155.GA11550@dcvr.yhbt.net> Message-ID: On Fri, Aug 12, 2011 at 9:51 PM, Eric Wong wrote: > Joe Van Dyk wrote: >> On Thu, Aug 11, 2011 at 10:42 PM, Eric Wong wrote: >> > Joe Van Dyk wrote: >> >> Has anyone seen anything like this before? ?I can get it to happen all >> >> the time if I issue a HEAD request, but it only happens very >> >> intermittently on GET requests. >> >> >> >> I'm using Ruby 1.9.2p180. >> >> >> >> Any ideas on where to start debugging? >> > >> > What web framework and other middlewares are you running? ?Are you using Rack::Head to >> > generate HEAD responses or something else? >> >> Does unicorn use Rack::Lint by default? ?What about Rack::ShowBacktrace? > > Yes with RACK_ENV=development, it loads Rack::CommonLogger, > Rack::ShowExceptions and Rack::Lint. ?This stack should match what > "rackup" loads in development. > > Btw, did you notice my other reply? ?I forgot to Cc: you and I'm not > sure if you're subscribed or not (if via gmane): > ?http://mid.gmane.org/20110812223615.GA30885 at dcvr.yhbt.net I thought I subscribed to the mailing list last night, but I'm not getting any emails. I just tried to subscribe (by email and by web) and still haven't received the confirmation email. Disabling newrelic had no effect on the bug. Joe From joe at tanga.com Sat Aug 13 03:29:05 2011 From: joe at tanga.com (Joe Van Dyk) Date: Sat, 13 Aug 2011 00:29:05 -0700 Subject: Rack content-length Rack::Lint::LintErrors errors with unicorn In-Reply-To: <20110813045155.GA11550@dcvr.yhbt.net> References: <20110812054252.GA30507@dcvr.yhbt.net> <20110813045155.GA11550@dcvr.yhbt.net> Message-ID: On Fri, Aug 12, 2011 at 9:51 PM, Eric Wong wrote: > Joe Van Dyk wrote: >> On Thu, Aug 11, 2011 at 10:42 PM, Eric Wong wrote: >> > Joe Van Dyk wrote: >> >> Has anyone seen anything like this before? ?I can get it to happen all >> >> the time if I issue a HEAD request, but it only happens very >> >> intermittently on GET requests. >> >> >> >> I'm using Ruby 1.9.2p180. >> >> >> >> Any ideas on where to start debugging? >> > >> > What web framework and other middlewares are you running? ?Are you using Rack::Head to >> > generate HEAD responses or something else? >> >> Does unicorn use Rack::Lint by default? ?What about Rack::ShowBacktrace? > > Yes with RACK_ENV=development, it loads Rack::CommonLogger, > Rack::ShowExceptions and Rack::Lint. ?This stack should match what > "rackup" loads in development. Hm -- RACK_ENV is set to 'production'. I don't see Rack::Lint or Rack::ShowExceptions when I do 'rake middleware' on the servers that exhibit the bug. Joe From normalperson at yhbt.net Sat Aug 13 04:17:03 2011 From: normalperson at yhbt.net (Eric Wong) Date: Sat, 13 Aug 2011 08:17:03 +0000 Subject: Rack content-length Rack::Lint::LintErrors errors with unicorn In-Reply-To: References: <20110812054252.GA30507@dcvr.yhbt.net> <20110813045155.GA11550@dcvr.yhbt.net> Message-ID: <20110813081703.GA13967@dcvr.yhbt.net> Joe Van Dyk wrote: > On Fri, Aug 12, 2011 at 9:51 PM, Eric Wong wrote: > > Joe Van Dyk wrote: > >> On Thu, Aug 11, 2011 at 10:42 PM, Eric Wong wrote: > >> > What web framework and other middlewares are you running? ?Are you using Rack::Head to > >> > generate HEAD responses or something else? > >> > >> Does unicorn use Rack::Lint by default? ?What about Rack::ShowBacktrace? > > > > Yes with RACK_ENV=development, it loads Rack::CommonLogger, > > Rack::ShowExceptions and Rack::Lint. ?This stack should match what > > "rackup" loads in development. > > Hm -- RACK_ENV is set to 'production'. I don't see Rack::Lint or > Rack::ShowExceptions when I do 'rake middleware' on the servers that > exhibit the bug. "rake middleware" won't show what the Rack server (Unicorn in this case) loads automatically. If you can, share what "rake middleware" shows[1] Also, are you sure it's RACK_ENV or could it be RAILS_ENV? If you're sure it's RACK_ENV=production, then there's no way it should load Rack::Lint behind your back... Can you share your command-line for launching unicorn? Are you using "unicorn" or "unicorn_rails"? The latter was designed for Rails 1/2.x, the former may work better with Rails 3 but really both should work... [1] Ideally, you'd be able to share the full source + configs of your app, but I'm not going to push you to violate NDAs and such. If you can put up an isolated test case that reproduces this problem reliably, that'd be awesome, too. -- Eric Wong From normalperson at yhbt.net Sat Aug 13 04:29:16 2011 From: normalperson at yhbt.net (Eric Wong) Date: Sat, 13 Aug 2011 08:29:16 +0000 Subject: Rack content-length Rack::Lint::LintErrors errors with unicorn In-Reply-To: References: <20110812054252.GA30507@dcvr.yhbt.net> <20110813045155.GA11550@dcvr.yhbt.net> Message-ID: <20110813082916.GB13967@dcvr.yhbt.net> Joe Van Dyk wrote: > I thought I subscribed to the mailing list last night, but I'm not > getting any emails. I just tried to subscribe (by email and by web) > and still haven't received the confirmation email. I just saw your subscription. I don't like requiring subscriptions to post, though. I'm used to lists where it's customary/expected to Cc: all recipients, but the Ruby world seems to favor[1] subscription-required mailing lists, however... (ruby-talk/ruby-core/...) > Disabling newrelic had no effect on the bug. Also, is there anything else you use that might hook directly into Unicorn internals? Something really feels "off" about the errors you're getting from Rack::Lint. You're not running anything super-exotic and I'm sure you're not the only Rails 3.0 + Ruby 1.9.2 user around. Do you have any other Rails/Rack applications or try other application servers? (WEBrick/Passenger/Thin/...). [1] - Sure it seems to favor non-Free operating systems and services, too, but I draw the line there. I am a after all a Free Software-only hippie :> -- Eric Wong From serg at podtynnyi.com Tue Aug 16 11:45:09 2011 From: serg at podtynnyi.com (Serg Podtynnyi) Date: Tue, 16 Aug 2011 19:45:09 +0400 Subject: Unicorn logging in production env Message-ID: Hi All, as I can see on https://github.com/defunkt/unicorn/blob/master/lib/unicorn.rb#L53 the?Rack::CommonLogger used only in development env and in weird evn called "deployment". Any chance to add "production" to this case? Serg Podtynnyi From normalperson at yhbt.net Tue Aug 16 13:30:34 2011 From: normalperson at yhbt.net (Eric Wong) Date: Tue, 16 Aug 2011 17:30:34 +0000 Subject: Unicorn logging in production env In-Reply-To: References: Message-ID: <20110816173034.GA4785@dcvr.yhbt.net> Serg Podtynnyi wrote: > Hi All, as I can see on > https://github.com/defunkt/unicorn/blob/master/lib/unicorn.rb#L53 > the?Rack::CommonLogger used only in development env and in weird evn > called "deployment". > Any chance to add "production" to this case? Not unless Rack does it, Rack doesn't recognize "production" in any special way. If you really want it, you can add Rack::CommonLogger to config.ru yourself. Since I was never satisfied with the Rack::CommonLogger logging format, I wrote clogger[1] which offers nginx-like formatting options. [1] http://clogger.rubyforge.org/ - configurable logger for Rack -- Eric Wong From jfieber at adobe.com Tue Aug 16 18:17:36 2011 From: jfieber at adobe.com (John Fieber) Date: Tue, 16 Aug 2011 15:17:36 -0700 Subject: Unicorn logging in production env In-Reply-To: References: Message-ID: <6010D71F-43CE-4D71-82C5-3C2AB2FD897B@adobe.com> On Aug 16, 2011, at 8:45 AM, Serg Podtynnyi wrote: > Hi All, as I can see on > https://github.com/defunkt/unicorn/blob/master/lib/unicorn.rb#L53 > the Rack::CommonLogger used only in development env and in weird evn > called "deployment". > Any chance to add "production" to this case? My wish would be to remove all of the magically added middlewares. Leave the middleware assembly to config.ru. Failing that, my next wish would be to have the magic opt-in. Failing that, my next wish would be for an easy opt-out. It is maddening to debug an error in your app that causes the ShowExceptions middleware to crash, masking your actual problem. -john From normalperson at yhbt.net Tue Aug 16 20:45:03 2011 From: normalperson at yhbt.net (Eric Wong) Date: Tue, 16 Aug 2011 17:45:03 -0700 Subject: Unicorn logging in production env In-Reply-To: <6010D71F-43CE-4D71-82C5-3C2AB2FD897B@adobe.com> References: <6010D71F-43CE-4D71-82C5-3C2AB2FD897B@adobe.com> Message-ID: <20110817004503.GA2793@dcvr.yhbt.net> John Fieber wrote: > On Aug 16, 2011, at 8:45 AM, Serg Podtynnyi wrote: > > Hi All, as I can see on > > https://github.com/defunkt/unicorn/blob/master/lib/unicorn.rb#L53 > > the Rack::CommonLogger used only in development env and in weird evn > > called "deployment". > > Any chance to add "production" to this case? > > My wish would be to remove all of the magically added middlewares. > Leave the middleware assembly to config.ru. If you can convince Rack upstream, Unicorn will follow. I dislike magic middlewares, too, but I try to follow Rack to minimize the differences for people using unicorn (and also switching between "rackup" and unicorn). > Failing that, my next wish would be to have the magic opt-in. Failing > that, my next wish would be for an easy opt-out. It is maddening to > debug an error in your app that causes the ShowExceptions middleware > to crash, masking your actual problem. I always set RACK_ENV=none for any services I run so nothing gets added automatically. -- Eric Wong From ajsharp at gmail.com Wed Aug 17 00:26:28 2011 From: ajsharp at gmail.com (Alex Sharp) Date: Tue, 16 Aug 2011 21:26:28 -0700 Subject: Strange quit behavior In-Reply-To: <20110805080729.GA6602@dcvr.yhbt.net> References: <20110802215412.GA12725@dcvr.yhbt.net> <20110805080729.GA6602@dcvr.yhbt.net> Message-ID: On Fri, Aug 5, 2011 at 1:07 AM, Eric Wong wrote: > Is this Unicorn 3.x? ?Which 3.x version exactly? ?Maybe give 4.0.1 > or the 4.0.2 git prerelease a try, too. We're using version 3.6.2 with ruby 1.9.2-p180 on ubuntu 11.x. I know the kernel on this version of ubuntu has a know signal trapping bug, but I don't think that's what happening here. The processes respond after I send the restart signal to them again (USR2 + QUIT). > Can I get more? (until the next select() call, at least). ?I'd like to > see the timeout argument that gets passed to the next select(). Here's some more output. This is from the old master though, not a new master that is pegging the CPU. In this instance it's almost like the old master just ignores the signal. select(4, [3], NULL, NULL, {8, 129984}) = 0 (Timeout) wait4(-1, 0x7fff16b82e4c, WNOHANG, NULL) = 0 clock_gettime(CLOCK_REALTIME, {1313554942, 259408364}) = 0 fstat(12, {st_dev=makedev(202, 1), st_ino=20373, st_mode=S_IFREG, st_nlink=0, st_uid=1001, st_gid=1001, st_blksize=4096, st_blocks=0, st_size=0, st_atime=2011/08/12-23:14:20, st_mtime=2011/08/12-23:14:20, st_ctime=2011/08/17-04:22:21}) = 0 clock_gettime(CLOCK_REALTIME, {1313554942, 259775504}) = 0 fstat(13, {st_dev=makedev(202, 1), st_ino=20381, st_mode=S_IFREG, st_nlink=0, st_uid=1001, st_gid=1001, st_blksize=4096, st_blocks=0, st_size=0, st_atime=2011/08/12-23:14:20, st_mtime=2011/08/12-23:14:20, st_ctime=2011/08/17-04:22:17}) = 0 clock_gettime(CLOCK_REALTIME, {1313554942, 259936816}) = 0 fstat(14, {st_dev=makedev(202, 1), st_ino=20382, st_mode=S_IFREG, st_nlink=0, st_uid=1001, st_gid=1001, st_blksize=4096, st_blocks=0, st_size=0, st_atime=2011/08/12-23:14:21, st_mtime=2011/08/12-23:14:21, st_ctime=2011/08/17-04:22:19}) = 0 clock_gettime(CLOCK_REALTIME, {1313554942, 260086380}) = 0 fstat(15, {st_dev=makedev(202, 1), st_ino=20185, st_mode=S_IFREG, st_nlink=0, st_uid=1001, st_gid=1001, st_blksize=4096, st_blocks=0, st_size=0, st_atime=2011/08/12-23:14:21, st_mtime=2011/08/12-23:14:21, st_ctime=2011/08/17-04:22:17}) = 0 clock_gettime(CLOCK_REALTIME, {1313554942, 260235797}) = 0 fstat(16, {st_dev=makedev(202, 1), st_ino=20255, st_mode=S_IFREG, st_nlink=0, st_uid=1001, st_gid=1001, st_blksize=4096, st_blocks=0, st_size=0, st_atime=2011/08/12-23:14:21, st_mtime=2011/08/12-23:14:21, st_ctime=2011/08/17-04:22:19}) = 0 clock_gettime(CLOCK_REALTIME, {1313554942, 260384849}) = 0 fstat(17, {st_dev=makedev(202, 1), st_ino=20383, st_mode=S_IFREG, st_nlink=0, st_uid=1001, st_gid=1001, st_blksize=4096, st_blocks=0, st_size=0, st_atime=2011/08/12-23:14:21, st_mtime=2011/08/12-23:14:21, st_ctime=2011/08/17-04:22:19}) = 0 clock_gettime(CLOCK_REALTIME, {1313554942, 260534792}) = 0 fstat(18, {st_dev=makedev(202, 1), st_ino=20384, st_mode=S_IFREG, st_nlink=0, st_uid=1001, st_gid=1001, st_blksize=4096, st_blocks=0, st_size=0, st_atime=2011/08/12-23:14:21, st_mtime=2011/08/12-23:14:21, st_ctime=2011/08/17-04:22:19}) = 0 clock_gettime(CLOCK_REALTIME, {1313554942, 260684278}) = 0 fstat(10, {st_dev=makedev(202, 1), st_ino=20196, st_mode=S_IFREG, st_nlink=0, st_uid=1001, st_gid=1001, st_blksize=4096, st_blocks=0, st_size=0, st_atime=2011/08/16-16:33:46, st_mtime=2011/08/16-16:33:46, st_ctime=2011/08/17-04:22:19}) = 0 clock_gettime(CLOCK_REALTIME, {1313554942, 260833725}) = 0 select(4, [3], NULL, NULL, {8, 976047} > If you see select() with very small timeout, use "strace -v" to get the > st_ctime from fstat()s... > > I could have a math bug in murder_lazy_workers (please read/review the > logic in that method, I haven't noticed issues myself). > > I made some tweaks to the master sleep strategy within the past year to > reduce wakeups during idle hours. ?This is intended to go with Ruby > 1.9.3 which will have /much/ better thread wakeup strategy that reduces > power consumption during idle times. > >> The first line was when the master was idle, and then I threw a few >> requests at it. > > Are all workers responding as expected and not dying? The old workers appear to be serving requests. - Alex From serg at podtynnyi.com Wed Aug 17 05:21:39 2011 From: serg at podtynnyi.com (Serg Podtynnyi) Date: Wed, 17 Aug 2011 13:21:39 +0400 Subject: Unicorn logging in production env In-Reply-To: <20110816173034.GA4785@dcvr.yhbt.net> References: <20110816173034.GA4785@dcvr.yhbt.net> Message-ID: On Tue, Aug 16, 2011 at 9:30 PM, Eric Wong wrote: > Serg Podtynnyi wrote: >> Hi All, as I can see on >> https://github.com/defunkt/unicorn/blob/master/lib/unicorn.rb#L53 >> the?Rack::CommonLogger used only in development env and in weird evn >> called "deployment". >> Any chance to add ?"production" to this case? > > Not unless Rack does it, Rack doesn't recognize "production" in any > special way. > > If you really want it, you can add Rack::CommonLogger to config.ru > yourself. > > Since I was never satisfied with the Rack::CommonLogger logging format, > I wrote clogger[1] which offers nginx-like formatting options. > > > > [1] http://clogger.rubyforge.org/ - configurable logger for Rack > I am asking about this default values because by the convention production env is 99% "production", but in the code I see "deployment", so most of the people that are using unicorn have bogus logs on production. Thanks for clogger suggestion. PS Any idea of how to have some kind of useful information when master process kills workers by timeout? We just monkey patched Unicorn::HttpParser read method like this https://gist.github.com/1151164 to get more info. Is there a better way to do this? From normalperson at yhbt.net Wed Aug 17 05:22:52 2011 From: normalperson at yhbt.net (Eric Wong) Date: Wed, 17 Aug 2011 09:22:52 +0000 Subject: Strange quit behavior In-Reply-To: References: <20110802215412.GA12725@dcvr.yhbt.net> <20110805080729.GA6602@dcvr.yhbt.net> Message-ID: <20110817092252.GA7186@dcvr.yhbt.net> Alex Sharp wrote: > On Fri, Aug 5, 2011 at 1:07 AM, Eric Wong wrote: > > Can I get more? (until the next select() call, at least). ?I'd like to > > see the timeout argument that gets passed to the next select(). > > Here's some more output. This is from the old master though, not a new > master that is pegging the CPU. In this instance it's almost like the > old master just ignores the signal. Wait, weren't you trying to track down a problem with the master that's pegging the CPU? Isn't the CPU usage the problem here? I was also thinking the CPU usage in the new master was caused by constant worker respawning because bundler wasn't loaded correctly somehow.... > select(4, [3], NULL, NULL, {8, 129984}) = 0 (Timeout) > select(4, [3], NULL, NULL, {8, 976047} OK, so /this/ master is confirmed to be sleeping with a reasonable timeout and not pegging the CPU... If you want to actually track down whether or not a signal is being delivered, use "strace -f -e '!futex'" since Ruby 1.9 has a dedicated signal handling thread (at the OS level). (You only need the -e '!futex' part to filter out the noise from the signal handling thread in <= 1.9.2, 1.9.3 is much quieter :) Below is a proposed patch (to unicorn.git) which may help debug issues in the signal -> handler master path (but only once it enters the Ruby runtime). I'm a hesitant to commit it since it worthless if the Ruby process is stuck because of some bad C extension. That's the most common cause of stuck/unresponsive processes I've seen. Subject: [PATCH] http_server: add debug statements for master sig handlers This should help users track down what's going on as soon as Ruby can process the signal. It's still not going to be useful if the Ruby process is stuck because of a bug in a C extension or Ruby itself, though. Unicorn always defers signals in the master process to serialize signal handling since some of the important actions (e.g. HUP, USR1, USR2) are NOT reentrant. --- lib/unicorn/http_server.rb | 13 +++++++++++-- 1 files changed, 11 insertions(+), 2 deletions(-) diff --git a/lib/unicorn/http_server.rb b/lib/unicorn/http_server.rb index aa8212e..b45d6d6 100644 --- a/lib/unicorn/http_server.rb +++ b/lib/unicorn/http_server.rb @@ -126,8 +126,17 @@ class Unicorn::HttpServer # setup signal handlers before writing pid file in case people get # trigger happy and send signals as soon as the pid file exists. # Note that signals don't actually get handled until the #join method - QUEUE_SIGS.each { |sig| trap(sig) { SIG_QUEUE << sig; awaken_master } } - trap(:CHLD) { awaken_master } + QUEUE_SIGS.each do |sig| + trap(sig) do + @logger.debug("received SIG#{sig}") + SIG_QUEUE << sig + awaken_master + end + end + trap(:CHLD) do + @logger.debug("received SIGCHLD") + awaken_master + end self.pid = config[:pid] self.master_pid = $$ -- Eric Wong From normalperson at yhbt.net Wed Aug 17 05:58:35 2011 From: normalperson at yhbt.net (Eric Wong) Date: Wed, 17 Aug 2011 09:58:35 +0000 Subject: Unicorn logging in production env In-Reply-To: References: <20110816173034.GA4785@dcvr.yhbt.net> Message-ID: <20110817095835.GA13605@dcvr.yhbt.net> Serg Podtynnyi wrote: > I am asking about this default values because by the convention > production env is 99% "production", but in the code I see > "deployment", so most of the people that are using unicorn have bogus > logs on production. Thanks for clogger suggestion. Yeah, as I explained in the other post, I've always tried to base as many defaults off Rack defaults so people have less trouble switching from/to Unicorn. I'm quite happy with how clogger works, hopefully we find more happy users :) (I don't like relying on unicorn to promote clogger, though. clogger should be able to stand on its own since it works with any Rack server). > Any idea of how to have some kind of useful information when master > process kills workers by timeout? It's not possible to know what a worker is doing since SIGKILL isn't trappable. The master killing workers should only be a last resort that'll let you limp by until you can have a proper fix. The the rest of the app should have local timeouts for all non-deterministic calls. > We just monkey patched Unicorn::HttpParser read method like this > https://gist.github.com/1151164 to get more info. Is there a better > way to do this? There isn't any difference between that and having Rack middleware at the top of your stack (clogger logs at the end, and if you have things like $body_bytes_sent it even wraps the response body) class LogBefore < Struct.new(:app) def call(env) env["rack.logger"].debug env["REQUEST_PATH"] end end --------- config.ru --------- use LogBefore use ... use ... use ... run YourApp.new ----------------------------- -- Eric Wong From normalperson at yhbt.net Wed Aug 17 06:04:17 2011 From: normalperson at yhbt.net (Eric Wong) Date: Wed, 17 Aug 2011 10:04:17 +0000 Subject: Unicorn logging in production env In-Reply-To: <20110817095835.GA13605@dcvr.yhbt.net> References: <20110816173034.GA4785@dcvr.yhbt.net> <20110817095835.GA13605@dcvr.yhbt.net> Message-ID: <20110817100417.GB13605@dcvr.yhbt.net> Eric Wong wrote: > The the rest of the app should have local timeouts for all > non-deterministic calls. If you're lazy, maybe the following middleware works, too, I haven't ever needed it but wrote (and later rewrote it) as a proof-of-concept: http://bogomips.org/rainbows.git/tree/lib/rainbows/thread_timeout.rb It should work with bare Unicorn if you remove the "Rainbows" references and the negative threshold handling in "initialize". If somebody wants I can split it into it's own gem (should work with WEBrick, Mongrel, Passenger, and maybe Thin), too). Unlike the Timeout library in Ruby stdlib, this doesn't allow nested timeout {} calls so it's simpler/possibly-safer in this regard... I even posted a request-for-review for the middleware on ruby-talk but didn't get any responses: http://mid.gmane.org/20110421232615.GA27468 at dcvr.yhbt.net -- Eric Wong From normalperson at yhbt.net Wed Aug 17 16:13:23 2011 From: normalperson at yhbt.net (Eric Wong) Date: Wed, 17 Aug 2011 13:13:23 -0700 Subject: Strange quit behavior In-Reply-To: <20110817092252.GA7186@dcvr.yhbt.net> References: <20110802215412.GA12725@dcvr.yhbt.net> <20110805080729.GA6602@dcvr.yhbt.net> <20110817092252.GA7186@dcvr.yhbt.net> Message-ID: <20110817201323.GA24581@dcvr.yhbt.net> Eric Wong wrote: > Below is a proposed patch (to unicorn.git) which may help debug issues > in the signal -> handler master path (but only once it enters the Ruby > runtime). I'm a hesitant to commit it since it worthless if the Ruby > process is stuck because of some bad C extension. That's the most > common cause of stuck/unresponsive processes I've seen. I think that was a bad patch, adding signal handler debugging at the Ruby layer leads to the false assumption that interpreter/VM is in a good state. If you need to debug signal handlers, something is already broken and tracing syscalls is the most reliable way to go. Ruby (and any other high-level language) signal handling is not straight forward[1]. Here's how things work in Matz Ruby 1.9.x[2]: you C timer thread Ruby Thread(s) ------------------------------------------------------------------- traps signals ignores most signals sleeps runs Ruby... kill -USR2 ... receives signal (async) runs (system) sighandler[1] wakes up from sleep signals Ruby Thread(s) *hopefully wakes up* runs Ruby sighandler The "*hopefully wakes up*" part is the part most likely to fail as a result of a bad C extension or Ruby bug. PS. In Ruby 1.9.3, timer thread uses the "self-pipe" sighandler implementation that the unicorn master process always used. This allows Ruby 1.9.3 to conserve power on idle processes. In 1.9.2, the timer thread signal handler just polls in 10ms intervals to check if any signals were received. This is why "strace -f" is noisy and I recommend "-e '!futex'" for 1.9.2. PPS. Unicorn still uses the "self-pipe" signal handler in Ruby-land because Ruby signal handlers are reentrant so must execute reentrant-safe code. So without the self-pipe to serialize the signal handler dispatch, the Ruby signal handler execution can nest and overlap execution with itself. This means if USR2 is sent multiple times in short succession, you could spawn multiple new unicorn masters [1] - See "man 7 signal" in Linux manpages or POSIX specs for the small list of safe functions that may be called in system-level sighandlers. Ruby-level signal handlers can't run inside system-level signal handlers for this reason. [2] - I think any high-level language that implements signal handlers AND native threads must do something similar. The only valid variation I can think of is to execute the high-level language code inside the timer thread, but that requires the coders of the high-level language to have thread-safety (not just reentrancy) in mind when writing signal handlers even if the rest of their code uses no threads. -- Eric Wong From ajsharp at gmail.com Thu Aug 18 19:13:19 2011 From: ajsharp at gmail.com (Alex Sharp) Date: Thu, 18 Aug 2011 16:13:19 -0700 Subject: Strange quit behavior In-Reply-To: <20110817201323.GA24581@dcvr.yhbt.net> References: <20110802215412.GA12725@dcvr.yhbt.net> <20110805080729.GA6602@dcvr.yhbt.net> <20110817092252.GA7186@dcvr.yhbt.net> <20110817201323.GA24581@dcvr.yhbt.net> Message-ID: Ok, thanks a lot for the for the patch attempt and subsequent in-depth explanation. I know this thread is long-lived and a bit confusing, mostly because I've seen two different sets of scenarios that yield the same result, which is that unicorn does not restart. The first scenario (and the one that started this thread) was that unicorn actually receives the signal, spawns a new master, and that process pegs the cpu and spins into eternity. The second scenario is that the old master seems to completely ignore the USR2 signal the first time, and when sent again, properly restarts. In both scenarios the old master and workers continue to serve the old code. I thought that setting the Unicorn::HttpServer::START_CTX[0] in my unicorn config had solved the first scenario (pegged cpu), but I found a new occurrence of it today, and I now have some new strace output for this scenario (strace -v): select(4, [3], NULL, NULL, {60, 727229}) = 0 (Timeout) wait4(-1, 0x7ffffd70f72c, WNOHANG, NULL) = 0 clock_gettime(CLOCK_REALTIME, {1313708081, 720757662}) = 0 fstat(8, {st_dev=makedev(202, 1), st_ino=39806, st_mode=S_IFREG, st_nlink=0, st_uid=1001, st_gid=1001, st_blksize=4096, st_blocks=0, st_size=0, st_atime=2011/08/18-14:32:10, st_mtime=2011/08/18-14:32:10, st_ctime=2011/08/18-22:54:06}) = 0 clock_gettime(CLOCK_REALTIME, {1313708081, 721131305}) = 0 fstat(10, {st_dev=makedev(202, 1), st_ino=45370, st_mode=S_IFREG, st_nlink=0, st_uid=1001, st_gid=1001, st_blksize=4096, st_blocks=0, st_size=0, st_atime=2011/08/18-14:32:10, st_mtime=2011/08/18-14:32:10, st_ctime=2011/08/18-22:54:05}) = 0 clock_gettime(CLOCK_REALTIME, {1313708081, 721290800}) = 0 select(4, [3], NULL, NULL, {565, 34005}) = 0 (Timeout) wait4(-1, 0x7ffffd70f72c, WNOHANG, NULL) = 0 clock_gettime(CLOCK_REALTIME, {1313708646, 853870540}) = 0 fstat(8, {st_dev=makedev(202, 1), st_ino=39806, st_mode=S_IFREG, st_nlink=0, st_uid=1001, st_gid=1001, st_blksize=4096, st_blocks=0, st_size=0, st_atime=2011/08/18-14:32:10, st_mtime=2011/08/18-14:32:10, st_ctime=2011/08/18-22:59:06}) = 0 clock_gettime(CLOCK_REALTIME, {1313708646, 854102750}) = 0 fstat(10, {st_dev=makedev(202, 1), st_ino=45370, st_mode=S_IFREG, st_nlink=0, st_uid=1001, st_gid=1001, st_blksize=4096, st_blocks=0, st_size=0, st_atime=2011/08/18-14:32:10, st_mtime=2011/08/18-14:32:10, st_ctime=2011/08/18-23:04:05}) = 0 clock_gettime(CLOCK_REALTIME, {1313708646, 854260655}) = 0 select(4, [3], NULL, NULL, {598, 630876} With respect to the second scenario (ignoring signals), I'm going to recommend we table that issue for now, as we're currently running on a version of ubuntu (11.10) which has a known signal trapping bug with ruby 1.9.2-p180, and downgrading to 10.04 may solve that problem. Granted, when I've observed this in the past with other libraries, the process becomes completely unresponsive, whereas the behavior in unicorn is more intermittent. Either way, we are in the process of downgrading our servers to ubuntu 10.04, so let's not waste anymore time trying to debug something that may well be occurring due to a kernel bug. If we continue to see this after the downgrade to 10.04, I'll report back and we can keep digging. Again, my apologies for the confusion between the two scenarios, and thanks for all your help. - Alex From normalperson at yhbt.net Thu Aug 18 21:53:12 2011 From: normalperson at yhbt.net (Eric Wong) Date: Fri, 19 Aug 2011 01:53:12 +0000 Subject: Strange quit behavior In-Reply-To: References: <20110802215412.GA12725@dcvr.yhbt.net> <20110805080729.GA6602@dcvr.yhbt.net> <20110817092252.GA7186@dcvr.yhbt.net> <20110817201323.GA24581@dcvr.yhbt.net> Message-ID: <20110819015312.GA29005@dcvr.yhbt.net> Alex Sharp wrote: > Ok, thanks a lot for the for the patch attempt and subsequent in-depth > explanation. I know this thread is long-lived and a bit confusing, > mostly because I've seen two different sets of scenarios that yield > the same result, which is that unicorn does not restart. No problem, I'm here to help and mutt makes it easy to follow long threads :> > I thought that setting the Unicorn::HttpServer::START_CTX[0] in my > unicorn config had solved the first scenario (pegged cpu), but I found > a new occurrence of it today, and I now have some new strace output > for this scenario (strace -v): > > select(4, [3], NULL, NULL, {60, 727229}) = 0 (Timeout) > wait4(-1, 0x7ffffd70f72c, WNOHANG, NULL) = 0 > clock_gettime(CLOCK_REALTIME, {1313708081, 720757662}) = 0 > fstat(8, {st_dev=makedev(202, 1), st_ino=39806, st_mode=S_IFREG, > st_nlink=0, st_uid=1001, st_gid=1001, st_blksize=4096, st_blocks=0, > st_size=0, st_atime=2011/08/18-14:32:10, st_mtime=2011/08/18-14:32:10, > st_ctime=2011/08/18-22:54:06}) = 0 > clock_gettime(CLOCK_REALTIME, {1313708081, 721131305}) = 0 > fstat(10, {st_dev=makedev(202, 1), st_ino=45370, st_mode=S_IFREG, > st_nlink=0, st_uid=1001, st_gid=1001, st_blksize=4096, st_blocks=0, > st_size=0, st_atime=2011/08/18-14:32:10, st_mtime=2011/08/18-14:32:10, > st_ctime=2011/08/18-22:54:05}) = 0 > clock_gettime(CLOCK_REALTIME, {1313708081, 721290800}) = 0 > select(4, [3], NULL, NULL, {565, 34005}) = 0 (Timeout) OK, this looks good from the system call level in the main thread of the master process: I can see that from select() timing out despite being called with long intervals. I don't see workers dying, either... Can you add "-f -e '!futex'" to the strace invocation so we see all the threads? You can also add '-T' to get timing information to see how long each syscall takes or '-tt' to get timestamps in strace if you think it's useful. Since the main thread doesn't appear to be doing much, it's probably a child thread somewhere... Also, I assume you're using preload_app? If you are, do you see this issue with preload_app=false? I suspect there's some background thread that could be running in the master process taking up all the CPU time. Unicorn itself never spawns background threads. Basically anything you can tell use about the app, the configuration, and which gems/libraries would be useful. > With respect to the second scenario (ignoring signals), I'm going to > recommend we table that issue for now, as we're currently running on a > version of ubuntu (11.10) which has a known signal trapping bug with > ruby 1.9.2-p180, and downgrading to 10.04 may solve that problem. > Granted, when I've observed this in the past with other libraries, the > process becomes completely unresponsive, whereas the behavior in > unicorn is more intermittent. Either way, we are in the process of > downgrading our servers to ubuntu 10.04, so let's not waste anymore > time trying to debug something that may well be occurring due to a > kernel bug. If we continue to see this after the downgrade to 10.04, > I'll report back and we can keep digging. OK. I wouldn't rule out the CPU usage as unrelated to the signaling issue, though. The Ruby 1.9 timer thread could be going crazy because it can't signal or receive signals properly... -- Eric Wong From normalperson at yhbt.net Thu Aug 18 22:23:16 2011 From: normalperson at yhbt.net (Eric Wong) Date: Thu, 18 Aug 2011 19:23:16 -0700 Subject: unicorn BYOE for bundler users Message-ID: <20110819022316.GA2951@dcvr.yhbt.net> BYOE = Bring Your Own Executable I just realized this, maybe it's useful and worth trying out if "bundler exec unicorn" is giving you problems. unicorn would be specified in your Gemfile -------------------------- 8< --------------------------------- #!/home/ew/r/trunk/bin/ruby # replace the above shebang with wherever your Ruby executable is # installed. require "rubygems" require "bundler/setup" # Gem.bin_path(gem_name, executable_name, *version_requirements) load Gem.bin_path("unicorn", "unicorn") --------------------------------------------------------------- This above file should be executable and be checked-in with your application source tree (e.g. "script/unicorn-byoe") This also allows you to switch between different Ruby installs by modifying the shebang. Isolate users can do something similar... -- Eric Wong From ajsharp at gmail.com Fri Aug 19 05:42:43 2011 From: ajsharp at gmail.com (Alex Sharp) Date: Fri, 19 Aug 2011 02:42:43 -0700 Subject: Strange quit behavior In-Reply-To: <20110819015312.GA29005@dcvr.yhbt.net> References: <20110802215412.GA12725@dcvr.yhbt.net> <20110805080729.GA6602@dcvr.yhbt.net> <20110817092252.GA7186@dcvr.yhbt.net> <20110817201323.GA24581@dcvr.yhbt.net> <20110819015312.GA29005@dcvr.yhbt.net> Message-ID: On Thu, Aug 18, 2011 at 6:53 PM, Eric Wong wrote: > ?I can see that from select() timing out despite being called with > ?long intervals. ?I don't see workers dying, either... That's correct, the old workers are staying alive as well. > Can you add "-f -e '!futex'" to the strace invocation so we see > all the threads? ?You can also add '-T' to get timing information > to see how long each syscall takes or '-tt' to get timestamps > in strace if you think it's useful. > > Also, I assume you're using preload_app? ?If you are, do you see this > issue with preload_app=false? ?I suspect there's some background thread > that could be running in the master process taking up all the CPU time. > Unicorn itself never spawns background threads. Yes, we're using preload_app=true. I haven't tried it with preload_app=false -- if I get to it tomorrow I'll report back here. We *are* using newrelic, which operates in a background thread. I've just found this ticket in the newrelic support forums: https://support.newrelic.com/help/discussions/support/7692-newrelic_rpm-gem-with-unicorn-40. I'll re-run strace with the suggested flags above and report back. > Basically anything you can tell use about the app, the configuration, > and which gems/libraries would be useful. Gemfile: https://gist.github.com/05a9445471ad7edfdcb7 > OK. ?I wouldn't rule out the CPU usage as unrelated to the signaling > issue, though. ?The Ruby 1.9 timer thread could be going crazy > because it can't signal or receive signals properly... -- alex sharp github.com/ajsharp twitter.com/ajsharp alexjsharp.com From normalperson at yhbt.net Fri Aug 19 12:51:01 2011 From: normalperson at yhbt.net (Eric Wong) Date: Fri, 19 Aug 2011 16:51:01 +0000 Subject: Strange quit behavior In-Reply-To: References: <20110802215412.GA12725@dcvr.yhbt.net> <20110805080729.GA6602@dcvr.yhbt.net> <20110817092252.GA7186@dcvr.yhbt.net> <20110817201323.GA24581@dcvr.yhbt.net> <20110819015312.GA29005@dcvr.yhbt.net> Message-ID: <20110819165101.GA15542@dcvr.yhbt.net> Alex Sharp wrote: > On Thu, Aug 18, 2011 at 6:53 PM, Eric Wong wrote: > > Basically anything you can tell use about the app, the configuration, > > and which gems/libraries would be useful. > > Gemfile: https://gist.github.com/05a9445471ad7edfdcb7 OK, out of the few gems I recognize, I don't see anything else besides newrelic_rpm that might run a background thread. I would scan through the rest of them to be sure (and maybe other folks here can give you pointers). Personally, I never depend on anything before I've had a chance to scan through the code and know what it does, first. -- Eric Wong From normalperson at yhbt.net Fri Aug 19 17:07:49 2011 From: normalperson at yhbt.net (Eric Wong) Date: Fri, 19 Aug 2011 21:07:49 +0000 Subject: [PATCH] Rack::Chunked and ContentLength middlewares by default Message-ID: <20110819210749.GA6809@dcvr.yhbt.net> This was prompted by the recent thread in http://thread.gmane.org/gmane.comp.lang.ruby.unicorn.general/1106 starting with Message-ID: CAL3dLFrkDix=L-STUnrxy7Wuc4wDZOb05NLbG6HABvFJNGmnmQ at mail.gmail.com Rainbows! users have also run into this problem a few times, too: AANLkTimSuK7-ihgCa00D-fot4U-FbZt2diQFndyN4uit at mail.gmail.com 2B157204-E5C6-4F5D-98A9-E2A79F9F9765 at christophsturm.com >From 1077961a3f8933c65d39c7e6c9ed6ff3b6b53647 Mon Sep 17 00:00:00 2001 From: Eric Wong Date: Fri, 19 Aug 2011 20:47:29 +0000 Subject: [PATCH] Rack::Chunked and ContentLength middlewares by default This is needed to match the behavior of Rack::Server for RACK_ENV=(deployment|development), actually. This won't affect users of other RACK_ENV values. This change has minor performance consequences, so users negatively affected should set RACK_ENV to "none" instead for full control of their middleware stack. This mainly affects Rainbows!/Zbatery users since they have persistent connections and /need/ Content-Length or Transfer-Encoding:chunked headers. --- lib/unicorn.rb | 7 +++++++ 1 files changed, 7 insertions(+), 0 deletions(-) diff --git a/lib/unicorn.rb b/lib/unicorn.rb index bb5409b..b882ce3 100644 --- a/lib/unicorn.rb +++ b/lib/unicorn.rb @@ -50,9 +50,14 @@ module Unicorn pp({ :inner_app => inner_app }) if $DEBUG # return value, matches rackup defaults based on env + # Unicorn does not support persistent connections, but Rainbows! + # and Zbatery both do. Users accustomed to the Rack::Server default + # middlewares will need ContentLength/Chunked middlewares. case ENV["RACK_ENV"] when "development" Rack::Builder.new do + use Rack::ContentLength + use Rack::Chunked use Rack::CommonLogger, $stderr use Rack::ShowExceptions use Rack::Lint @@ -60,6 +65,8 @@ module Unicorn end.to_app when "deployment" Rack::Builder.new do + use Rack::ContentLength + use Rack::Chunked use Rack::CommonLogger, $stderr run inner_app end.to_app -- Eric Wong From normalperson at yhbt.net Fri Aug 19 18:18:44 2011 From: normalperson at yhbt.net (Eric Wong) Date: Fri, 19 Aug 2011 22:18:44 +0000 Subject: Strange quit behavior In-Reply-To: <20110803003449.GA8373@dcvr.yhbt.net> References: <20110802215335.GA11745@dcvr.yhbt.net> <20110803003449.GA8373@dcvr.yhbt.net> Message-ID: <20110819221844.GA519@dcvr.yhbt.net> Eric Wong wrote: > Eric Wong wrote: > > If you SIGQUIT/SIGTERM before the app is loaded, the signal could > > be ignored. This behavior should probably change... > > I pushed the following to git://bogomips.org/unicorn > > From 406b8b0e2ed6e5be34d8ec3cd4b16048233c2856 Mon Sep 17 00:00:00 2001 > From: Eric Wong > Date: Tue, 2 Aug 2011 23:52:14 +0000 > Subject: [PATCH] trap death signals in the worker sooner > > This helps close a race condition preventing shutdown if > loading the application (preload_app=false) takes a long > time and the user decides to kil workers instead. The following completely eliminates the race condition: >From 8de6ab371c1623669b86a5dfa8703c8fd539011f Mon Sep 17 00:00:00 2001 From: Eric Wong Date: Fri, 19 Aug 2011 22:13:04 +0000 Subject: [PATCH] close race if an exit signal hits the worker before trap The signal handler from the master is still active and will push the pending signal to SIG_QUEUE if a worker receives a signal immediately after forking. --- lib/unicorn/http_server.rb | 1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/lib/unicorn/http_server.rb b/lib/unicorn/http_server.rb index aa8212e..4f516c9 100644 --- a/lib/unicorn/http_server.rb +++ b/lib/unicorn/http_server.rb @@ -549,6 +549,7 @@ class Unicorn::HttpServer def init_worker_process(worker) # we'll re-trap :QUIT later for graceful shutdown iff we accept clients EXIT_SIGS.each { |sig| trap(sig) { exit!(0) } } + exit!(0) if (SIG_QUEUE & EXIT_SIGS)[0] WORKER_QUEUE_SIGS.each { |sig| trap(sig, nil) } trap(:CHLD, 'DEFAULT') SIG_QUEUE.clear -- Eric Wong From ajsharp at gmail.com Fri Aug 19 18:46:28 2011 From: ajsharp at gmail.com (Alex Sharp) Date: Fri, 19 Aug 2011 15:46:28 -0700 Subject: USR2 + QUIT sometimes interrupts socket connections Message-ID: I've seen this happen a few times during deploys/restarts, and I just got a fresh backtrace, so I thought I'd bring this up. The exception message I get is: Mongo::ConnectionFailure: Failed to connect to host xxx.yyy.zzz.com and port 12345: Interrupted system call - connect(2) Basically, it looks like unicorn is interrupting connect(2) calls when an underlying library is making a socket connection. Backtrace: https://gist.github.com/1158216 Mongo pool.rb: https://github.com/mongodb/mongo-ruby-driver/blob/1.3.1/lib/mongo/util/pool.rb I don't know if this is a unicorn thing or a Mongo implementation problem, but I'm pretty sure I've seen this happen before with Net::HTTP. I'll try to locate that error. -- alex sharp github.com/ajsharp twitter.com/ajsharp alexjsharp.com From normalperson at yhbt.net Fri Aug 19 19:47:37 2011 From: normalperson at yhbt.net (Eric Wong) Date: Fri, 19 Aug 2011 16:47:37 -0700 Subject: USR2 + QUIT sometimes interrupts socket connections In-Reply-To: References: Message-ID: <20110819234737.GA20869@dcvr.yhbt.net> Alex Sharp wrote: > The exception message I get is: Mongo::ConnectionFailure: Failed to > connect to host xxx.yyy.zzz.com and port 12345: Interrupted system > call - connect(2) > I don't know if this is a unicorn thing or a Mongo implementation > problem, but I'm pretty sure I've seen this happen before with > Net::HTTP. I'll try to locate that error. This is a bug in fixed in Ruby 1.9.2-p290 (r31829 in the ruby_1_9_2 branch) -- Eric Wong From cliftonk at gmail.com Fri Aug 19 19:49:41 2011 From: cliftonk at gmail.com (cliftonk at gmail.com) Date: Fri, 19 Aug 2011 18:49:41 -0500 Subject: USR2 + QUIT sometimes interrupts socket connections In-Reply-To: References: Message-ID: <95034CB8-FFB2-4A77-A87F-E9DD47F36A11@gmail.com> On Aug 19, 2011, at 5:46 PM, Alex Sharp wrote: > The exception message I get is: Mongo::ConnectionFailure: Failed to > connect to host xxx.yyy.zzz.com and port 12345: Interrupted system > call - connect(2) I've been experiencing the same issue in production with fauna/memcached gem which is backed by the libmemcached C library. From ajsharp at gmail.com Fri Aug 19 20:04:03 2011 From: ajsharp at gmail.com (Alex Sharp) Date: Fri, 19 Aug 2011 17:04:03 -0700 Subject: USR2 + QUIT sometimes interrupts socket connections In-Reply-To: <20110819234737.GA20869@dcvr.yhbt.net> References: <20110819234737.GA20869@dcvr.yhbt.net> Message-ID: > This is a bug in fixed in Ruby 1.9.2-p290 (r31829 in the ruby_1_9_2 branch) Great, thanks much. - alex From normalperson at yhbt.net Fri Aug 19 20:11:08 2011 From: normalperson at yhbt.net (Eric Wong) Date: Fri, 19 Aug 2011 17:11:08 -0700 Subject: USR2 + QUIT sometimes interrupts socket connections In-Reply-To: <95034CB8-FFB2-4A77-A87F-E9DD47F36A11@gmail.com> References: <95034CB8-FFB2-4A77-A87F-E9DD47F36A11@gmail.com> Message-ID: <20110820001108.GA22342@dcvr.yhbt.net> cliftonk at gmail.com wrote: > On Aug 19, 2011, at 5:46 PM, Alex Sharp wrote: > > The exception message I get is: Mongo::ConnectionFailure: Failed to > > connect to host xxx.yyy.zzz.com and port 12345: Interrupted system > > call - connect(2) > > I've been experiencing the same issue in production with > fauna/memcached gem which is backed by the libmemcached C library. Unlike Mongo and Net::HTTP, the memcached gem doesn't use the standard Ruby socket library. Can you show us a backtrace (and probably Cc: Evan Weaver (fauna)) so we can get it fixed? Any process accepting signals can have system calls fail with EINTR, and buggy libraries don't handle it properly. The Ruby standard library is mostly good at gracefully handling EINTR, and I've gotten all instances I've encountered fixed in MRI 1.9.x. -- Eric Wong From normalperson at yhbt.net Fri Aug 19 20:42:22 2011 From: normalperson at yhbt.net (Eric Wong) Date: Sat, 20 Aug 2011 00:42:22 +0000 Subject: [ANN] unicorn 4.1.0 - small updates and fixes Message-ID: <20110820004222.GA17262@dcvr.yhbt.net> Changes: * Rack::Chunked and Rack::ContentLength middlewares are loaded by default for RACK_ENV=(development|deployment) users to match Rack::Server behavior. As before, use RACK_ENV=none if you want fine-grained control of your middleware. This should also help users of Rainbows! and Zbatery. * CTL characters are now rejected from HTTP header values * Exception messages are now filtered for [:cntrl:] characters since application/middleware authors may forget to do so * Workers will now terminate properly if a SIGQUIT/SIGTERM/SIGINT is received while during worker process initialization. * close-on-exec is explicitly disabled to future-proof against Ruby 2.0 changes [ruby-core:38140] * http://unicorn.bogomips.org/ * mongrel-unicorn at rubyforge.org * git://bogomips.org/unicorn.git * http://unicorn.bogomips.org/NEWS.atom.xml -- Eric Wong From normalperson at yhbt.net Sat Aug 20 03:37:17 2011 From: normalperson at yhbt.net (Eric Wong) Date: Sat, 20 Aug 2011 07:37:17 +0000 Subject: USR2 + QUIT sometimes interrupts socket connections In-Reply-To: <20110820001108.GA22342@dcvr.yhbt.net> References: <95034CB8-FFB2-4A77-A87F-E9DD47F36A11@gmail.com> <20110820001108.GA22342@dcvr.yhbt.net> Message-ID: <20110820073717.GA25799@dcvr.yhbt.net> Eric Wong wrote: > cliftonk at gmail.com wrote: > > On Aug 19, 2011, at 5:46 PM, Alex Sharp wrote: > > > The exception message I get is: Mongo::ConnectionFailure: Failed to > > > connect to host xxx.yyy.zzz.com and port 12345: Interrupted system > > > call - connect(2) > > > > I've been experiencing the same issue in production with > > fauna/memcached gem which is backed by the libmemcached C library. > > Unlike Mongo and Net::HTTP, the memcached gem doesn't use the standard > Ruby socket library. Can you show us a backtrace (and probably Cc: Evan > Weaver (fauna)) so we can get it fixed? Evan said (privately to me) to open a ticket for memcached so he or Brandon can fix it. I guess https://github.com/fauna/memcached/issues ? -- Eric Wong From ibc at aliax.net Sat Aug 20 15:45:17 2011 From: ibc at aliax.net (=?UTF-8?Q?I=C3=B1aki_Baz_Castillo?=) Date: Sat, 20 Aug 2011 21:45:17 +0200 Subject: adding GPLv3 to unicorn license In-Reply-To: <20110428005727.GA19557@dcvr.yhbt.net> References: <20110428005727.GA19557@dcvr.yhbt.net> Message-ID: 2011/4/28 Eric Wong : > Hello, I would like to extend the unicorn license from > (GPLv2|Ruby terms) to (GPLv2|GPLv3|Ruby terms). > > Zed and Evan are still the authors of much of the HTTP parser code in > unicorn and some of the tests, so I'm asking you guys first. ?I would > also like to license kcar[1] in the same way since it's a fork of > the unicorn parser (for client-side usage). > > Adding GPLv3 would allow AGPLv3-only applications to be bundled with > unicorn and distributed. ?This would be useful for people distributing > complete devices running AGPLv3 apps on unicorn. > > I'm a supporter of strong copyleft licenses and I want unicorn to > be distributable with applications with strong copyleft licenses. > > Furthermore, I (or whoever replaces me as project leader when I die) > would like to have the option of adding compatibility for newer versions > of the GPL (as published by the FSF). ?I'm /not/ asking for a "GPLv2 or > later" clause since that would give the FSF too much power for GPLv4. > > I'm not at all interested in adding licenses other than future versions > of the GPL. ?Even though I support AGPLv3 for user-facing applications, > I do /not/ want infrastructure like unicorn to be under the terms of > something like the AGPLv3. > > I'll email the other contributors who have more trivial contributions if > you guys agree. ?Thanks! Hi, I'm fine with it, fully. Cheers. -- I?aki Baz Castillo From joe at tanga.com Mon Aug 22 16:13:50 2011 From: joe at tanga.com (Joe Van Dyk) Date: Mon, 22 Aug 2011 20:13:50 +0000 Subject: Rack content-length Rack::Lint::LintErrors errors with unicorn In-Reply-To: <20110812054252.GA30507@dcvr.yhbt.net> References: <20110812054252.GA30507@dcvr.yhbt.net> Message-ID: On Fri, Aug 12, 2011 at 5:42 AM, Eric Wong wrote: > Joe Van Dyk wrote: >> Has anyone seen anything like this before? ?I can get it to happen all >> the time if I issue a HEAD request, but it only happens very >> intermittently on GET requests. >> >> I'm using Ruby 1.9.2p180. >> >> Any ideas on where to start debugging? > > What web framework and other middlewares are you running? ?Are you using Rack::Head to > generate HEAD responses or something else? > >> 204.93.223.151, 10.195.114.81 - - [11/Aug/2011 21:03:50] "GET / >> HTTP/1.0" 200 37902 0.5316 >> app error: Content-Length header was 37902, but should be 0 >> (Rack::Lint::LintError) >> /mnt/data/tanga/current/bundler/ruby/1.9.1/gems/rack-1.2.3/lib/rack/lint.rb:19:in >> `assert' >> /mnt/data/tanga/current/bundler/ruby/1.9.1/gems/rack-1.2.3/lib/rack/lint.rb:501:in >> `verify_content_length' > > Looking at the 1.2.3 rack/lint.rb code, it should've set @head_request to true > when env["REQUEST_METHOD"] == "HEAD" (rack/lint.rb line 56). > Do you happen to have any middlewares that might rewrite REQUEST_METHOD? > > I would edit rack/lint.rb and put some print statements to show the > value of @head_request and env["REQUEST_METHOD"] Narrowed this down a little bit more. Nginx is receiving a HEAD request, unicorn is logging a GET. Somewhere along the chain, the http method is getting mangled. (I'm seeing a lot of these errors because newrelic is sending HEAD requests to check if the site is up.) From normalperson at yhbt.net Mon Aug 22 16:23:27 2011 From: normalperson at yhbt.net (Eric Wong) Date: Mon, 22 Aug 2011 20:23:27 +0000 Subject: Wanted: OobGC testers on ruby_1_9_3 branch Message-ID: <20110822202327.GB2874@dcvr.yhbt.net> Hello, if you're currently using OobGC on 1.9.2, please try the latest ruby_1_9_3 branch and report back. Do NOT test the 1.9.3preview1 release, there was a GC performance regression fixed since 1.9.3preview1. I'm curious to know if OobGC still helps with the new lazy-sweep GC in 1.9.3. If OobGC still helps your app, I expect it to help less than it did with 1.9.2. I'll understand if you're concerned about stability, but fwiw, I'm happier with Ruby 1.9.3 than I am with 1.9.2. Obviously, please report any 1.9.3 bugs you find to ruby-core. -- Eric Wong From normalperson at yhbt.net Mon Aug 22 16:38:46 2011 From: normalperson at yhbt.net (Eric Wong) Date: Mon, 22 Aug 2011 20:38:46 +0000 Subject: Rack content-length Rack::Lint::LintErrors errors with unicorn In-Reply-To: References: <20110812054252.GA30507@dcvr.yhbt.net> Message-ID: <20110822203846.GA16913@dcvr.yhbt.net> Joe Van Dyk wrote: > Narrowed this down a little bit more. > > Nginx is receiving a HEAD request, unicorn is logging a GET. > Somewhere along the chain, the http method is getting mangled. That's not good. I'm pretty sure all versions of nginx send HEAD requests as-is to Unicorn, so something in your Rack middleware stack is rewriting HEAD => GET. You can strace a Unicorn worker to confirm it receives a HEAD and not a GET at the socket level. Do it on a server that's not receiving any other traffic and use one worker process so you're always stracing the correct worker process. -- Eric Wong From ajsharp at gmail.com Mon Aug 22 22:59:36 2011 From: ajsharp at gmail.com (Alex Sharp) Date: Mon, 22 Aug 2011 19:59:36 -0700 Subject: Strange quit behavior In-Reply-To: References: <20110802215412.GA12725@dcvr.yhbt.net> <20110805080729.GA6602@dcvr.yhbt.net> <20110817092252.GA7186@dcvr.yhbt.net> <20110817201323.GA24581@dcvr.yhbt.net> <20110819015312.GA29005@dcvr.yhbt.net> Message-ID: >> Can you add "-f -e '!futex'" to the strace invocation so we see >> all the threads? ?You can also add '-T' to get timing information >> to see how long each syscall takes or '-tt' to get timestamps >> in strace if you think it's useful. Here's the result of strace with suggested flags on the old master. (To clarify, this is still the spinning CPU issue) ? sudo strace -v -f -e '!futex' -p 18862 Process 18862 attached with 2 threads - interrupt to quit [pid 22170] restart_syscall(<... resuming interrupted call ...> [pid 18862] select(4, [3], NULL, NULL, {3, 546106}) = 0 (Timeout) [pid 18862] wait4(-1, 0x7fffbb907d3c, WNOHANG, NULL) = 0 [pid 18862] clock_gettime(CLOCK_REALTIME, {1314067031, 459472514}) = 0 [pid 18862] fstat(7, {st_dev=makedev(202, 1), st_ino=31695, st_mode=S_IFREG, st_nlink=0, st_uid=1001, st_gid=1001, st_blksize=4096, st_blocks=0, st_size=0, st_atime=2011/08/23-00:09:05, st_mtime=2011/08/23-00:09:05, st_ctime=2011/08/23-02:37:06}) = 0 [pid 18862] clock_gettime(CLOCK_REALTIME, {1314067031, 459813102}) = 0 [pid 18862] fstat(11, {st_dev=makedev(202, 1), st_ino=31696, st_mode=S_IFREG, st_nlink=0, st_uid=1001, st_gid=1001, st_blksize=4096, st_blocks=0, st_size=0, st_atime=2011/08/23-00:09:05, st_mtime=2011/08/23-00:09:05, st_ctime=2011/08/23-02:37:07}) = 0 [pid 18862] clock_gettime(CLOCK_REALTIME, {1314067031, 459999675}) = 0 [pid 18862] fstat(13, {st_dev=makedev(202, 1), st_ino=31697, st_mode=S_IFREG, st_nlink=0, st_uid=1001, st_gid=1001, st_blksize=4096, st_blocks=0, st_size=0, st_atime=2011/08/23-00:09:05, st_mtime=2011/08/23-00:09:05, st_ctime=2011/08/23-02:37:07}) = 0 [pid 18862] clock_gettime(CLOCK_REALTIME, {1314067031, 460168995}) = 0 [pid 18862] fstat(14, {st_dev=makedev(202, 1), st_ino=31698, st_mode=S_IFREG, st_nlink=0, st_uid=1001, st_gid=1001, st_blksize=4096, st_blocks=0, st_size=0, st_atime=2011/08/23-00:09:05, st_mtime=2011/08/23-00:09:05, st_ctime=2011/08/23-02:37:07}) = 0 [pid 18862] clock_gettime(CLOCK_REALTIME, {1314067031, 460343593}) = 0 [pid 18862] select(4, [3], NULL, NULL, {6, 255959}) = 0 (Timeout) [pid 18862] wait4(-1, 0x7fffbb907d3c, WNOHANG, NULL) = 0 [pid 18862] clock_gettime(CLOCK_REALTIME, {1314067037, 739352608}) = 0 [pid 18862] fstat(7, {st_dev=makedev(202, 1), st_ino=31695, st_mode=S_IFREG, st_nlink=0, st_uid=1001, st_gid=1001, st_blksize=4096, st_blocks=0, st_size=0, st_atime=2011/08/23-00:09:05, st_mtime=2011/08/23-00:09:05, st_ctime=2011/08/23-02:37:16}) = 0 [pid 18862] clock_gettime(CLOCK_REALTIME, {1314067037, 739570659}) = 0 [pid 18862] fstat(11, {st_dev=makedev(202, 1), st_ino=31696, st_mode=S_IFREG, st_nlink=0, st_uid=1001, st_gid=1001, st_blksize=4096, st_blocks=0, st_size=0, st_atime=2011/08/23-00:09:05, st_mtime=2011/08/23-00:09:05, st_ctime=2011/08/23-02:37:17}) = 0 [pid 18862] clock_gettime(CLOCK_REALTIME, {1314067037, 739768755}) = 0 [pid 18862] fstat(13, {st_dev=makedev(202, 1), st_ino=31697, st_mode=S_IFREG|01, st_nlink=0, st_uid=1001, st_gid=1001, st_blksize=4096, st_blocks=0, st_size=0, st_atime=2011/08/23-00:09:05, st_mtime=2011/08/23-00:09:05, st_ctime=2011/08/23-02:37:17}) = 0 [pid 18862] clock_gettime(CLOCK_REALTIME, {1314067037, 739951254}) = 0 [pid 18862] fstat(14, {st_dev=makedev(202, 1), st_ino=31698, st_mode=S_IFREG|01, st_nlink=0, st_uid=1001, st_gid=1001, st_blksize=4096, st_blocks=0, st_size=0, st_atime=2011/08/23-00:09:05, st_mtime=2011/08/23-00:09:05, st_ctime=2011/08/23-02:37:17}) = 0 [pid 18862] clock_gettime(CLOCK_REALTIME, {1314067037, 740130965}) = 0 [pid 18862] select(4, [3], NULL, NULL, {9, 996190}^C >> Also, I assume you're using preload_app? ?If you are, do you see this >> issue with preload_app=false? ?I suspect there's some background thread >> that could be running in the master process taking up all the CPU time. >> Unicorn itself never spawns background threads. I went ahead and ran strace with the same flags on the *new* master, and saw a bunch of output that looked bundler-related: https://gist.github.com/138344b5b19ec6ba1a4c Even more bizarre, eventually the process started successfully :-/ Is it possible this had something to do with strace de-taching? You can see this in the unicorn.stderr.log file I included in the gist. Check out these two lines in particular, which occur 25 minutes apart: I, [2011-08-23T02:15:08.396868 #22169] INFO -- : Refreshing Gem list I, [2011-08-23T02:40:16.621210 #22925] INFO -- : worker=1 spawned pid=22925 Question: If I turn on debug flags in unicorn, will I get much of this output in the unicorn.stderr log? - Alex From normalperson at yhbt.net Tue Aug 23 03:12:05 2011 From: normalperson at yhbt.net (Eric Wong) Date: Tue, 23 Aug 2011 07:12:05 +0000 Subject: Strange quit behavior In-Reply-To: References: <20110805080729.GA6602@dcvr.yhbt.net> <20110817092252.GA7186@dcvr.yhbt.net> <20110817201323.GA24581@dcvr.yhbt.net> <20110819015312.GA29005@dcvr.yhbt.net> Message-ID: <20110823071205.GA27074@dcvr.yhbt.net> Alex Sharp wrote: > >> Can you add "-f -e '!futex'" to the strace invocation so we see > >> all the threads? ?You can also add '-T' to get timing information > >> to see how long each syscall takes or '-tt' to get timestamps > >> in strace if you think it's useful. > > Here's the result of strace with suggested flags on the old master. > (To clarify, this is still the spinning CPU issue) > > ? sudo strace -v -f -e '!futex' -p 18862 > Process 18862 attached with 2 threads - interrupt to quit > [pid 22170] restart_syscall(<... resuming interrupted call ...> > [pid 18862] select(4, [3], NULL, NULL, {3, 546106}) = 0 (Timeout) 22170 is presumably the timer thread. Did you send any signals to 18862 while you were making this strace? If you did send a signal to 18862 while stracing, it's worrying it didn't show up in your strace. If you didn't send a signal, then try sending a signal. > >> Also, I assume you're using preload_app? ?If you are, do you see this > >> issue with preload_app=false? ?I suspect there's some background thread > >> that could be running in the master process taking up all the CPU time. > >> Unicorn itself never spawns background threads. > > I went ahead and ran strace with the same flags on the *new* master, > and saw a bunch of output that looked bundler-related: > https://gist.github.com/138344b5b19ec6ba1a4c > > Even more bizarre, eventually the process started successfully :-/ Is > it possible this had something to do with strace de-taching? That looks like pretty normal "require" behavior. strace would slow down requires, a lot. So this was with preload_app=true? While you're debugging problems, I suggest keeping preload_app=false and worker_problems=1 to minimize the variables. > You can see this in the unicorn.stderr.log file I included in the > gist. Check out these two lines in particular, which occur 25 minutes > apart: > > I, [2011-08-23T02:15:08.396868 #22169] INFO -- : Refreshing Gem list > I, [2011-08-23T02:40:16.621210 #22925] INFO -- : worker=1 spawned pid=22925 Wow, it takes 25 minutes to load your application? strace makes the application /much/ slower, so I can actually believe it takes that long. Loading large applications is very slow under Ruby 1.9.2, there's some pathological load performance issues fixed in 1.9.3. So you're saying /without/ strace, CPU usage _stays_ at 100% and _never_ goes down? > Question: If I turn on debug flags in unicorn, will I get much of this > output in the unicorn.stderr log? You'll get every exception that's thrown even if it's rescued. Unicorn 3.x itself doesn't throw that many exceptions. Generally, can you reproduce this behavior on a plain (empty) Rails application with no extra gems? -- Eric Wong From ajsharp at gmail.com Tue Aug 23 12:49:23 2011 From: ajsharp at gmail.com (Alex Sharp) Date: Tue, 23 Aug 2011 09:49:23 -0700 Subject: Strange quit behavior In-Reply-To: <20110823071205.GA27074@dcvr.yhbt.net> References: <20110805080729.GA6602@dcvr.yhbt.net> <20110817092252.GA7186@dcvr.yhbt.net> <20110817201323.GA24581@dcvr.yhbt.net> <20110819015312.GA29005@dcvr.yhbt.net> <20110823071205.GA27074@dcvr.yhbt.net> Message-ID: <359EE87E6F2A4003BF4BCCCED94B44E2@gmail.com> On Tuesday, August 23, 2011 at 12:12 AM, Eric Wong wrote: > Did you send any signals to 18862 while you were making this strace? Not while, no. I had already sent a USR2 signal to this process (the old master), and then I attached strace to it. I'll try sending another USR2 signal next time while strace is attached. > > I went ahead and ran strace with the same flags on the *new* master, > > and saw a bunch of output that looked bundler-related: > > https://gist.github.com/138344b5b19ec6ba1a4c > > > > Even more bizarre, eventually the process started successfully :-/ Is > > it possible this had something to do with strace de-taching? > > That looks like pretty normal "require" behavior. strace would slow > down requires, a lot. > > So this was with preload_app=true? While you're debugging problems, > I suggest keeping preload_app=false and worker_problems=1 to minimize > the variables. Ok, I'll change those and report back. I'm guessing you meant worker_processes (not problems)? > > > You can see this in the unicorn.stderr.log file I included in the > > gist. Check out these two lines in particular, which occur 25 minutes > > apart: > > > > I, [2011-08-23T02:15:08.396868 #22169] INFO -- : Refreshing Gem list > > I, [2011-08-23T02:40:16.621210 #22925] INFO -- : worker=1 spawned pid=22925 > > Wow, it takes 25 minutes to load your application? strace makes the > application /much/ slower, so I can actually believe it takes that long. No, my mistake. Loading the application only takes about 10 seconds, and I only had strace attached to this process for a few seconds (less than 10). My point here was to show that the new master just spun for a good 25 minutes (presumably trying to load files over and over again), and then, seemingly out of nowhere, the new master came up and spawned the new workers. Next time I'll try to get attached with strace earlier and record more output. > > Loading large applications is very slow under Ruby 1.9.2, there's some > pathological load performance issues fixed in 1.9.3. > Yep, I've read about those, and I've seen Xavier's patch, but I don't think that's the issue here (though, it appears that's why the files attempting to be loaded in the strace output do not exist). Under normal circumstances, loading the app takes about 10 seconds and doesn't peg the cpu while doing it. > > So you're saying /without/ strace, CPU usage _stays_ at 100% and _never_ > goes down? Correct. > > > Generally, can you reproduce this behavior on a plain (empty) Rails > application with no extra gems? > Good idea, I'll try that next. - Alex From joe at tanga.com Tue Aug 23 14:22:01 2011 From: joe at tanga.com (Joe Van Dyk) Date: Tue, 23 Aug 2011 11:22:01 -0700 Subject: Rack content-length Rack::Lint::LintErrors errors with unicorn In-Reply-To: <20110822203846.GA16913@dcvr.yhbt.net> References: <20110812054252.GA30507@dcvr.yhbt.net> <20110822203846.GA16913@dcvr.yhbt.net> Message-ID: On Mon, Aug 22, 2011 at 1:38 PM, Eric Wong wrote: > Joe Van Dyk wrote: >> Narrowed this down a little bit more. >> >> Nginx is receiving a HEAD request, unicorn is logging a GET. >> Somewhere along the chain, the http method is getting mangled. > > That's not good. ?I'm pretty sure all versions of nginx send HEAD > requests as-is to Unicorn, so something in your Rack middleware stack is > rewriting HEAD => GET. > > You can strace ?a Unicorn worker to confirm it receives a HEAD and not a > GET at the socket level. ?Do it on a server that's not receiving any > other traffic and use one worker process so you're always stracing the > correct worker process. I started a new Rails application at https://github.com/joevandyk/unicorn-head-requests When I send unicorn a HEAD request, it logs it as a GET. This shows up in the unicorn log: 127.0.0.1 - - [23/Aug/2011 11:15:38] "GET / HTTP/1.1" 200 - 0.0231 But this is what shows up in Rails: Started HEAD "/" for 127.0.0.1 at 2011-08-23 11:15:58 -0700 Processing by WelcomeController#index as From joe at tanga.com Tue Aug 23 14:29:04 2011 From: joe at tanga.com (Joe Van Dyk) Date: Tue, 23 Aug 2011 11:29:04 -0700 Subject: Rack content-length Rack::Lint::LintErrors errors with unicorn In-Reply-To: References: <20110812054252.GA30507@dcvr.yhbt.net> <20110822203846.GA16913@dcvr.yhbt.net> Message-ID: On Tue, Aug 23, 2011 at 11:22 AM, Joe Van Dyk wrote: > On Mon, Aug 22, 2011 at 1:38 PM, Eric Wong wrote: >> Joe Van Dyk wrote: >>> Narrowed this down a little bit more. >>> >>> Nginx is receiving a HEAD request, unicorn is logging a GET. >>> Somewhere along the chain, the http method is getting mangled. >> >> That's not good. ?I'm pretty sure all versions of nginx send HEAD >> requests as-is to Unicorn, so something in your Rack middleware stack is >> rewriting HEAD => GET. >> >> You can strace ?a Unicorn worker to confirm it receives a HEAD and not a >> GET at the socket level. ?Do it on a server that's not receiving any >> other traffic and use one worker process so you're always stracing the >> correct worker process. > > I started a new Rails application at > https://github.com/joevandyk/unicorn-head-requests > > When I send unicorn a HEAD request, it logs it as a GET. ?This shows > up in the unicorn log: > 127.0.0.1 - - [23/Aug/2011 11:15:38] "GET / HTTP/1.1" 200 - 0.0231 > > But this is what shows up in Rails: > Started HEAD "/" for 127.0.0.1 at 2011-08-23 11:15:58 -0700 > ?Processing by WelcomeController#index as I tracked down the exception to a bug in a Rack middleware. But -- unicorn should be logging HEAD requests as a HEAD request (and not a GET), right? Joe From joe at tanga.com Tue Aug 23 15:54:00 2011 From: joe at tanga.com (Joe Van Dyk) Date: Tue, 23 Aug 2011 12:54:00 -0700 Subject: Rack content-length Rack::Lint::LintErrors errors with unicorn In-Reply-To: References: <20110812054252.GA30507@dcvr.yhbt.net> <20110822203846.GA16913@dcvr.yhbt.net> Message-ID: On Tue, Aug 23, 2011 at 11:29 AM, Joe Van Dyk wrote: > On Tue, Aug 23, 2011 at 11:22 AM, Joe Van Dyk wrote: >> On Mon, Aug 22, 2011 at 1:38 PM, Eric Wong wrote: >>> Joe Van Dyk wrote: >>>> Narrowed this down a little bit more. >>>> >>>> Nginx is receiving a HEAD request, unicorn is logging a GET. >>>> Somewhere along the chain, the http method is getting mangled. >>> >>> That's not good. ?I'm pretty sure all versions of nginx send HEAD >>> requests as-is to Unicorn, so something in your Rack middleware stack is >>> rewriting HEAD => GET. >>> >>> You can strace ?a Unicorn worker to confirm it receives a HEAD and not a >>> GET at the socket level. ?Do it on a server that's not receiving any >>> other traffic and use one worker process so you're always stracing the >>> correct worker process. >> >> I started a new Rails application at >> https://github.com/joevandyk/unicorn-head-requests >> >> When I send unicorn a HEAD request, it logs it as a GET. ?This shows >> up in the unicorn log: >> 127.0.0.1 - - [23/Aug/2011 11:15:38] "GET / HTTP/1.1" 200 - 0.0231 >> >> But this is what shows up in Rails: >> Started HEAD "/" for 127.0.0.1 at 2011-08-23 11:15:58 -0700 >> ?Processing by WelcomeController#index as > > I tracked down the exception to a bug in a Rack middleware. ?But -- > unicorn should be logging HEAD requests as a HEAD request (and not a > GET), right? Ok -- this officially isn't a unicorn bug. :) Something in Rails (or something Rails uses). With a HEAD request, env["REQUEST_METHOD"] inside Rack::Lint, is 'GET'. Only happens with Rails (tried it in 3.0 and 3.1). Doesn't happen with Sinatra. Joe From normalperson at yhbt.net Tue Aug 23 16:00:03 2011 From: normalperson at yhbt.net (Eric Wong) Date: Tue, 23 Aug 2011 13:00:03 -0700 Subject: Rack content-length Rack::Lint::LintErrors errors with unicorn In-Reply-To: References: <20110812054252.GA30507@dcvr.yhbt.net> <20110822203846.GA16913@dcvr.yhbt.net> Message-ID: <20110823200003.GB10484@dcvr.yhbt.net> Joe Van Dyk wrote: > Ok -- this officially isn't a unicorn bug. :) Something in Rails (or > something Rails uses). Alright, good to know. I was just running bundler your app when I saw this message :) -- Eric Wong From normalperson at yhbt.net Tue Aug 23 16:23:03 2011 From: normalperson at yhbt.net (Eric Wong) Date: Tue, 23 Aug 2011 20:23:03 +0000 Subject: Strange quit behavior In-Reply-To: <359EE87E6F2A4003BF4BCCCED94B44E2@gmail.com> References: <20110805080729.GA6602@dcvr.yhbt.net> <20110817092252.GA7186@dcvr.yhbt.net> <20110817201323.GA24581@dcvr.yhbt.net> <20110819015312.GA29005@dcvr.yhbt.net> <20110823071205.GA27074@dcvr.yhbt.net> <359EE87E6F2A4003BF4BCCCED94B44E2@gmail.com> Message-ID: <20110823202302.GA7028@dcvr.yhbt.net> Alex Sharp wrote: > On Tuesday, August 23, 2011 at 12:12 AM, Eric Wong wrote: > > So this was with preload_app=true? While you're debugging problems, > > I suggest keeping preload_app=false and worker_problems=1 to minimize > > the variables. > Ok, I'll change those and report back. I'm guessing you meant > worker_processes (not problems)? LOL yes, it was late last night :) > > Generally, can you reproduce this behavior on a plain (empty) Rails > > application with no extra gems? > > > Good idea, I'll try that next. I rarely run into problems, but doing things like this (starting with empty app, Unicorn as default-as-possible (worker_processes=1, preload_app=false)) should be standard procedure for troubleshooting. These things are so second nature to me and I forget to mention it :< I'm also the type who reads the code of /every/ library I introduce into the app so I know (or am very quickly capable) of knowing exactly what an app is doing. -- Eric Wong From normalperson at yhbt.net Tue Aug 23 22:56:22 2011 From: normalperson at yhbt.net (Eric Wong) Date: Tue, 23 Aug 2011 19:56:22 -0700 Subject: adding GPLv3 to unicorn license In-Reply-To: <20110516220341.GE18848@dcvr.yhbt.net> References: <20110428005727.GA19557@dcvr.yhbt.net> <20110516215459.GD18848@dcvr.yhbt.net> <20110516220341.GE18848@dcvr.yhbt.net> Message-ID: <20110824025622.GA8433@dcvr.yhbt.net> Eric Wong wrote: > I would like to add the GPLv3 to the existing Unicorn license, making it > (Ruby terms|GPLv2|GPLv3). Furthermore, I'd like the project leader of > Unicorn (currently me) to be given the option to add future versions of > the GPL (as published by the Free Software Foundation) to the license. > > I'm /not/ asking for a "GPLv2 or later" clause since that would give the > FSF too much power for GPLv4. > > The original request to Zed Shaw and Evan Weaver is here: > http://mid.gmane.org/20110428005727.GA19557 at dcvr.yhbt.net > > Besides myself, Zed Shaw and Evan Weaver have the largest amount of code > in Unicorn, and both have OK-ed the addition of the GPLv3, so I hope you > do the same. I've received affirmative responses from everybody except Wayne Larsen and Ian Ownbey, neither of which were reachable when I last tried. Ian's changes are all from Mongrel and almost entirely rewritten/removed from what I can tell with: 1) git log -p --author=Ian 2) git ls-files | xargs -n1 git blame | grep -i ownbey Wayne's change was a one-liner assignment and rewritten almost immediately by me (even though the idea is important). http://mid.gmane.org/C4216E7A-A91B-4712-93EF-0687668E9ABE at larsen.st I think it's safe to proceed with adding GPLv3 to Unicorn... (GPLv2 and Ruby license terms will remain options, too) -- Eric Wong From normalperson at yhbt.net Wed Aug 24 21:02:59 2011 From: normalperson at yhbt.net (Eric Wong) Date: Wed, 24 Aug 2011 18:02:59 -0700 Subject: [PATCH] doc: add Application Timeouts document Message-ID: <20110825010259.GA7356@dcvr.yhbt.net> Hopefully this leads to fewer worker processes being killed. --- I just pushed this out to the website: http://unicorn.bogomips.org/Application_Timeouts.html Comments/feedback/corrections greatly appreciated. .document | 1 + Application_Timeouts | 77 ++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 78 insertions(+), 0 deletions(-) create mode 100644 Application_Timeouts diff --git a/.document b/.document index a3d7605..4092597 100644 --- a/.document +++ b/.document @@ -26,3 +26,4 @@ unicorn_rails_1 ISSUES Sandbox Links +Application_Timeouts diff --git a/Application_Timeouts b/Application_Timeouts new file mode 100644 index 0000000..5f0370d --- /dev/null +++ b/Application_Timeouts @@ -0,0 +1,77 @@ += Application Timeouts + +This article focuses on _application_ setup for Rack applications, but +can be expanded to all applications that connect to external resources +and expect short response times. + +This article is not specific to \Unicorn, but exists to discourage +the overuse of the built-in +{timeout}[link:Unicorn/Configurator.html#method-i-timeout] directive +in \Unicorn. + +== ALL External Resources Are Considered Unreliable + +Network reliability can _never_ be guaranteed. Network failures cannot +be detected reliably by the client (Rack application) in a reasonable +timeframe, not even on a LAN. + +Thus, application authors must configure timeouts when interacting with +external resources. + +Most database adapters allow configurable timeouts. + +Net::HTTP and Net::SMTP in the Ruby standard library allow +configurable timeouts. + +Even for things as fast as {memcached}[http://memcached.org/], +{dalli}[http://rubygems.org/gems/dalli], +{memcached}[http://rubygems.org/gems/memcached] and +{memcache-client}[http://rubygems.org/gems/memcache-client] RubyGems all +offer configurable timeouts. + +Consult the relevant documentation for the libraries you use on +how to configure these timeouts. + +== Rolling Your Own Socket Code + +Use non-blocking I/O and IO.select with a timeout to wait on sockets. + +== Timeout module in the Ruby standard library + +Ruby offers a Timeout module in its standard library. It has several +caveats and is not always reliable: + +* /Some/ Ruby C extensions are not interrupted/timed-out gracefully by + this module (report these bugs to extension authors, please) but + pure-Ruby components should be. + +* Long-running tasks may run inside `ensure' clauses after timeout + fires, causing the timeout to be ineffective. + +The Timeout module is a second-to-last-resort solution, timeouts using +IO.select (or similar) are more reliable. If you depend on libraries +that do not offer timeouts when connecting to external resources, kindly +ask those library authors to provide configurable timeouts. + +=== A Note About Filesystems + +Most operations to regular files on POSIX filesystems are NOT +interruptable. Thus, the "timeout" module in the Ruby standard library +can not reliably timeout systems with massive amounts of iowait. + +If your app relies on the filesystem, ensure all the data your +application works with is small enough to fit in the kernel page cache. +Otherwise increase the amount of physical memory you have to match, or +employ a fast, low-latency storage system (solid state). + +Volumes mounted over NFS (and thus a potentially unreliable network) +must be mounted with timeouts and applications must be prepared to +handle network/server failures. + +== The Last Line Of Defense + +The {timeout}[link:Unicorn/Configurator.html#method-i-timeout] mechanism +in \Unicorn is an extreme solution that should be avoided whenever +possible. It will help catch bugs in your application where and when +your application forgets to use timeouts, but it is expensive as it +kills and respawns a worker process. -- Eric Wong From normalperson at yhbt.net Thu Aug 25 17:49:28 2011 From: normalperson at yhbt.net (Eric Wong) Date: Thu, 25 Aug 2011 21:49:28 +0000 Subject: [ANN] unicorn 4.1.1 - fix last-resort timeout accuracy Message-ID: <20110825214928.GA15629@dcvr.yhbt.net> Changes: The last-resort timeout mechanism was inaccurate and often delayed in activation since the 2.0.0 release. It is now fixed and remains power-efficient in idle situations, especially with the wakeup reduction in MRI 1.9.3+. There is also a new document on application timeouts intended to discourage the reliance on this last-resort mechanism. It is visible on the web at: http://unicorn.bogomips.org/Application_Timeouts.html * http://unicorn.bogomips.org/ * mongrel-unicorn at rubyforge.org * git://bogomips.org/unicorn.git * http://unicorn.bogomips.org/NEWS.atom.xml -- Eric Wong From ajsharp at gmail.com Thu Aug 25 17:50:53 2011 From: ajsharp at gmail.com (Alex Sharp) Date: Thu, 25 Aug 2011 14:50:53 -0700 Subject: [ANN] unicorn 4.1.1 - fix last-resort timeout accuracy In-Reply-To: <20110825214928.GA15629@dcvr.yhbt.net> References: <20110825214928.GA15629@dcvr.yhbt.net> Message-ID: <3A815B9BD04446549F2F2AE082A6E3C9@gmail.com> Just wanted to say that all of this is extremely informative information. Thanks for putting it out there. - alex From jlevitt at tiptap.com Sun Aug 28 09:35:13 2011 From: jlevitt at tiptap.com (Jay Levitt) Date: Sun, 28 Aug 2011 09:35:13 -0400 Subject: Unicorn for Rails development mode? Message-ID: I'm tired of the bugginess of Webrick, so I want to upgrade to something modern for developing Rails apps on my Mac, and I'm a bit out of the loop. Is Unicorn a drop-in replacement these days in development mode? Does it have advantages or disadvantages compared to Thin? I don't even know what capabilities I should care about, but some obvious ones are: - Doesn't stop logging after a while (hi, Webrick) - Logs to STDOUT or STDERR - Reloads everything that ought to be reloaded on each request - Works well with 1.9.2 Any advice? I'd think this would be a FAQ, but I haven't found any discussions on the topic. From normalperson at yhbt.net Sun Aug 28 19:14:34 2011 From: normalperson at yhbt.net (Eric Wong) Date: Sun, 28 Aug 2011 23:14:34 +0000 Subject: Unicorn for Rails development mode? In-Reply-To: References: Message-ID: <20110828231434.GA9021@dcvr.yhbt.net> Jay Levitt wrote: > I'm tired of the bugginess of Webrick, so I want to upgrade to > something modern for developing Rails apps on my Mac, and I'm a bit > out of the loop. Is Unicorn a drop-in replacement these days in > development mode? Does it have advantages or disadvantages compared to > Thin? I don't even know what capabilities I should care about, but > some obvious ones are: unicorn should be a drop-in replacement, you shouldn't need a unicorn-specific config file for development. The "unicorn_rails"[1] command was designed Rails 1.x/2.x users, while "unicorn" is a better fit for Rails 3 (and all Rack frameworks), but either works for Rails 3 Thin is a great server and I've had no issues with Webrick for development, either. You don't have to worry about slow clients hitting Unicorn in development, so you can forgo nginx for development. > - Doesn't stop logging after a while (hi, Webrick) This is not a problem I've heard of with either server. It /may/ be an issue with the Rails buffered logger implementation, though. The Ruby standard library Logger used by Unicorn does no buffering. > - Logs to STDOUT or STDERR This is the default, and Unicorn explicitly disables output buffering on both of these. > - Reloads everything that ought to be reloaded on each request Set RAILS_ENV=development in your environment. > - Works well with 1.9.2 Yes, I regularly test against 1.8.7, 1.9.2, 1.9.3dev and trunk. > Any advice? I'd think this would be a FAQ, but I haven't found any > discussions on the topic. I don't do much Rails development, but everything /should/ work fine. I will never officially support non-Free systems, but from what I've heard it works fine on the ones names after fruits. Others on this list can hopefully chime in, too. [1] - There's some confusion that was the result of "unicorn_rails" being an automatic compatibility layer for old Rails. If I could to do it all over again, I'd leave "unicorn_rails" out and force folks to setup the compatibility layer themselves to learn how Rack works. -- Eric Wong From normalperson at yhbt.net Mon Aug 29 16:08:58 2011 From: normalperson at yhbt.net (Eric Wong) Date: Mon, 29 Aug 2011 20:08:58 +0000 Subject: adding GPLv3 to unicorn license In-Reply-To: <20110824025622.GA8433@dcvr.yhbt.net> References: <20110428005727.GA19557@dcvr.yhbt.net> <20110516215459.GD18848@dcvr.yhbt.net> <20110516220341.GE18848@dcvr.yhbt.net> <20110824025622.GA8433@dcvr.yhbt.net> Message-ID: <20110829200858.GA22601@dcvr.yhbt.net> Eric Wong wrote: > I think it's safe to proceed with adding GPLv3 to Unicorn... > > (GPLv2 and Ruby license terms will remain options, too) Pushed out: http://bogomips.org/unicorn.git/patch/?id=cd22c59563 From normalperson at yhbt.net Tue Aug 30 20:33:02 2011 From: normalperson at yhbt.net (Eric Wong) Date: Tue, 30 Aug 2011 17:33:02 -0700 Subject: Strange quit behavior In-Reply-To: <20110817092252.GA7186@dcvr.yhbt.net> References: <20110802215412.GA12725@dcvr.yhbt.net> <20110805080729.GA6602@dcvr.yhbt.net> <20110817092252.GA7186@dcvr.yhbt.net> Message-ID: <20110831003302.GA7447@dcvr.yhbt.net> Eric Wong wrote: > + trap(sig) do > + @logger.debug("received SIG#{sig}") I'm even more glad I didn't apply this patch to Unicorn. I completely forgot Logger uses a mutex internally (even though it doesn't need to when writing to a POSIX-compliant file system). Rainbows! has a similar issue I fixed/worked around: http://mid.gmane.org/20110830233232.GA19633 at dcvr.yhbt.net -- Eric Wong