Support for Soft Timeout in Unicorn

Pierre Baillet oct at fotonauts.com
Thu Jun 3 14:06:02 EDT 2010


Hello,

On Thu, Jun 3, 2010 at 7:37 PM, Eric Wong <normalperson at yhbt.net> wrote:
>
> Hi,
>
> HTML attachments are wasteful and thus rejected from the mailing list.
> On the other hand, it actually helps to include the patch itself
> (inline) so it's readable without a (human) context switch :)

Indeed,
sorry for the HTML attachment, I have no idea where it comes from. As
for the patch, here you are. This is really just a way to handle
SIGABRT in a specific way in the worker and allow proper termination
of the application. Note the FIXME comment I added in the
murder_lazy_workers method. If any worker blocks while all the others
are idle for a _timeout_ period of time, they will all be killed
anyway. The consequence of that is that Unicorn will restart all its
workers if traffic is very low on the server.

diff --git a/lib/unicorn.rb b/lib/unicorn.rb
index a363014..855f26a 100644
--- a/lib/unicorn.rb
+++ b/lib/unicorn.rb
@@ -84,7 +84,7 @@ module Unicorn
   # Listener sockets are started in the master process and shared with
   # forked worker children.

-  class HttpServer < Struct.new(:app, :timeout, :worker_processes,
+  class HttpServer < Struct.new(:app, :soft_timeout, :timeout,
:worker_processes,
                                 :before_fork, :after_fork, :before_exec,
                                 :logger, :pid, :listener_opts, :preload_app,
                                 :reexec_pid, :orig_app, :init_listeners,
@@ -393,7 +393,7 @@ module Unicorn
           when nil
             # avoid murdering workers after our master process (or the
             # machine) comes out of suspend/hibernation
-            if (last_check + timeout) >= (last_check = Time.now)
+            if (last_check + soft_timeout) >= (last_check = Time.now)
               murder_lazy_workers
             else
               # wait for workers to wakeup on suspend
@@ -581,10 +581,20 @@ module Unicorn
         stat = worker.tmp.stat
         # skip workers that disable fchmod or have never fchmod-ed
         stat.mode == 0100600 and next
-        (diff = (Time.now - stat.ctime)) <= timeout and next
-        logger.error "worker=#{worker.nr} PID:#{wpid} timeout " \
+        # FIXME: if the worker has not been working for soft_timeout,
it will be
+        # killed even if it is not blocking
+        (diff = (Time.now - stat.ctime)) <= soft_timeout and
+          diff <= timeout and next
+        # lazy since less than timeout, attempt soft kill
+        if diff < timeout
+          logger.error "worker=#{worker.nr} PID:#{wpid} soft timeout " \
+                     "(#{diff}s > #{soft_timeout}s), killing softly"
+          kill_worker(:ABRT, wpid)
+        else
+         logger.error "worker=#{worker.nr} PID:#{wpid} hard timeout " \
:
diff --git a/lib/unicorn.rb b/lib/unicorn.rb
index a363014..855f26a 100644
--- a/lib/unicorn.rb
+++ b/lib/unicorn.rb
@@ -84,7 +84,7 @@ module Unicorn
   # Listener sockets are started in the master process and shared with
   # forked worker children.

-  class HttpServer < Struct.new(:app, :timeout, :worker_processes,
+  class HttpServer < Struct.new(:app, :soft_timeout, :timeout,
:worker_processes,
                                 :before_fork, :after_fork, :before_exec,
                                 :logger, :pid, :listener_opts, :preload_app,
                                 :reexec_pid, :orig_app, :init_listeners,
@@ -393,7 +393,7 @@ module Unicorn
           when nil
             # avoid murdering workers after our master process (or the
             # machine) comes out of suspend/hibernation
-            if (last_check + timeout) >= (last_check = Time.now)
+            if (last_check + soft_timeout) >= (last_check = Time.now)
               murder_lazy_workers
             else
               # wait for workers to wakeup on suspend
@@ -581,10 +581,20 @@ module Unicorn
         stat = worker.tmp.stat
         # skip workers that disable fchmod or have never fchmod-ed
         stat.mode == 0100600 and next
-        (diff = (Time.now - stat.ctime)) <= timeout and next
-        logger.error "worker=#{worker.nr} PID:#{wpid} timeout " \
+        # FIXME: if the worker has not been working for soft_timeout,
it will be
+        # killed even if it is not blocking
+        (diff = (Time.now - stat.ctime)) <= soft_timeout and
+          diff <= timeout and next
+        # lazy since less than timeout, attempt soft kill
+        if diff < timeout
+          logger.error "worker=#{worker.nr} PID:#{wpid} soft timeout " \
+                     "(#{diff}s > #{soft_timeout}s), killing softly"
+          kill_worker(:ABRT, wpid)
+        else
+         logger.error "worker=#{worker.nr} PID:#{wpid} hard timeout " \
                      "(#{diff}s > #{timeout}s), killing"
-        kill_worker(:KILL, wpid) # take no prisoners for timeout violations
+          kill_worker(:KILL, wpid) # take no prisoners for timeout violations
+        end
       end
     end

@@ -657,6 +667,12 @@ module Unicorn
       proc_name "worker[#{worker.nr}]"
       START_CTX.clear
       init_self_pipe!
+
+      # try to handle SIGABRT correctly
+      trap('ABRT') do
+        raise SignalException, "SIGABRT"
+      end
+
       WORKERS.values.each { |other| other.tmp.close rescue nil }
       WORKERS.clear
       LISTENERS.each { |sock| sock.fcntl(Fcntl::F_SETFD, Fcntl::FD_CLOEXEC) }
diff --git a/lib/unicorn/configurator.rb b/lib/unicorn/configurator.rb
index 64a25e3..6efb0c5 100644
--- a/lib/unicorn/configurator.rb
+++ b/lib/unicorn/configurator.rb
@@ -14,6 +14,8 @@ module Unicorn

     # Default settings for Unicorn
     DEFAULTS = {
+      # Backward compatibility soft timeout (disabled in default configuration)
+      :soft_timeout => 60,
       :timeout => 60,
       :logger => Logger.new($stderr),
       :worker_processes => 1,
@@ -129,6 +131,23 @@ module Unicorn

     # sets the timeout of worker processes to +seconds+.  Workers
     # handling the request/app.call/response cycle taking longer than
+    # this time period will be softly killed (via SIGABRT).  This
+    # timeout is enforced by the master process itself and not subject
+    # to the scheduling limitations by the worker process.  Due the
+    # low-complexity, low-overhead implementation, timeouts of less
+    # than 3.0 seconds can be considered inaccurate and unsafe.
+    # ABORT is handled by the worker and raise an exception, offering a
+    # way to log the stack trace in your rails application.
+
+    def soft_timeout(seconds)
+      Numeric === seconds or raise ArgumentError,
+                                  "not numeric: timeout=#{seconds.inspect}"
+      seconds >= 3 or raise ArgumentError,
+                                  "too low: timeout=#{seconds.inspect}"
+      set[:soft_timeout] = seconds
+    end
+    # sets the timeout of worker processes to +seconds+.  Workers
+    # handling the request/app.call/response cycle taking longer than
     # this time period will be forcibly killed (via SIGKILL).  This
     # timeout is enforced by the master process itself and not subject
     # to the scheduling limitations by the worker process.  Due the
@@ -159,6 +178,7 @@ module Unicorn
       set[:timeout] = seconds
     end

+
     # sets the current number of worker_processes to +nr+.  Each worker
     # process will serve exactly one client at a time.  You can
     # increment or decrement this value at runtime by sending SIGTTIN


Cheers,
--
Pierre Baillet <oct at fotonauts.com>
http://www.fotopedia.com/


More information about the mongrel-unicorn mailing list