[Backgroundrb-devel] magical disappearing background processes!

Jack Nutting jnutting at gmail.com
Fri Oct 10 08:12:44 EDT 2008


Hi all,

I've been having trouble for a long time with backgroundrb processes
that suddenly vanish without a trace.  What happens is that at some
point I discover that all the backgroundrb processes are suddenly
gone.  Nothing special is seen in any of the log files.  This has
happened intermittently for a long time, and I was hoping that
upgrading to 1.0.4 would somehow help me out, but I seem to encounter
the same problem.

It happens infrequently, sometimes two-three times a week, sometimes
not at all for several weeks.  Yesterday it actually happened twice in
ten minutes during a period when the server was heavily loaded, but
that's unusual.  Usually when it happens the server is not under a
heavy load.

Yesterday when it happened, I had the fortune of having a "top" log
running in a terminal window, so I'm able to present some more data.
top was displaying all threads, so most of the processes show up twice
or more.

I have 5 background workers running, each apparently has 2 threads,
plus log_worker with 1 thread and script/backgroundrb with 2 threads.
My architecture is set up so that only "master" is started
automatically when backgroundrb starts up, and it in turn starts the
rest.

I'm pasting in data for all the backgroundrb processes, sorry for the
terrible formatting but I can't really think of a better way to
present this all.

Here's what it normally looks like while everything is up and running.
 This is the last "normal" state I found before it starting going
haywire:

top - 15:11:13 up 5 days,  5:05,  3 users,  load average: 3.10, 3.09, 3.02
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
17508 deploy    15   0 49300  35m 2688 S 11.8  1.7   7:54.65
/usr/bin/ruby1.8 /usr/bin/packet_worker_runner
18:16:mblox_sender:94:/home/deploy/mbargo/lib/workers:/home/deploy/mbar
17504 deploy    15   0 49648  35m 2688 S  8.2  1.7   8:01.64
/usr/bin/ruby1.8 /usr/bin/packet_worker_runner
16:14:mblox_sender:94:/home/deploy/mbargo/lib/workers:/home/deploy/mbar
14141 deploy    15   0 20796  17m 1612 S  0.3  0.8   2:48.59
/usr/bin/ruby1.8 /usr/bin/packet_worker_runner
8:7:log_worker:17:/home/deploy/mbargo/lib/workers:/home/deploy/mbargo/s
14147 deploy    15   0 48232  34m 2556 S  0.3  1.7   5:10.90
/usr/bin/ruby1.8 /usr/bin/packet_worker_runner
11:10:master:4:/home/deploy/mbargo/lib/workers:/home/deploy/mbargo/scri
17523 deploy    17   0  132m 115m 3316 R  0.3  5.6   6:43.89
/usr/bin/ruby1.8 /usr/bin/packet_worker_runner
20:18:campaign_starter:39:/home/deploy/mbargo/lib/workers:/home/deploy/
14102 deploy    17   0 48320  31m 1364 R  0.0  1.5   3:08.97 ruby
/home/deploy/mbargo/script/backgroundrb start
14144 deploy    15   0 48320  31m 1364 S  0.0  1.5   0:45.35 ruby
/home/deploy/mbargo/script/backgroundrb start
17446 deploy    15   0 48232  34m 2556 S  0.0  1.7   0:43.62
/usr/bin/ruby1.8 /usr/bin/packet_worker_runner
11:10:master:4:/home/deploy/mbargo/lib/workers:/home/deploy/mbargo/scri
17486 deploy    15   0 59500  41m 3500 S  0.0  2.0  11:45.15
/usr/bin/ruby1.8 /usr/bin/packet_worker_runner
14:13:receiver:39:/home/deploy/mbargo/lib/workers:/home/deploy/mbargo/s
22300 deploy    15   0 59500  41m 3500 S  0.0  2.0   0:45.27
/usr/bin/ruby1.8 /usr/bin/packet_worker_runner
14:13:receiver:39:/home/deploy/mbargo/lib/workers:/home/deploy/mbargo/s
23636 deploy    15   0 49648  35m 2688 S  0.0  1.7   0:45.68
/usr/bin/ruby1.8 /usr/bin/packet_worker_runner
16:14:mblox_sender:94:/home/deploy/mbargo/lib/workers:/home/deploy/mbar
24042 deploy    15   0 49300  35m 2688 S  0.0  1.7   0:43.58
/usr/bin/ruby1.8 /usr/bin/packet_worker_runner
18:16:mblox_sender:94:/home/deploy/mbargo/lib/workers:/home/deploy/mbar
24053 deploy    15   0  132m 115m 3316 S  0.0  5.6   0:43.70
/usr/bin/ruby1.8 /usr/bin/packet_worker_runner
20:18:campaign_starter:39:/home/deploy/mbargo/lib/workers:/home/deploy/

Next snapshot, 3 seconds later.  script/backgroundrb is gone, and each
of my workers (except for master) is down to 1 thread.

top - 15:11:16 up 5 days,  5:05,  3 users,  load average: 3.10, 3.09, 3.02
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
17504 deploy    15   0 49648  35m 2688 S 12.6  1.7   8:02.02
/usr/bin/ruby1.8 /usr/bin/packet_worker_runner
16:14:mblox_sender:94:/home/deploy/mbargo/lib/workers:/home/deploy/mbar
17486 deploy    17   0 59500  41m 3500 R  0.3  2.0  11:45.16
/usr/bin/ruby1.8 /usr/bin/packet_worker_runner
14:13:receiver:39:/home/deploy/mbargo/lib/workers:/home/deploy/mbargo/s
14141 deploy    15   0 20796  17m 1612 S  0.0  0.8   2:48.59
/usr/bin/ruby1.8 /usr/bin/packet_worker_runner
8:7:log_worker:17:/home/deploy/mbargo/lib/workers:/home/deploy/mbargo/s
14147 deploy    15   0 48232  34m 2556 S  0.0  1.7   5:10.90
/usr/bin/ruby1.8 /usr/bin/packet_worker_runner
11:10:master:4:/home/deploy/mbargo/lib/workers:/home/deploy/mbargo/scri
17446 deploy    15   0 48232  34m 2556 S  0.0  1.7   0:43.62
/usr/bin/ruby1.8 /usr/bin/packet_worker_runner
11:10:master:4:/home/deploy/mbargo/lib/workers:/home/deploy/mbargo/scri
22300 deploy    15   0 59500  41m 3500 S  0.0  2.0   0:45.27
/usr/bin/ruby1.8 /usr/bin/packet_worker_runner
14:13:receiver:39:/home/deploy/mbargo/lib/workers:/home/deploy/mbargo/s
23636 deploy    15   0 49648  35m 2688 S  0.0  1.7   0:45.68
/usr/bin/ruby1.8 /usr/bin/packet_worker_runner
16:14:mblox_sender:94:/home/deploy/mbargo/lib/workers:/home/deploy/mbar

Next, 3 seconds after that,  all I have left is master (still 2
threads) and log_worker:

top - 15:11:19 up 5 days,  5:05,  3 users,  load average: 2.85, 3.03, 3.01
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
14141 deploy    15   0 20796  17m 1612 S  0.0  0.8   2:48.59
/usr/bin/ruby1.8 /usr/bin/packet_worker_runner
8:7:log_worker:17:/home/deploy/mbargo/lib/workers:/home/deploy/mbargo/s
14147 deploy    15   0 48232  34m 2556 S  0.0  1.7   5:10.90
/usr/bin/ruby1.8 /usr/bin/packet_worker_runner
11:10:master:4:/home/deploy/mbargo/lib/workers:/home/deploy/mbargo/scri
17446 deploy    15   0 48232  34m 2556 S  0.0  1.7   0:43.62
/usr/bin/ruby1.8 /usr/bin/packet_worker_runner
11:10:master:4:/home/deploy/mbargo/lib/workers:/home/deploy/mbargo/scri

At the next snapshot, all backgroundrb processes are gone.

This is running on Ubuntu 7.10, backgroundrb 1.0.4.  I'm nowhere near
maxing out system memory, and there are no memory or other limits set
on user processes as far as I can tell.  If anyone has any ideas about
what might cause this, or how to dig deeper, please let me know!  I'm
nearly at my wits' end.

-- 
// jack
// http://www.nuthole.com


More information about the Backgroundrb-devel mailing list