[Backgroundrb-devel] magical disappearing background processes!
Jack Nutting
jnutting at gmail.com
Fri Oct 10 08:12:44 EDT 2008
Hi all,
I've been having trouble for a long time with backgroundrb processes
that suddenly vanish without a trace. What happens is that at some
point I discover that all the backgroundrb processes are suddenly
gone. Nothing special is seen in any of the log files. This has
happened intermittently for a long time, and I was hoping that
upgrading to 1.0.4 would somehow help me out, but I seem to encounter
the same problem.
It happens infrequently, sometimes two-three times a week, sometimes
not at all for several weeks. Yesterday it actually happened twice in
ten minutes during a period when the server was heavily loaded, but
that's unusual. Usually when it happens the server is not under a
heavy load.
Yesterday when it happened, I had the fortune of having a "top" log
running in a terminal window, so I'm able to present some more data.
top was displaying all threads, so most of the processes show up twice
or more.
I have 5 background workers running, each apparently has 2 threads,
plus log_worker with 1 thread and script/backgroundrb with 2 threads.
My architecture is set up so that only "master" is started
automatically when backgroundrb starts up, and it in turn starts the
rest.
I'm pasting in data for all the backgroundrb processes, sorry for the
terrible formatting but I can't really think of a better way to
present this all.
Here's what it normally looks like while everything is up and running.
This is the last "normal" state I found before it starting going
haywire:
top - 15:11:13 up 5 days, 5:05, 3 users, load average: 3.10, 3.09, 3.02
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
17508 deploy 15 0 49300 35m 2688 S 11.8 1.7 7:54.65
/usr/bin/ruby1.8 /usr/bin/packet_worker_runner
18:16:mblox_sender:94:/home/deploy/mbargo/lib/workers:/home/deploy/mbar
17504 deploy 15 0 49648 35m 2688 S 8.2 1.7 8:01.64
/usr/bin/ruby1.8 /usr/bin/packet_worker_runner
16:14:mblox_sender:94:/home/deploy/mbargo/lib/workers:/home/deploy/mbar
14141 deploy 15 0 20796 17m 1612 S 0.3 0.8 2:48.59
/usr/bin/ruby1.8 /usr/bin/packet_worker_runner
8:7:log_worker:17:/home/deploy/mbargo/lib/workers:/home/deploy/mbargo/s
14147 deploy 15 0 48232 34m 2556 S 0.3 1.7 5:10.90
/usr/bin/ruby1.8 /usr/bin/packet_worker_runner
11:10:master:4:/home/deploy/mbargo/lib/workers:/home/deploy/mbargo/scri
17523 deploy 17 0 132m 115m 3316 R 0.3 5.6 6:43.89
/usr/bin/ruby1.8 /usr/bin/packet_worker_runner
20:18:campaign_starter:39:/home/deploy/mbargo/lib/workers:/home/deploy/
14102 deploy 17 0 48320 31m 1364 R 0.0 1.5 3:08.97 ruby
/home/deploy/mbargo/script/backgroundrb start
14144 deploy 15 0 48320 31m 1364 S 0.0 1.5 0:45.35 ruby
/home/deploy/mbargo/script/backgroundrb start
17446 deploy 15 0 48232 34m 2556 S 0.0 1.7 0:43.62
/usr/bin/ruby1.8 /usr/bin/packet_worker_runner
11:10:master:4:/home/deploy/mbargo/lib/workers:/home/deploy/mbargo/scri
17486 deploy 15 0 59500 41m 3500 S 0.0 2.0 11:45.15
/usr/bin/ruby1.8 /usr/bin/packet_worker_runner
14:13:receiver:39:/home/deploy/mbargo/lib/workers:/home/deploy/mbargo/s
22300 deploy 15 0 59500 41m 3500 S 0.0 2.0 0:45.27
/usr/bin/ruby1.8 /usr/bin/packet_worker_runner
14:13:receiver:39:/home/deploy/mbargo/lib/workers:/home/deploy/mbargo/s
23636 deploy 15 0 49648 35m 2688 S 0.0 1.7 0:45.68
/usr/bin/ruby1.8 /usr/bin/packet_worker_runner
16:14:mblox_sender:94:/home/deploy/mbargo/lib/workers:/home/deploy/mbar
24042 deploy 15 0 49300 35m 2688 S 0.0 1.7 0:43.58
/usr/bin/ruby1.8 /usr/bin/packet_worker_runner
18:16:mblox_sender:94:/home/deploy/mbargo/lib/workers:/home/deploy/mbar
24053 deploy 15 0 132m 115m 3316 S 0.0 5.6 0:43.70
/usr/bin/ruby1.8 /usr/bin/packet_worker_runner
20:18:campaign_starter:39:/home/deploy/mbargo/lib/workers:/home/deploy/
Next snapshot, 3 seconds later. script/backgroundrb is gone, and each
of my workers (except for master) is down to 1 thread.
top - 15:11:16 up 5 days, 5:05, 3 users, load average: 3.10, 3.09, 3.02
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
17504 deploy 15 0 49648 35m 2688 S 12.6 1.7 8:02.02
/usr/bin/ruby1.8 /usr/bin/packet_worker_runner
16:14:mblox_sender:94:/home/deploy/mbargo/lib/workers:/home/deploy/mbar
17486 deploy 17 0 59500 41m 3500 R 0.3 2.0 11:45.16
/usr/bin/ruby1.8 /usr/bin/packet_worker_runner
14:13:receiver:39:/home/deploy/mbargo/lib/workers:/home/deploy/mbargo/s
14141 deploy 15 0 20796 17m 1612 S 0.0 0.8 2:48.59
/usr/bin/ruby1.8 /usr/bin/packet_worker_runner
8:7:log_worker:17:/home/deploy/mbargo/lib/workers:/home/deploy/mbargo/s
14147 deploy 15 0 48232 34m 2556 S 0.0 1.7 5:10.90
/usr/bin/ruby1.8 /usr/bin/packet_worker_runner
11:10:master:4:/home/deploy/mbargo/lib/workers:/home/deploy/mbargo/scri
17446 deploy 15 0 48232 34m 2556 S 0.0 1.7 0:43.62
/usr/bin/ruby1.8 /usr/bin/packet_worker_runner
11:10:master:4:/home/deploy/mbargo/lib/workers:/home/deploy/mbargo/scri
22300 deploy 15 0 59500 41m 3500 S 0.0 2.0 0:45.27
/usr/bin/ruby1.8 /usr/bin/packet_worker_runner
14:13:receiver:39:/home/deploy/mbargo/lib/workers:/home/deploy/mbargo/s
23636 deploy 15 0 49648 35m 2688 S 0.0 1.7 0:45.68
/usr/bin/ruby1.8 /usr/bin/packet_worker_runner
16:14:mblox_sender:94:/home/deploy/mbargo/lib/workers:/home/deploy/mbar
Next, 3 seconds after that, all I have left is master (still 2
threads) and log_worker:
top - 15:11:19 up 5 days, 5:05, 3 users, load average: 2.85, 3.03, 3.01
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
14141 deploy 15 0 20796 17m 1612 S 0.0 0.8 2:48.59
/usr/bin/ruby1.8 /usr/bin/packet_worker_runner
8:7:log_worker:17:/home/deploy/mbargo/lib/workers:/home/deploy/mbargo/s
14147 deploy 15 0 48232 34m 2556 S 0.0 1.7 5:10.90
/usr/bin/ruby1.8 /usr/bin/packet_worker_runner
11:10:master:4:/home/deploy/mbargo/lib/workers:/home/deploy/mbargo/scri
17446 deploy 15 0 48232 34m 2556 S 0.0 1.7 0:43.62
/usr/bin/ruby1.8 /usr/bin/packet_worker_runner
11:10:master:4:/home/deploy/mbargo/lib/workers:/home/deploy/mbargo/scri
At the next snapshot, all backgroundrb processes are gone.
This is running on Ubuntu 7.10, backgroundrb 1.0.4. I'm nowhere near
maxing out system memory, and there are no memory or other limits set
on user processes as far as I can tell. If anyone has any ideas about
what might cause this, or how to dig deeper, please let me know! I'm
nearly at my wits' end.
--
// jack
// http://www.nuthole.com
More information about the Backgroundrb-devel
mailing list