[Backgroundrb-devel] magical disappearing background processes!

hemant kumar gethemant at gmail.com
Fri Oct 10 08:34:24 EDT 2008


Are your running two copies of BackgrounDRb server on the same machine?
I see, two server instances, in your top output.


On Fri, 2008-10-10 at 14:12 +0200, Jack Nutting wrote:
> Hi all,
> 
> I've been having trouble for a long time with backgroundrb processes
> that suddenly vanish without a trace.  What happens is that at some
> point I discover that all the backgroundrb processes are suddenly
> gone.  Nothing special is seen in any of the log files.  This has
> happened intermittently for a long time, and I was hoping that
> upgrading to 1.0.4 would somehow help me out, but I seem to encounter
> the same problem.
> 
> It happens infrequently, sometimes two-three times a week, sometimes
> not at all for several weeks.  Yesterday it actually happened twice in
> ten minutes during a period when the server was heavily loaded, but
> that's unusual.  Usually when it happens the server is not under a
> heavy load.
> 
> Yesterday when it happened, I had the fortune of having a "top" log
> running in a terminal window, so I'm able to present some more data.
> top was displaying all threads, so most of the processes show up twice
> or more.
> 
> I have 5 background workers running, each apparently has 2 threads,
> plus log_worker with 1 thread and script/backgroundrb with 2 threads.
> My architecture is set up so that only "master" is started
> automatically when backgroundrb starts up, and it in turn starts the
> rest.
> 
> I'm pasting in data for all the backgroundrb processes, sorry for the
> terrible formatting but I can't really think of a better way to
> present this all.
> 
> Here's what it normally looks like while everything is up and running.
>  This is the last "normal" state I found before it starting going
> haywire:
> 
> top - 15:11:13 up 5 days,  5:05,  3 users,  load average: 3.10, 3.09, 3.02
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
> 17508 deploy    15   0 49300  35m 2688 S 11.8  1.7   7:54.65
> /usr/bin/ruby1.8 /usr/bin/packet_worker_runner
> 18:16:mblox_sender:94:/home/deploy/mbargo/lib/workers:/home/deploy/mbar
> 17504 deploy    15   0 49648  35m 2688 S  8.2  1.7   8:01.64
> /usr/bin/ruby1.8 /usr/bin/packet_worker_runner
> 16:14:mblox_sender:94:/home/deploy/mbargo/lib/workers:/home/deploy/mbar
> 14141 deploy    15   0 20796  17m 1612 S  0.3  0.8   2:48.59
> /usr/bin/ruby1.8 /usr/bin/packet_worker_runner
> 8:7:log_worker:17:/home/deploy/mbargo/lib/workers:/home/deploy/mbargo/s
> 14147 deploy    15   0 48232  34m 2556 S  0.3  1.7   5:10.90
> /usr/bin/ruby1.8 /usr/bin/packet_worker_runner
> 11:10:master:4:/home/deploy/mbargo/lib/workers:/home/deploy/mbargo/scri
> 17523 deploy    17   0  132m 115m 3316 R  0.3  5.6   6:43.89
> /usr/bin/ruby1.8 /usr/bin/packet_worker_runner
> 20:18:campaign_starter:39:/home/deploy/mbargo/lib/workers:/home/deploy/
> 14102 deploy    17   0 48320  31m 1364 R  0.0  1.5   3:08.97 ruby
> /home/deploy/mbargo/script/backgroundrb start
> 14144 deploy    15   0 48320  31m 1364 S  0.0  1.5   0:45.35 ruby
> /home/deploy/mbargo/script/backgroundrb start
> 17446 deploy    15   0 48232  34m 2556 S  0.0  1.7   0:43.62
> /usr/bin/ruby1.8 /usr/bin/packet_worker_runner
> 11:10:master:4:/home/deploy/mbargo/lib/workers:/home/deploy/mbargo/scri
> 17486 deploy    15   0 59500  41m 3500 S  0.0  2.0  11:45.15
> /usr/bin/ruby1.8 /usr/bin/packet_worker_runner
> 14:13:receiver:39:/home/deploy/mbargo/lib/workers:/home/deploy/mbargo/s
> 22300 deploy    15   0 59500  41m 3500 S  0.0  2.0   0:45.27
> /usr/bin/ruby1.8 /usr/bin/packet_worker_runner
> 14:13:receiver:39:/home/deploy/mbargo/lib/workers:/home/deploy/mbargo/s
> 23636 deploy    15   0 49648  35m 2688 S  0.0  1.7   0:45.68
> /usr/bin/ruby1.8 /usr/bin/packet_worker_runner
> 16:14:mblox_sender:94:/home/deploy/mbargo/lib/workers:/home/deploy/mbar
> 24042 deploy    15   0 49300  35m 2688 S  0.0  1.7   0:43.58
> /usr/bin/ruby1.8 /usr/bin/packet_worker_runner
> 18:16:mblox_sender:94:/home/deploy/mbargo/lib/workers:/home/deploy/mbar
> 24053 deploy    15   0  132m 115m 3316 S  0.0  5.6   0:43.70
> /usr/bin/ruby1.8 /usr/bin/packet_worker_runner
> 20:18:campaign_starter:39:/home/deploy/mbargo/lib/workers:/home/deploy/
> 
> Next snapshot, 3 seconds later.  script/backgroundrb is gone, and each
> of my workers (except for master) is down to 1 thread.
> 
> top - 15:11:16 up 5 days,  5:05,  3 users,  load average: 3.10, 3.09, 3.02
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
> 17504 deploy    15   0 49648  35m 2688 S 12.6  1.7   8:02.02
> /usr/bin/ruby1.8 /usr/bin/packet_worker_runner
> 16:14:mblox_sender:94:/home/deploy/mbargo/lib/workers:/home/deploy/mbar
> 17486 deploy    17   0 59500  41m 3500 R  0.3  2.0  11:45.16
> /usr/bin/ruby1.8 /usr/bin/packet_worker_runner
> 14:13:receiver:39:/home/deploy/mbargo/lib/workers:/home/deploy/mbargo/s
> 14141 deploy    15   0 20796  17m 1612 S  0.0  0.8   2:48.59
> /usr/bin/ruby1.8 /usr/bin/packet_worker_runner
> 8:7:log_worker:17:/home/deploy/mbargo/lib/workers:/home/deploy/mbargo/s
> 14147 deploy    15   0 48232  34m 2556 S  0.0  1.7   5:10.90
> /usr/bin/ruby1.8 /usr/bin/packet_worker_runner
> 11:10:master:4:/home/deploy/mbargo/lib/workers:/home/deploy/mbargo/scri
> 17446 deploy    15   0 48232  34m 2556 S  0.0  1.7   0:43.62
> /usr/bin/ruby1.8 /usr/bin/packet_worker_runner
> 11:10:master:4:/home/deploy/mbargo/lib/workers:/home/deploy/mbargo/scri
> 22300 deploy    15   0 59500  41m 3500 S  0.0  2.0   0:45.27
> /usr/bin/ruby1.8 /usr/bin/packet_worker_runner
> 14:13:receiver:39:/home/deploy/mbargo/lib/workers:/home/deploy/mbargo/s
> 23636 deploy    15   0 49648  35m 2688 S  0.0  1.7   0:45.68
> /usr/bin/ruby1.8 /usr/bin/packet_worker_runner
> 16:14:mblox_sender:94:/home/deploy/mbargo/lib/workers:/home/deploy/mbar
> 
> Next, 3 seconds after that,  all I have left is master (still 2
> threads) and log_worker:
> 
> top - 15:11:19 up 5 days,  5:05,  3 users,  load average: 2.85, 3.03, 3.01
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
> 14141 deploy    15   0 20796  17m 1612 S  0.0  0.8   2:48.59
> /usr/bin/ruby1.8 /usr/bin/packet_worker_runner
> 8:7:log_worker:17:/home/deploy/mbargo/lib/workers:/home/deploy/mbargo/s
> 14147 deploy    15   0 48232  34m 2556 S  0.0  1.7   5:10.90
> /usr/bin/ruby1.8 /usr/bin/packet_worker_runner
> 11:10:master:4:/home/deploy/mbargo/lib/workers:/home/deploy/mbargo/scri
> 17446 deploy    15   0 48232  34m 2556 S  0.0  1.7   0:43.62
> /usr/bin/ruby1.8 /usr/bin/packet_worker_runner
> 11:10:master:4:/home/deploy/mbargo/lib/workers:/home/deploy/mbargo/scri
> 
> At the next snapshot, all backgroundrb processes are gone.
> 
> This is running on Ubuntu 7.10, backgroundrb 1.0.4.  I'm nowhere near
> maxing out system memory, and there are no memory or other limits set
> on user processes as far as I can tell.  If anyone has any ideas about
> what might cause this, or how to dig deeper, please let me know!  I'm
> nearly at my wits' end.
> 



More information about the Backgroundrb-devel mailing list