Forums | Admin

Discussion Forums: help

Start New Thread Start New Thread

 

By: Nicolas Fouché
RE: When a worker crashes, another should bor [ reply ]  
2008-08-11 11:19
Here is a case when the respawn works, when a worker taks too much memory:

[INFO] #18593 2008-08-09 12:01:06.301185 <WORKER-18593> Respawning and taking worker status WORKER OVER MAX MEM AT: 1870 MAX: 500
[INFO] #18593 2008-08-09 12:01:06.348230 <WORKER-18593> Worker 18593 stopping...
[WARN] #18593 2008-08-09 12:01:09.136042 <WORKER-18593> WORKER 18593 SCRIPT CAUGHT RESPAWN. RESTARTING WORKER OVER MAX MEM AT: 1870 MAX: 500
[ERROR] #18590 2008-08-09 12:01:13.730352 <MANAGER> Expected 3 workers. 2 running. Starting 2
[WARN] #18590 2008-08-09 12:01:13.865986 <MANAGER> Adding 2 WORKERS. Task Workers: 1, Master Workers: 0 Master & Task Workers: 1
[WARN] #18590 2008-08-09 12:01:15.454696 <MANAGER> Adding Worker #1 PID: 6275 QUEUE: 0, WORKER_TYPE?:task
[WARN] #18590 2008-08-09 12:01:17.624901 <MANAGER> Adding Worker #2 PID: 6276 QUEUE: 0, WORKER_TYPE?:any
[INFO] #18590 2008-08-09 12:01:17.639913 <MANAGER> Worker Distribution [{:master=>0, :any=>1, :task=>1}]
[WARN] #18590 2008-08-09 12:01:34.679756 <MANAGER> Checking started workers, 2 out of 4 after the 1th try...
[WARN] #18590 2008-08-09 12:01:36.760381 <MANAGER> Checking started workers, 2 out of 4 after the 2th try...
[WARN] #18590 2008-08-09 12:01:38.798540 <MANAGER> Checking started workers, 2 out of 4 after the 3th try...
[WARN] #18590 2008-08-09 12:01:40.806493 <MANAGER> Checking started workers, 2 out of 4 after the 4th try...
[WARN] #18590 2008-08-09 12:01:42.817604 <MANAGER> Checking started workers, 2 out of 4 after the 5th try...
[WARN] #18590 2008-08-09 12:01:44.885882 <MANAGER> Checking started workers, 2 out of 4 after the 6th try...
[LOG] #6275 2008-08-09 12:01:45.163029 <WORKER-6275> STARTING WORKER @ VER:1 type:task QUEUE_ID:0
[WARN] #18590 2008-08-09 12:01:46.896196 <MANAGER> Checking started workers, 3 out of 4 after the 7th try...
[LOG] #6276 2008-08-09 12:01:47.788750 <WORKER-6276> STARTING WORKER @ VER:1 type:any QUEUE_ID:0
[WARN] #18590 2008-08-09 12:01:47.999068 <MANAGER> Checking started workers, 3 out of 4 after the 8th try...
[LOG] #18590 2008-08-09 12:01:48.42099 <MANAGER> FINISHED STARTING ALL 4 WORKERS


By: Adam Pisoni
RE: When a worker crashes, another should bor [ reply ]  
2008-08-08 18:19
Hmm... wonder if you found a bug. I was just looking at the skynet_manager code and see where it's supposed to start new workers if some die. I'll have to look more deeply why it's not getting to that code.

By: Nicolas Fouché
When a worker crashes, another should born ? [ reply ]  
2008-08-08 16:32
Hi,

If the Skynet code of a worker crashes, or if my code launched by the worker throws an uncatched exception (or worst case, generates a low level BUS error), the worker dies and no other worker is launched to replace it.

I thought Skynet was able to start a new worker if one dies.

Here is the log:

[DEBUG] #16808 2008-08-08 15:20:25.584563 <MANAGER> Checking on 3 workers...
[ERROR] #16808 2008-08-08 15:20:25.584925 <MANAGER> Worker 16813 was in queue and but was not running. Removing from queue.
[DEBUG] #16810 2008-08-08 15:20:27.945438 <MYSQLMQ> MISS PTYPE task OLDTEMP: 1 NEWTEMP: 1
[DEBUG] #16810 2008-08-08 15:20:27.945613 <MYSQLMQ> EMPTY QUEUE 1 SLEEPING: 60 / 27
[DEBUG] #16808 2008-08-08 15:20:30.594508 <MANAGER> Checking on 2 workers...

Nicolas

PS: thanks for this great tool :)