[Backgroundrb-devel] Workers sleeping habbits

Erik Morton eimorton at gmail.com
Fri Sep 28 10:39:19 EDT 2007


I'm having some issues with Workers dying after a period of several  
hours.

Each worker runs a loop that asks Amazon SQS for work to do. If there  
is a message in the queue the work is completed (image processing,  
etc...); if there is no message the worker will sleep for X seconds  
(sleep 10, etc...). I've noticed that the workers will frequently  
exhibit two negative behaviors: A) stop asking for requests, but  
still exist as a process; or B) die completely (no more process) with  
no errors reported in either log file.

I made a simple DeathWorker last night to try to find out exactly  
*when* death occurs. The worker will log when it asks for a message,  
when it goes to sleep, and when it wakes up. Like so:

09/27/2007 13:23:05 (7673) DeathWorker: SQSMiddleMan.next_message 
(:death_worker)
09/27/2007 13:23:05 (7673) DeathWorker: No message. Going to sleep.
09/27/2007 13:23:34 (7673) DeathWorker: Done sleeping.

The above log entries show the normal course of operation for the  
DeathWorker: look for a message, almost immediately report that there  
is no message and go to sleep for 10 seconds. Wake up and log that  
you are awake. As you can see there was more than 10 seconds between  
logging that you were going to sleep and then waking up. Is it  
possible that the log synchronization that occurs through the logging  
worker causes the delay?

This happened later in the night:

09/27/2007 13:23:38 (7673) DeathWorker: No message. Going to sleep.
09/27/2007 13:27:15 (7673) DeathWorker: Done sleeping.

Almost four minutes of sleep  when I call sleep 10. Interesting.

Later in the night:

09/27/2007 13:50:13 (7673) DeathWorker: No message. Going to sleep.
09/27/2007 19:29:36 (7673) DeathWorker: Done sleeping.

Wow! Almost 6 hours of sleeping!

After that nap the worker went for another 10 minutes or so and then  
the process actually died, with no errors reported in the log. Any  
idea what is going on? How can I debug this issue? Every time I try  
to attach to the oversleeping process with GDB it segfaults. Thanks  
in advance!

Erik


More information about the Backgroundrb-devel mailing list