Bugs: Browse | Submit New | Admin

[#5267] exception loss

Date:
2006-08-02 16:00
Priority:
2
Submitted By:
Stephen Anderson (sja)
Assigned To:
Shyouhei Urabe (shyouhei)
Category:
Language / Runtime / Core Libraries
State:
Open
Platform:
 
Summary:
exception loss

Detailed description
Ruby processes intermitently fail to terminate when signalled. The trap declared in these ruby scripts simply raises
an exception and the exception is somehow lost. A simple demonstration setup consists of a exception handler script
(handleException.rb) that creates a thread whose job is to wake up every 10s and count to 1 million. The testcase is
driven by another script (testException.rb) whose job is to fork handleException.rb, wait approximately ten seconds,
then kill the subprocess. If the process dies with 2s the testcase passes. Otherwise the testcase fails and the teardown
tries to kill the subprocess again.

I have seen this testcase complete successfully once (success is deemed as 4000 repetitions), however it was just luck
as subsequent runs of exactly the same code failed after a few hundred repetitions.

Some of the things we have tried to fix this problem:
- Change the exception raised to Interrupt instead of SystemExit.
- Make the signal handler a critical section.
- Move the signal handler declaration into a class.

We have never observed the signal being lost.

This is raised as a high priority problem #458 on our bug tracking system as our daemon processes in a clustered environment
cannot reliably be killed.

Add A Comment: Notepad

Please login


Followup

Message
Date: 2007-06-29 18:49
Sender: Roger Pack

So the problem for this one is that exceptions are thrown then...never
caught by the rescue clause? Is it possible that they 'bypass'
the rescue clause somehow, and kill the thread? I have had that
happen to me recently on windows, and am wondering if it the
same bug or a different one.  To make threads 'more compainy'
when they die I use
Thread.abort_on_exception = true # if a thread dies, tell me :)

GL :)
Date: 2007-06-28 22:51
Sender: Roger Pack

What might be happening is that the exception is indeed generated,
but then is somehow getting 'sent' to the wrong thread.  I will
post my code on this happening in my own experience sometime.
Date: 2007-06-13 03:06
Sender: Roger Pack

I was referring to the topmost few bugs listed
in http://rubyforge.org/pipermail/mongrel-users/2006-May/000302.html
when I mentioned Ruby on Rails.
Date: 2007-06-13 02:57
Sender: Roger Pack

For me I had a problem where 'sometimes' a known class just...wasn't
there when it should of then (using threads).  I.e. it says
"ReXML::Document class unknown" when it is included
at the top of the same script.  Weirder than weird.  I know also
that when attempting to run multi threaded ruby on rails sometimes
methods 'disappeared' from classes, one part of the reason it
cannot yet be multithreaded.  They might be related to the same
problem.  GL!  It's so intermittent that it's hard to find!
Date: 2006-08-15 17:31
Sender: Stephen Anderson

I have reproduced this bug on FreeBSD 6.1-RELEASE SMP, Dell 
PowerEdge 2950 Dual XEON CPU, RAID-5, 4G Ram running ruby 
1.8.4 (2005-12-24) [i386-freebsd6], in 178 repetitions.

Ryan, can I offer you access to one of our servers to debug 
this?
Date: 2006-08-15 00:49
Sender: Ryan Davis

I don't run linux so no, I haven't been able to repro. Either
someone else is going to have to repro this, or you're going
to have to debug it and offer a patch. Sorry. The bug is intermittent
enough and possibly platform dependent that it isn't that high
of a priority for the next release.
Date: 2006-08-14 21:01
Sender: Stephen Anderson

I just reproduced this in 19 repetitions on a less powerful 
Single CPU Dell PowerEdge 350 P3 1GHz 1G Ram, 120G IDE HD, 
running Linux 2.6.12, ruby 1.8.4.
Date: 2006-08-14 20:41
Sender: Stephen Anderson

Sorry, I haven't come up with a faster way of reproducing 
this problem - it is critical but intermitent. I usually 
start it off at the end of the day and check results in the 
morning. I have seen it fail a mere 100 or so repetitions 
into the test. Have you had the chance to try it on Linux 
2.6?
Date: 2006-08-14 19:42
Sender: Ryan Davis

I was unable to reproduce this problem, and really... 40000 second
tests aren't preferred. Is there any way you can get this to
repro quicker? Might it be a platform specific bug?
Date: 2006-08-13 03:18
Sender: Ryan Davis

I'm running this now on osx. I'm not seeing a problem yet and
it is going to be a bitch to reproduce... running this for an
arbitrary 40,000 seconds (half a day) seems to be a waste. Is
there any other way to reproduce this?

Are you sure it isn't an OS specific bug?
Date: 2006-08-02 16:07
Sender: Stephen Anderson

Hardware/OS: Dell 2850/Raid-5/dual XEON CPU/4G Ram running 
Knoppix 4.0.2/Linux 2.6.12 ruby 1.8.4

Attached Files:

Name Description Download
testException.zip Zip of testException and handleException scripts Download

Changes:

Field Old Value Date By
assigned_tozenspider2007-06-13 05:05zenspider
category_idMisc / Other Standard Library2007-05-29 21:46zenspider
category_idLanguage / Runtime / Core Libraries2007-05-29 15:54zenspider
category_idNone2006-08-14 19:42zenspider
assigned_tonone2006-08-13 03:18zenspider
priority32006-08-13 03:18zenspider
File Added717: testException.zip2006-08-02 16:00sja