After investigating some waitpid problems, I was able to repoduce the bug in a simple test case. Make sure to save your
time or re-run your ntp client before executing the script :
-- test.rb (run as root) --
pid = fork { exec 'sleep 100000' }
t = Thread.new do
puts "Waiting pid #{pid}"
Process.waitpid(pid)
puts "Pid gone"
end
sleep 2
system "ps -p#{pid}"
system 'date -s 0000'
system "kill #{pid}"
sleep 1
system "ps -p#{pid}"
puts 'Should have printed "Pid gone"'
t.join
---------------
The critical line, is "date -s 0000". If you remove it, everything is fine. If you don't, you'll see the second
"ps -p#{pid}" show a <defunct> process and "Pid gone" will not be printed.
From a short code review, it seems like the thread delay is stored as an absolute time (eval.c:10904). When setting
the time backwards, the thread won't be executed until it goes back to the time it was last executed. In our case, the
C waitpid WNOHANG won't be called and we'll see the defunct process.
I don't know for other platforms, but Linux has a monotonic timer that doesn't suffer the "ntp" problem, which
could be used to fix that kind of problem.
# ruby -v
ruby 1.8.6 (2007-06-07 patchlevel 36) [i486-linux]
|