 |
Forums |
Admin Discussion Forums: help Start New Thread
By: Adam Pisoni
RE: skynet worker contention issues? [ reply ] 2008-06-17 19:56
|
So, execute is returning nil. That's strange indeed. There's no rescue block around that so its not throwing an exception. I'm trying to think of what context execute would return nil. You'd think it would always return a result record. I've never seen that before in any other skynet install. Have you tried googling it? Seems like an AR problem. What version of AR are you using?
adam
|
By: Ann Lewis
skynet worker contention issues? [ reply ] 2008-06-17 18:38
|
I've been seeing some strange worker errors on long-running jobs. Workers will starting dying when the skynet manager tries to respawn them. They die with these exceptions:
[FATAL] #20635 2008-06-17 11:12:25.319132 <WORKER-20635> WORKER 20635 DYING NoMethodError undefined method `all_hashes' for nil:NilClass /home/deploy/staging/dimwit/vendor/rails/activerecord/lib/active_record/connection_adapters/mysql_adapter.rb:482:in `select'/home/deploy/staging/dimwit/vendor/rails/activerecord/lib/active_record/connection_adapters/abstract/database_statements.rb:7:in `select_all_without_query_cache'/home/deploy/staging/dimwit/vendor/rails/activerecord/lib/active_record/connection_adapters/abstract/query_cache.rb:55:in `select_all'/home/deploy/staging/dimwit/vendor/rails/activerecord/lib/active_record/connection_adapters/abstract/database_statements.rb:13:in `select_one'/home/deploy/staging/dimwit/vendor/rails/activerecord/lib/active_record/connection_adapters/abstract/database_statements.rb:19:in `select_value'/home/deploy/staging/dimwit/vendor/gems/skynet-0.9.3/lib/skynet/message_queue_adapters/mysql.rb:296:in `get_worker_version'/home/deploy/staging/dimwit/vendor/gems/skynet-0.9.3/bin/../lib/skynet/skynet_message_queue.rb:64:in `get_worker_version'/home/deploy/staging/dimwit/vendor/gems/skynet-0.9.3/bin/../lib/skynet/skynet_worker.rb:76:in `new_version_respawn?'/home/deploy/staging/dimwit/vendor/gems/skynet-0.9.3/bin/../lib/skynet/skynet_worker.rb:268:in `start'/home/deploy/staging/dimwit/vendor/gems/skynet-0.9.3/bin/../lib/skynet/skynet_worker.rb:197:in `loop'/home/deploy/staging/dimwit/vendor/gems/skynet-0.9.3/bin/../lib/skynet/skynet_worker.rb:197:in `start'/home/deploy/staging/dimwit/vendor/gems/skynet-0.9.3/bin/../lib/skynet/skynet_worker.rb:423:in `start'/home/deploy/staging/dimwit/vendor/gems/skynet-0.9.3/bin/../lib/skynet/skynet_launcher.rb:25:in `start'/home/deploy/staging/dimwit/vendor/gems/skynet-0.9.3/bin/skynet:75
[ERROR] #20633 2008-06-17 11:12:28.424291 <MANAGER> Worker 20635 was in queue and but was not running. Removing from queue.
When I open up active_record/connection_adapters/mysql_adapter.rb, I see:
def select(sql, name = nil)
@connection.query_with_result = true
result = execute(sql, name)
rows = result.all_hashes <---------- line 482
result.free
rows
end
The sql result is nil, which seems to be due to a sql timeout.
MySQL contention? At first I thought I could be causing this but running with a large number of mappers and reducers, so I dialed my mappers and reducers down to 1 apiece, and still I see these errors. Could it be coming from contention between the skynet workers themselves? Anyone else seen this problem?
|
|
 |