[Backgroundrb-devel] Intermittent "can't convert Float into Hash" and results.rb

Mason Hale masonhale at gmail.com
Wed Jan 10 14:39:11 EST 2007


I've done some more work on this and have created a test case that reliably
throws errors, although the errors themselves are not consistent.
About 1 out of every 4 times, I get the "can't convert Symbol to Hash" error
in server/lib/backgroundrb/results.rb:40 in 'merge!'.

I created the following worker class is
{RAILS_ROOT}/lib/workers/results_test_worker.rb

# This class repeatedly writes values to the results, to
# test the results process
class ResultsTestWorker < BackgrounDRb::Worker::RailsBase

  def do_work(args)
    logger.info "Started ResultsTestWorker"
    results[:started_at] = Time.now
    args ||= {}
    limit = args[:limit] || 10_000

    logger.info "Limit is #{limit}"

    limit.times do |i|
      results[:last_update] = Time.now
      results[:counter] = i
    end
    stop_time = Time.now
    logger.info "Stopped ResultsTestWorker at #{stop_time}"
    results[:stopped_at] = stop_time
    self.delete
  end

end
ResultsTestWorker.register

Then in {RAILS_ROOT}/test/unit/drb_results_test.rb  I have:

require File.dirname(__FILE__) + '/../test_helper'

class DrbResultsTest < Test::Unit::TestCase

  def setup
    # start backgroundrb server
    `../../script/backgroundrb start`
    sleep 5 # give it time to startup
  end

  def teardown
    # stop backgroundrb server
    `../../script/backgroundrb stop`
  end

  def test_results
    limit = 10
    keys = []
    4.times do |i|
      job_key = "#{self.class.name}_#{i}"
      keys << job_key
      MiddleMan.new_worker(:class => :results_test_worker, :job_key =>
job_key, :args => {:limit => limit})
    end

    sleep 2 # wait for workers to finish

    keys.each_with_index do |k, i|
      assert_not_nil MiddleMan[k], "checking job_key #{k} on iteration #{i}"
      assert_not_nil MiddleMan[k].object, "checking object on iteration
#{i}"
      assert_not_nil MiddleMan[k].object.results, "checking results on
iteration #{i}"
      assert_equal(limit - 1, MiddleMan[k].object.results.to_hash[:counter],
"checking counter on iteration #{i}")
    end

  end
end

This test does the following:
- Spawns 4 results_test_worker processes that each write several values to
the ResultsWorker (in parallel)
Increasing the limit value increases the odds of these processes
concurrently trying to write results at the
same time, but I've found that a limit of 10 works pretty well.
- It waits a couple seconds for the workers to finish (is there a better way
to determine if the processes are all done)?
- Then it tries to access the results for each job_key, specifically to
ensure that counter value is equal to limit - 1.

NOTE: I've never gotten this test to complete successfully. In addition to
the "can't convert Symbol to Hash" error,
I've seen the following:
- The [:counter] value is much lower than the expected value. If limit is
10,000 this value might be 246 when 9,999 was expected.
- The job_key is not recognized, the call to MiddleMan[k] returns nil. When
this occurs, I can usually see in the backgroundrb.log
   that fewer than 4 workers were actually created. I can see this by
counting the number of "Started ResultsTestWorker"
   messages in the log.
- The job_key is resolved, but the call to MiddleMan[k].object.results
returns nil
- The call to MiddleMan.new_worker hangs and never returns

I'm sharing this code so that others can try it out. It's a bit of a hack to
get some testing working (starting and stopping the BackgrounDRb server on
each test, having a test worker class in lib/workers, etc.), but it is
self-contained, and replicated the real-world environment of my code running
in rails. It you have suggestions for improving the testing approach I'm all
ears.

I'm also interested in feedback in the code itself. Maybe I'm not working
with the MiddleMan object correctly. I have to admit I'm still wrapping my
head around Drb.

Resolving this issue is critical to my project so I will continue trying to
track things down. I'll start by adding a mutex to the Results#[]= method.

Mason


On 1/10/07, skaar <skaar at waste.org> wrote:
>
> It might be that we have to introduce a mutex in the results worker
> where this happens. I'll try to get this reproduced sometime this
> weekend.
>
> /skaar
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://rubyforge.org/pipermail/backgroundrb-devel/attachments/20070110/bfd945b4/attachment.html 


More information about the Backgroundrb-devel mailing list