I've done some more work on this and have created a test case that reliably throws errors, although the errors themselves are not consistent.<br>About 1 out of every 4 times, I get the "can't convert Symbol to Hash" error in server/lib/backgroundrb/results.rb:40 in 'merge!'.
<br><br>I created the following worker class is {RAILS_ROOT}/lib/workers/results_test_worker.rb<br><br># This class repeatedly writes values to the results, to<br># test the results process<br>class ResultsTestWorker < BackgrounDRb::Worker::RailsBase
<br> <br> def do_work(args)<br> <a href="http://logger.info">logger.info</a> "Started ResultsTestWorker"<br> results[:started_at] = Time.now<br> args ||= {}<br> limit = args[:limit] || 10_000<br>
<br> <a href="http://logger.info">logger.info</a> "Limit is #{limit}"<br> <br> limit.times do |i|<br> results[:last_update] = Time.now<br> results[:counter] = i<br> end<br> stop_time = Time.now
<br> <a href="http://logger.info">logger.info</a> "Stopped ResultsTestWorker at #{stop_time}"<br> results[:stopped_at] = stop_time<br> self.delete<br> end<br><br>end<br>ResultsTestWorker.register<br><br>
Then in {RAILS_ROOT}/test/unit/drb_results_test.rb I have:<br><br>require File.dirname(__FILE__) + '/../test_helper'<br><br>class DrbResultsTest < Test::Unit::TestCase<br><br> def setup<br> # start backgroundrb server
<br> `../../script/backgroundrb start`<br> sleep 5 # give it time to startup<br> end<br> <br> def teardown<br> # stop backgroundrb server<br> `../../script/backgroundrb stop`<br> end<br> <br> def test_results
<br> limit = 10<br> keys = []<br> 4.times do |i|<br> job_key = "#{<a href="http://self.class.name">self.class.name</a>}_#{i}"<br> keys << job_key<br> MiddleMan.new_worker(:class => :results_test_worker, :job_key => job_key, :args => {:limit => limit})
<br> end<br> <br> sleep 2 # wait for workers to finish<br> <br> keys.each_with_index do |k, i|<br> assert_not_nil MiddleMan[k], "checking job_key #{k} on iteration #{i}"<br> assert_not_nil MiddleMan[k].object, "checking object on iteration #{i}"
<br> assert_not_nil MiddleMan[k].object.results, "checking results on iteration #{i}"<br> assert_equal(limit - 1, MiddleMan[k].object.results.to_hash[:counter], "checking counter on iteration #{i}")
<br> end<br> <br> end<br>end<br><br>This test does the following:<br>- Spawns 4 results_test_worker processes that each write several values to the ResultsWorker (in parallel)<br>Increasing the limit value increases the odds of these processes concurrently trying to write results at the
<br>same time, but I've found that a limit of 10 works pretty well.<br>- It waits a couple seconds for the workers to finish (is there a better way to determine if the processes are all done)?<br>- Then it tries to access the results for each job_key, specifically to ensure that counter value is equal to limit - 1.
<br><br>NOTE: I've never gotten this test to complete successfully. In addition to the "can't convert Symbol to Hash" error,<br>I've seen the following:<br>- The [:counter] value is much lower than the expected value. If limit is 10,000 this value might be 246 when 9,999 was expected.
<br>- The job_key is not recognized, the call to MiddleMan[k] returns nil. When this occurs, I can usually see in the backgroundrb.log<br> that fewer than 4 workers were actually created. I can see this by counting the number of "Started ResultsTestWorker"
<br> messages in the log.<br>- The job_key is resolved, but the call to MiddleMan[k].object.results returns nil<br>- The call to MiddleMan.new_worker hangs and never returns<br><br>I'm sharing this code so that others can try it out. It's a bit of a hack to get some testing working (starting and stopping the BackgrounDRb server on each test, having a test worker class in lib/workers, etc.), but it is self-contained, and replicated the real-world environment of my code running in rails. It you have suggestions for improving the testing approach I'm all ears.
<br><br>I'm also interested in feedback in the code itself. Maybe I'm not working with the MiddleMan object correctly. I have to admit I'm still wrapping my head around Drb.<br><br>Resolving this issue is critical to my project so I will continue trying to track things down. I'll start by adding a mutex to the Results#[]= method.
<br><br>Mason<br><br><br><div><span class="gmail_quote">On 1/10/07, <b class="gmail_sendername">skaar</b> <<a href="mailto:skaar@waste.org">skaar@waste.org</a>> wrote:</span><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
It might be that we have to introduce a mutex in the results worker<br>where this happens. I'll try to get this reproduced sometime this<br>weekend.<br><br>/skaar<br><br><br></blockquote></div><br>