[Backgroundrb-devel] Problems sending large results withbackgroundrb

Mike Evans mike at metaswitch.com
Sat May 24 10:49:48 EDT 2008


Hemant

I fixed a minor bug that means the code is now getting the right file
name, but object file is still failing to load.

The fix is to change the regular expression used to process the
Marshal.load exception is MasterWorker::load_data from

      if error_msg =~ /^undefined.+([A-Z]\w+)/ 

to

      if error_msg =~ /^undefined.+ ([A-Z]\w+)/

The extra space forces it to take the whole of the last word in the
error message, not the just the last capital onward.

I suspect the issue I'm now seeing is because the MasterWorker class
doesn't load the Rails environment.  Any thoughts on how to fix this?

Mike

-----Original Message-----
From: backgroundrb-devel-bounces at rubyforge.org
[mailto:backgroundrb-devel-bounces at rubyforge.org] On Behalf Of Mike
Evans
Sent: 24 May 2008 14:52
To: hemant
Cc: backgroundrb-devel at rubyforge.org
Subject: Re: [Backgroundrb-devel] Problems sending large results
withbackgroundrb

Hemant

I'm not sure why we didn't hit that problem in original testing, but we
have hit it in later testing.

I've tried upgrading to the latest packet and backgroundrb from git, but
I'm now having problems with the initial start_worker.  I'm trying to
start the worker passing it a Ruby object of type SearchDn (which is
declared in app/model/search_dn.rb), but I'm hitting the exception
below.  Previously I was running with :lazy_load set to false, but this
doesn't seem to make any difference - has this feature been retired in
this version of code?

Any thoughts?

Mike

/usr/local/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:27:in
`gem_original_require': no such file to load -- dn (MissingSourceFile)
        from
/usr/local/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:27:in
`require'
        from
/usr/local/lib/ruby/gems/1.8/gems/activesupport-1.4.4/lib/active_support
/dependencies.rb:495:in `require'
        from
/usr/local/lib/ruby/gems/1.8/gems/activesupport-1.4.4/lib/active_support
/dependencies.rb:342:in `new_constants_in'
        from
/usr/local/lib/ruby/gems/1.8/gems/activesupport-1.4.4/lib/active_support
/dependencies.rb:495:in `require'
        from
/disk0.7/var/opt/MetaViewSAS/tview/vendor/plugins/backgroundrb/server/li
b/master_worker.rb:60:in `load_data'
        from
/disk0.7/var/opt/MetaViewSAS/tview/vendor/plugins/backgroundrb/server/li
b/master_worker.rb:32:in `receive_data'
        from
/usr/local/lib/ruby/gems/1.8/gems/packet-0.1.6/lib/packet/packet_parser.
rb:30:in `call'
        from
/usr/local/lib/ruby/gems/1.8/gems/packet-0.1.6/lib/packet/packet_parser.
rb:30:in `extract'
         ... 9 levels...
        from
/usr/local/lib/ruby/gems/1.8/gems/packet-0.1.6/lib/packet/packet_master.
rb:21:in `run'
        from
/disk0.7/var/opt/MetaViewSAS/tview/vendor/plugins/backgroundrb/server/li
b/master_worker.rb:188:in `initialize'
        from ../script/backgroundrb:42:in `new'
        from ../script/backgroundrb:42
/usr/local/lib/ruby/gems/1.8/gems/packet-0.1.6/bin/../lib/packet/packet_
nbio.rb:25:in `read_data': Packet::DisconnectError
(Packet::DisconnectError)
        from
/usr/local/lib/ruby/gems/1.8/gems/packet-0.1.6/bin/../lib/packet/packet_
worker.rb:49:in `handle_internal_messages'
        from
/usr/local/lib/ruby/gems/1.8/gems/packet-0.1.6/bin/../lib/packet/packet_
core.rb:179:in `handle_read_event'
        from
/usr/local/lib/ruby/gems/1.8/gems/packet-0.1.6/bin/../lib/packet/packet_
core.rb:177:in `each'
        from
/usr/local/lib/ruby/gems/1.8/gems/packet-0.1.6/bin/../lib/packet/packet_
core.rb:177:in `handle_read_event'
        from
/usr/local/lib/ruby/gems/1.8/gems/packet-0.1.6/bin/../lib/packet/packet_
core.rb:133:in `start_reactor'
        from
/usr/local/lib/ruby/gems/1.8/gems/packet-0.1.6/bin/../lib/packet/packet_
core.rb:126:in `loop'
        from
/usr/local/lib/ruby/gems/1.8/gems/packet-0.1.6/bin/../lib/packet/packet_
core.rb:126:in `start_reactor'
        from
/usr/local/lib/ruby/gems/1.8/gems/packet-0.1.6/bin/../lib/packet/packet_
worker.rb:21:in `start_worker'
        from
/usr/local/lib/ruby/gems/1.8/gems/packet-0.1.6/bin/packet_worker_runner:
38:in `load_worker'
        from
/usr/local/lib/ruby/gems/1.8/gems/packet-0.1.6/bin/packet_worker_runner:
26:in `initialize'
        from
/usr/local/lib/ruby/gems/1.8/gems/packet-0.1.6/bin/packet_worker_runner:
47:in `new'
        from
/usr/local/lib/ruby/gems/1.8/gems/packet-0.1.6/bin/packet_worker_runner:
47
        from /usr/local/bin/packet_worker_runner:16:in `load'
        from /usr/local/bin/packet_worker_runner:16




-----Original Message-----
From: hemant [mailto:gethemant at gmail.com]
Sent: 21 May 2008 08:56
To: Mike Evans
Cc: backgroundrb-devel at rubyforge.org
Subject: Re: [Backgroundrb-devel] Problems sending large results with
backgroundrb

Yeah that too. But I wonder, how did you solve following two problems:

Take a look at this code:

      def handle_write_event(p_ready_fds)
        p_ready_fds.each do |sock_fd|
          fileno = sock_fd.fileno
          if UNIXSocket === sock_fd && internal_scheduled_write[fileno]
# we have a problem here
            write_and_schedule(sock_fd)
          elsif extern_opts = connection_completion_awaited[fileno]
            complete_connection(sock_fd,extern_opts)
          elsif handler_instance = write_scheduled[fileno] # I was drunk
while writing following line
            handler_instance.write_scheduled(sock_fd)
          end
        end
      end

The problem is, as you said say in a MetaPimp some data is left
unwritten, it won't get written in subsequent writes because
outbound_data belongs to MetaPimp class not main reactor class and
hence, it should be:

      def handle_write_event(p_ready_fds)
        p_ready_fds.each do |sock_fd|
          fileno = sock_fd.fileno
          if UNIXSocket === sock_fd && (internal_instance =
internal_scheduled_write[fileno])
            internal_instance.write_and_schedule(sock_fd)
          elsif extern_opts = connection_completion_awaited[fileno]
            complete_connection(sock_fd,extern_opts)
          elsif handler_instance = write_scheduled[fileno]
            handler_instance.write_and_schedule(sock_fd)
          end
        end
      end

Also, I have included your changes in packet git. So, if you can give
backgroundrb git a shot, I will appreciate that ( Please backup your
older plugin and config files)

On Wed, May 21, 2008 at 12:44 PM, Mike Evans <mike at metaswitch.com>
wrote:
> Hemant
>
> I got to the bottom of the other problem last night.  The issue was 
> with the NbioHelper::write_and_schedule method deleting entries from 
> the outbound_data array while iterating through it.  This can end up 
> with data getting out of order.  I fixed it by changing the
> outbound_data.delete_at(index) statement to outbound_data[index] = 
> nil, and then compacting the array at the end of the iteration.
>
>    # write the data in socket buffer and schedule the thing
>    def write_and_schedule sock
>      outbound_data.each_with_index do |t_data,index|
>        leftover = write_once(t_data,sock)
>        if leftover.empty?
>          outbound_data[index] = nil
>        else
>          outbound_data[index] = leftover
>          reactor.schedule_write(sock)
>          break
>        end
>      end
>      outbound_data.compact!
>      reactor.cancel_write(sock) if outbound_data.empty?
>    end
>
> Mike
>
> -----Original Message-----
> From: hemant [mailto:gethemant at gmail.com]
> Sent: 21 May 2008 05:36
> To: Mike Evans
> Cc: backgroundrb-devel at rubyforge.org
> Subject: Re: [Backgroundrb-devel] Problems sending large results with 
> backgroundrb
>
> You can test git version of backgroundrb with git version of packet 
> (which incorporates latest changes). The procedure is as follows:
>
> clone the packet git repo:
>
> git clone git://github.com/gnufied/packet.git
> cd packet;rake gem
> cd pkg; sudo gem install --local packet-0.1.6.gem
>
> Go to your vendor directory of your rails directory and remove or 
> backup older version of backgroundrb plugin and backup related config 
> file as well.
>
> from vendor directory:
>
> git clone git://gitorious.org/backgroundrb/mainline.git backgroundrb 
> cd RAILS_ROOT <<assuming older script and config file has been backed
> up>> rake backgroundrb:setup <<modify config/backgroundrb.yml
> according to your needs>> ./script/backgroundrb start <<Let me know, 
> how it goes and if this fixes your problem>>
>
>
> On Wed, May 21, 2008 at 9:42 AM, hemant <gethemant at gmail.com> wrote:
>> On Wed, May 21, 2008 at 1:00 AM, Mike Evans <mike at metaswitch.com>
> wrote:
>>> I'm working on an application that does extensive database
searching.
>
>>> These searches can take a long time, so we have been working on 
>>> moving the searches to a backgroundrb worker task so we can provide 
>>> a
>
>>> sexy AJAX progress bar, and populate the search results as they are 
>>> available.  All of this seems to work fine until the size of the 
>>> search results gets sufficiently large, when we start to hit 
>>> exceptions in backgroundrb (most likely in the packet layer).  We 
>>> are
>
>>> using packet-0.5.1 and backgroundrb from the latest svn mirror.
>>>
>>> We have found and fixed one problem in the packet sender.  This is 
>>> triggered when the non-blocking send in NbioHelper::send_once cannot

>>> send the entire buffer, resulting in an exception in the line
>>>
>>>       write_scheduled[fileno] ||= connections[fileno].instance
>>>
>>> in Core::schedule_write because connections[fileno] is nil.  I can't

>>> claim to fully understand the code, but I think there are two
> problems here.
>>>
>>> The main issue seems to be that when Core::handle_write_event calls 
>>> write_and_schedule to schedule the write, it doesn't clear out 
>>> internal_scheduled_write[fileno].  It looks like the code is 
>>> expecting the cancel_write call at the end of write_and_schedule to 
>>> clear it out, but this doesn't happen if there is enough queued data

>>> to cause the non-blocking write to only partially succeed again.  In

>>> this case, Core::schedule_write is called again, and because 
>>> internal_schedule_write[fileno] has not been cleared out, the code 
>>> drops through to the second if test, then hits the above exception.
>>> We fixed this by adding the line
>>>
>>> internal_scheduled_write.delete(fileno)
>>>
>>> immediately before the call to write_and_schedule in 
>>> Core::handle_write_event.
>>>
>>> The secondary issue is that the connections[fileno] structure is not

>>> getting populated for this connection - I'm guessing because it is 
>>> an
>
>>> internal socket rather than a network socket, but I couldn't be
sure.
>
>>> We changed the second if test in Core::schedule_write to
>>>
>>>       elsif write_scheduled[fileno].nil? &&
!connections[fileno].nil?
>>>
>>> to firewall against this, but we are not sure if this is the right
> fix.
>>
>> Thats was surely a bug and I fixed it like this:
>>
>>      def schedule_write(t_sock,internal_instance = nil)
>>        fileno = t_sock.fileno
>>        if UNIXSocket === t_sock &&
> internal_scheduled_write[fileno].nil?
>>          write_ios << t_sock
>>          internal_scheduled_write[t_sock.fileno] ||=
internal_instance
>>        elsif write_scheduled[fileno].nil? &&
> !(t_sock.is_a?(UNIXSocket))
>>          write_ios << t_sock
>>          write_scheduled[fileno] ||= connections[fileno].instance
>>        end
>>      end
>>
>> Also, I fixed issue with marshalling larger data across the channel.
>> Thanks for reporting this. I have been terribly busy with things in 
>> office and personal life and hence my work on BackgrounDRb has been 
>> in
>
>> hiatus for a while. Unfortunately, you can't use trunk packet code 
>> which is available from:
>>
>> git clone git://github.com/gnufied/packet.git
>>
>> directly with svn mirror of backgroundrb, since packet now uses fork 
>> and exec to run workers and hence reducing memory usage of workers.
>> However in a day or two I will update git repository of BackgrounDRb 
>> which makes use of latest packet version. In the meanwhile, you can 
>> try backporting relevant packet changes to version you are using and 
>> see if it fixes your problem.
>>
>>>
>>> We are now hitting problems in the Packet::MetaPimp module receiving

>>> the data, usually an exception in the Marshal.load call in 
>>> MetaPimp::receive_data.  We suspect this is caused by the packet 
>>> code
>
>>> corrupting the data somewhere, probably because we are sending such 
>>> large arrays of results (the repro I am working on at the moment is 
>>> trying to marshal over 200k of data).  We've been trying to put 
>>> extra
>
>>> diagnostics in the code so we can see what is happening, but if we 
>>> edit puts statements into the code we only seem to get output from 
>>> the end of the connection that hits an exception and so far our 
>>> attempts to make logger objects available throughout the code have 
>>> failed.  We therefore thought we would ask for help
>>> - either to see whether this is a known problem, or whether there is

>>> a recommended way to add diagnostics to the packet code.
>>>
>>> I'm also open to ideas as to better ways to solve the problem!
>>>
>>> Thanks in advance,
>>>
>>> Mike
>>>
>>>
>>> _______________________________________________
>>> Backgroundrb-devel mailing list
>>> Backgroundrb-devel at rubyforge.org
>>> http://rubyforge.org/mailman/listinfo/backgroundrb-devel
>>>
>>
>>
>>
>> --
>> Let them talk of their oriental summer climes of everlasting 
>> conservatories; give me the privilege of making my own summer with my

>> own coals.
>>
>> http://gnufied.org
>>
>
>
>
> --
> Let them talk of their oriental summer climes of everlasting 
> conservatories; give me the privilege of making my own summer with my 
> own coals.
>
> http://gnufied.org
>



--
Let them talk of their oriental summer climes of everlasting
conservatories; give me the privilege of making my own summer with my
own coals.

http://gnufied.org
_______________________________________________
Backgroundrb-devel mailing list
Backgroundrb-devel at rubyforge.org
http://rubyforge.org/mailman/listinfo/backgroundrb-devel


More information about the Backgroundrb-devel mailing list