[Backgroundrb-devel] Problems sending large results with backgroundrb

Mike Evans mike at metaswitch.com
Wed May 21 03:14:18 EDT 2008


Hemant

I got to the bottom of the other problem last night.  The issue was with
the NbioHelper::write_and_schedule method deleting entries from the
outbound_data array while iterating through it.  This can end up with
data getting out of order.  I fixed it by changing the
outbound_data.delete_at(index) statement to outbound_data[index] = nil,
and then compacting the array at the end of the iteration.

    # write the data in socket buffer and schedule the thing
    def write_and_schedule sock
      outbound_data.each_with_index do |t_data,index|
        leftover = write_once(t_data,sock)
        if leftover.empty?
          outbound_data[index] = nil
        else
          outbound_data[index] = leftover
          reactor.schedule_write(sock)
          break
        end
      end
      outbound_data.compact!
      reactor.cancel_write(sock) if outbound_data.empty?
    end

Mike 

-----Original Message-----
From: hemant [mailto:gethemant at gmail.com] 
Sent: 21 May 2008 05:36
To: Mike Evans
Cc: backgroundrb-devel at rubyforge.org
Subject: Re: [Backgroundrb-devel] Problems sending large results with
backgroundrb

You can test git version of backgroundrb with git version of packet
(which incorporates latest changes). The procedure is as follows:

clone the packet git repo:

git clone git://github.com/gnufied/packet.git
cd packet;rake gem
cd pkg; sudo gem install --local packet-0.1.6.gem

Go to your vendor directory of your rails directory and remove or backup
older version of backgroundrb plugin and backup related config file as
well.

from vendor directory:

git clone git://gitorious.org/backgroundrb/mainline.git backgroundrb cd
RAILS_ROOT <<assuming older script and config file has been backed up>>
rake backgroundrb:setup <<modify config/backgroundrb.yml according to
your needs>> ./script/backgroundrb start <<Let me know, how it goes and
if this fixes your problem>>


On Wed, May 21, 2008 at 9:42 AM, hemant <gethemant at gmail.com> wrote:
> On Wed, May 21, 2008 at 1:00 AM, Mike Evans <mike at metaswitch.com>
wrote:
>> I'm working on an application that does extensive database searching.

>> These searches can take a long time, so we have been working on 
>> moving the searches to a backgroundrb worker task so we can provide a

>> sexy AJAX progress bar, and populate the search results as they are 
>> available.  All of this seems to work fine until the size of the 
>> search results gets sufficiently large, when we start to hit 
>> exceptions in backgroundrb (most likely in the packet layer).  We are

>> using packet-0.5.1 and backgroundrb from the latest svn mirror.
>>
>> We have found and fixed one problem in the packet sender.  This is 
>> triggered when the non-blocking send in NbioHelper::send_once cannot 
>> send the entire buffer, resulting in an exception in the line
>>
>>       write_scheduled[fileno] ||= connections[fileno].instance
>>
>> in Core::schedule_write because connections[fileno] is nil.  I can't 
>> claim to fully understand the code, but I think there are two
problems here.
>>
>> The main issue seems to be that when Core::handle_write_event calls 
>> write_and_schedule to schedule the write, it doesn't clear out 
>> internal_scheduled_write[fileno].  It looks like the code is 
>> expecting the cancel_write call at the end of write_and_schedule to 
>> clear it out, but this doesn't happen if there is enough queued data 
>> to cause the non-blocking write to only partially succeed again.  In 
>> this case, Core::schedule_write is called again, and because 
>> internal_schedule_write[fileno] has not been cleared out, the code 
>> drops through to the second if test, then hits the above exception.  
>> We fixed this by adding the line
>>
>> internal_scheduled_write.delete(fileno)
>>
>> immediately before the call to write_and_schedule in 
>> Core::handle_write_event.
>>
>> The secondary issue is that the connections[fileno] structure is not 
>> getting populated for this connection - I'm guessing because it is an

>> internal socket rather than a network socket, but I couldn't be sure.

>> We changed the second if test in Core::schedule_write to
>>
>>       elsif write_scheduled[fileno].nil? && !connections[fileno].nil?
>>
>> to firewall against this, but we are not sure if this is the right
fix.
>
> Thats was surely a bug and I fixed it like this:
>
>      def schedule_write(t_sock,internal_instance = nil)
>        fileno = t_sock.fileno
>        if UNIXSocket === t_sock &&
internal_scheduled_write[fileno].nil?
>          write_ios << t_sock
>          internal_scheduled_write[t_sock.fileno] ||= internal_instance
>        elsif write_scheduled[fileno].nil? &&
!(t_sock.is_a?(UNIXSocket))
>          write_ios << t_sock
>          write_scheduled[fileno] ||= connections[fileno].instance
>        end
>      end
>
> Also, I fixed issue with marshalling larger data across the channel.
> Thanks for reporting this. I have been terribly busy with things in 
> office and personal life and hence my work on BackgrounDRb has been in

> hiatus for a while. Unfortunately, you can't use trunk packet code 
> which is available from:
>
> git clone git://github.com/gnufied/packet.git
>
> directly with svn mirror of backgroundrb, since packet now uses fork 
> and exec to run workers and hence reducing memory usage of workers.
> However in a day or two I will update git repository of BackgrounDRb 
> which makes use of latest packet version. In the meanwhile, you can 
> try backporting relevant packet changes to version you are using and 
> see if it fixes your problem.
>
>>
>> We are now hitting problems in the Packet::MetaPimp module receiving 
>> the data, usually an exception in the Marshal.load call in 
>> MetaPimp::receive_data.  We suspect this is caused by the packet code

>> corrupting the data somewhere, probably because we are sending such 
>> large arrays of results (the repro I am working on at the moment is 
>> trying to marshal over 200k of data).  We've been trying to put extra

>> diagnostics in the code so we can see what is happening, but if we 
>> edit puts statements into the code we only seem to get output from 
>> the end of the connection that hits an exception and so far our 
>> attempts to make logger objects available throughout the code have 
>> failed.  We therefore thought we would ask for help
>> - either to see whether this is a known problem, or whether there is 
>> a recommended way to add diagnostics to the packet code.
>>
>> I'm also open to ideas as to better ways to solve the problem!
>>
>> Thanks in advance,
>>
>> Mike
>>
>>
>> _______________________________________________
>> Backgroundrb-devel mailing list
>> Backgroundrb-devel at rubyforge.org
>> http://rubyforge.org/mailman/listinfo/backgroundrb-devel
>>
>
>
>
> --
> Let them talk of their oriental summer climes of everlasting 
> conservatories; give me the privilege of making my own summer with my 
> own coals.
>
> http://gnufied.org
>



--
Let them talk of their oriental summer climes of everlasting
conservatories; give me the privilege of making my own summer with my
own coals.

http://gnufied.org


More information about the Backgroundrb-devel mailing list