[Backgroundrb-devel] Problems sending large results with backgroundrb

hemant gethemant at gmail.com
Wed May 21 03:56:16 EDT 2008


Yeah that too. But I wonder, how did you solve following two problems:

Take a look at this code:

      def handle_write_event(p_ready_fds)
        p_ready_fds.each do |sock_fd|
          fileno = sock_fd.fileno
          if UNIXSocket === sock_fd && internal_scheduled_write[fileno]
# we have a problem here
            write_and_schedule(sock_fd)
          elsif extern_opts = connection_completion_awaited[fileno]
            complete_connection(sock_fd,extern_opts)
          elsif handler_instance = write_scheduled[fileno]
# I was drunk while writing following line
            handler_instance.write_scheduled(sock_fd)
          end
        end
      end

The problem is, as you said say in a MetaPimp some data is left
unwritten, it won't get written in subsequent writes because
outbound_data belongs to MetaPimp class not main reactor class and
hence, it should be:

      def handle_write_event(p_ready_fds)
        p_ready_fds.each do |sock_fd|
          fileno = sock_fd.fileno
          if UNIXSocket === sock_fd && (internal_instance =
internal_scheduled_write[fileno])
            internal_instance.write_and_schedule(sock_fd)
          elsif extern_opts = connection_completion_awaited[fileno]
            complete_connection(sock_fd,extern_opts)
          elsif handler_instance = write_scheduled[fileno]
            handler_instance.write_and_schedule(sock_fd)
          end
        end
      end

Also, I have included your changes in packet git. So, if you can give
backgroundrb git a shot, I will appreciate that ( Please backup your
older plugin and config files)

On Wed, May 21, 2008 at 12:44 PM, Mike Evans <mike at metaswitch.com> wrote:
> Hemant
>
> I got to the bottom of the other problem last night.  The issue was with
> the NbioHelper::write_and_schedule method deleting entries from the
> outbound_data array while iterating through it.  This can end up with
> data getting out of order.  I fixed it by changing the
> outbound_data.delete_at(index) statement to outbound_data[index] = nil,
> and then compacting the array at the end of the iteration.
>
>    # write the data in socket buffer and schedule the thing
>    def write_and_schedule sock
>      outbound_data.each_with_index do |t_data,index|
>        leftover = write_once(t_data,sock)
>        if leftover.empty?
>          outbound_data[index] = nil
>        else
>          outbound_data[index] = leftover
>          reactor.schedule_write(sock)
>          break
>        end
>      end
>      outbound_data.compact!
>      reactor.cancel_write(sock) if outbound_data.empty?
>    end
>
> Mike
>
> -----Original Message-----
> From: hemant [mailto:gethemant at gmail.com]
> Sent: 21 May 2008 05:36
> To: Mike Evans
> Cc: backgroundrb-devel at rubyforge.org
> Subject: Re: [Backgroundrb-devel] Problems sending large results with
> backgroundrb
>
> You can test git version of backgroundrb with git version of packet
> (which incorporates latest changes). The procedure is as follows:
>
> clone the packet git repo:
>
> git clone git://github.com/gnufied/packet.git
> cd packet;rake gem
> cd pkg; sudo gem install --local packet-0.1.6.gem
>
> Go to your vendor directory of your rails directory and remove or backup
> older version of backgroundrb plugin and backup related config file as
> well.
>
> from vendor directory:
>
> git clone git://gitorious.org/backgroundrb/mainline.git backgroundrb cd
> RAILS_ROOT <<assuming older script and config file has been backed up>>
> rake backgroundrb:setup <<modify config/backgroundrb.yml according to
> your needs>> ./script/backgroundrb start <<Let me know, how it goes and
> if this fixes your problem>>
>
>
> On Wed, May 21, 2008 at 9:42 AM, hemant <gethemant at gmail.com> wrote:
>> On Wed, May 21, 2008 at 1:00 AM, Mike Evans <mike at metaswitch.com>
> wrote:
>>> I'm working on an application that does extensive database searching.
>
>>> These searches can take a long time, so we have been working on
>>> moving the searches to a backgroundrb worker task so we can provide a
>
>>> sexy AJAX progress bar, and populate the search results as they are
>>> available.  All of this seems to work fine until the size of the
>>> search results gets sufficiently large, when we start to hit
>>> exceptions in backgroundrb (most likely in the packet layer).  We are
>
>>> using packet-0.5.1 and backgroundrb from the latest svn mirror.
>>>
>>> We have found and fixed one problem in the packet sender.  This is
>>> triggered when the non-blocking send in NbioHelper::send_once cannot
>>> send the entire buffer, resulting in an exception in the line
>>>
>>>       write_scheduled[fileno] ||= connections[fileno].instance
>>>
>>> in Core::schedule_write because connections[fileno] is nil.  I can't
>>> claim to fully understand the code, but I think there are two
> problems here.
>>>
>>> The main issue seems to be that when Core::handle_write_event calls
>>> write_and_schedule to schedule the write, it doesn't clear out
>>> internal_scheduled_write[fileno].  It looks like the code is
>>> expecting the cancel_write call at the end of write_and_schedule to
>>> clear it out, but this doesn't happen if there is enough queued data
>>> to cause the non-blocking write to only partially succeed again.  In
>>> this case, Core::schedule_write is called again, and because
>>> internal_schedule_write[fileno] has not been cleared out, the code
>>> drops through to the second if test, then hits the above exception.
>>> We fixed this by adding the line
>>>
>>> internal_scheduled_write.delete(fileno)
>>>
>>> immediately before the call to write_and_schedule in
>>> Core::handle_write_event.
>>>
>>> The secondary issue is that the connections[fileno] structure is not
>>> getting populated for this connection - I'm guessing because it is an
>
>>> internal socket rather than a network socket, but I couldn't be sure.
>
>>> We changed the second if test in Core::schedule_write to
>>>
>>>       elsif write_scheduled[fileno].nil? && !connections[fileno].nil?
>>>
>>> to firewall against this, but we are not sure if this is the right
> fix.
>>
>> Thats was surely a bug and I fixed it like this:
>>
>>      def schedule_write(t_sock,internal_instance = nil)
>>        fileno = t_sock.fileno
>>        if UNIXSocket === t_sock &&
> internal_scheduled_write[fileno].nil?
>>          write_ios << t_sock
>>          internal_scheduled_write[t_sock.fileno] ||= internal_instance
>>        elsif write_scheduled[fileno].nil? &&
> !(t_sock.is_a?(UNIXSocket))
>>          write_ios << t_sock
>>          write_scheduled[fileno] ||= connections[fileno].instance
>>        end
>>      end
>>
>> Also, I fixed issue with marshalling larger data across the channel.
>> Thanks for reporting this. I have been terribly busy with things in
>> office and personal life and hence my work on BackgrounDRb has been in
>
>> hiatus for a while. Unfortunately, you can't use trunk packet code
>> which is available from:
>>
>> git clone git://github.com/gnufied/packet.git
>>
>> directly with svn mirror of backgroundrb, since packet now uses fork
>> and exec to run workers and hence reducing memory usage of workers.
>> However in a day or two I will update git repository of BackgrounDRb
>> which makes use of latest packet version. In the meanwhile, you can
>> try backporting relevant packet changes to version you are using and
>> see if it fixes your problem.
>>
>>>
>>> We are now hitting problems in the Packet::MetaPimp module receiving
>>> the data, usually an exception in the Marshal.load call in
>>> MetaPimp::receive_data.  We suspect this is caused by the packet code
>
>>> corrupting the data somewhere, probably because we are sending such
>>> large arrays of results (the repro I am working on at the moment is
>>> trying to marshal over 200k of data).  We've been trying to put extra
>
>>> diagnostics in the code so we can see what is happening, but if we
>>> edit puts statements into the code we only seem to get output from
>>> the end of the connection that hits an exception and so far our
>>> attempts to make logger objects available throughout the code have
>>> failed.  We therefore thought we would ask for help
>>> - either to see whether this is a known problem, or whether there is
>>> a recommended way to add diagnostics to the packet code.
>>>
>>> I'm also open to ideas as to better ways to solve the problem!
>>>
>>> Thanks in advance,
>>>
>>> Mike
>>>
>>>
>>> _______________________________________________
>>> Backgroundrb-devel mailing list
>>> Backgroundrb-devel at rubyforge.org
>>> http://rubyforge.org/mailman/listinfo/backgroundrb-devel
>>>
>>
>>
>>
>> --
>> Let them talk of their oriental summer climes of everlasting
>> conservatories; give me the privilege of making my own summer with my
>> own coals.
>>
>> http://gnufied.org
>>
>
>
>
> --
> Let them talk of their oriental summer climes of everlasting
> conservatories; give me the privilege of making my own summer with my
> own coals.
>
> http://gnufied.org
>



-- 
Let them talk of their oriental summer climes of everlasting
conservatories; give me the privilege of making my own summer with my
own coals.

http://gnufied.org


More information about the Backgroundrb-devel mailing list