[Backgroundrb-devel] Problems sending large results with backgroundrb

hemant gethemant at gmail.com
Wed May 21 00:12:23 EDT 2008

On Wed, May 21, 2008 at 1:00 AM, Mike Evans <mike at metaswitch.com> wrote:
> I'm working on an application that does extensive database searching.  These
> searches can take a long time, so we have been working on moving the
> searches to a backgroundrb worker task so we can provide a sexy AJAX
> progress bar, and populate the search results as they are available.  All of
> this seems to work fine until the size of the search results gets
> sufficiently large, when we start to hit exceptions in backgroundrb (most
> likely in the packet layer).  We are using packet-0.5.1 and backgroundrb
> from the latest svn mirror.
> We have found and fixed one problem in the packet sender.  This is triggered
> when the non-blocking send in NbioHelper::send_once cannot send the entire
> buffer, resulting in an exception in the line
>       write_scheduled[fileno] ||= connections[fileno].instance
> in Core::schedule_write because connections[fileno] is nil.  I can't claim
> to fully understand the code, but I think there are two problems here.
> The main issue seems to be that when Core::handle_write_event calls
> write_and_schedule to schedule the write, it doesn't clear out
> internal_scheduled_write[fileno].  It looks like the code is expecting the
> cancel_write call at the end of write_and_schedule to clear it out, but this
> doesn't happen if there is enough queued data to cause the non-blocking
> write to only partially succeed again.  In this case, Core::schedule_write
> is called again, and because internal_schedule_write[fileno] has not been
> cleared out, the code drops through to the second if test, then hits the
> above exception.  We fixed this by adding the line
> internal_scheduled_write.delete(fileno)
> immediately before the call to write_and_schedule in
> Core::handle_write_event.
> The secondary issue is that the connections[fileno] structure is not getting
> populated for this connection - I'm guessing because it is an internal
> socket rather than a network socket, but I couldn't be sure.  We changed the
> second if test in Core::schedule_write to
>       elsif write_scheduled[fileno].nil? && !connections[fileno].nil?
> to firewall against this, but we are not sure if this is the right fix.

Thats was surely a bug and I fixed it like this:

      def schedule_write(t_sock,internal_instance = nil)
        fileno = t_sock.fileno
        if UNIXSocket === t_sock && internal_scheduled_write[fileno].nil?
          write_ios << t_sock
          internal_scheduled_write[t_sock.fileno] ||= internal_instance
        elsif write_scheduled[fileno].nil? && !(t_sock.is_a?(UNIXSocket))
          write_ios << t_sock
          write_scheduled[fileno] ||= connections[fileno].instance

Also, I fixed issue with marshalling larger data across the channel.
Thanks for reporting this. I have been terribly busy with things in
office and personal life and hence my work on BackgrounDRb has been in
hiatus for a while. Unfortunately, you can't use trunk packet code
which is available from:

git clone git://github.com/gnufied/packet.git

directly with svn mirror of backgroundrb, since packet now uses fork
and exec to run workers and hence reducing memory usage of workers.
However in a day or two I will update git repository of BackgrounDRb
which makes use of latest packet version. In the meanwhile, you can
try backporting relevant packet changes to version you are using and
see if it fixes your problem.

> We are now hitting problems in the Packet::MetaPimp module receiving the
> data, usually an exception in the Marshal.load call in
> MetaPimp::receive_data.  We suspect this is caused by the packet code
> corrupting the data somewhere, probably because we are sending such large
> arrays of results (the repro I am working on at the moment is trying to
> marshal over 200k of data).  We've been trying to put extra diagnostics in
> the code so we can see what is happening, but if we edit puts statements
> into the code we only seem to get output from the end of the connection that
> hits an exception and so far our attempts to make logger objects available
> throughout the code have failed.  We therefore thought we would ask for help
> - either to see whether this is a known problem, or whether there is a
> recommended way to add diagnostics to the packet code.
> I'm also open to ideas as to better ways to solve the problem!
> Thanks in advance,
> Mike
> _______________________________________________
> Backgroundrb-devel mailing list
> Backgroundrb-devel at rubyforge.org
> http://rubyforge.org/mailman/listinfo/backgroundrb-devel

Let them talk of their oriental summer climes of everlasting
conservatories; give me the privilege of making my own summer with my
own coals.


More information about the Backgroundrb-devel mailing list