[Backgroundrb-devel] adding results from threads to a collection and returning it

hemant gethemant at gmail.com
Tue Jun 10 23:35:33 EDT 2008

On Wed, Jun 11, 2008 at 5:26 AM, Neil Mock <neilmock at gmail.com> wrote:
> Forgive me if this has been addressed somewhere, but I have searched and
> can't come up with anything.
> I am basically trying to distribute several web page scraping tasks among
> different threads, and have the results from each added to an Array which is
> ultimately returned by the backgroundrb worker.  Here is an example of what
> I'm trying to do in a worker method:
>      pages = Array.new
>      pages_to_scrape.each do |url|
>           thread_pool.defer(url) do |url|
>             begin
>               # model object performs the scraping
>               page = ScrapedPage.new(page.url)
>               pages << page
>             rescue
>               logger.info "page scrape failed"
>             end
>           end
>         end
>       end
>     return pages
> From monitoring the backgroundrb logs, it appears that all of the pages are
> completed successfully in the threads.  However, the array that is returned
> is empty.  This is to be expected I suppose because the threads don't
> complete before the array is returned, but my question is: how can I make
> the worker wait to return the array only when all of the threads are
> complete?

Actually, you are doing couple of things wrong. First, you are
accessing a variable that you created outside thread_pool from inside
of pool and hence have a big thread unsafe code, which can cause
anything from deadlocks to random crashes.

Thread pools are for running concurrent tasks in background without
any reporting, its for fire and forget kinda of thing. However, I am
contemplating some change in behaviour of thread pools, which will
enable what you want perhaps, so unless your need is dire, please
don't use thread pools like above snippet.

More information about the Backgroundrb-devel mailing list