streaming input for large requests

John Leach john at brightbox.co.uk
Wed Aug 11 18:41:32 EDT 2010


On Tue, 2010-08-10 at 16:25 -0700, Eric Wong wrote: 
> John Leach <john at brightbox.co.uk> wrote:
> > Hi,
> > 
> > I'm looking to be able to get access to the request body as it is
> > available on the socket, so I can process uploads on the fly, as they
> > stream in.
> 
> Hi John,
> 
> Cool!  If you need some example code, you should check out upr,
> http://upr.bogomips.org/  Sadly the demo machine is down, but the one
> application I helped somebody write (on a private LAN somewhere) still
> works well :)
> 
> There's also the test/examples in the Rainbows! source tree:
> 
>   t/sha1*.ru
>   t/content-md5.ru
> 
> > But to be rewindable, I'm assuming they're being stored somewhere?  I'd
> > like to be able to handle huge request bodies bit by bit without having
> > them written to disk (or worse, stored in ram!).  Is there some way to
> > do this?
> 
> Yes, we store uploads to an unlinked temporary file if the body is
> larger than 112 Kbytes (this threshold was established by Mongrel back
> in the day).
> 
> Rack currently requires rewindability, but this requirement will
> most likely be optional in Rack 2.x, and we'll update our code
> to match, then.
> 
> Meanwhile, you can either:
> 
> 1. Write a module to disable writes to tmp for the Unicorn::TeeInput
>    class (or monkey patch it) it.
> 
> 2. Without loading Rack::Lint (or anything that wraps env["rack.input"]):
>    Redirect the temporary file to /dev/null:
> 
>     input = env["rack.input"]
>     if input.respond_to?(:tmp)
>       tmp = input.tmp
>       # StringIO is used for bodies <112K, can't reopen those
>       tmp.respond_to?(:reopen) and tmp.reopen('/dev/null', 'wb')
>     end
> 

I knocked together a little test app to do this as you suggested, it
works a treat:

http://gist.github.com/519915

Thanks again Eric!

John.

#
# This little test app generates SHA1 hashes for HTTP uploads on the
# fly, without storing them on disk.
# By John Leach <john at johnleach.co.uk> (with help from Eric Wong)
#
# Start the server like this:
#
#  rainbows -c rainbows.conf.rb rainbows-sha1.ru
#
# I've been testing this with Revactor, which requires Ruby 1.9 
# 
# Use with the following rainbows.conf.rb:
#
#  ENV['RACK_ENV'] = nil # we don't want lint to be loaded
#  worker_processes 2
#  Rainbows! do
#   use :Revactor
#   client_max_body_size nil
#  end
#
# You can upload files like this:
#
#  curl -v -T /path/to/a/file/to/upload http://localhost:8080/
#
# You can upload infinite data to test concurrency like this:
#
#  dd if=/dev/zero bs=16k | curl -v -T - http://localhost:8080/test.bin
#
# Spawn as many of these as you like :) You'll notice regular debug
# output from the server telling you the upload progress of each
# concurrent upload.
#
# If all is well, your disk space should not decrease during the
# uploads and the ram usage of the server should not balloon.

bs = ENV['bs'] ? ENV['bs'].to_i : 16384
require 'digest/sha1'
use Rack::CommonLogger
use Rack::ShowExceptions
use Rack::ContentLength

app = lambda do |env|

  # Tell all expect requests we're happy to accept
  /\A100-continue\z/i =~ env['HTTP_EXPECT'] and
    return [ 100, {}, [] ]

  input = env["rack.input"]
 
  if input.respond_to?(:tmp)
    tmp = input.tmp
    # Hack to prevent request being written to disk
    tmp.respond_to?(:reopen) and tmp.reopen('/dev/null', 'w+')
  end

  digest = Digest::SHA1.new

  recv_bytes = 0
  last_time = Time.now.to_i
  last_recv_bytes = 0
  req_id = rand(0xffff)

  while buf = input.read(bs)
    recv_bytes += buf.size
    digest.update(buf)
    if (recv_bytes / bs) % 10000 == 9999
      time_diff = Time.now.to_i - last_time + 1
      recv_bytes_diff = recv_bytes - last_recv_bytes
      speed = (recv_bytes_diff / time_diff) / 1024
      recv_meg = recv_bytes / 1024 / 1024
      msg = "req #{req_id}: #{recv_meg}M so far, (#{speed}k/s)\n"
      env['rack.errors'].write msg
      last_time = Time.now.to_i
      last_recv_bytes = recv_bytes
    end
  end
  
  [ 200, {
      'Content-Type' => 'text/plain', 
      'SHA1' => digest.hexdigest, 
      'Received-Bytes' => recv_bytes.to_s
    }, [''] ]

end
run app




More information about the rainbows-talk mailing list