[Mechanize-users] Response To Form Submission Hanging

Andrew Stewart boss at airbladesoftware.com
Mon Dec 11 05:15:40 EST 2006

Hi Aaron,

On 8 Dec 2006, at 23:20, Aaron Patterson wrote:

> How long exactly is the page?  In the mega-bytes?

Just under 3.2 MB.

> I've seen stack traces similar to this when you get read timeouts.  If
> the server is slow, that may be your problem.....

I wouldn't call the server fast but it seems good enough.  When I do  
what the script does myself in the browser, the page comes back in  
under a minute.

> Also, there is sort of a bug in ruby net/http where large file will
> take a long time to download and max out your CPU.  Try adding the
> following snippet and see if that helps.  Note that it will only work
> for ruby versions > 1.8.2.

Thanks for the snippet.  I'm on Ruby 1.8.4 and unfortunately the  
snippet didn't appear to make any difference.

> If you can send a short sample that will reproduce the problem, I can
> help more.

I've pasted my code below.  The part which hangs is the final line  
using Mechanize: "page = agent.submit(form)".

require 'rubygems'
require 'mechanize'
require 'logger'

class Net::InternetMessageIO #:nodoc:
   alias :old_rbuf_fill :rbuf_fill
   def rbuf_fill
       @rbuf << @io.read_nonblock(65536)
     rescue Errno::EWOULDBLOCK
       if IO.select([@io], nil, nil, @read_timeout)
         @rbuf << @io.read_nonblock(65536)
         raise Timeout::TimeoutError

class SangerParser < WWW::Mechanize::Page
   def initialize(uri = nil, response = nil, body = nil, code = nil)
     # Ditch any runs of table data which contain only and at least 3  
"<td>" or "</td>"s
     body.gsub!(/(<\/?td>){3,}/, '')
     # Ditch any title attributes which equal nothing at all
     body.gsub!(/ title=>/i, '>')
     super(uri, response, body, code)

logger = Logger.new('mech.log')
logger.level = Logger::DEBUG

agent = WWW::Mechanize.new { |a| a.log = logger }
agent.pluggable_parser.html = SangerParser
agent.user_agent_alias = 'Mac Safari'
agent.read_timeout = 120  # Does this make any difference?

# Tissue selection screen
url = 'http://www.sanger.ac.uk/cgi-bin/genetics/CGP/genotyping/lohmap'
page = agent.get(url)
# Just choose one tissue for now
#tissues = page.forms[1].fields.name('site_1').options[1..-1].map { | 
o| o.value }
tissues = %w( Colorectal )
tissues.each { |tissue|
   # chrom:    chromosome to display
   # site_1:   tissue to display
   # freqval:  number of consecutive homozygous markers to colour (1-6)
   # failmark: extend homozygous markup (yes = 1, no = 0)
   # sort:     sort the cell lines (by loss [high to low] = 1, by  
cell line [A-Z] = 0)
   options = {
     'site_1'   => tissue,
     'chrom'    => '1',
     'freqval'  => '1',
     'failmark' => '0',
     'sort'     => '1'
   # Post directly without going via the tissue selection form
   page = agent.post(url, options.merge('stage' => 'display'))

   # LOH data selection screen
   form = page.forms[1]
   form.checkboxes.each { |c| c.checked = true }
   options.merge('stage' => 'refine').each { |k, v| form.add_field! 
(k, v) }
   puts "about to submit at #{Time.now}"
   page = agent.submit(form)
   puts "submitted at #{Time.now}"

   # Write out results page to disk
   open("data/#{tissue.downcase}.html", 'w') { |f| f << page.body }

Thanks for your help,
Andy Stewart

More information about the Mechanize-users mailing list