[Mechanize-users] Response To Form Submission Hanging
Andrew Stewart
boss at airbladesoftware.com
Mon Dec 11 05:15:40 EST 2006
Hi Aaron,
On 8 Dec 2006, at 23:20, Aaron Patterson wrote:
> How long exactly is the page? In the mega-bytes?
Just under 3.2 MB.
> I've seen stack traces similar to this when you get read timeouts. If
> the server is slow, that may be your problem.....
I wouldn't call the server fast but it seems good enough. When I do
what the script does myself in the browser, the page comes back in
under a minute.
> Also, there is sort of a bug in ruby net/http where large file will
> take a long time to download and max out your CPU. Try adding the
> following snippet and see if that helps. Note that it will only work
> for ruby versions > 1.8.2.
Thanks for the snippet. I'm on Ruby 1.8.4 and unfortunately the
snippet didn't appear to make any difference.
> If you can send a short sample that will reproduce the problem, I can
> help more.
I've pasted my code below. The part which hangs is the final line
using Mechanize: "page = agent.submit(form)".
----
require 'rubygems'
require 'mechanize'
require 'logger'
class Net::InternetMessageIO #:nodoc:
alias :old_rbuf_fill :rbuf_fill
def rbuf_fill
begin
@rbuf << @io.read_nonblock(65536)
rescue Errno::EWOULDBLOCK
if IO.select([@io], nil, nil, @read_timeout)
@rbuf << @io.read_nonblock(65536)
else
raise Timeout::TimeoutError
end
end
end
end
class SangerParser < WWW::Mechanize::Page
def initialize(uri = nil, response = nil, body = nil, code = nil)
# Ditch any runs of table data which contain only and at least 3
"<td>" or "</td>"s
body.gsub!(/(<\/?td>){3,}/, '')
# Ditch any title attributes which equal nothing at all
body.gsub!(/ title=>/i, '>')
super(uri, response, body, code)
end
end
logger = Logger.new('mech.log')
logger.level = Logger::DEBUG
agent = WWW::Mechanize.new { |a| a.log = logger }
agent.pluggable_parser.html = SangerParser
agent.user_agent_alias = 'Mac Safari'
agent.read_timeout = 120 # Does this make any difference?
# Tissue selection screen
url = 'http://www.sanger.ac.uk/cgi-bin/genetics/CGP/genotyping/lohmap'
page = agent.get(url)
# Just choose one tissue for now
#tissues = page.forms[1].fields.name('site_1').options[1..-1].map { |
o| o.value }
tissues = %w( Colorectal )
tissues.each { |tissue|
# chrom: chromosome to display
# site_1: tissue to display
# freqval: number of consecutive homozygous markers to colour (1-6)
# failmark: extend homozygous markup (yes = 1, no = 0)
# sort: sort the cell lines (by loss [high to low] = 1, by
cell line [A-Z] = 0)
options = {
'site_1' => tissue,
'chrom' => '1',
'freqval' => '1',
'failmark' => '0',
'sort' => '1'
}
# Post directly without going via the tissue selection form
page = agent.post(url, options.merge('stage' => 'display'))
# LOH data selection screen
form = page.forms[1]
form.checkboxes.each { |c| c.checked = true }
options.merge('stage' => 'refine').each { |k, v| form.add_field!
(k, v) }
puts "about to submit at #{Time.now}"
page = agent.submit(form)
puts "submitted at #{Time.now}"
# Write out results page to disk
open("data/#{tissue.downcase}.html", 'w') { |f| f << page.body }
}
----
Thanks for your help,
Andy Stewart
More information about the Mechanize-users
mailing list