From sigmund at synflood.homelinux.org Sat Apr 3 01:57:47 2010 From: sigmund at synflood.homelinux.org (Sigmund Scheinbar) Date: Sat, 03 Apr 2010 07:57:47 +0200 Subject: [Mechanize-users] Gigabyte File Download Message-ID: <4BB6D8DB.40101@synflood.homelinux.org> Hi there! Is it possible to download a multi-gigabyte file (.avi) w/o loading it into memory (using mechanize)? How do i directly write it do disk? Cheers, Sigmund From justinbrinkerhoff at gmail.com Tue Apr 13 02:07:46 2010 From: justinbrinkerhoff at gmail.com (Justin Brinkerhoff) Date: Tue, 13 Apr 2010 00:07:46 -0600 Subject: [Mechanize-users] Form submit doesn't behave correctly, when form action returns results on same page. Message-ID: Hi, I am stumped on how to proceed with this problem. So I am building an application to scrape data off of the website, biblegateway.com, to get Bible passages, that I can then export the retrieved data to a file. So I am just trying to get the behavior correct before I write the Ruby script. So here is what I do: I fire up an irb console. irb I'll declare the required Ruby libraries require 'rubygems' require 'mechanize' I'll then create a new object of the Mechanize class. agent = Mechanize.new # Callilng WWW::Mechanize.new throws a warning message I'll then tell it what page to scrape. agent.get("http://www.biblegateway.com/passage") I'll then tell it to use the last form on the page, which is the one I am working with. form = agent.page.forms.last I'll then find the name of the fields, and set their values form.search1 = "John 3:16" form.version1 = "NKJV" That is all the options needed to get the results, so then submit the form. form.submit Now technically speaking, the form does in fact submit. That's not the problem. The problem is, Mechanize is designed to render the results from a new page to a new Mechanize::Page object. But how they have their website setup, the same page is rendered with the results then loaded on the page, and it uses a get method instead of a post method, and the URL ends up looking like: http://www.biblegateway.com/passage/?search=John%203:16&version=NKJV So, what I need to know is, what do I need to do to render the same page in a "get fashion" so to speak? The documentation is very difficult to pick apart, and I haven't had much luck with Google... Thank you in advance for the help. Hi, I am stumped on how to proceed with this problem. So I am building an application to scrape data off of the website, biblegateway.com, to get Bible passages, that I can then export the retrieved data to a file. So I am just trying to get the behavior correct before I write the Ruby script. So here is what I do: I fire up an irb console. irb I'll declare the required Ruby libraries require 'rubygems' require 'mechanize' I'll then create a new object of the Mechanize class. agent = Mechanize.new # Callilng WWW::Mechanize.new throws a warning message I'll then tell it what page to scrape. agent.get("http://www.biblegateway.com/passage") I'll then tell it to use the last form on the page, which is the one I am working with. form = agent.page.forms.last I'll then find the name of the fields, and set their values form.search1 = "John 3:16" form.version1 = "NKJV" That is all the options needed to get the results, so then submit the form. form.submit Now technically speaking, the form does in fact submit. That's not the problem. The problem is, Mechanize is designed to render the results from a new page to a new Mechanize::Page object. But how they have their website setup, the same page is rendered with the results then loaded on the page, and it uses a get method instead of a post method, and the URL ends up looking like: http://www.biblegateway.com/passage/?search=John%203:16&version=NKJV So, what I need to know is, what do I need to do to render the same page in a "get fashion" so to speak? The documentation is very difficult to pick apart, and I haven't had much luck with Google... Thank you in advance for the help. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ross at roscommonhq.com Wed Apr 28 03:49:41 2010 From: ross at roscommonhq.com (Ross Cameron) Date: Wed, 28 Apr 2010 17:49:41 +1000 Subject: [Mechanize-users] Adding a missing link to a Mechanize::Page Message-ID: <4BD7E895.1060005@roscommonhq.com> Some time ago, Jeremy Woertinnk posted a helpful solution to add missing fields to a Mechanize::Form, viz, @new_field = WWW::Mechanize::Form::Field.new("url", " http://www.justprofessionals.com") # shameless plug :) page.forms.first.fields << @new_field @new_page = page.forms.first.submit which worked a treat in fixing forms with badly formed or missing fields. New Problem: how do you add a link to a Mechanize::Page I've been researching the documentation(?) and the source code and playing around with it but haven't been able to crack this one. Can someone please point me in the right direction. Regards Ross -------------- next part -------------- An HTML attachment was scrubbed... URL: From jeremywoertink at gmail.com Thu Apr 29 00:59:55 2010 From: jeremywoertink at gmail.com (Jeremy Woertink) Date: Wed, 28 Apr 2010 21:59:55 -0700 Subject: [Mechanize-users] Adding a missing link to a Mechanize::Page In-Reply-To: <4BD7E895.1060005@roscommonhq.com> References: <4BD7E895.1060005@roscommonhq.com> Message-ID: hmm. well, I would create a new Nokogiri::XML::Node of an anchor tag, then create a new Mechanize::Page::Link which takes a node, mech and page then page.links << new_mechanize_link I think that might work for you. ~Jeremy On Wed, Apr 28, 2010 at 12:49 AM, Ross Cameron wrote: > Some time ago, Jeremy Woertinnk posted a helpful solution to add missing > fields to a Mechanize::Form, viz, > > @new_field = WWW::Mechanize::Form::Field.new("url", " > http://www.justprofessionals.com") # shameless plug :) > page.forms.first.fields << @new_field > @new_page = page.forms.first.submit > > which worked a treat in fixing forms with badly formed or missing fields. > > New Problem: how do you add a link to a Mechanize::Page > > I've been researching the documentation(?) and the source code and playing around with it but haven't been able to crack this one. > > Can someone please point me in the right direction. > > Regards > Ross > > > > > _______________________________________________ > Mechanize-users mailing list > Mechanize-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mechanize-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ross at roscommonhq.com Thu Apr 29 01:41:39 2010 From: ross at roscommonhq.com (Ross Cameron) Date: Thu, 29 Apr 2010 15:41:39 +1000 Subject: [Mechanize-users] Adding a missing link to a Mechanize::Page In-Reply-To: References: <4BD7E895.1060005@roscommonhq.com> Message-ID: <4BD91C13.10905@roscommonhq.com> Jeremy Thanks for that. I had created a Nokogiri::XML::Node OK for the anchor tag intending to add the constructed link as before (page.links << new_mechanize_link) It's the mech & page parameters of the Mechanize::Page::Link where things got interesting. I couldn't sort them out. Hoping you can provide some illumination. Regards Ross ------------------------------------------------------------------------ Ross Cameron | Director Roscommon Pty Ltd | ABN 85 099 499 840 p: +61 2 9016 4133 | m: +61 4 3312 9087 | f: +61 2 9420 4525 | w: www.roscommonhq.com | AIM: rossppc Roscommon uses the five sentences email reply policy. Please consider our environment before printing this email. NOTE: This email and any attachments may be confidential. If received in error, please delete the email. Because emails and attachments may be interfered with, may contain computer viruses or other defects and may not be successfully replicated on other systems, you must be cautious. Roscommon cannot guarantee that what you receive is what we sent. If you have any doubts about the authenticity of an email from Roscommon, please contact us immediately. Jeremy Woertink wrote: > hmm. well, > > I would create a new Nokogiri::XML::Node of an anchor tag, then create > a new Mechanize::Page::Link which takes a node, mech and page then > > page.links << new_mechanize_link > > I think that might work for you. > > ~Jeremy > > On Wed, Apr 28, 2010 at 12:49 AM, Ross Cameron > wrote: > > Some time ago, Jeremy Woertinnk posted a helpful solution to add > missing fields to a Mechanize::Form, viz, > > @new_field = WWW::Mechanize::Form::Field.new("url", " > http://www.justprofessionals.com") # shameless plug :) > page.forms.first.fields << @new_field > @new_page = page.forms.first.submit > > which worked a treat in fixing forms with badly formed or missing fields. > > New Problem: how do you add a link to a Mechanize::Page > > I've been researching the documentation(?) and the source code and playing around with it but haven't been able to crack this one. > > Can someone please point me in the right direction. > > Regards > Ross > > > > > > _______________________________________________ > Mechanize-users mailing list > Mechanize-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mechanize-users > > > ------------------------------------------------------------------------ > > _______________________________________________ > Mechanize-users mailing list > Mechanize-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mechanize-users -------------- next part -------------- An HTML attachment was scrubbed... URL: