From rdpoor at gmail.com Mon May 2 22:37:06 2011 From: rdpoor at gmail.com (Robert Poor) Date: Mon, 2 May 2011 19:37:06 -0700 Subject: [Mechanize-users] parsing and submitting a form from a page with missing content type? Message-ID: [cross posted on Nokogiri and Mechanize lists] I have a page that is missing a valid content type, but it appears to contain (mostly?) well formed HTML. My goal is to agent.click() on a button in the form, but I don't know the best way to get there. The first lines read: ... When I coerce Mechanize to use Mechanize::Page as the parser via: @agent.pluggable_parser.default = Mechanize::Page I get a Mechanize::ContentTypeError (okay, so maybe it isn't so well formed). But Nokogiri is able to parse it without error. So: what's the best way (or easiest way) to submit the form from the loaded page? I see two options: * xpath to the appropriate form using Nokogiri and then (somehow) pass the form to Mechanize to submit. But I'm not sure of the syntax for that. * write my own pluggable parser that somehow knows how to deal with the page. I'm even foggier on how that would work. Suggestions or examples are welcome... From rob_gar_esp at hotmail.com Thu May 12 12:55:03 2011 From: rob_gar_esp at hotmail.com (Rob GB) Date: Thu, 12 May 2011 11:55:03 -0500 Subject: [Mechanize-users] retrieve metadata Message-ID: Hi all, When I request a URL I get a refresh URL in the headers: \ When I ask to mechanize to list the metadata I do this: pp $page.meta I get: [#\ > > When I ask to mechanize to list the metadata I do this: > > pp $page.meta > > I get: > > [# "" > "http://localhost/html/Splash.action?splash=">] > > How can I cleanly retrieve the refresh URL? > > Thanks! > > > > _______________________________________________ > Mechanize-users mailing list > Mechanize-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mechanize-users From rob_gar_esp at hotmail.com Fri May 13 02:42:22 2011 From: rob_gar_esp at hotmail.com (Rob GB) Date: Fri, 13 May 2011 01:42:22 -0500 Subject: [Mechanize-users] retrieve metadata In-Reply-To: <4DCCBDED.1090802@sixserv.org> References: , <4DCCBDED.1090802@sixserv.org> Message-ID: Thumbs up :)Thanks! > Date: Fri, 13 May 2011 07:13:17 +0200 > From: apoc at sixserv.org > To: mechanize-users at rubyforge.org > Subject: Re: [Mechanize-users] retrieve metadata > > maybe $page.meta.first.href is what you're looking for? > > bye > apoc > > On 05/12/2011 06:55 PM, Rob GB wrote: > > > > Hi all, > > > > When I request a URL I get a refresh URL in the headers: > > > > \ > > > > When I ask to mechanize to list the metadata I do this: > > > > pp $page.meta > > > > I get: > > > > [# > "" > > "http://localhost/html/Splash.action?splash=">] > > > > How can I cleanly retrieve the refresh URL? > > > > Thanks! > > > > > > > > _______________________________________________ > > Mechanize-users mailing list > > Mechanize-users at rubyforge.org > > http://rubyforge.org/mailman/listinfo/mechanize-users > > _______________________________________________ > Mechanize-users mailing list > Mechanize-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mechanize-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From aross at opencongress.org Tue May 17 12:51:55 2011 From: aross at opencongress.org (Andrew Ross) Date: Tue, 17 May 2011 09:51:55 -0700 Subject: [Mechanize-users] Rebuild Browser State Later? Message-ID: Hi all, I'm looking for a way to completely save and rebuild the browser state to use for a later form submission. Since a Mechanize browser instance can't be serialized, this has proved difficult. I can see how you can easily store and re-load the cookie jar, but I also need to have the DOM in the exact state it was in previously. Does anyone know of an easy way to do this with mechanize? Thanks, Andy