From astahl at hi5.com Thu Jul 8 18:50:38 2010 From: astahl at hi5.com (Alex Stahl) Date: Thu, 8 Jul 2010 15:50:38 -0700 Subject: [Mechanize-users] How to access value of URL in a Page object? Message-ID: <1278629438.1888.73.camel@awstahl-t61> Hiya mechanize-users - I'm using mechanize to click a log out button on a website, which returns a Mechanize::Page object upon completion. When either 'puts'ing or 'inspect'ing this page object, I can see that the first hash (? it looks like a hash) contained within the object is the URL of the response page. There is no accessor for the URL, and trying to access as a member var fails. How can I programmatically get to that value? (Some assumptions - I'm already logged in when I call .get, and let's assume here that I always get the "logged out" page assigned to res). Sample code: req = Mechanize.new base = "http://www.example.com" res = req.click(req.get(base).link_with(:text => /log out/)) p res Included in the output: {url #} How do I get that url value? These don't work: res.url res(:url) And can I do so w/out having to resort to a search or an iterator? Thanks, Alex From astahl at hi5.com Thu Jul 8 19:35:20 2010 From: astahl at hi5.com (Alex Stahl) Date: Thu, 8 Jul 2010 16:35:20 -0700 Subject: [Mechanize-users] How to access value of URL in a Page object? In-Reply-To: <1278629438.1888.73.camel@awstahl-t61> References: <1278629438.1888.73.camel@awstahl-t61> Message-ID: <1278632120.1888.74.camel@awstahl-t61> res.uri On Thu, 2010-07-08 at 17:50 -0500, Alex Stahl wrote: > Hiya mechanize-users - > > I'm using mechanize to click a log out button on a website, which > returns a Mechanize::Page object upon completion. When either 'puts'ing > or 'inspect'ing this page object, I can see that the first hash (? it > looks like a hash) contained within the object is the URL of the > response page. There is no accessor for the URL, and trying to access > as a member var fails. How can I programmatically get to that value? > > (Some assumptions - I'm already logged in when I call .get, and let's > assume here that I always get the "logged out" page assigned to res). > > Sample code: > > req = Mechanize.new > base = "http://www.example.com" > res = req.click(req.get(base).link_with(:text => /log out/)) > p res > > Included in the output: > {url #} > > How do I get that url value? These don't work: > res.url > res(:url) > > And can I do so w/out having to resort to a search or an iterator? > > Thanks, > Alex > > _______________________________________________ > Mechanize-users mailing list > Mechanize-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mechanize-users From takto78 at gmail.com Fri Jul 9 02:36:01 2010 From: takto78 at gmail.com (Takashi Ogura) Date: Fri, 9 Jul 2010 15:36:01 +0900 Subject: [Mechanize-users] How to access value of URL in a Page object? In-Reply-To: <1278632120.1888.74.camel@awstahl-t61> References: <1278629438.1888.73.camel@awstahl-t61> <1278632120.1888.74.camel@awstahl-t61> Message-ID: Hi If you want to get url value as a character string, you can use the to_s method. req = Mechanize.new base = "http://www.example.com" res = req.click(req.get(base).link_with(:text => /log out/)) p res.uri.to_s #=> "http://www.example.com/login" regards, Takashi Ogura 2010/7/9 Alex Stahl : > res.uri > > > On Thu, 2010-07-08 at 17:50 -0500, Alex Stahl wrote: >> Hiya mechanize-users - >> >> I'm using mechanize to click a log out button on a website, which >> returns a Mechanize::Page object upon completion. ?When either 'puts'ing >> or 'inspect'ing this page object, I can see that the first hash (? it >> looks like a hash) contained within the object is the URL of the >> response page. ?There is no accessor for the URL, and trying to access >> as a member var fails. ?How can I programmatically get to that value? >> >> (Some assumptions - I'm already logged in when I call .get, and let's >> assume here that I always get the "logged out" page assigned to res). >> >> Sample code: >> >> req = Mechanize.new >> base = "http://www.example.com" >> res = req.click(req.get(base).link_with(:text => /log out/)) >> p res >> >> Included in the output: >> {url #} >> >> How do I get that url value? ?These don't work: >> res.url >> res(:url) >> >> And can I do so w/out having to resort to a search or an iterator? >> >> Thanks, >> Alex >> >> _______________________________________________ >> Mechanize-users mailing list >> Mechanize-users at rubyforge.org >> http://rubyforge.org/mailman/listinfo/mechanize-users > > > _______________________________________________ > Mechanize-users mailing list > Mechanize-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mechanize-users > From rdpoor at gmail.com Sat Jul 10 11:47:09 2010 From: rdpoor at gmail.com (Robert Poor) Date: Sat, 10 Jul 2010 08:47:09 -0700 Subject: [Mechanize-users] parsing a page from a string rather than from get()? Message-ID: I'm pretty sure this is easy, but haven't been able to figure it out from the docs or examples: If I already have an HTML page as a string, can I ask Mechanize to parse it into a Mechanize::Page object? As a contrived example: >> page = agent.get("http://www.google.com) => # >> raw_html = page.body => " References: Message-ID: Mechanize::Page.new(nil,{'content-type'=>'text/html'},raw_html,200,Mechanize.new) (For the last argument, you could also use the page's mechanize object, if you still have it around. Actually, for all the arguments you could use the page's, if you still have it around.) On Sat, Jul 10, 2010 at 8:47 AM, Robert Poor wrote: > I'm pretty sure this is easy, but haven't been able to figure it out from > the docs or examples: If I already have an HTML page as a string, can I ask > Mechanize to parse it into a Mechanize::Page object? > > As a contrived example: > > >> page = agent.get("http://www.google.com) > => # > >> raw_html = page.body > => " > Now: how can I ask Mechanize to parse raw_html back into a Mechanize::Page > structure? > > Thanks in advance. > > _______________________________________________ > Mechanize-users mailing list > Mechanize-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mechanize-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: From leondu at gmail.com Wed Jul 21 11:57:48 2010 From: leondu at gmail.com (Leon Du) Date: Wed, 21 Jul 2010 23:57:48 +0800 Subject: [Mechanize-users] weird url question when using Mechanize Message-ID: I am get a page which url contains special characters: > url = > http://www.example.comr/details-kontakt-1274/[theater]-Dimbeldu-Puppentheater-Kinderschminken-Maerchen-.html when I open it directly with Mechanize: > agent.get(url) I got: > URI::InvalidURIError: bad URI(is not URI?): > http://www.example.comr/details-kontakt-1274/[theater]-Dimbeldu-Puppentheater-Kinderschminken-Maerchen-.html from > /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/uri/common.rb:436:in > `split' from > /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/uri/common.rb:485:in > `parse' and if I escape the url with > agent.get(CGI.escape(url)) then I got: > RuntimeError: need absolute URL from > /Library/Ruby/Gems/1.8/gems/mechanize-1.0.0/lib/mechanize/chain/uri_resolver.rb:52:in > `handle' from > /Library/Ruby/Gems/1.8/gems/mechanize-1.0.0/lib/mechanize/chain.rb:24:in > `handle' from /Library/Ruby/Gems/1.8/gems/mechanize-1.0.0/lib/mechanize.rb:509:in > `fetch_page' from /Library/Ruby/Gems/1.8/gems/mechanize-1.0.0/lib/mechanize.rb:259:in > `get' could you please shed some light on this? -- Regards Leon Du -------------- next part -------------- An HTML attachment was scrubbed... URL: From alex at blackkettle.org Wed Jul 21 12:38:13 2010 From: alex at blackkettle.org (Alex Young) Date: Wed, 21 Jul 2010 17:38:13 +0100 Subject: [Mechanize-users] weird url question when using Mechanize In-Reply-To: References: Message-ID: <1279730293.25257.3.camel@redlodge> On Wed, 2010-07-21 at 23:57 +0800, Leon Du wrote: > http://www.example.comr/details-kontakt-1274/[theater]-Dimbeldu-Puppentheater-Kinderschminken-Maerchen-.html [ and ] aren't strictly legal characters in an href. I use this as a pre-filter: href.gsub(%r{[^%A-Za-z0-9:@\-._~!$&'\/\(\)*+,;=]}){|m| "%%%x" % m[0] } -- Alex From leondu at gmail.com Thu Jul 22 03:24:29 2010 From: leondu at gmail.com (Leon Du) Date: Thu, 22 Jul 2010 15:24:29 +0800 Subject: [Mechanize-users] weird url question when using Mechanize In-Reply-To: <1279730293.25257.3.camel@redlodge> References: <1279730293.25257.3.camel@redlodge> Message-ID: thanks Alex, that actually works :) but still my question is why the CGI.escape doesn't work? shouldn't it do the same? On Thu, Jul 22, 2010 at 12:38 AM, Alex Young wrote: > On Wed, 2010-07-21 at 23:57 +0800, Leon Du wrote: > > > http://www.example.comr/details-kontakt-1274/[theater]-Dimbeldu-Puppentheater-Kinderschminken-Maerchen-.html > > [ and ] aren't strictly legal characters in an href. I use this as a > pre-filter: > > href.gsub(%r{[^%A-Za-z0-9:@\-._~!$&'\/\(\)*+,;=]}){|m| "%%%x" % m[0] } > > -- > Alex > > _______________________________________________ > Mechanize-users mailing list > Mechanize-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mechanize-users > -- Regards Leon Du -------------- next part -------------- An HTML attachment was scrubbed... URL: From alex at blackkettle.org Thu Jul 22 04:10:05 2010 From: alex at blackkettle.org (Alex Young) Date: Thu, 22 Jul 2010 09:10:05 +0100 Subject: [Mechanize-users] weird url question when using Mechanize In-Reply-To: References: <1279730293.25257.3.camel@redlodge> Message-ID: <1279786205.2138.6.camel@redlodge> On Thu, 2010-07-22 at 15:24 +0800, Leon Du wrote: > thanks Alex, that actually works :) > > > but still my question is why the CGI.escape doesn't work? > shouldn't it do the same? The list of characters that CGI.escape escapes includes ':' and '/', so it turns "http://" into "http%3A%2F%2F". URI.parse doesn't recognise that as the start of an absolute URI. -- Alex > > > > > On Thu, Jul 22, 2010 at 12:38 AM, Alex Young > wrote: > On Wed, 2010-07-21 at 23:57 +0800, Leon Du wrote: > > > http://www.example.comr/details-kontakt-1274/[theater]-Dimbeldu-Puppentheater-Kinderschminken-Maerchen-.html > > > [ and ] aren't strictly legal characters in an href. I use > this as a > pre-filter: > > href.gsub(%r{[^%A-Za-z0-9:@\-._~!$&'\/\(\)*+,;=]}){|m| "%%%x" > % m[0] } > > -- > Alex > > _______________________________________________ > Mechanize-users mailing list > Mechanize-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mechanize-users > > > > -- > Regards > Leon Du > > _______________________________________________ > Mechanize-users mailing list > Mechanize-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mechanize-users From leondu at gmail.com Thu Jul 22 04:46:00 2010 From: leondu at gmail.com (Leon Du) Date: Thu, 22 Jul 2010 16:46:00 +0800 Subject: [Mechanize-users] weird url question when using Mechanize In-Reply-To: <1279786205.2138.6.camel@redlodge> References: <1279730293.25257.3.camel@redlodge> <1279786205.2138.6.camel@redlodge> Message-ID: yes, you are right, that is the problem, CGI escape just won't work together with URI.parse, isn't this a bug? On Thu, Jul 22, 2010 at 4:10 PM, Alex Young wrote: > On Thu, 2010-07-22 at 15:24 +0800, Leon Du wrote: > > thanks Alex, that actually works :) > > > > > > but still my question is why the CGI.escape doesn't work? > > shouldn't it do the same? > > The list of characters that CGI.escape escapes includes ':' and '/', so > it turns "http://" into "http%3A%2F%2F". URI.parse doesn't recognise > that as the start of an absolute URI. > > -- > Alex > > > > > > > > > > > On Thu, Jul 22, 2010 at 12:38 AM, Alex Young > > wrote: > > On Wed, 2010-07-21 at 23:57 +0800, Leon Du wrote: > > > > > > http://www.example.comr/details-kontakt-1274/[theater]-Dimbeldu-Puppentheater-Kinderschminken-Maerchen-.html > > > > > > [ and ] aren't strictly legal characters in an href. I use > > this as a > > pre-filter: > > > > href.gsub(%r{[^%A-Za-z0-9:@\-._~!$&'\/\(\)*+,;=]}){|m| "%%%x" > > % m[0] } > > > > -- > > Alex > > > > _______________________________________________ > > Mechanize-users mailing list > > Mechanize-users at rubyforge.org > > http://rubyforge.org/mailman/listinfo/mechanize-users > > > > > > > > -- > > Regards > > Leon Du > > > > _______________________________________________ > > Mechanize-users mailing list > > Mechanize-users at rubyforge.org > > http://rubyforge.org/mailman/listinfo/mechanize-users > > > _______________________________________________ > Mechanize-users mailing list > Mechanize-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mechanize-users > -- Regards Leon Du -------------- next part -------------- An HTML attachment was scrubbed... URL: From alex at blackkettle.org Thu Jul 22 06:02:32 2010 From: alex at blackkettle.org (Alex Young) Date: Thu, 22 Jul 2010 11:02:32 +0100 Subject: [Mechanize-users] weird url question when using Mechanize In-Reply-To: References: <1279730293.25257.3.camel@redlodge> <1279786205.2138.6.camel@redlodge> Message-ID: <1279792952.4820.2.camel@redlodge> On Thu, 2010-07-22 at 16:46 +0800, Leon Du wrote: > yes, you are right, that is the problem, > CGI escape just won't work together with URI.parse, > isn't this a bug? No. CGI.escape is intended for encoding strings *within* a URL so that you can, for instance, include path separators in your data string without the URL handlers at either end mistaking them for *actual* path separators. It's not supposed to handle encoding the URL itself. -- Alex > > > > > On Thu, Jul 22, 2010 at 4:10 PM, Alex Young > wrote: > On Thu, 2010-07-22 at 15:24 +0800, Leon Du wrote: > > thanks Alex, that actually works :) > > > > > > but still my question is why the CGI.escape doesn't work? > > shouldn't it do the same? > > > The list of characters that CGI.escape escapes includes ':' > and '/', so > it turns "http://" into "http%3A%2F%2F". URI.parse doesn't > recognise > that as the start of an absolute URI. > > -- > Alex > > > > > > > > > > > > On Thu, Jul 22, 2010 at 12:38 AM, Alex Young > > > wrote: > > On Wed, 2010-07-21 at 23:57 +0800, Leon Du wrote: > > > > > > http://www.example.comr/details-kontakt-1274/[theater]-Dimbeldu-Puppentheater-Kinderschminken-Maerchen-.html > > > > > > [ and ] aren't strictly legal characters in an href. > I use > > this as a > > pre-filter: > > > > href.gsub(%r{[^%A-Za-z0-9:@\-._~!$&'\/\(\)* > +,;=]}){|m| "%%%x" > > % m[0] } > > > > -- > > Alex > > > > _______________________________________________ > > Mechanize-users mailing list > > Mechanize-users at rubyforge.org > > > http://rubyforge.org/mailman/listinfo/mechanize-users > > > > > > > > -- > > Regards > > Leon Du > > > > _______________________________________________ > > Mechanize-users mailing list > > Mechanize-users at rubyforge.org > > http://rubyforge.org/mailman/listinfo/mechanize-users > > > _______________________________________________ > Mechanize-users mailing list > Mechanize-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mechanize-users > > > > > -- > Regards > Leon Du > > _______________________________________________ > Mechanize-users mailing list > Mechanize-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mechanize-users From leondu at gmail.com Thu Jul 22 22:02:54 2010 From: leondu at gmail.com (Leon Du) Date: Fri, 23 Jul 2010 10:02:54 +0800 Subject: [Mechanize-users] weird url question when using Mechanize In-Reply-To: <1279792952.4820.2.camel@redlodge> References: <1279730293.25257.3.camel@redlodge> <1279786205.2138.6.camel@redlodge> <1279792952.4820.2.camel@redlodge> Message-ID: thanks for the explanation :) On Thu, Jul 22, 2010 at 6:02 PM, Alex Young wrote: > On Thu, 2010-07-22 at 16:46 +0800, Leon Du wrote: > > yes, you are right, that is the problem, > > CGI escape just won't work together with URI.parse, > > isn't this a bug? > > No. CGI.escape is intended for encoding strings *within* a URL so that > you can, for instance, include path separators in your data string > without the URL handlers at either end mistaking them for *actual* path > separators. It's not supposed to handle encoding the URL itself. > > -- > Alex > > > > > > > > > > > On Thu, Jul 22, 2010 at 4:10 PM, Alex Young > > wrote: > > On Thu, 2010-07-22 at 15:24 +0800, Leon Du wrote: > > > thanks Alex, that actually works :) > > > > > > > > > but still my question is why the CGI.escape doesn't work? > > > shouldn't it do the same? > > > > > > The list of characters that CGI.escape escapes includes ':' > > and '/', so > > it turns "http://" into "http%3A%2F%2F". URI.parse doesn't > > recognise > > that as the start of an absolute URI. > > > > -- > > Alex > > > > > > > > > > > > > > > > > > > On Thu, Jul 22, 2010 at 12:38 AM, Alex Young > > > > > wrote: > > > On Wed, 2010-07-21 at 23:57 +0800, Leon Du wrote: > > > > > > > > > > http://www.example.comr/details-kontakt-1274/[theater]-Dimbeldu-Puppentheater-Kinderschminken-Maerchen-.html > > > > > > > > > [ and ] aren't strictly legal characters in an href. > > I use > > > this as a > > > pre-filter: > > > > > > href.gsub(%r{[^%A-Za-z0-9:@\-._~!$&'\/\(\)* > > +,;=]}){|m| "%%%x" > > > % m[0] } > > > > > > -- > > > Alex > > > > > > _______________________________________________ > > > Mechanize-users mailing list > > > Mechanize-users at rubyforge.org > > > > > http://rubyforge.org/mailman/listinfo/mechanize-users > > > > > > > > > > > > -- > > > Regards > > > Leon Du > > > > > > _______________________________________________ > > > Mechanize-users mailing list > > > Mechanize-users at rubyforge.org > > > http://rubyforge.org/mailman/listinfo/mechanize-users > > > > > > _______________________________________________ > > Mechanize-users mailing list > > Mechanize-users at rubyforge.org > > http://rubyforge.org/mailman/listinfo/mechanize-users > > > > > > > > > > -- > > Regards > > Leon Du > > > > _______________________________________________ > > Mechanize-users mailing list > > Mechanize-users at rubyforge.org > > http://rubyforge.org/mailman/listinfo/mechanize-users > > > _______________________________________________ > Mechanize-users mailing list > Mechanize-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mechanize-users > -- Regards Leon Du -------------- next part -------------- An HTML attachment was scrubbed... URL: