From jack at groundbreakingsoftware.com Fri Mar 1 00:26:00 2013 From: jack at groundbreakingsoftware.com (Jack Royal-Gordon) Date: Thu, 28 Feb 2013 16:26:00 -0800 Subject: [Mechanize-users] Stupid(?) Cookie Question Message-ID: I could not find the answer to this question in the Mechanize docs (although I did find that WWW:Mechanize in PERL does do this): Are cookies received from one request handled by the Mechanize Agent automatically sent with the next request to the same instance of the agent (and same domain)? In other words, if I do agent.get('www.somewebsite.com'), receive a response, and then do agent.get('www.somewebsite.com/anotherpage'), will the cookies that were received with the response to the first request be sent with the second request? It makes sense that Mechanize would do so, since that would simulate the effect of running those two requests in a browser, but I thought I'd confirm since I'm having a problem with a website giving me '500's and I know that the website uses cookies. -------------- next part -------------- An HTML attachment was scrubbed... URL: From godfreykfc at gmail.com Fri Mar 1 02:11:29 2013 From: godfreykfc at gmail.com (Godfrey Chan) Date: Thu, 28 Feb 2013 18:11:29 -0800 (PST) Subject: [Mechanize-users] Stupid(?) Cookie Question In-Reply-To: References: Message-ID: <1362103888674.4ed0e204@Nodemailer> Yes, in the README: "The?Mechanize?library is used for automating interaction with websites.?Mechanize?automatically stores and sends cookies, follows redirects, and can follow links and submit forms. Form fields can be populated and submitted.?Mechanize?also keeps track of the sites that you have visited as a history." ? Sent from Mailbox for iPhone On Thu, Feb 28, 2013 at 6:09 PM, Jack Royal-Gordon wrote: > I could not find the answer to this question in the Mechanize docs (although I did find that WWW:Mechanize in PERL does do this): Are cookies received from one request handled by the Mechanize Agent automatically sent with the next request to the same instance of the agent (and same domain)? > In other words, if I do agent.get('www.somewebsite.com'), receive a response, and then do agent.get('www.somewebsite.com/anotherpage'), will the cookies that were received with the response to the first request be sent with the second request? > It makes sense that Mechanize would do so, since that would simulate the effect of running those two requests in a browser, but I thought I'd confirm since I'm having a problem with a website giving me '500's and I know that the website uses cookies. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jack at groundbreakingsoftware.com Thu Mar 21 05:13:06 2013 From: jack at groundbreakingsoftware.com (Jack Royal-Gordon) Date: Wed, 20 Mar 2013 22:13:06 -0700 Subject: [Mechanize-users] Problems parsing page encoded in Shift-JIS Message-ID: I'm posting this question to both mailing lists as I'm not sure whether it's a Mechanize problem or a Nokogiri problem. Using Nokogiri and Mechanize to load and parse a web page encoded with Shift-JIS. I have an HTML construct like: ... ... In case it's relevant, the response header has "Content-Type = text/html; charset=Shift_JIS". I'm trying to parse this with the following Ruby code: page = mechanize_agent.get(url) list = page.search("ul#test li").each {|item| item.search("a").each {|a| a.content}.join(" > ") } } I'd expect this to return ["abc > def > ghi", "abc > def > ghi", "abc > def > ghi"] but it returns ["abc > def > ghi123 > abc > def > ghi123 > abc > def > ghi123"]. However, if I save the page.body and then do page = Nokogiri::parse(saved_body) and repeat the code, then it behaves as expected. This is a simplified example. The actual HTML is (you can get this at "http://www.amazon.co.jp/dp/B006QP63LI"): and the result was: Kindle? > Kindle?? > Romance > Historical 1542? ? ?? > Romance > Historical 2627? ? ?? > Literature & Fiction > Genre Fiction > Historical but when the page was reloaded, I got the expected result: ["Kindle?X?g?A > Kindle?{ > Kindle?m?? > Romance > Historical", "?m?? > Romance > Historical", "?m?? > Literature & Fiction > Genre Fiction > Historical"] -------------- next part -------------- An HTML attachment was scrubbed... URL: