From schapht at gmail.com Mon Jul 2 12:58:29 2007 From: schapht at gmail.com (Mat Schaffer) Date: Mon, 2 Jul 2007 12:58:29 -0400 Subject: [Mechanize-users] Basic auth bug in 0.6.9 Message-ID: <400AFABB-436A-4408-A765-2F57C83D581A@gmail.com> I haven't spent the time to get a proper test case for this yet, but there appears to be a bug in the basic_auth code for mechanize 0.6.9. I've attached a CSV (from Charles) that illustrates the problem. Basically when running with basic_auth, there's a failed request that's followed up by a successful request. That last POST is a agent.submit(form) which gets repeated as a GET. I'll email again if I get any thing more conclusive. Thanks, Mat -------------- next part -------------- A non-text attachment was scrubbed... Name: basic auth bug.csv Type: application/octet-stream Size: 917 bytes Desc: not available Url : http://rubyforge.org/pipermail/mechanize-users/attachments/20070702/0d50bbc6/attachment.obj From aaron at tenderlovemaking.com Mon Jul 2 12:10:50 2007 From: aaron at tenderlovemaking.com (Aaron Patterson) Date: Mon, 2 Jul 2007 09:10:50 -0700 Subject: [Mechanize-users] Basic auth bug in 0.6.9 In-Reply-To: <400AFABB-436A-4408-A765-2F57C83D581A@gmail.com> References: <400AFABB-436A-4408-A765-2F57C83D581A@gmail.com> Message-ID: <20070702161050.GA17423@mac-mini.lan> Hey Mat, On Mon, Jul 02, 2007 at 12:58:29PM -0400, Mat Schaffer wrote: > I haven't spent the time to get a proper test case for this yet, but > there appears to be a bug in the basic_auth code for mechanize > 0.6.9. I've attached a CSV (from Charles) that illustrates the problem. > > Basically when running with basic_auth, there's a failed request > that's followed up by a successful request. That last POST is a > agent.submit(form) which gets repeated as a GET. I think I know what the problem is.... It looks like I forgot to repeat the request with what was original method. So if you sent a post, then got a 401, mechanize will re-request with a GET. For now, you should downgrade to 0.6.8 and I'll fix this bug. I'm sorry everyone! :-( -- Aaron Patterson http://tenderlovemaking.com/ From schapht at gmail.com Mon Jul 2 15:00:35 2007 From: schapht at gmail.com (Mat Schaffer) Date: Mon, 2 Jul 2007 15:00:35 -0400 Subject: [Mechanize-users] Basic auth bug in 0.6.9 In-Reply-To: <20070702161050.GA17423@mac-mini.lan> References: <400AFABB-436A-4408-A765-2F57C83D581A@gmail.com> <20070702161050.GA17423@mac-mini.lan> Message-ID: <3C5CC81C-B5D9-483F-9390-02982AD2D57E@gmail.com> On Jul 2, 2007, at 12:10 PM, Aaron Patterson wrote: > I think I know what the problem is.... It looks like I forgot to > repeat > the request with what was original method. So if you sent a post, > then got a 401, mechanize will re-request with a GET. > > For now, you should downgrade to 0.6.8 and I'll fix this bug. I'm > sorry > everyone! :-( No problem at all! Thanks for the quick reply. But I'm a little concerned about the duplicate requests. I'm guessing you're doing it for security because if you assume that you should send the basic auth every time, you risk the user sending their auth string to an unintended site. Have you considered an optional third argument to basic_auth that would be a base path on which to send the information? i.e., basic_auth(user, pass, 'http://www.mysite.com/blah') causes all requests that start with 'http://www.mysite.com/blah' to send authorization. Extra requests can eat a lot of time if you're doing a lot of operations, so it'd be nice to have the option to reduce the number. Thanks again, Mat From aaron at tenderlovemaking.com Mon Jul 2 13:38:32 2007 From: aaron at tenderlovemaking.com (Aaron Patterson) Date: Mon, 2 Jul 2007 10:38:32 -0700 Subject: [Mechanize-users] Basic auth bug in 0.6.9 In-Reply-To: <3C5CC81C-B5D9-483F-9390-02982AD2D57E@gmail.com> References: <400AFABB-436A-4408-A765-2F57C83D581A@gmail.com> <20070702161050.GA17423@mac-mini.lan> <3C5CC81C-B5D9-483F-9390-02982AD2D57E@gmail.com> Message-ID: <20070702173832.GA17687@mac-mini.lan> On Mon, Jul 02, 2007 at 03:00:35PM -0400, Mat Schaffer wrote: > On Jul 2, 2007, at 12:10 PM, Aaron Patterson wrote: > > I think I know what the problem is.... It looks like I forgot to > > repeat > > the request with what was original method. So if you sent a post, > > then got a 401, mechanize will re-request with a GET. > > > > For now, you should downgrade to 0.6.8 and I'll fix this bug. I'm > > sorry > > everyone! :-( > > No problem at all! Thanks for the quick reply. > > But I'm a little concerned about the duplicate requests. I'm > guessing you're doing it for security because if you assume that you > should send the basic auth every time, you risk the user sending > their auth string to an unintended site. Not exactly. > > Have you considered an optional third argument to basic_auth that > would be a base path on which to send the information? i.e., > basic_auth(user, pass, 'http://www.mysite.com/blah') causes all > requests that start with 'http://www.mysite.com/blah' to send > authorization. > > Extra requests can eat a lot of time if you're doing a lot of > operations, so it'd be nice to have the option to reduce the number. You shouldn't get duplicate requests after the first 401 request. Basically the reason I want it to get the 401 is to determine if the site requires basic auth or digest auth. Once mechanize determines which scheme to use, it caches that setting for subsequent requests to that server. -- Aaron Patterson http://tenderlovemaking.com/ From schapht at gmail.com Mon Jul 2 17:49:31 2007 From: schapht at gmail.com (Mat Schaffer) Date: Mon, 2 Jul 2007 17:49:31 -0400 Subject: [Mechanize-users] Basic auth bug in 0.6.9 In-Reply-To: <20070702173832.GA17687@mac-mini.lan> References: <400AFABB-436A-4408-A765-2F57C83D581A@gmail.com> <20070702161050.GA17423@mac-mini.lan> <3C5CC81C-B5D9-483F-9390-02982AD2D57E@gmail.com> <20070702173832.GA17687@mac-mini.lan> Message-ID: <48709895-C89A-4A32-BD24-AE04F8CEED20@gmail.com> On Jul 2, 2007, at 1:38 PM, Aaron Patterson wrote: > You shouldn't get duplicate requests after the first 401 request. > Basically the reason I want it to get the 401 is to determine if the > site requires basic auth or digest auth. Once mechanize determines > which scheme to use, it caches that setting for subsequent requests to > that server. Hrm... perhaps I have something wrong in my script then. The CSV I sent you should show that it makes two requests for each page. This is all using the same agent, so I think there may be a problem there as well. If I get some time, I'll try to write up a simple case and send it along. Props if you beat me to it though :) -Mat From aaron at tenderlovemaking.com Mon Jul 2 15:57:14 2007 From: aaron at tenderlovemaking.com (Aaron Patterson) Date: Mon, 2 Jul 2007 12:57:14 -0700 Subject: [Mechanize-users] Basic auth bug in 0.6.9 In-Reply-To: <48709895-C89A-4A32-BD24-AE04F8CEED20@gmail.com> References: <400AFABB-436A-4408-A765-2F57C83D581A@gmail.com> <20070702161050.GA17423@mac-mini.lan> <3C5CC81C-B5D9-483F-9390-02982AD2D57E@gmail.com> <20070702173832.GA17687@mac-mini.lan> <48709895-C89A-4A32-BD24-AE04F8CEED20@gmail.com> Message-ID: <20070702195714.GA18098@mac-mini.lan> On Mon, Jul 02, 2007 at 05:49:31PM -0400, Mat Schaffer wrote: > > On Jul 2, 2007, at 1:38 PM, Aaron Patterson wrote: > > You shouldn't get duplicate requests after the first 401 request. > > Basically the reason I want it to get the 401 is to determine if the > > site requires basic auth or digest auth. Once mechanize determines > > which scheme to use, it caches that setting for subsequent requests to > > that server. > > Hrm... perhaps I have something wrong in my script then. The CSV I > sent you should show that it makes two requests for each page. This > is all using the same agent, so I think there may be a problem there > as well. If I get some time, I'll try to write up a simple case and > send it along. Props if you beat me to it though :) I see what the problem is.... I'm caching the auth stuff based on URL. Since each of those URL's change, it tries to re-auth. I'll just have it cache based on domain name, and that would take care of this problem. -- Aaron Patterson http://tenderlovemaking.com/ From schapht at gmail.com Mon Jul 2 17:49:31 2007 From: schapht at gmail.com (Mat Schaffer) Date: Mon, 2 Jul 2007 17:49:31 -0400 Subject: [Mechanize-users] Basic auth bug in 0.6.9 In-Reply-To: <20070702173832.GA17687@mac-mini.lan> References: <400AFABB-436A-4408-A765-2F57C83D581A@gmail.com> <20070702161050.GA17423@mac-mini.lan> <3C5CC81C-B5D9-483F-9390-02982AD2D57E@gmail.com> <20070702173832.GA17687@mac-mini.lan> Message-ID: <48709895-C89A-4A32-BD24-AE04F8CEED20@gmail.com> On Jul 2, 2007, at 1:38 PM, Aaron Patterson wrote: > You shouldn't get duplicate requests after the first 401 request. > Basically the reason I want it to get the 401 is to determine if the > site requires basic auth or digest auth. Once mechanize determines > which scheme to use, it caches that setting for subsequent requests to > that server. Hrm... perhaps I have something wrong in my script then. The CSV I sent you should show that it makes two requests for each page. This is all using the same agent, so I think there may be a problem there as well. If I get some time, I'll try to write up a simple case and send it along. Props if you beat me to it though :) -Mat From schapht at gmail.com Mon Jul 2 18:24:34 2007 From: schapht at gmail.com (Mat Schaffer) Date: Mon, 2 Jul 2007 18:24:34 -0400 Subject: [Mechanize-users] Basic auth bug in 0.6.9 In-Reply-To: <20070702195714.GA18098@mac-mini.lan> References: <400AFABB-436A-4408-A765-2F57C83D581A@gmail.com> <20070702161050.GA17423@mac-mini.lan> <3C5CC81C-B5D9-483F-9390-02982AD2D57E@gmail.com> <20070702173832.GA17687@mac-mini.lan> <48709895-C89A-4A32-BD24-AE04F8CEED20@gmail.com> <20070702195714.GA18098@mac-mini.lan> Message-ID: <38AA2EA4-150E-4855-8B39-6CF4294C89A3@gmail.com> On Jul 2, 2007, at 3:57 PM, Aaron Patterson wrote:\ > I see what the problem is.... I'm caching the auth stuff based on > URL. > Since each of those URL's change, it tries to re-auth. I'll just have > it cache based on domain name, and that would take care of this > problem. Yeah, that makes sense. Come to think of it, I wonder how firefox handles it if I came in on a similar path to my script (one deep url followed by in-site links). Seems like the server has to return a base-url for the authentication or something. I'll email you if I find anything conclusive. Thanks again, Mat From jeffrey.mclurkin at gmail.com Wed Jul 11 22:47:58 2007 From: jeffrey.mclurkin at gmail.com (jeffrey mclurkin) Date: Wed, 11 Jul 2007 19:47:58 -0700 Subject: [Mechanize-users] Basic_auth questions Message-ID: <72edc0310707111947p6fa2593el67d9c7ec79077e5a@mail.gmail.com> How do you make get the page when using basic_auth? Below is code, I am getting a 401 error. I am not sure if the basic _auth comes before the agent.get. I will appreciate any help. require 'mechanize' require 'logger' agent = WWW::Mechanize.new {|a| a.log = Logger.new(STDERR) } page = agent.get('https://brewx.qualcomm.com/developer' ) agent.basic_auth('username', 'password') page = agent.get('https://brewx.qualcomm.com/developer' ) Thanks, Jeff -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/mechanize-users/attachments/20070711/edff91ce/attachment.html From aaron at tenderlovemaking.com Wed Jul 11 22:36:51 2007 From: aaron at tenderlovemaking.com (Aaron Patterson) Date: Wed, 11 Jul 2007 19:36:51 -0700 Subject: [Mechanize-users] Basic_auth questions In-Reply-To: <72edc0310707111947p6fa2593el67d9c7ec79077e5a@mail.gmail.com> References: <72edc0310707111947p6fa2593el67d9c7ec79077e5a@mail.gmail.com> Message-ID: <20070712023651.GA3167@mac-mini.lan> On Wed, Jul 11, 2007 at 07:47:58PM -0700, jeffrey mclurkin wrote: > How do you make get the page when using basic_auth? Below is code, I am > getting a 401 error. I am not sure if the basic _auth comes before the > agent.get. I will appreciate any help. Make sure to do it before the get. Hope that helps! -- Aaron Patterson http://tenderlovemaking.com/ From ericp at activestate.com Thu Jul 12 14:49:36 2007 From: ericp at activestate.com (Eric Promislow) Date: Thu, 12 Jul 2007 11:49:36 -0700 Subject: [Mechanize-users] WWW::Mechanize::Link.inspect needs some TLC Message-ID: <469677C0.8060809@activestate.com> The problem: users trying to debug Mechanize apps with Komodo are finding the debugger times out once it's loaded a web page. They don't run into this in the ruby-debug debugger, or running in normal mode. The reason: Komodo's debugger is graphical, which means that whenever it hits a breakpoint it automatically shows the contents of each local variable. It has a limit on how much data it will retrieve, but it currently doesn't guard an object from loading too much data (which it should, but that's a separate bug). I can duplicate the cause of this crash in ruby-debug as well. Here's a sample session, with this code: require 'rubygems' require 'mechanize' require 'logger' agent = WWW::Mechanize.new { |a| a.log = Logger.new("mech.log") } agent.user_agent_alias = 'Mac Safari' page = agent.get("http://www.google.com/") search_form = page.forms.name("f").first search_form.fields.name("q").value = "bratislava tournament" search_results = agent.submit(search_form) puts search_results.body Given this ruby-debug session: bugs $ rdebug mechanize01.rb ./mechanize01.rb:1 require 'rubygems' (rdb:1) b 11 Set breakpoint 1 at mechanize01.rb:11 (rdb:1) c Breakpoint 1 at mechanize01.rb:11 ./mechanize01.rb:11 puts search_results.body (rdb:1) p page.links[0].inspect.size 1521039 # That's way too big, since the page is a simple google results page: (rdb:1) p page.body.size 3441 Using mechanize/inspect.rb: (rdb:1) p page.links[0].pretty_inspect "#\n" (rdb:1) p page.links[0].pretty_inspect.size 138 (rdb:1) p page.inspect.size 1480219 (rdb:1) p page.pretty_inspect.size 2172 With this change: --- mechanize/inspect.rb~ 2007-07-12 10:55:20.375000000 -0700 +++ mechanize/inspect.rb 2007-07-12 11:42:58.203125000 -0700 @@ -40,6 +40,7 @@ } } end + alias :inspect :pretty_inspect end class Link @@ -49,6 +50,7 @@ q.breakable; q.pp href } end + alias :inspect :pretty_inspect end class Form lib $ I get these much better results: (rdb:1) p page.links[0].inspect.size 138 (rdb:1) p page.body.size 3441 (rdb:1) p page.inspect.size 2172 Is this patch reasonable or have I missed something? Thanks, Eric From jeffrey.mclurkin at gmail.com Thu Jul 12 15:49:22 2007 From: jeffrey.mclurkin at gmail.com (jeffrey mclurkin) Date: Thu, 12 Jul 2007 12:49:22 -0700 Subject: [Mechanize-users] Basic_auth questions In-Reply-To: <20070712023651.GA3167@mac-mini.lan> References: <72edc0310707111947p6fa2593el67d9c7ec79077e5a@mail.gmail.com> <20070712023651.GA3167@mac-mini.lan> Message-ID: <72edc0310707121249k23e2a8d1vfe902509091145d0@mail.gmail.com> Thanks for response I am still having some problems now its the 403. Manually I can login with no probems, but there is a popup page to enter username and password. How do I process the username popup? require 'mechanize' require 'logger' CP_LOGGER = Logger.new( 'cp.log' ) CP_LOGGER.level = Logger::INFO agent = WWW::Mechanize.new {|a| a.log = Logger.new(STDERR) } agent.basic_auth('username', 'password') page = agent.get('https://' ) >ruby mechanize_altell.rb I, [2007-07-12T12:42:11.718000 #400] INFO -- : Net::HTTP::Get: /developer/ D, [2007-07-12T12:42:13.375000 #400] DEBUG -- : request-header: accept-language => en-us,en;q0.5 D, [2007-07-12T12:42:13.375000 #400] DEBUG -- : request-header: connection => keep-alive D, [2007-07-12T12:42:13.375000 #400] DEBUG -- : request-header: accept => */* D, [2007-07-12T12:42:13.375000 #400] DEBUG -- : request-header: accept-encoding => gzip,identity D, [2007-07-12T12:42:13.375000 #400] DEBUG -- : request-header: user-agent => WWW-Mechanize/0.6.8 (http://rubyforge.org/projects/mechanize/) D, [2007-07-12T12:42:13.375000 #400] DEBUG -- : request-header: authorization => Basic anl1QGdvdHZuZXR3b3Jrcy5jb206Z290dnJvY2tz D, [2007-07-12T12:42:13.375000 #400] DEBUG -- : request-header: accept-charset => ISO-8859-1,utf-8;q=0.7,*;q=0.7 D, [2007-07-12T12:42:13.375000 #400] DEBUG -- : request-header: keep-alive => 300 D, [2007-07-12T12:42:13.515000 #400] DEBUG -- : Read 0 bytes D, [2007-07-12T12:42:13.515000 #400] DEBUG -- : Read 726 bytes D, [2007-07-12T12:42:13.515000 #400] DEBUG -- : response-header: connection => Keep-Alive D, [2007-07-12T12:42:13.515000 #400] DEBUG -- : response-header: content-type => text/html; charset=iso-8859-1 D, [2007-07-12T12:42:13.515000 #400] DEBUG -- : response-header: date => Thu, 12 Jul 2007 19:42:20 GMT D, [2007-07-12T12:42:13.515000 #400] DEBUG -- : response-header: server => IBM_HTTP_Server D, [2007-07-12T12:42:13.515000 #400] DEBUG -- : response-header: content-length => 726 D, [2007-07-12T12:42:13.515000 #400] DEBUG -- : response-header: keep-alive => timeout=10, max=100 I, [2007-07-12T12:42:13.531000 #400] INFO -- : status: 403 c:/ruby/lib/ruby/gems/1.8/gems/mechanize-0.6.8/lib/mechanize.rb:172:in `get': 403 => Net::HTTPForbidden (WWW::Mechanize::ResponseCodeError) from mechanize_altell.rb:13 >Exit code: 1 On 7/11/07, Aaron Patterson wrote: > > On Wed, Jul 11, 2007 at 07:47:58PM -0700, jeffrey mclurkin wrote: > > How do you make get the page when using basic_auth? Below is code, I am > > getting a 401 error. I am not sure if the basic _auth comes before the > > agent.get. I will appreciate any help. > > Make sure to do it before the get. > > Hope that helps! > > -- > Aaron Patterson > http://tenderlovemaking.com/ > _______________________________________________ > Mechanize-users mailing list > Mechanize-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mechanize-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/mechanize-users/attachments/20070712/2823715a/attachment-0001.html From jeffrey.mclurkin at gmail.com Thu Jul 12 23:14:35 2007 From: jeffrey.mclurkin at gmail.com (jeffrey mclurkin) Date: Thu, 12 Jul 2007 20:14:35 -0700 Subject: [Mechanize-users] How do you handle pop ups? Message-ID: <72edc0310707122014le34e40agd6e4c14b41abd578@mail.gmail.com> When I click a link to download a file, a pop up window comes up to save the file. Is there a way to enter a file name and click the submit button with mechanize? Thanks, Jeff -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/mechanize-users/attachments/20070712/558f7018/attachment.html From aaron.patterson at gmail.com Fri Jul 13 11:50:15 2007 From: aaron.patterson at gmail.com (Aaron Patterson) Date: Fri, 13 Jul 2007 08:50:15 -0700 Subject: [Mechanize-users] How do you handle pop ups? In-Reply-To: <72edc0310707122014le34e40agd6e4c14b41abd578@mail.gmail.com> References: <72edc0310707122014le34e40agd6e4c14b41abd578@mail.gmail.com> Message-ID: <6959e1680707130850h6a11d249ie8054944e68d3706@mail.gmail.com> On 7/12/07, jeffrey mclurkin wrote: > > When I click a link to download a file, a pop up window comes up to save > the file. Is there a way to enter a file name and click the submit button > with mechanize? Mechanize normally won't automatically save the file for you. If you "click" the file you want to save, then call "save_as" on it, passing in a string, it will save the file as the string you passed in. For example: page = agent.get('some_url') file = page.click('some_link') file.save_as('my_file.txt') -- Aaron Patterson http://tenderlovemaking.com/ -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/mechanize-users/attachments/20070713/6510bbad/attachment.html From aaron.patterson at gmail.com Fri Jul 13 11:54:39 2007 From: aaron.patterson at gmail.com (Aaron Patterson) Date: Fri, 13 Jul 2007 08:54:39 -0700 Subject: [Mechanize-users] Basic_auth questions In-Reply-To: <72edc0310707121249k23e2a8d1vfe902509091145d0@mail.gmail.com> References: <72edc0310707111947p6fa2593el67d9c7ec79077e5a@mail.gmail.com> <20070712023651.GA3167@mac-mini.lan> <72edc0310707121249k23e2a8d1vfe902509091145d0@mail.gmail.com> Message-ID: <6959e1680707130854p340102acx5f539fd6bf3c6444@mail.gmail.com> On 7/12/07, jeffrey mclurkin wrote: > > Thanks for response I am still having some problems now its the 403. > Manually I can login with no probems, but there is a popup page to enter > username and password. How do I process the username popup? Is it actually a popup? Or is it a username/password prompt? If its just a username/password prompt, setting the username and password through the basic_auth method will do the trick. If it is actually a popup, you'll have to fill out the popup form. -- Aaron Patterson http://tenderlovemaking.com/ -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/mechanize-users/attachments/20070713/6813df85/attachment.html From aaron.patterson at gmail.com Fri Jul 13 12:13:18 2007 From: aaron.patterson at gmail.com (Aaron Patterson) Date: Fri, 13 Jul 2007 09:13:18 -0700 Subject: [Mechanize-users] WWW::Mechanize::Link.inspect needs some TLC In-Reply-To: <469677C0.8060809@activestate.com> References: <469677C0.8060809@activestate.com> Message-ID: <6959e1680707130913s6dee5402l89b530c88da1f0e1@mail.gmail.com> On 7/12/07, Eric Promislow wrote: > > The problem: users trying to debug Mechanize apps with Komodo > are finding the debugger times out once it's loaded a web > page. They don't run into this in the ruby-debug debugger, > or running in normal mode. > > The reason: Komodo's debugger is graphical, which means that > whenever it hits a breakpoint it automatically shows the > contents of each local variable. It has a limit on how > much data it will retrieve, but it currently doesn't guard > an object from loading too much data (which it should, but > that's a separate bug). > > I can duplicate the cause of this crash in ruby-debug as > well. Here's a sample session, with this code: > > require 'rubygems' > require 'mechanize' > require 'logger' > > agent = WWW::Mechanize.new { |a| a.log = Logger.new("mech.log") } > agent.user_agent_alias = 'Mac Safari' > page = agent.get("http://www.google.com/") > search_form = page.forms.name("f").first > search_form.fields.name("q").value = "bratislava tournament" > search_results = agent.submit(search_form) > puts search_results.body > > Given this ruby-debug session: > > bugs $ rdebug mechanize01.rb > ./mechanize01.rb:1 require 'rubygems' > (rdb:1) b 11 > Set breakpoint 1 at mechanize01.rb:11 > (rdb:1) c > Breakpoint 1 at mechanize01.rb:11 > ./mechanize01.rb:11 puts search_results.body > (rdb:1) p page.links[0].inspect.size > 1521039 > # That's way too big, since the page is a simple google results page: > (rdb:1) p page.body.size > 3441 > > Using mechanize/inspect.rb: > > (rdb:1) p page.links[0].pretty_inspect > "# http://www.google.ca/ig%3Fhl%3Den&usg=AFQjCNH9TTed08sJL_DKraFsuSMDFvW1gw\ > ">\n" > (rdb:1) p page.links[0].pretty_inspect.size > 138 > (rdb:1) p page.inspect.size > 1480219 > (rdb:1) p page.pretty_inspect.size > 2172 > > With this change: > --- mechanize/inspect.rb~ 2007-07-12 10:55:20.375000000 -0700 > +++ mechanize/inspect.rb 2007-07-12 11:42:58.203125000 -0700 > @@ -40,6 +40,7 @@ > } > } > end > + alias :inspect :pretty_inspect > end > > class Link > @@ -49,6 +50,7 @@ > q.breakable; q.pp href > } > end > + alias :inspect :pretty_inspect > end > > class Form > lib $ > > I get these much better results: > > (rdb:1) p page.links[0].inspect.size > 138 > (rdb:1) p page.body.size > 3441 > (rdb:1) p page.inspect.size > 2172 > > Is this patch reasonable or have I missed something? This patch seems reasonable. I'll apply it and make sure the tests pass. :-) -- Aaron Patterson http://tenderlovemaking.com/ -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/mechanize-users/attachments/20070713/92713504/attachment.html From barjunk at attglobal.net Fri Jul 13 14:57:32 2007 From: barjunk at attglobal.net (barsalou) Date: Fri, 13 Jul 2007 10:57:32 -0800 Subject: [Mechanize-users] How do you handle pop ups? In-Reply-To: <6959e1680707130850h6a11d249ie8054944e68d3706@mail.gmail.com> References: <72edc0310707122014le34e40agd6e4c14b41abd578@mail.gmail.com> <6959e1680707130850h6a11d249ie8054944e68d3706@mail.gmail.com> Message-ID: <20070713105732.m9hnovgi0ww04gcg@lcgalaska.com> Quoting Aaron Patterson : > On 7/12/07, jeffrey mclurkin wrote: >> >> When I click a link to download a file, a pop up window comes up to save >> the file. Is there a way to enter a file name and click the submit button >> with mechanize? > > > Mechanize normally won't automatically save the file for you. If you > "click" the file you want to save, then call "save_as" on it, passing in a > string, it will save the file as the string you passed in. For example: > > page = agent.get('some_url') > file = page.click('some_link') > file.save_as('my_file.txt') > > This looks like a great example to have in your set of examples. Mike B. ---------------------------------------------------------------- This message was sent using IMP, the Internet Messaging Program. From hazynrg at gmail.com Sun Jul 15 09:35:37 2007 From: hazynrg at gmail.com (hazynrg) Date: Sun, 15 Jul 2007 15:35:37 +0200 Subject: [Mechanize-users] rejected form not handled Message-ID: <5c4f3a5c0707150635m55c8e06ar8413c291b9f9ed68@mail.gmail.com> Hello, I have a login form on /login.php which POSTs to /dorf1.php when access is granted and to /login.php when it is denied. require 'rubygems' require 'mechanize' agent = WWW:Mechanize.new() login = agent.get("http://server/login.php") form = login.forms.action("dorf1.php") form.fields[2].value = "wronguser" # login form.fields[3].value = "wrongpass" # password dorf1 = form.submit() dorf1.uri # => # But the page we got was "login.php" (a bit altered: "access denied", etc.) So the URI of the page returned by form.submit() isn't updated if there is a redirect, please fix :) Best regards. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/mechanize-users/attachments/20070715/f02ca66b/attachment.html From schapht at gmail.com Mon Jul 16 10:20:36 2007 From: schapht at gmail.com (Mat Schaffer) Date: Mon, 16 Jul 2007 10:20:36 -0400 Subject: [Mechanize-users] rejected form not handled In-Reply-To: <5c4f3a5c0707150635m55c8e06ar8413c291b9f9ed68@mail.gmail.com> References: <5c4f3a5c0707150635m55c8e06ar8413c291b9f9ed68@mail.gmail.com> Message-ID: On Jul 15, 2007, at 9:35 AM, hazynrg wrote: > Hello, > I have a login form on /login.php which POSTs to /dorf1.php when > access is granted and to /login.php when it is denied. > > require 'rubygems' > require 'mechanize' > agent = WWW:Mechanize.new () > login = agent.get("http://server/login.php") > form = login.forms.action("dorf1.php") > form.fields[2].value = "wronguser" # login > form.fields [3].value = "wrongpass" # password > dorf1 = form.submit() > dorf1.uri > # => # > > But the page we got was " login.php" (a bit altered: "access > denied", etc.) > So the URI of the page returned by form.submit() isn't updated if > there is a redirect, please fix :) > Best regards. Mechanize doesn't support javascript, so if you're changing target based on that, it won't work. If you're using a html meta tag to redirect from dorf1.php back to login.php on failure, I don't think that would work either. But a 302 (header("Location: login.php")) should work. -Mat -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/mechanize-users/attachments/20070716/f6122411/attachment-0001.html From hazynrg at gmail.com Mon Jul 16 12:13:22 2007 From: hazynrg at gmail.com (hazynrg) Date: Mon, 16 Jul 2007 18:13:22 +0200 Subject: [Mechanize-users] rejected form not handled In-Reply-To: References: <5c4f3a5c0707150635m55c8e06ar8413c291b9f9ed68@mail.gmail.com> Message-ID: <5c4f3a5c0707160913p4323cb30ld4570bc22d2d84a@mail.gmail.com> Okay, i now understand. Should i get the "302" in dorf1.code? i have 200 there :( Do you have any idea on the way to check that the access have been granted (or not)? 2007/7/16, Mat Schaffer : > > On Jul 15, 2007, at 9:35 AM, hazynrg wrote: > > Hello, > I have a login form on /login.php which POSTs to /dorf1.php when access is > granted and to /login.php when it is denied. > > require 'rubygems' > require 'mechanize' > agent = WWW:Mechanize.new () > login = agent.get("http://server/login.php") > form = login.forms.action("dorf1.php") > form.fields[2].value = "wronguser" # login > form.fields [3].value = "wrongpass" # password > dorf1 = form.submit() > dorf1.uri > # => # > > But the page we got was " login.php" (a bit altered: "access denied", > etc.) > So the URI of the page returned by form.submit() isn't updated if there is > a redirect, please fix :) > Best regards. > > > Mechanize doesn't support javascript, so if you're changing target based > on that, it won't work. If you're using a html meta tag to redirect from > dorf1.php back to login.php on failure, I don't think that would work > either. But a 302 (header("Location: login.php")) should work. > -Mat > > _______________________________________________ > Mechanize-users mailing list > Mechanize-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mechanize-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/mechanize-users/attachments/20070716/dbed7b41/attachment.html From schapht at gmail.com Mon Jul 16 12:59:39 2007 From: schapht at gmail.com (Mat Schaffer) Date: Mon, 16 Jul 2007 12:59:39 -0400 Subject: [Mechanize-users] rejected form not handled In-Reply-To: <5c4f3a5c0707160913p4323cb30ld4570bc22d2d84a@mail.gmail.com> References: <5c4f3a5c0707150635m55c8e06ar8413c291b9f9ed68@mail.gmail.com> <5c4f3a5c0707160913p4323cb30ld4570bc22d2d84a@mail.gmail.com> Message-ID: <7E347210-25BD-4A81-9EFA-87B0D4DFEFCF@gmail.com> At the HTTP level, access granted (or not) is determined by the return codes (403 being a common access denied message). Mechanize can access this via Page#code: agent.get(url).code == "403" But if your "access denied" message is on an HTML page that gets returned with a 200 code, consider using Page#search to look for the message (assuming you have some class tags that could identify it): agent.get(url).search("div.error").find { |e| e.inner_html =~ /denied/ i } # find the first div of class "error" that has "denied" in it's contents Or something like that anyway.... I'm sure there's a prettier way to do that search. Check out http://code.whytheluckystiff.net/hpricot/ wiki/HpricotBasics for some more info on the search syntax. -Mat On Jul 16, 2007, at 12:13 PM, hazynrg wrote: > Okay, i now understand. > Should i get the "302" in dorf1.code? i have 200 there :( > Do you have any idea on the way to check that the access have been > granted (or not)? > > 2007/7/16, Mat Schaffer < schapht at gmail.com>: > On Jul 15, 2007, at 9:35 AM, hazynrg wrote: > >> Hello, >> I have a login form on /login.php which POSTs to /dorf1.php when >> access is granted and to /login.php when it is denied. >> >> require 'rubygems' >> require 'mechanize' >> agent = WWW:Mechanize.new () >> login = agent.get(" http://server/login.php") >> form = login.forms.action("dorf1.php") >> form.fields[2].value = "wronguser" # login >> form.fields [3].value = "wrongpass" # password >> dorf1 = form.submit () >> dorf1.uri >> # => # >> >> But the page we got was " login.php" (a bit altered: "access >> denied", etc.) >> So the URI of the page returned by form.submit() isn't updated if >> there is a redirect, please fix :) >> Best regards. > > Mechanize doesn't support javascript, so if you're changing target > based on that, it won't work. If you're using a html meta tag to > redirect from dorf1.php back to login.php on failure, I don't think > that would work either. But a 302 (header("Location: login.php")) > should work. > -Mat > > _______________________________________________ > Mechanize-users mailing list > Mechanize-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mechanize-users > > _______________________________________________ > Mechanize-users mailing list > Mechanize-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mechanize-users -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/mechanize-users/attachments/20070716/752b33f8/attachment.html From barjunk at attglobal.net Mon Jul 23 13:16:17 2007 From: barjunk at attglobal.net (barsalou) Date: Mon, 23 Jul 2007 09:16:17 -0800 Subject: [Mechanize-users] dependencies Message-ID: <20070723091617.b1in743ns4owkkgk@lcgalaska.com> I knew that hpricot was a dependency, but hoe and rubyforge....is this right? Version 0.6.9 for ruby Mike B. ---------------------------------------------------------------- This message was sent using IMP, the Internet Messaging Program. From aaron at tenderlovemaking.com Mon Jul 23 10:31:01 2007 From: aaron at tenderlovemaking.com (Aaron Patterson) Date: Mon, 23 Jul 2007 07:31:01 -0700 Subject: [Mechanize-users] dependencies In-Reply-To: <20070723091617.b1in743ns4owkkgk@lcgalaska.com> References: <20070723091617.b1in743ns4owkkgk@lcgalaska.com> Message-ID: <20070723143101.GA21415@mac-mini.lan> On Mon, Jul 23, 2007 at 09:16:17AM -0800, barsalou wrote: > I knew that hpricot was a dependency, but hoe and rubyforge....is this right? Yes. Hoe is required for running the tests. -- Aaron Patterson http://tenderlovemaking.com/ From barjunk at attglobal.net Mon Jul 23 14:11:02 2007 From: barjunk at attglobal.net (barsalou) Date: Mon, 23 Jul 2007 10:11:02 -0800 Subject: [Mechanize-users] dependencies In-Reply-To: <20070723143101.GA21415@mac-mini.lan> References: <20070723091617.b1in743ns4owkkgk@lcgalaska.com> <20070723143101.GA21415@mac-mini.lan> Message-ID: <20070723101102.6jl5gjzsgswo8og8@lcgalaska.com> Quoting Aaron Patterson : > On Mon, Jul 23, 2007 at 09:16:17AM -0800, barsalou wrote: >> I knew that hpricot was a dependency, but hoe and rubyforge....is >> this right? > > Yes. Hoe is required for running the tests. > > -- > Aaron Patterson > http://tenderlovemaking.com/ > _______________________________________________ > Mechanize-users mailing list > Mechanize-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mechanize-users > I see. Thanks. ---------------------------------------------------------------- This message was sent using IMP, the Internet Messaging Program. From barjunk at attglobal.net Mon Jul 23 14:16:59 2007 From: barjunk at attglobal.net (barsalou) Date: Mon, 23 Jul 2007 10:16:59 -0800 Subject: [Mechanize-users] Design ideas Message-ID: <20070723101659.js32ayc7wggcs0gw@lcgalaska.com> I'm trying to use mechanize against a site that has four fields in the form. However, those four fields have to be filled in order. So putting something in field one, populates the second field drop down. So I'm thinking that I'll probably have to call the page multiple times? What sort of things should I be doing to figure out how to interact with this page. I'm currently just using trial and error, but thought I might tap the experience here to see if anyone has run into this sort of site before. Thanks for any guidance. Mike B. ---------------------------------------------------------------- This message was sent using IMP, the Internet Messaging Program. From aaron at tenderlovemaking.com Mon Jul 23 11:21:58 2007 From: aaron at tenderlovemaking.com (Aaron Patterson) Date: Mon, 23 Jul 2007 08:21:58 -0700 Subject: [Mechanize-users] Design ideas In-Reply-To: <20070723101659.js32ayc7wggcs0gw@lcgalaska.com> References: <20070723101659.js32ayc7wggcs0gw@lcgalaska.com> Message-ID: <20070723152158.GA21581@mac-mini.lan> On Mon, Jul 23, 2007 at 10:16:59AM -0800, barsalou wrote: > I'm trying to use mechanize against a site that has four fields in the > form. However, those four fields have to be filled in order. > > So putting something in field one, populates the second field drop down. > > So I'm thinking that I'll probably have to call the page multiple times? > > What sort of things should I be doing to figure out how to interact > with this page. How does the browser do it? Are the fields populated via Javascript? > > I'm currently just using trial and error, but thought I might tap the > experience here to see if anyone has run into this sort of site before. > > Thanks for any guidance. Typically when I'm scripting a site, I'll try to follow what the browser does as closely as possible. If I'm not sure exactly what the browser is doing I'll open up LiveHTTPHeaders. http://livehttpheaders.mozdev.org/ Hope that helps. -- Aaron Patterson http://tenderlovemaking.com/ From barjunk at attglobal.net Mon Jul 23 15:01:33 2007 From: barjunk at attglobal.net (barsalou) Date: Mon, 23 Jul 2007 11:01:33 -0800 Subject: [Mechanize-users] Design ideas In-Reply-To: <20070723152158.GA21581@mac-mini.lan> References: <20070723101659.js32ayc7wggcs0gw@lcgalaska.com> <20070723152158.GA21581@mac-mini.lan> Message-ID: <20070723110133.pud9ret0zgkcos08@lcgalaska.com> Quoting Aaron Patterson : > On Mon, Jul 23, 2007 at 10:16:59AM -0800, barsalou wrote: >> I'm trying to use mechanize against a site that has four fields in the >> form. However, those four fields have to be filled in order. >> >> So putting something in field one, populates the second field drop down. >> >> So I'm thinking that I'll probably have to call the page multiple times? >> >> What sort of things should I be doing to figure out how to interact >> with this page. > > How does the browser do it? Are the fields populated via Javascript? Yes. This seems like the way things are being done. Does that make this an impossible task? Should I handle a site with this much JS differently than I would other sites? > >> >> I'm currently just using trial and error, but thought I might tap the >> experience here to see if anyone has run into this sort of site before. >> >> Thanks for any guidance. > > Typically when I'm scripting a site, I'll try to follow what the browser > does as closely as possible. If I'm not sure exactly what the browser > is doing I'll open up LiveHTTPHeaders. > > http://livehttpheaders.mozdev.org/ > > Hope that helps. > I'll take a look at that. Thanks for the suggestion. ---------------------------------------------------------------- This message was sent using IMP, the Internet Messaging Program. From barjunk at attglobal.net Mon Jul 23 18:45:02 2007 From: barjunk at attglobal.net (barsalou) Date: Mon, 23 Jul 2007 14:45:02 -0800 Subject: [Mechanize-users] What does {bogusetag } mean? Message-ID: <20070723144502.p0l7wekf408wkwoo@lcgalaska.com> I got this in one of my pages, and I'm not exactly sure what it means. ideas? Mike B. ---------------------------------------------------------------- This message was sent using IMP, the Internet Messaging Program. From schapht at gmail.com Tue Jul 24 11:09:11 2007 From: schapht at gmail.com (Mat Schaffer) Date: Tue, 24 Jul 2007 11:09:11 -0400 Subject: [Mechanize-users] dependencies In-Reply-To: <20070723143101.GA21415@mac-mini.lan> References: <20070723091617.b1in743ns4owkkgk@lcgalaska.com> <20070723143101.GA21415@mac-mini.lan> Message-ID: On Jul 23, 2007, at 10:31 AM, Aaron Patterson wrote: > On Mon, Jul 23, 2007 at 09:16:17AM -0800, barsalou wrote: >> I knew that hpricot was a dependency, but hoe and rubyforge....is >> this right? > > Yes. Hoe is required for running the tests. As an FYI, you might be able to avoid this by wrapping your require 'hoe' in a begin/rescue and providing a simple test task in the rescue block. That way you'd need hoe to do any of the packaging tasks, but tests can still run without it. -Mat From schapht at gmail.com Tue Jul 24 11:13:59 2007 From: schapht at gmail.com (Mat Schaffer) Date: Tue, 24 Jul 2007 11:13:59 -0400 Subject: [Mechanize-users] Design ideas In-Reply-To: <20070723110133.pud9ret0zgkcos08@lcgalaska.com> References: <20070723101659.js32ayc7wggcs0gw@lcgalaska.com> <20070723152158.GA21581@mac-mini.lan> <20070723110133.pud9ret0zgkcos08@lcgalaska.com> Message-ID: <57B31365-B14A-403E-8EC5-571D51279F7C@gmail.com> On Jul 23, 2007, at 3:01 PM, barsalou wrote: > Yes. This seems like the way things are being done. Does that make > this an impossible task? Should I handle a site with this much JS > differently than I would other sites? I've used mechanize for heavy JS sites (like hotmail) by reverse engineering just enough of the javascript to make the appropriate HTTP call. I usually work out the logic then port it to ruby so I can use the same basic structure as the main website. Not exactly easy, but doable. In addition to LiveHTTPHeaders I also use TamperData [1], Firebug [2] and Charles [3] for this sort of work. -Mat [1] https://addons.mozilla.org/en-US/firefox/addon/966 [2] https://addons.mozilla.org/en-US/firefox/addon/1843 [3] http://www.xk72.com/charles/ From schapht at gmail.com Tue Jul 24 11:15:49 2007 From: schapht at gmail.com (Mat Schaffer) Date: Tue, 24 Jul 2007 11:15:49 -0400 Subject: [Mechanize-users] Jruby + Rhino = Javascript support? Message-ID: Hey Aaron, I'm just thinking out loud here, but have you considered the possibility of using the Rhino [1] library to implement javascript support in mechanize? It'd create a jruby dependency for that feature, but still. Just thought I'd bring it up while I was thinking about it. -Mat [1] http://www.mozilla.org/rhino/ From aaron at tenderlovemaking.com Tue Jul 24 08:17:40 2007 From: aaron at tenderlovemaking.com (Aaron Patterson) Date: Tue, 24 Jul 2007 05:17:40 -0700 Subject: [Mechanize-users] What does {bogusetag } mean? In-Reply-To: <20070723144502.p0l7wekf408wkwoo@lcgalaska.com> References: <20070723144502.p0l7wekf408wkwoo@lcgalaska.com> Message-ID: <20070724121740.GA25448@mac-mini.lan> On Mon, Jul 23, 2007 at 02:45:02PM -0800, barsalou wrote: > I got this in one of my pages, and I'm not exactly sure what it means. > > ideas? This is definitely a message from Hpricot. Can you provide an example to reproduce it? -- Aaron Patterson http://tenderlovemaking.com/ From aaron at tenderlovemaking.com Tue Jul 24 08:26:04 2007 From: aaron at tenderlovemaking.com (Aaron Patterson) Date: Tue, 24 Jul 2007 05:26:04 -0700 Subject: [Mechanize-users] Jruby + Rhino = Javascript support? In-Reply-To: References: Message-ID: <20070724122604.GB25448@mac-mini.lan> On Tue, Jul 24, 2007 at 11:15:49AM -0400, Mat Schaffer wrote: > Hey Aaron, I'm just thinking out loud here, but have you considered > the possibility of using the Rhino [1] library to implement > javascript support in mechanize? It'd create a jruby dependency for > that feature, but still. Just thought I'd bring it up while I was > thinking about it. Yes, definitely. I've been thinking (and working) on a javascript solution for a while. I haven't played around with Rhino though... Basically what I've been working on is a way to convert Javascript to ruby. Then I can just eval the converted JS. You can check out what I've done so far here: http://rubyforge.org/projects/rkelly/ I've also got a copy of mechanize on my laptop with RKelly integrated, and it works with simple javascript (including setting form values!). Unfortunately I've been really busy lately so I haven't been able to pay attention to RKelly as much as I'd like. Also here is a short example of what it can do so far: http://tenderlovemaking.com/2007/05/22/more-ecma-awesomeness/ -- Aaron Patterson http://tenderlovemaking.com/ From barjunk at attglobal.net Tue Jul 24 12:29:22 2007 From: barjunk at attglobal.net (barsalou) Date: Tue, 24 Jul 2007 08:29:22 -0800 Subject: [Mechanize-users] Jruby + Rhino = Javascript support? In-Reply-To: <20070724122604.GB25448@mac-mini.lan> References: <20070724122604.GB25448@mac-mini.lan> Message-ID: <20070724082922.fl5x54wnd444gsc8@lcgalaska.com> This makes me wonder if its possible to call javascript functions. For example, I have a web page that when you type data, it retrieves the data for the next field with javascript. I was wondering if I could just "submit" this request directly. Currently how does one do these sorts of things, or is rkelly the kind of thing that could make that happen? Mike B. Quoting Aaron Patterson : > On Tue, Jul 24, 2007 at 11:15:49AM -0400, Mat Schaffer wrote: >> Hey Aaron, I'm just thinking out loud here, but have you considered >> the possibility of using the Rhino [1] library to implement >> javascript support in mechanize? It'd create a jruby dependency for >> that feature, but still. Just thought I'd bring it up while I was >> thinking about it. > > Yes, definitely. I've been thinking (and working) on a javascript > solution for a while. I haven't played around with Rhino though... > Basically what I've been working on is a way to convert Javascript to > ruby. Then I can just eval the converted JS. > > You can check out what I've done so far here: > > http://rubyforge.org/projects/rkelly/ > > I've also got a copy of mechanize on my laptop with RKelly integrated, > and it works with simple javascript (including setting form values!). > Unfortunately I've been really busy lately so I haven't been able to pay > attention to RKelly as much as I'd like. Also here is a short example > of what it can do so far: > > http://tenderlovemaking.com/2007/05/22/more-ecma-awesomeness/ > > -- > Aaron Patterson > http://tenderlovemaking.com/ > _______________________________________________ > Mechanize-users mailing list > Mechanize-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mechanize-users > ---------------------------------------------------------------- This message was sent using IMP, the Internet Messaging Program. From barjunk at attglobal.net Tue Jul 24 12:31:54 2007 From: barjunk at attglobal.net (barsalou) Date: Tue, 24 Jul 2007 08:31:54 -0800 Subject: [Mechanize-users] Design ideas In-Reply-To: <57B31365-B14A-403E-8EC5-571D51279F7C@gmail.com> References: <20070723101659.js32ayc7wggcs0gw@lcgalaska.com> <20070723152158.GA21581@mac-mini.lan> <20070723110133.pud9ret0zgkcos08@lcgalaska.com> <57B31365-B14A-403E-8EC5-571D51279F7C@gmail.com> Message-ID: <20070724083154.871tw8ece8w0w0cc@lcgalaska.com> Quoting Mat Schaffer : > On Jul 23, 2007, at 3:01 PM, barsalou wrote: >> Yes. This seems like the way things are being done. Does that make >> this an impossible task? Should I handle a site with this much JS >> differently than I would other sites? > > I've used mechanize for heavy JS sites (like hotmail) by reverse > engineering just enough of the javascript to make the appropriate > HTTP call. I usually work out the logic then port it to ruby so I > can use the same basic structure as the main website. Not exactly > easy, but doable. > > In addition to LiveHTTPHeaders I also use TamperData [1], Firebug [2] > and Charles [3] for this sort of work. > > -Mat > > [1] https://addons.mozilla.org/en-US/firefox/addon/966 > [2] https://addons.mozilla.org/en-US/firefox/addon/1843 > [3] http://www.xk72.com/charles/ Thanks. This is helpful. Mike B. ---------------------------------------------------------------- This message was sent using IMP, the Internet Messaging Program. From barjunk at attglobal.net Tue Jul 24 13:27:48 2007 From: barjunk at attglobal.net (barsalou) Date: Tue, 24 Jul 2007 09:27:48 -0800 Subject: [Mechanize-users] What does {bogusetag } mean? In-Reply-To: <20070724121740.GA25448@mac-mini.lan> References: <20070723144502.p0l7wekf408wkwoo@lcgalaska.com> <20070724121740.GA25448@mac-mini.lan> Message-ID: <20070724092748.icurc8dhcw0g80o4@lcgalaska.com> Quoting Aaron Patterson : > On Mon, Jul 23, 2007 at 02:45:02PM -0800, barsalou wrote: >> I got this in one of my pages, and I'm not exactly sure what it means. >> >> ideas? > > This is definitely a message from Hpricot. Can you provide an example > to reproduce it? > I believe this is NOT a problem with mechanize...but that the page I am working with uses lousy html coding...when using dom inspector on the page, there were many Tbody's surrounding tables, surrounding tr/td's, etc. It's like five levels deep or something. I'll see what I can cobble together for an example. Mike B. ---------------------------------------------------------------- This message was sent using IMP, the Internet Messaging Program. From schapht at gmail.com Tue Jul 24 16:52:10 2007 From: schapht at gmail.com (Mat Schaffer) Date: Tue, 24 Jul 2007 16:52:10 -0400 Subject: [Mechanize-users] Jruby + Rhino = Javascript support? In-Reply-To: <20070724122604.GB25448@mac-mini.lan> References: <20070724122604.GB25448@mac-mini.lan> Message-ID: <9EC33F71-1C24-485C-9946-7862279DD307@gmail.com> On Jul 24, 2007, at 8:26 AM, Aaron Patterson wrote: > Yes, definitely. I've been thinking (and working) on a javascript > solution for a while. I haven't played around with Rhino though... > Basically what I've been working on is a way to convert Javascript to > ruby. Then I can just eval the converted JS. > > You can check out what I've done so far here: > > http://rubyforge.org/projects/rkelly/ As cool as this is, I can't get past the fact that you named a ruby project rkelly. I'll have to think about this more once the mental shock settles :-P -Mat From riddochc at gmail.com Wed Jul 25 18:14:48 2007 From: riddochc at gmail.com (Chris Riddoch) Date: Wed, 25 Jul 2007 16:14:48 -0600 Subject: [Mechanize-users] Being a polite client: maintaining history Message-ID: <6efbd9b70707251514h3c06d969l5ce8c335925a25d1@mail.gmail.com> Hi, folks. I'm investigating libraries to use in a rather specialized feed reader. Some of the sites I want to follow don't have RSS feeds (or have hopelessly broken feeds) so I was already planning on using Hpricot anyway -- Mechanize is looking good, here. In my research for my project, recipe 11.16 in O'Reilly's Ruby Cookbook references a website[1] discussing the importance of the If-Modified-Since header in polite RSS readers. It mentions that the Etag header is also important. I see in Mechanize's code that if conditional_requests is set, it'll add the If-Modified-Since header. But this requires that the page is already in the history, and there's currently no provision for caching the history. Since RSS readers (and most scrapers in general) are likely to be run periodically, mechanize should try to maintain this kind of state between runs, don't you think? You might see a patch from me, unless someone beats me to it. [1] http://fishbowl.pastiche.org/2002/10/21/http_conditional_get_for_rss_hackers -- epistemological humility Chris Riddoch From aaron at tenderlovemaking.com Thu Jul 26 20:30:23 2007 From: aaron at tenderlovemaking.com (Aaron Patterson) Date: Thu, 26 Jul 2007 17:30:23 -0700 Subject: [Mechanize-users] [ANN] mechanize 0.6.10 Released Message-ID: <20070727003023.GA4173@mac-mini.lan> mechanize version 0.6.10 has been released! * The Mechanize library is used for automating interaction with websites. Mechanize automatically stores and sends cookies, follows redirects, can follow links, and submit forms. Form fields can be populated and submitted. Mechanize also keeps track of the sites that you have visited as a history. Changes: # Mechanize CHANGELOG ## 0.6.10 * Made digest authentication work with POSTs. * Made sure page was HTML before following meta refreshes. http://rubyforge.org/tracker/index.php?func=detail&aid=12260&group_id=1453&atid=5709 * Made sure that URLS with a host and no path would default to '/' for history purposes. http://rubyforge.org/tracker/index.php?func=detail&aid=12368&group_id=1453&atid=5709 * Avoiding memory leaks with transact. Thanks Tobias Gruetzmacher! http://rubyforge.org/tracker/index.php?func=detail&aid=12057&group_id=1453&atid=5711 * Fixing a problem with # signs in the file name. Thanks Tobias Gruetzmacher! http://rubyforge.org/tracker/index.php?func=detail&aid=12510&group_id=1453&atid=5711 * Made sure that blank form values are submitted. http://rubyforge.org/tracker/index.php?func=detail&aid=12505&group_id=1453&atid=5709 * Mechanize now respects the base tag. Thanks Stephan Dale. http://rubyforge.org/tracker/index.php?func=detail&aid=12468&group_id=1453&atid=5709 * Aliasing inspect to pretty_inspect. Thanks Eric Promislow. http://rubyforge.org/pipermail/mechanize-users/2007-July/000157.html * -- Aaron Patterson http://tenderlovemaking.com/ From carl.lerche at gmail.com Fri Jul 27 02:43:07 2007 From: carl.lerche at gmail.com (Carl Lerche) Date: Thu, 26 Jul 2007 23:43:07 -0700 Subject: [Mechanize-users] Is mechanize thread safe? Message-ID: Hello all, I was just wondering if anybody knew whether mechanize is supposed to be thread-safe or not? I didn't really find any information about it anywhere. I've been getting a strange error in protocol.rb when I run a script that uses mechanize in a multi threaded fashion, but not with a single thread. I'm trying to write a spider that does multiple gets in parallel, but it keeps puking when I thread it. Thanks, -carl -- EPA Rating: 3000 Lines of Code / Gallon (of coffee) From whitethunder922 at yahoo.com Fri Jul 27 10:07:25 2007 From: whitethunder922 at yahoo.com (Matt White) Date: Fri, 27 Jul 2007 07:07:25 -0700 (PDT) Subject: [Mechanize-users] Is mechanize thread safe? Message-ID: <546099.35848.qm@web53305.mail.re2.yahoo.com> Can you give more information on where it dies? I've run mechanize successfully with multiple threads but I did have to work some kinks out, mostly with database access. Matt White ----- Original Message ---- From: Carl Lerche To: mechanize-users at rubyforge.org Sent: Friday, July 27, 2007 12:43:07 AM Subject: [Mechanize-users] Is mechanize thread safe? Hello all, I was just wondering if anybody knew whether mechanize is supposed to be thread-safe or not? I didn't really find any information about it anywhere. I've been getting a strange error in protocol.rb when I run a script that uses mechanize in a multi threaded fashion, but not with a single thread. I'm trying to write a spider that does multiple gets in parallel, but it keeps puking when I thread it. Thanks, -carl -- EPA Rating: 3000 Lines of Code / Gallon (of coffee) _______________________________________________ Mechanize-users mailing list Mechanize-users at rubyforge.org http://rubyforge.org/mailman/listinfo/mechanize-users ____________________________________________________________________________________ Get the free Yahoo! toolbar and rest assured with the added security of spyware protection. http://new.toolbar.yahoo.com/toolbar/features/norton/index.php -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/mechanize-users/attachments/20070727/0ffc627b/attachment.html From carl.lerche at gmail.com Fri Jul 27 12:28:37 2007 From: carl.lerche at gmail.com (Carl Lerche) Date: Fri, 27 Jul 2007 09:28:37 -0700 Subject: [Mechanize-users] Is mechanize thread safe? In-Reply-To: <546099.35848.qm@web53305.mail.re2.yahoo.com> References: <546099.35848.qm@web53305.mail.re2.yahoo.com> Message-ID: Thanks for the response. Here is my bit of code. I'm no expert coder, but I think I got the mutex applied where it is needed. Here are various errors I get. What I notice is that it seems like stuff is getting overwritten left and right because of the threading. It all seems to happen in net/http, but as far as I know, net/http is thread safe (I've done a lot of threading with it before). whowhat:~/Developer/Tools/Parser carllerche$ ./spider /usr/local/lib/ruby/1.8/net/http.rb:2019:in `read_status_line': wrong status line: "gieslist.com/angieslist/Login.aspx\" class=\"link\" title=\"Angie's List Member Login\" onmouseover=\"window.status=this.title;return true;\" onmouseout=\"window.status=defaultStatus;return true;\">" (Net::HTTPBadResponse) from /Users/carllerche/Developer/Tools/Parser/library/crawlers/angies_list.rb:108:in `join' from /Users/carllerche/Developer/Tools/Parser/library/crawlers/angies_list.rb:108:in `crawl_entries' from /Users/carllerche/Developer/Tools/Parser/library/crawlers/angies_list.rb:108:in `each' from /Users/carllerche/Developer/Tools/Parser/library/crawlers/angies_list.rb:108:in `crawl_entries' from /Users/carllerche/Developer/Tools/Parser/library/crawlers/angies_list.rb:34:in `crawl' from ./spider:6 whowhat:~/Developer/Tools/Parser carllerche$ ./spider /usr/local/lib/ruby/1.8/net/protocol.rb:176:in `write0': undefined method `+' for nil:NilClass (NoMethodError) from /Users/carllerche/Developer/Tools/Parser/library/crawlers/angies_list.rb:108:in `join' from /Users/carllerche/Developer/Tools/Parser/library/crawlers/angies_list.rb:108:in `crawl_entries' from /Users/carllerche/Developer/Tools/Parser/library/crawlers/angies_list.rb:108:in `each' from /Users/carllerche/Developer/Tools/Parser/library/crawlers/angies_list.rb:108:in `crawl_entries' from /Users/carllerche/Developer/Tools/Parser/library/crawlers/angies_list.rb:34:in `crawl' from ./spider:6 whowhat:~/Developer/Tools/Parser carllerche$ ./spider /usr/local/lib/ruby/1.8/net/http.rb:2019:in `read_status_line': wrong status line: " _uacct = \"UA-448811-1\"; " (Net::HTTPBadResponse) from /Users/carllerche/Developer/Tools/Parser/library/crawlers/angies_list.rb:108:in `join' from /Users/carllerche/Developer/Tools/Parser/library/crawlers/angies_list.rb:108:in `crawl_entries' from /Users/carllerche/Developer/Tools/Parser/library/crawlers/angies_list.rb:108:in `each' from /Users/carllerche/Developer/Tools/Parser/library/crawlers/angies_list.rb:108:in `crawl_entries' from /Users/carllerche/Developer/Tools/Parser/library/crawlers/angies_list.rb:34:in `crawl' from ./spider:6 On 7/27/07, Matt White wrote: > > Can you give more information on where it dies? I've run mechanize > successfully with multiple threads but I did have to work some kinks out, > mostly with database access. > > Matt White > > > ----- Original Message ---- > From: Carl Lerche > To: mechanize-users at rubyforge.org > Sent: Friday, July 27, 2007 12:43:07 AM > Subject: [Mechanize-users] Is mechanize thread safe? > > > Hello all, > > I was just wondering if anybody knew whether mechanize is supposed to > be thread-safe or not? I didn't really find any information about it > anywhere. I've been getting a strange error in protocol.rb when I run > a script that uses mechanize in a multi threaded fashion, but not with > a single thread. > > I'm trying to write a spider that does multiple gets in parallel, but > it keeps puking when I thread it. > > Thanks, > -carl > > -- > EPA Rating: 3000 Lines of Code / Gallon (of coffee) > _______________________________________________ > Mechanize-users mailing list > Mechanize-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mechanize-users > > > ________________________________ > Get the free Yahoo! toolbar and rest assured with the added security of > spyware protection. > _______________________________________________ > Mechanize-users mailing list > Mechanize-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mechanize-users > -- EPA Rating: 3000 Lines of Code / Gallon (of coffee) From scott.hankins at gmail.com Sat Jul 28 15:26:53 2007 From: scott.hankins at gmail.com (scott hankins) Date: Sat, 28 Jul 2007 15:26:53 -0400 Subject: [Mechanize-users] problem with a form Message-ID: <48596a5a0707281226j649b42f4m168b82c7009aa60d@mail.gmail.com> Hello I am trying to fill in a form using WWW::Mechanize. I can fill in 2 of the 3 fields but one is giving me a problem. The name of this field is "name" when I use the following bit of code, it seems to change the name of the form, not the value of the field. The part of original page after pretty printing {forms # # The following bit of code seems to rename the form: page_form = page.form('wp_pers_form') page_form.name = 'smith' page_form.firstname = 'joe' Here is the first bit of the page form after pretty printing # # #} Any suggestions to fix this? Is this a bug? thanks scott From aaron at tenderlovemaking.com Sat Jul 28 17:56:00 2007 From: aaron at tenderlovemaking.com (Aaron Patterson) Date: Sat, 28 Jul 2007 14:56:00 -0700 Subject: [Mechanize-users] problem with a form In-Reply-To: <48596a5a0707281226j649b42f4m168b82c7009aa60d@mail.gmail.com> References: <48596a5a0707281226j649b42f4m168b82c7009aa60d@mail.gmail.com> Message-ID: <20070728215559.GA13040@mac-mini.lan> Hey scott! On Sat, Jul 28, 2007 at 03:26:53PM -0400, scott hankins wrote: > Hello > > I am trying to fill in a form using WWW::Mechanize. I can fill in 2 of > the 3 fields but one is giving me a problem. The name of this field is > "name" when I use the following bit of code, it seems to change the > name of the form, not the value of the field. > > The part of original page after pretty printing > > {forms > # {name "wp_pers_form"} > {method "GET"} > {action "http://www.whitepages.com/10866/search/FindPerson"} > {fields > # > # > > The following bit of code seems to rename the form: > > page_form = page.form('wp_pers_form') > page_form.name = 'smith' > page_form.firstname = 'joe' > > Here is the first bit of the page form after pretty printing > > # {name "smith"} > {method "GET"} > {action "http://www.whitepages.com/10866/search/FindPerson"} > {fields > # > # > #} > > Any suggestions to fix this? Is this a bug? I don't know if I would call this a bug, but it is a little known "feature". The problem is that forms can have a name, and they can also contain fields named "name". Since I made form input fields act like accessors on the form object, it is possible for one to clobber the other. Which is exactly what you are running into. What I suggest is that you treat the form object like a hash. That will guarantee that you set those fields on the form. For example: page_form = page.form('wp_pers_form') page_form['name'] = 'smith' page_form['firstname'] = 'joe' Hope that helps! -- Aaron Patterson http://tenderlovemaking.com/ From scott.hankins at gmail.com Sat Jul 28 18:10:59 2007 From: scott.hankins at gmail.com (scott hankins) Date: Sat, 28 Jul 2007 18:10:59 -0400 Subject: [Mechanize-users] problem with a form In-Reply-To: <20070728215559.GA13040@mac-mini.lan> References: <48596a5a0707281226j649b42f4m168b82c7009aa60d@mail.gmail.com> <20070728215559.GA13040@mac-mini.lan> Message-ID: <48596a5a0707281510h72c7a63evda4f9362a50e5c39@mail.gmail.com> On 7/28/07, Aaron Patterson wrote: > I don't know if I would call this a bug, but it is a little known > "feature". The problem is that forms can have a name, and they can also > contain fields named "name". Since I made form input fields act like > accessors on the form object, it is possible for one to clobber the > other. Which is exactly what you are running into. > > What I suggest is that you treat the form object like a hash. That will > guarantee that you set those fields on the form. For example: > > page_form = page.form('wp_pers_form') > page_form['name'] = 'smith' > page_form['firstname'] = 'joe' > > Hope that helps! Thanks for your help. That worked perfectly. I am pretty new at Ruby (and programming in general), is there a reason this would be considered a 'feature'? Why would you want to change the name of the form? thanks scott From aaron at tenderlovemaking.com Sat Jul 28 18:14:50 2007 From: aaron at tenderlovemaking.com (Aaron Patterson) Date: Sat, 28 Jul 2007 15:14:50 -0700 Subject: [Mechanize-users] problem with a form In-Reply-To: <48596a5a0707281510h72c7a63evda4f9362a50e5c39@mail.gmail.com> References: <48596a5a0707281226j649b42f4m168b82c7009aa60d@mail.gmail.com> <20070728215559.GA13040@mac-mini.lan> <48596a5a0707281510h72c7a63evda4f9362a50e5c39@mail.gmail.com> Message-ID: <20070728221450.GA13140@mac-mini.lan> On Sat, Jul 28, 2007 at 06:10:59PM -0400, scott hankins wrote: > On 7/28/07, Aaron Patterson wrote: > > > I don't know if I would call this a bug, but it is a little known > > "feature". The problem is that forms can have a name, and they can also > > contain fields named "name". Since I made form input fields act like > > accessors on the form object, it is possible for one to clobber the > > other. Which is exactly what you are running into. > > > > What I suggest is that you treat the form object like a hash. That will > > guarantee that you set those fields on the form. For example: > > > > page_form = page.form('wp_pers_form') > > page_form['name'] = 'smith' > > page_form['firstname'] = 'joe' > > > > Hope that helps! > > Thanks for your help. That worked perfectly. > > I am pretty new at Ruby (and programming in general), is there a > reason this would be considered a 'feature'? Why would you want to > change the name of the form? I can't think of a case where I would want to change the form name. :-) The reason I did that was to make behavior consistent. The setter and getter should access the same variable. In the case of "name", the meaning is ambiguous. When do you want the form name vs the "name" field of a form? -- Aaron Patterson http://tenderlovemaking.com/ From carl.lerche at gmail.com Sun Jul 29 22:35:36 2007 From: carl.lerche at gmail.com (Carl Lerche) Date: Sun, 29 Jul 2007 19:35:36 -0700 Subject: [Mechanize-users] Is mechanize thread safe? In-Reply-To: References: <546099.35848.qm@web53305.mail.re2.yahoo.com> Message-ID: Welp, i looked through the mechanize code. Doesn't look thread safe to me. Good to know for future reference. -carl On 7/27/07, Carl Lerche wrote: > Thanks for the response. > > Here is my bit of code. I'm no expert coder, but I think I got the > mutex applied where it is needed. > > Here are various errors I get. What I notice is that it seems like > stuff is getting overwritten left and right because of the threading. > It all seems to happen in net/http, but as far as I know, net/http is > thread safe (I've done a lot of threading with it before). > > whowhat:~/Developer/Tools/Parser carllerche$ ./spider > /usr/local/lib/ruby/1.8/net/http.rb:2019:in `read_status_line': wrong > status line: "gieslist.com/angieslist/Login.aspx\" class=\"link\" > title=\"Angie's List Member Login\" > onmouseover=\"window.status=this.title;return true;\" > onmouseout=\"window.status=defaultStatus;return true;\">" > (Net::HTTPBadResponse) > from /Users/carllerche/Developer/Tools/Parser/library/crawlers/angies_list.rb:108:in > `join' > from /Users/carllerche/Developer/Tools/Parser/library/crawlers/angies_list.rb:108:in > `crawl_entries' > from /Users/carllerche/Developer/Tools/Parser/library/crawlers/angies_list.rb:108:in > `each' > from /Users/carllerche/Developer/Tools/Parser/library/crawlers/angies_list.rb:108:in > `crawl_entries' > from /Users/carllerche/Developer/Tools/Parser/library/crawlers/angies_list.rb:34:in > `crawl' > from ./spider:6 > > > whowhat:~/Developer/Tools/Parser carllerche$ ./spider > /usr/local/lib/ruby/1.8/net/protocol.rb:176:in `write0': undefined > method `+' for nil:NilClass (NoMethodError) > from /Users/carllerche/Developer/Tools/Parser/library/crawlers/angies_list.rb:108:in > `join' > from /Users/carllerche/Developer/Tools/Parser/library/crawlers/angies_list.rb:108:in > `crawl_entries' > from /Users/carllerche/Developer/Tools/Parser/library/crawlers/angies_list.rb:108:in > `each' > from /Users/carllerche/Developer/Tools/Parser/library/crawlers/angies_list.rb:108:in > `crawl_entries' > from /Users/carllerche/Developer/Tools/Parser/library/crawlers/angies_list.rb:34:in > `crawl' > from ./spider:6 > > > whowhat:~/Developer/Tools/Parser carllerche$ ./spider > /usr/local/lib/ruby/1.8/net/http.rb:2019:in `read_status_line': wrong > status line: " _uacct = \"UA-448811-1\"; " (Net::HTTPBadResponse) > from /Users/carllerche/Developer/Tools/Parser/library/crawlers/angies_list.rb:108:in > `join' > from /Users/carllerche/Developer/Tools/Parser/library/crawlers/angies_list.rb:108:in > `crawl_entries' > from /Users/carllerche/Developer/Tools/Parser/library/crawlers/angies_list.rb:108:in > `each' > from /Users/carllerche/Developer/Tools/Parser/library/crawlers/angies_list.rb:108:in > `crawl_entries' > from /Users/carllerche/Developer/Tools/Parser/library/crawlers/angies_list.rb:34:in > `crawl' > from ./spider:6 > > On 7/27/07, Matt White wrote: > > > > Can you give more information on where it dies? I've run mechanize > > successfully with multiple threads but I did have to work some kinks out, > > mostly with database access. > > > > Matt White > > > > > > ----- Original Message ---- > > From: Carl Lerche > > To: mechanize-users at rubyforge.org > > Sent: Friday, July 27, 2007 12:43:07 AM > > Subject: [Mechanize-users] Is mechanize thread safe? > > > > > > Hello all, > > > > I was just wondering if anybody knew whether mechanize is supposed to > > be thread-safe or not? I didn't really find any information about it > > anywhere. I've been getting a strange error in protocol.rb when I run > > a script that uses mechanize in a multi threaded fashion, but not with > > a single thread. > > > > I'm trying to write a spider that does multiple gets in parallel, but > > it keeps puking when I thread it. > > > > Thanks, > > -carl > > > > -- > > EPA Rating: 3000 Lines of Code / Gallon (of coffee) > > _______________________________________________ > > Mechanize-users mailing list > > Mechanize-users at rubyforge.org > > http://rubyforge.org/mailman/listinfo/mechanize-users > > > > > > ________________________________ > > Get the free Yahoo! toolbar and rest assured with the added security of > > spyware protection. > > _______________________________________________ > > Mechanize-users mailing list > > Mechanize-users at rubyforge.org > > http://rubyforge.org/mailman/listinfo/mechanize-users > > > > > -- > EPA Rating: 3000 Lines of Code / Gallon (of coffee) > -- EPA Rating: 3000 Lines of Code / Gallon (of coffee)