From barjunk at attglobal.net Wed Mar 14 14:00:54 2007 From: barjunk at attglobal.net (barsalou) Date: Wed, 14 Mar 2007 10:00:54 -0800 Subject: [Mechanize-users] A java initialization routine Message-ID: <20070314100054.i6x1ghym6840cco0@lcgalaska.com> Whe using a particular web page, the default values of the form are set by a java function, then I can modify the web page and submit it. But when I use Mechanize, I am having trouble figuring out how to get those values so I can put them into the form I am going to submit. Below is the function that has the data...and I have changed pertinent info. What other choices might I have to accomplish this? I'm going through multiple devices here, so the data is specific to the device, if that wasn't already obvious. So basically, I want to get the default data, change one item (an IP address), put it in the form, and submit it. Thanks for any direction you might provide. It's kinda long, so I apologize: function InitValue(passForm) { // var snmp_ver = '0'; /*-----AP Password-----*/ passForm.http_username.value = 'username'; // admin/user/0/username passForm.http_passwd.value = 'supersecret'; // admin/user/0/password passForm.confirm_http_passwd.value = 'supersecret'; // admin/user/0/password passForm.webs_use_https.value = '1'; passForm.wireless_access_web.value = '0'; passForm.snmp_mode.value = '1'; passForm.snmp_ver.value = '0'; passForm.SYSContact.value = 'My Company (1-800-555-1212)'; passForm.SYSName.value = 'System-14'; // system/name passForm.SYSLocation.value = 'D217';//system/location passForm.SNMPUserName.value = '';//snmp username passForm.SNMPPassword.value = '';//snmp passphrase passForm.SNMPPassPhrase.value = '';//snmp password passForm.SNMPCommunityOne.value = 'public'; passForm.SNMPCommunityTwo.value = 'public'; passForm.SNMPTrapCommunity.value = 'public'; var f = passForm ; if( '1' == '0' ) f.webs_use_https[1].checked = true; else f.webs_use_https[0].checked = true; if( '0' == '0' ) f.wireless_access_web[1].checked = true; else f.wireless_access_web[0].checked = true; if( '1' == '0' ) f.snmp_mode[1].checked = true; else f.snmp_mode[0].checked = true; if( '0' == '0' ) f.snmp_ver[0].checked = true; else f.snmp_ver[1].checked = true; ChangeSNMP_Mode('1'); ChangeSNMP_Version('0'); SplitTrustHostIPAddress ( '1.2.3.4' );//SNMPTrustedHost SplitSNMPTrapDestIPAddress ( '0.0.0.0' );//SNMPTrapDest ---------------------------------------------------------------- This message was sent using IMP, the Internet Messaging Program. From andy at petdance.com Wed Mar 14 19:43:21 2007 From: andy at petdance.com (Andy Lester) Date: Wed, 14 Mar 2007 18:43:21 -0500 Subject: [Mechanize-users] A java initialization routine In-Reply-To: <20070314100054.i6x1ghym6840cco0@lcgalaska.com> References: <20070314100054.i6x1ghym6840cco0@lcgalaska.com> Message-ID: <97634DA9-5736-4356-A459-F17A53BF961D@petdance.com> On Mar 14, 2007, at 1:00 PM, barsalou wrote: > Whe using a particular web page, the default values of the form are > set > by a java function, then I can modify the web page and submit it. You mean Javascript, not Java. The two are very different. And Mechanize does not support Javascript in any way. -- Andy Lester => andy at petdance.com => www.petdance.com => AIM:petdance From barjunk at attglobal.net Wed Mar 14 20:20:53 2007 From: barjunk at attglobal.net (barsalou) Date: Wed, 14 Mar 2007 16:20:53 -0800 Subject: [Mechanize-users] A java initialization routine In-Reply-To: <97634DA9-5736-4356-A459-F17A53BF961D@petdance.com> References: <20070314100054.i6x1ghym6840cco0@lcgalaska.com> <97634DA9-5736-4356-A459-F17A53BF961D@petdance.com> Message-ID: <20070314162053.qx4hvtww84s8kg0s@lcgalaska.com> Quoting Andy Lester : > > On Mar 14, 2007, at 1:00 PM, barsalou wrote: > >> Whe using a particular web page, the default values of the form are >> set >> by a java function, then I can modify the web page and submit it. > > You mean Javascript, not Java. The two are very different. > > And Mechanize does not support Javascript in any way. > > -- > Andy Lester => andy at petdance.com => www.petdance.com => AIM:petdance Sorry for mis-speaking about java/javascript...thanks for pointing that out. I just thought since when I do a view source, that I can see this code, that I would somehow be able to take advantage of the information in the text....not necessarily use a javascript "trick" to get the data out. Thanks for any ideas you might have. Mike B. ---------------------------------------------------------------- This message was sent using IMP, the Internet Messaging Program. From barjunk at attglobal.net Wed Mar 14 21:49:57 2007 From: barjunk at attglobal.net (barsalou) Date: Wed, 14 Mar 2007 17:49:57 -0800 Subject: [Mechanize-users] Getting info from an Hpricot::Elem Message-ID: <20070314174957.mgg09hd4w0gs0csc@lcgalaska.com> How about getting the name and value from this element object: irb(main):079:0> elements[1] => {emptyelem } irb(main):080:0> elements[1].class => Hpricot::Elem If I can do that, then I'm home free. I've been looking at the Hpricot docs, but I'm not seeing what I need. Should I see the doctor if my vision gets blurred? :) Mike B. ---------------------------------------------------------------- This message was sent using IMP, the Internet Messaging Program. From barjunk at attglobal.net Wed Mar 14 21:52:27 2007 From: barjunk at attglobal.net (barsalou) Date: Wed, 14 Mar 2007 17:52:27 -0800 Subject: [Mechanize-users] SOLVED: Getting info from an Hpricot::Elem Message-ID: <20070314175227.2l2dcnhoo4w08scs@lcgalaska.com> I knew as soon as I hit the enter key, the intelligence would feed my brain. The answer to this is: elements[1]["value"] Easy Peasy....and why not? (there is a pun there if you can see it) Thanks for listening. Mike B. ----- Forwarded message from barjunk at attglobal.net ----- Date: Wed, 14 Mar 2007 17:49:57 -0800 From: barsalou Reply-To: barsalou Subject: Getting info from an Hpricot::Elem To: mechanize-users at rubyforge.org How about getting the name and value from this element object: irb(main):079:0> elements[1] => {emptyelem } irb(main):080:0> elements[1].class => Hpricot::Elem If I can do that, then I'm home free. I've been looking at the Hpricot docs, but I'm not seeing what I need. Should I see the doctor if my vision gets blurred? :) Mike B. ---------------------------------------------------------------- This message was sent using IMP, the Internet Messaging Program. ----- End forwarded message ----- ---------------------------------------------------------------- This message was sent using IMP, the Internet Messaging Program. From wflanagan at gmail.com Sun Mar 18 16:39:21 2007 From: wflanagan at gmail.com (William Flanagan) Date: Sun, 18 Mar 2007 16:39:21 -0400 Subject: [Mechanize-users] Submitting a form sends a file. How do I save it? Message-ID: <38E0369D-2A8B-4969-BD28-9007F4E3CBDB@gmail.com> I've been using Mechanize for a project that i've been working on, but this is the first time i'm having to use forms (scraping previously). So, after i fill out the form, when I hit submit, it sends me information in the form of a text file to download. For the life of me, I can't see how to get access to it. When clicking on a link, you can put a save_as on it and save the data as a file. How do I save off data that's sent to me via the submit of a form? Thanks for any advice, Will From aaron at tenderlovemaking.com Sun Mar 18 19:03:21 2007 From: aaron at tenderlovemaking.com (Aaron Patterson) Date: Sun, 18 Mar 2007 15:03:21 -0800 Subject: [Mechanize-users] Submitting a form sends a file. How do I save it? In-Reply-To: <38E0369D-2A8B-4969-BD28-9007F4E3CBDB@gmail.com> References: <38E0369D-2A8B-4969-BD28-9007F4E3CBDB@gmail.com> Message-ID: <20070318230321.GA3215@eviladmins.org> On Sun, Mar 18, 2007 at 04:39:21PM -0400, William Flanagan wrote: > I've been using Mechanize for a project that i've been working on, > but this is the first time i'm having to use forms (scraping > previously). So, after i fill out the form, when I hit submit, it > sends me information in the form of a text file to download. For the > life of me, I can't see how to get access to it. When clicking on a > link, you can put a save_as on it and save the data as a file. > > How do I save off data that's sent to me via the submit of a form? Are you able to get a page object back after you submit the form? You can just call "save_as" on the object returned from the form submission. So something like this: form.submit.save_as('blah.txt') -- Aaron Patterson http://tenderlovemaking.com/ From bigoudi32 at yahoo.fr Tue Mar 20 13:38:44 2007 From: bigoudi32 at yahoo.fr (bruno) Date: Tue, 20 Mar 2007 18:38:44 +0100 Subject: [Mechanize-users] bad URI problem when submitting a form Message-ID: <200703201838.44815.bigoudi32@yahoo.fr> Hi all, I've a problem when submitting a form : the URL seems to be invalid. I really don't see the way I should try to solve that problem. I searched on the web, but didn't found any kind of that problem. Here is the code : require 'mechanize' agent = WWW::Mechanize.new page = agent.get('http://www.voyages-sncf.com/leisure/fr/launch/home/') # the form isn't on the main page, and it's necessary to follow that # link to get a session on the site (local cookie) link = page.links.text("train") page = agent.click(link) form = page.forms.first # filling the form form.ORIGIN_CITY = "paris" form.DESTINATION_CITY = "lille" # submit page = agent.submit(form, form.buttons.first) Thanks, Bruno Duy? From barjunk at attglobal.net Wed Mar 21 01:16:31 2007 From: barjunk at attglobal.net (barsalou) Date: Tue, 20 Mar 2007 21:16:31 -0800 Subject: [Mechanize-users] bad URI problem when submitting a form In-Reply-To: <200703201838.44815.bigoudi32@yahoo.fr> References: <200703201838.44815.bigoudi32@yahoo.fr> Message-ID: <20070320211631.ebe77b6z48wsws4s@lcgalaska.com> You didn't mention the error or however the problem presented itself. Care to share? :) Mike B. Quoting bruno : > Hi all, > > I've a problem when submitting a form : the URL seems to be invalid. I really > don't see the way I should try to solve that problem. I searched on the web, > but didn't found any kind of that problem. > > Here is the code : > > > > require 'mechanize' > > agent = WWW::Mechanize.new > page = agent.get('http://www.voyages-sncf.com/leisure/fr/launch/home/') > > # the form isn't on the main page, and it's necessary to follow that > # link to get a session on the site (local cookie) > link = page.links.text("train") > page = agent.click(link) > > form = page.forms.first > # filling the form > form.ORIGIN_CITY = "paris" > form.DESTINATION_CITY = "lille" > > # submit > page = agent.submit(form, form.buttons.first) > > > > > Thanks, > Bruno Duy? > _______________________________________________ > Mechanize-users mailing list > Mechanize-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mechanize-users > ---------------------------------------------------------------- This message was sent using IMP, the Internet Messaging Program. From bigoudi32 at yahoo.fr Wed Mar 21 05:33:08 2007 From: bigoudi32 at yahoo.fr (bruno) Date: Wed, 21 Mar 2007 10:33:08 +0100 Subject: [Mechanize-users] bad URI problem when submitting a form In-Reply-To: <20070320211631.ebe77b6z48wsws4s@lcgalaska.com> References: <200703201838.44815.bigoudi32@yahoo.fr> <20070320211631.ebe77b6z48wsws4s@lcgalaska.com> Message-ID: <200703211033.08707.bigoudi32@yahoo.fr> Yes indeed ... Here is the error message : $ ruby sncf.rb /usr/lib/ruby/1.8/uri/common.rb:436:in `split': bad URI(is not URI?): http://www.voyages-sncf.com/dynamic/_SvTermCommVoySaisie?_TMS=1174469615530&_DLG=SvTermCommVoySaisie&_LANG=FR&_AGENCY=VSC&_AGENCY=VSC (URI::InvalidURIError) from /usr/lib/ruby/1.8/uri/common.rb:485:in `parse' from /usr/lib/ruby/1.8/mechanize.rb:238:in `to_absolute_uri' from /usr/lib/ruby/1.8/mechanize.rb:198:in `submit' from sncf.rb:16 Thanks I'm sorry for that; but it's the fisrt program I plan to do in Ruby. I like Runy so much and really wanted to do that GPL software for mounths that I can't let it down now. Bruno Le mercredi 21 mars 2007 06:16, barsalou a ?crit?: > You didn't mention the error or however the problem presented itself. > Care to share? :) > > Mike B. > > Quoting bruno : > > Hi all, > > > > I've a problem when submitting a form : the URL seems to be invalid. I > > really don't see the way I should try to solve that problem. I searched > > on the web, but didn't found any kind of that problem. > > > > Here is the code : > > > > > > > > require 'mechanize' > > > > agent = WWW::Mechanize.new > > page = agent.get('http://www.voyages-sncf.com/leisure/fr/launch/home/') > > > > # the form isn't on the main page, and it's necessary to follow that > > # link to get a session on the site (local cookie) > > link = page.links.text("train") > > page = agent.click(link) > > > > form = page.forms.first > > # filling the form > > form.ORIGIN_CITY = "paris" > > form.DESTINATION_CITY = "lille" > > > > # submit > > page = agent.submit(form, form.buttons.first) > > > > > > > > > > Thanks, > > Bruno Duy? > > _______________________________________________ > > Mechanize-users mailing list > > Mechanize-users at rubyforge.org > > http://rubyforge.org/mailman/listinfo/mechanize-users > > ---------------------------------------------------------------- > This message was sent using IMP, the Internet Messaging Program. > > _______________________________________________ > Mechanize-users mailing list > Mechanize-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mechanize-users From aaron.patterson at gmail.com Wed Mar 21 11:47:46 2007 From: aaron.patterson at gmail.com (Aaron Patterson) Date: Wed, 21 Mar 2007 08:47:46 -0700 Subject: [Mechanize-users] bad URI problem when submitting a form In-Reply-To: <200703211033.08707.bigoudi32@yahoo.fr> References: <200703201838.44815.bigoudi32@yahoo.fr> <20070320211631.ebe77b6z48wsws4s@lcgalaska.com> <200703211033.08707.bigoudi32@yahoo.fr> Message-ID: <6959e1680703210847md047ab4yd00ec788b25984b1@mail.gmail.com> On 3/21/07, bruno wrote: > Yes indeed ... > > Here is the error message : > > $ ruby sncf.rb > /usr/lib/ruby/1.8/uri/common.rb:436:in `split': bad URI(is not URI?): > http://www.voyages-sncf.com/dynamic/_SvTermCommVoySaisie?_TMS=1174469615530&_DLG=SvTermCommVoySaisie&_LANG=FR&_AGENCY=VSC&_AGENCY=VSC > (URI::InvalidURIError) > from /usr/lib/ruby/1.8/uri/common.rb:485:in `parse' > from /usr/lib/ruby/1.8/mechanize.rb:238:in `to_absolute_uri' > from /usr/lib/ruby/1.8/mechanize.rb:198:in `submit' > from sncf.rb:16 > Ah. Looks as if you have html entities that are not being decoded before the form is submitted. This is definitely a bug. I will fix this for the next release, but in the mean time you can decode the form action before the form posts. Just use the htmlentities gem: http://rubyforge.org/projects/htmlentities/ Sorry about this, I will get it fixed quickily! -- Aaron Patterson http://tenderlovemaking.com/ From bigoudi32 at yahoo.fr Thu Mar 22 06:32:49 2007 From: bigoudi32 at yahoo.fr (bruno) Date: Thu, 22 Mar 2007 11:32:49 +0100 Subject: [Mechanize-users] bad URI problem when submitting a form In-Reply-To: <6959e1680703210847md047ab4yd00ec788b25984b1@mail.gmail.com> References: <200703201838.44815.bigoudi32@yahoo.fr> <200703211033.08707.bigoudi32@yahoo.fr> <6959e1680703210847md047ab4yd00ec788b25984b1@mail.gmail.com> Message-ID: <200703221132.49964.bigoudi32@yahoo.fr> Ok, thanks for reply. I'll try to decode the form action using htmlentities. When do you think the next release will be able ? Thanks very much. Bruno Le mercredi 21 mars 2007 16:47, Aaron Patterson a ?crit?: > On 3/21/07, bruno wrote: > > Yes indeed ... > > > > Here is the error message : > > > > $ ruby sncf.rb > > /usr/lib/ruby/1.8/uri/common.rb:436:in `split': bad URI(is not URI?): > > http://www.voyages-sncf.com/dynamic/_SvTermCommVoySaisie?_TMS=11744696155 > >30&_DLG=SvTermCommVoySaisie&_LANG=FR&_AGENCY=VSC&_AGEN > >CY=VSC (URI::InvalidURIError) > > from /usr/lib/ruby/1.8/uri/common.rb:485:in `parse' > > from /usr/lib/ruby/1.8/mechanize.rb:238:in `to_absolute_uri' > > from /usr/lib/ruby/1.8/mechanize.rb:198:in `submit' > > from sncf.rb:16 > > Ah. Looks as if you have html entities that are not being decoded > before the form is submitted. This is definitely a bug. I will fix > this for the next release, but in the mean time you can decode the > form action before the form posts. Just use the htmlentities gem: > > http://rubyforge.org/projects/htmlentities/ > > Sorry about this, I will get it fixed quickily! From aaron.patterson at gmail.com Thu Mar 22 11:13:41 2007 From: aaron.patterson at gmail.com (Aaron Patterson) Date: Thu, 22 Mar 2007 08:13:41 -0700 Subject: [Mechanize-users] bad URI problem when submitting a form In-Reply-To: <200703221132.49964.bigoudi32@yahoo.fr> References: <200703201838.44815.bigoudi32@yahoo.fr> <200703211033.08707.bigoudi32@yahoo.fr> <6959e1680703210847md047ab4yd00ec788b25984b1@mail.gmail.com> <200703221132.49964.bigoudi32@yahoo.fr> Message-ID: <6959e1680703220813x728772c4rca433dda8f37401f@mail.gmail.com> On 3/22/07, bruno wrote: > > Ok, thanks for reply. I'll try to decode the form action using > htmlentities. > When do you think the next release will be able ? I have one more bug to fix, and I'll do a release. That should be sometime this weekend. -- Aaron Patterson http://tenderlovemaking.com/ -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/mechanize-users/attachments/20070322/3772ec63/attachment.html From bigoudi32 at yahoo.fr Thu Mar 22 11:26:15 2007 From: bigoudi32 at yahoo.fr (bruno) Date: Thu, 22 Mar 2007 16:26:15 +0100 Subject: [Mechanize-users] bad URI problem when submitting a form In-Reply-To: <6959e1680703220813x728772c4rca433dda8f37401f@mail.gmail.com> References: <200703201838.44815.bigoudi32@yahoo.fr> <200703221132.49964.bigoudi32@yahoo.fr> <6959e1680703220813x728772c4rca433dda8f37401f@mail.gmail.com> Message-ID: <200703221626.16153.bigoudi32@yahoo.fr> Oh, you're quick ! I thought that the release will be in a mounth. I'm lucky. So I'll wait for that and enjoy my week end with my girl friend ;) Good luck for the release ! Bruno Duy? Le jeudi 22 mars 2007 16:13, Aaron Patterson a ?crit?: > On 3/22/07, bruno wrote: > > Ok, thanks for reply. I'll try to decode the form action using > > htmlentities. > > When do you think the next release will be able ? > > I have one more bug to fix, and I'll do a release. That should be sometime > this weekend. From aaron at tenderlovemaking.com Sat Mar 24 22:25:38 2007 From: aaron at tenderlovemaking.com (Aaron Patterson) Date: Sat, 24 Mar 2007 18:25:38 -0800 Subject: [Mechanize-users] [ANN] mechanize 0.6.6 Released Message-ID: <20070325022538.GA30596@eviladmins.org> mechanize version 0.6.6 has been released! http://mechanize.rubyforge.org/ The Mechanize library is used for automating interaction with websites. Mechanize automatically stores and sends cookies, follows redirects, can follow links, and submit forms. Form fields can be populated and submitted. Mechanize also keeps track of the sites that you have visited as a history. Changes: = Mechanize CHANGELOG == 0.6.6 * Removing hpricot overrides * Fixed a bug where alt text can be nil. Thanks Yannick! * Unparseable expiration dates in cookies are now treated as session cookies * Caching connections * Requests now default to keep alive * [#9434] Fixed bug where html entities weren't decoded * [#9150] Updated mechanize history to deal with redirects http://mechanize.rubyforge.org/ -- Aaron Patterson http://tenderlovemaking.com/ From barjunk at attglobal.net Sun Mar 25 15:17:12 2007 From: barjunk at attglobal.net (barsalou) Date: Sun, 25 Mar 2007 11:17:12 -0800 Subject: [Mechanize-users] [ANN] mechanize 0.6.6 Released In-Reply-To: <20070325022538.GA30596@eviladmins.org> References: <20070325022538.GA30596@eviladmins.org> Message-ID: <20070325111712.7iocg73b404cckwg@lcgalaska.com> To all those that have contributed to this software so far...thanks for writing good and useful code. Mike B. Quoting Aaron Patterson : > mechanize version 0.6.6 has been released! > > http://mechanize.rubyforge.org/ > > The Mechanize library is used for automating interaction with websites. > Mechanize automatically stores and sends cookies, follows redirects, > can follow links, and submit forms. Form fields can be populated and > submitted. Mechanize also keeps track of the sites that you have visited as > a history. > > Changes: > > = Mechanize CHANGELOG > > == 0.6.6 > > * Removing hpricot overrides > * Fixed a bug where alt text can be nil. Thanks Yannick! > * Unparseable expiration dates in cookies are now treated as session cookies > * Caching connections > * Requests now default to keep alive > * [#9434] Fixed bug where html entities weren't decoded > * [#9150] Updated mechanize history to deal with redirects > > http://mechanize.rubyforge.org/ > > -- > Aaron Patterson > http://tenderlovemaking.com/ > _______________________________________________ > Mechanize-users mailing list > Mechanize-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mechanize-users > ---------------------------------------------------------------- This message was sent using IMP, the Internet Messaging Program. From bigoudi32 at yahoo.fr Mon Mar 26 09:41:42 2007 From: bigoudi32 at yahoo.fr (bruno) Date: Mon, 26 Mar 2007 14:41:42 +0100 Subject: [Mechanize-users] [ANN] mechanize 0.6.6 Released In-Reply-To: <20070325022538.GA30596@eviladmins.org> References: <20070325022538.GA30596@eviladmins.org> Message-ID: <200703261541.42597.bigoudi32@yahoo.fr> Thanks ! I confirm : bug #9434 is solved. I can continue my developpement. Bests regards, Bruno Duy? Le dimanche 25 mars 2007 04:25, Aaron Patterson a ?crit?: > mechanize version 0.6.6 has been released! > > http://mechanize.rubyforge.org/ > > The Mechanize library is used for automating interaction with websites. > Mechanize automatically stores and sends cookies, follows redirects, > can follow links, and submit forms. Form fields can be populated and > submitted. Mechanize also keeps track of the sites that you have visited > as a history. > > Changes: > > = Mechanize CHANGELOG > > == 0.6.6 > > * Removing hpricot overrides > * Fixed a bug where alt text can be nil. Thanks Yannick! > * Unparseable expiration dates in cookies are now treated as session > cookies * Caching connections > * Requests now default to keep alive > * [#9434] Fixed bug where html entities weren't decoded > * [#9150] Updated mechanize history to deal with redirects > > http://mechanize.rubyforge.org/ From aaron at tenderlovemaking.com Mon Mar 26 12:10:20 2007 From: aaron at tenderlovemaking.com (Aaron Patterson) Date: Mon, 26 Mar 2007 08:10:20 -0800 Subject: [Mechanize-users] [ANN] mechanize 0.6.6 Released In-Reply-To: <200703261541.42597.bigoudi32@yahoo.fr> References: <20070325022538.GA30596@eviladmins.org> <200703261541.42597.bigoudi32@yahoo.fr> Message-ID: <20070326161020.GA18661@eviladmins.org> On Mon, Mar 26, 2007 at 02:41:42PM +0100, bruno wrote: > Thanks ! > > I confirm : bug #9434 is solved. I can continue my developpement. Great! I'm glad its working! -- Aaron Patterson http://tenderlovemaking.com/ From zach at zachbaker.com Tue Mar 27 16:24:50 2007 From: zach at zachbaker.com (Zach Baker) Date: Tue, 27 Mar 2007 13:24:50 -0700 Subject: [Mechanize-users] Bug: Mechanize 0.6.6 has problems handling 302s In-Reply-To: <20070325022538.GA30596@eviladmins.org> References: <20070325022538.GA30596@eviladmins.org> Message-ID: <46097D92.9020502@zachbaker.com> I'm trying to get a link that results in a 302 redirect, and although Mechanize 0.6.5 handled it fine, I get an EOFError deep inside Net::Protocol when I try it with 0.6.6. Here's an example with some URL I found on Google. This script: require 'rubygems' require 'mechanize' a = WWW::Mechanize.new a.get('http://www.brunway.com/adcentrix/clickthru.cfm?id=25') Results in this error: EOFError: end of file reached from /usr/lib/ruby/1.8/net/protocol.rb:133:in `sysread' from /usr/lib/ruby/1.8/net/protocol.rb:133:in `rbuf_fill' from /usr/lib/ruby/1.8/timeout.rb:56:in `timeout' from /usr/lib/ruby/1.8/timeout.rb:76:in `timeout' from /usr/lib/ruby/1.8/net/protocol.rb:132:in `rbuf_fill' from /usr/lib/ruby/1.8/net/protocol.rb:116:in `readuntil' from /usr/lib/ruby/1.8/net/protocol.rb:126:in `readline' from /usr/lib/ruby/1.8/net/http.rb:2017:in `read_status_line' from /usr/lib/ruby/1.8/net/http.rb:2006:in `read_new' from /usr/lib/ruby/1.8/net/http.rb:1047:in `request' from /usr/lib/ruby/gems/1.8/gems/mechanize-0.6.6/lib/mechanize.rb:408:in `fetch_page' from /usr/lib/ruby/gems/1.8/gems/mechanize-0.6.6/lib/mechanize.rb:477:in `fetch_page' from /usr/lib/ruby/1.8/net/http.rb:1050:in `request' from /usr/lib/ruby/1.8/net/http.rb:2133:in `reading_body' from /usr/lib/ruby/1.8/net/http.rb:1049:in `request' from /usr/lib/ruby/1.8/net/http.rb:1034:in `request' from /usr/lib/ruby/1.8/net/http.rb:543:in `start' from /usr/lib/ruby/1.8/net/http.rb:1032:in `request' from /usr/lib/ruby/gems/1.8/gems/mechanize-0.6.6/lib/mechanize.rb:408:in `fetch_page' from /usr/lib/ruby/gems/1.8/gems/mechanize-0.6.6/lib/mechanize.rb:166:in `get' Unfortunately, my Ruby Net module understanding is not strong enough to figure out what's going on here. Anyone know what's up? -- Zach. From aaron at tenderlovemaking.com Tue Mar 27 18:24:37 2007 From: aaron at tenderlovemaking.com (Aaron Patterson) Date: Tue, 27 Mar 2007 14:24:37 -0800 Subject: [Mechanize-users] Bug: Mechanize 0.6.6 has problems handling 302s In-Reply-To: <46097D92.9020502@zachbaker.com> References: <20070325022538.GA30596@eviladmins.org> <46097D92.9020502@zachbaker.com> Message-ID: <20070327222437.GA30966@eviladmins.org> Hi Zach! On Tue, Mar 27, 2007 at 01:24:50PM -0700, Zach Baker wrote: > I'm trying to get a link that results in a 302 redirect, and although > Mechanize 0.6.5 handled it fine, I get an EOFError deep inside Net::Protocol > when I try it with 0.6.6. Here's an example with some URL I found on Google. > > This script: > > require 'rubygems' > require 'mechanize' > a = WWW::Mechanize.new > a.get('http://www.brunway.com/adcentrix/clickthru.cfm?id=25') Looks like the problem isn't with 302's. It looks like the server doesn't like keep alive requests, and mechanize is not handling it properly. It tries to read from the socket even though there is nothing left to read. I've added a fix in SVN, and I'll release this tonight. -- Aaron Patterson http://tenderlovemaking.com/ From aaron at tenderlovemaking.com Tue Mar 27 20:23:07 2007 From: aaron at tenderlovemaking.com (Aaron Patterson) Date: Tue, 27 Mar 2007 16:23:07 -0800 Subject: [Mechanize-users] [ANN] mechanize 0.6.7 Released Message-ID: <20070328002307.GA4157@eviladmins.org> mechanize version 0.6.7 has been released! http://mechanize.rubyforge.org/ The Mechanize library is used for automating interaction with websites. Mechanize automatically stores and sends cookies, follows redirects, can follow links, and submit forms. Form fields can be populated and submitted. Mechanize also keeps track of the sites that you have visited as a history. Changes: = Mechanize CHANGELOG == 0.6.7 * Fixed a bug with keep-alive requests * [#9549] fixed problem with cookie paths http://mechanize.rubyforge.org/ -- Aaron Patterson http://tenderlovemaking.com/ From mengkuan at gmail.com Thu Mar 29 22:30:16 2007 From: mengkuan at gmail.com (Meng Kuan) Date: Fri, 30 Mar 2007 10:30:16 +0800 Subject: [Mechanize-users] keep-alive Message-ID: Greetings, I'm reporting on what I found after trying to use mechanize on a site like www.tellme.com. With mechanize versions 0.6.5, 0.6.4, 0.6.3, I was able to use mechanize without any problems on www.tellme.com. However, when I upgraded to 0.6.6 or 0.6.7, mechanize simply ground to a halt after a while. I'm not sure where the problem lies, but after looking at the changelogs for 0.6.6 and 0.6.7, it seems that keep-alives might be the culprit here. In any case, I'm back to using 0.6.5 and the app is working fine. Hopefully this helps somebody who is using mechanize. cheers, mengkuan From aaron at tenderlovemaking.com Fri Mar 30 00:58:44 2007 From: aaron at tenderlovemaking.com (Aaron Patterson) Date: Thu, 29 Mar 2007 20:58:44 -0800 Subject: [Mechanize-users] keep-alive In-Reply-To: References: Message-ID: <20070330045844.GA8862@eviladmins.org> On Fri, Mar 30, 2007 at 10:30:16AM +0800, Meng Kuan wrote: > Greetings, > > I'm reporting on what I found after trying to use mechanize on a site > like www.tellme.com. > > With mechanize versions 0.6.5, 0.6.4, 0.6.3, I was able to use > mechanize without any problems on www.tellme.com. > > However, when I upgraded to 0.6.6 or 0.6.7, mechanize simply ground > to a halt after a while. I'm not sure where the problem lies, but > after looking at the changelogs for 0.6.6 and 0.6.7, it seems that > keep-alives might be the culprit here. > > In any case, I'm back to using 0.6.5 and the app is working fine. Interesting. Can you tell if it was the script, or the server that stopped working? Were there any errors? What was the memory usage like? Also, is it possible for you to write a short sample reproducing the error? Thanks! -- Aaron Patterson http://tenderlovemaking.com/ From mengkuan at gmail.com Fri Mar 30 02:06:38 2007 From: mengkuan at gmail.com (Meng Kuan) Date: Fri, 30 Mar 2007 14:06:38 +0800 Subject: [Mechanize-users] keep-alive In-Reply-To: <20070330045844.GA8862@eviladmins.org> References: <20070330045844.GA8862@eviladmins.org> Message-ID: <073834D4-7E4A-411F-A9DB-3F6EA4E7154E@gmail.com> Hi Aaron, On 30 Mar 2007, at 12:58 PM, Aaron Patterson wrote: > > Interesting. Can you tell if it was the script, or the server that > stopped working? Were there any errors? What was the memory usage > like? > No errors. I was actually running the mechanize process off of backgroundrb. Memory utilization was reasonable. Running "top" shows the process using up around 1.8% of memory (works out to about 22 MB on my system). However, it does not exit properly so backgroundrb thinks it is still running and the worker process hangs around indefinitely so I had to kill it manually. > Also, is it possible for you to write a short sample reproducing the > error? Attached spider.rb file contains the test Spider class I use for this testing. Use it like this: s = Spider.new('http://www.tellme.com') s.process The mech_test.log file shows where the process abruptly exited for version 0.6.7. I tried the same script on another system with 0.6.4 and it ran to completion. I'm wondering if the state of the tellme.com server is to blame for this. cheers, mengkuan -------------- next part -------------- A non-text attachment was scrubbed... Name: spider.rb Type: text/x-ruby-script Size: 2663 bytes Desc: not available Url : http://rubyforge.org/pipermail/mechanize-users/attachments/20070330/1a57b167/attachment-0001.bin -------------- next part -------------- -------------- next part -------------- A non-text attachment was scrubbed... Name: mech_test.log Type: application/octet-stream Size: 85166 bytes Desc: not available Url : http://rubyforge.org/pipermail/mechanize-users/attachments/20070330/1a57b167/attachment-0001.obj -------------- next part --------------