From jduhamel at gmail.com Mon Apr 2 21:07:03 2007 From: jduhamel at gmail.com (Joe Duhamel) Date: Tue, 3 Apr 2007 10:07:03 +0900 Subject: [Mechanize-users] Scraping and saving. Message-ID: <352b6a910704021807w3d66408fgf70029edc3761a4c@mail.gmail.com> Hi, I'm working to scrape and save some ebooks. Mechanize has been wonderful so far. The link I'm having trouble with is this one. http://www.webscription.net/SendZip.aspx?SKU=0671578499&ProductID=379&format=H When I click that in the browser it saves it to a file named H_1632.zip. How do I get that name from the page. I suspect to save this to a file I would just do (if lnk is the lnk) lnk.click() but how do I save it and how do I get the name that the browser gets? Thanks, -Joe From aaron at tenderlovemaking.com Tue Apr 3 01:45:53 2007 From: aaron at tenderlovemaking.com (Aaron Patterson) Date: Mon, 2 Apr 2007 22:45:53 -0700 Subject: [Mechanize-users] Scraping and saving. In-Reply-To: <352b6a910704021807w3d66408fgf70029edc3761a4c@mail.gmail.com> References: <352b6a910704021807w3d66408fgf70029edc3761a4c@mail.gmail.com> Message-ID: <20070403054553.GA19277@eviladmins.org> Hi Joe! On Tue, Apr 03, 2007 at 10:07:03AM +0900, Joe Duhamel wrote: > Hi, > > I'm working to scrape and save some ebooks. Mechanize has been > wonderful so far. The link I'm having trouble with is this one. > http://www.webscription.net/SendZip.aspx?SKU=0671578499&ProductID=379&format=H > > When I click that in the browser it saves it to a file named > H_1632.zip. How do I get that name from the page. I suspect to save > this to a file I would just do (if lnk is the lnk) > lnk.click() but how do I save it and how do I get the name that the > browser gets? I have to admit I haven't tried requests like that with mechanize yet. I tried it in Firefox using LiveHTTPHeaders to see how the browser gets the filename, then just reproduced it with mechanize: agent = WWW::Mechanize.new page = agent.get( 'http://www.webscription.net/SendZip.aspx?SKU=0671578499&ProductID=379&format=H') page.save_as( page.header['content-disposition'].split(/=/)[1].gsub(/"/, '') ) I'll change mechanize to default the filename if it sees this header in the future. Then you should be able to do 'pave.save', and it will do the right thing. Unfortunately you'll have to use the listed technique for now.... Hope that helps! -- Aaron Patterson http://tenderlovemaking.com/ From jduhamel at gmail.com Tue Apr 3 02:08:06 2007 From: jduhamel at gmail.com (Joe Duhamel) Date: Tue, 3 Apr 2007 15:08:06 +0900 Subject: [Mechanize-users] Scraping and saving. In-Reply-To: <20070403054553.GA19277@eviladmins.org> References: <352b6a910704021807w3d66408fgf70029edc3761a4c@mail.gmail.com> <20070403054553.GA19277@eviladmins.org> Message-ID: <352b6a910704022308l540f6522h5ec59c9086e62dcb@mail.gmail.com> Alan, Thanks for the quick reply. Other than your web site getting me in trouble with the Banks's Web Proxy Police. This has rocked. First thanks for the headsup on LiveHTTPHeaders and second. yes that was exactly was I was looking to do. -Joe On 4/3/07, Aaron Patterson wrote: > Hi Joe! > > On Tue, Apr 03, 2007 at 10:07:03AM +0900, Joe Duhamel wrote: > > Hi, > > > > I'm working to scrape and save some ebooks. Mechanize has been > > wonderful so far. The link I'm having trouble with is this one. > > http://www.webscription.net/SendZip.aspx?SKU=0671578499&ProductID=379&format=H > > > > When I click that in the browser it saves it to a file named > > H_1632.zip. How do I get that name from the page. I suspect to save > > this to a file I would just do (if lnk is the lnk) > > lnk.click() but how do I save it and how do I get the name that the > > browser gets? > > I have to admit I haven't tried requests like that with mechanize yet. > I tried it in Firefox using LiveHTTPHeaders to see how the browser gets > the filename, then just reproduced it with mechanize: > > agent = WWW::Mechanize.new > page = agent.get( > 'http://www.webscription.net/SendZip.aspx?SKU=0671578499&ProductID=379&format=H') > page.save_as( > page.header['content-disposition'].split(/=/)[1].gsub(/"/, '') > ) > > I'll change mechanize to default the filename if it sees this header in > the future. Then you should be able to do 'pave.save', and it will do > the right thing. Unfortunately you'll have to use the listed technique > for now.... > > Hope that helps! > > -- > Aaron Patterson > http://tenderlovemaking.com/ > _______________________________________________ > Mechanize-users mailing list > Mechanize-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mechanize-users > From aaron at tenderlovemaking.com Sun Apr 15 16:59:25 2007 From: aaron at tenderlovemaking.com (Aaron Patterson) Date: Sun, 15 Apr 2007 13:59:25 -0700 Subject: [Mechanize-users] [ANN] mechanize 0.6.8 Released Message-ID: <20070415205925.GA14338@eviladmins.org> mechanize version 0.6.8 has been released! http://mechanize.rubyforge.org/ The Mechanize library is used for automating interaction with websites. Mechanize automatically stores and sends cookies, follows redirects, can follow links, and submit forms. Form fields can be populated and submitted. Mechanize also keeps track of the sites that you have visited as a history. Changes: = Mechanize CHANGELOG == 0.6.8 * Keep alive can be shut off now with WWW::Mechanize#keep_alive * Conditional requests can be shut off with WWW::Mechanize#conditional_requests * Monkey patched Net::HTTP#keep_alive? * [#9877] Moved last request time. Thanks Max Stepanov * Added WWW::Mechanize::File#save * Defaulting file name to URI or Content-Disposition * Updating compatability with hpricot * Added more unit tests http://mechanize.rubyforge.org/ -- Aaron Patterson http://tenderlovemaking.com/ From peter at rubyrailways.com Thu Apr 19 16:11:00 2007 From: peter at rubyrailways.com (Peter Szinek) Date: Thu, 19 Apr 2007 22:11:00 +0200 Subject: [Mechanize-users] Do you have any idea what could be the problem with this script? Message-ID: <4627CCD4.4080205@rubyrailways.com> Hello all, If I run this script, and observe the output, the results are not there at all (try to do the same in the browser). Any suggestions? require 'rubygems' require 'mechanize' agent = WWW::Mechanize.new agent.user_agent_alias = 'Mac Safari' page = agent.get("http://www.sbstransit.com.sg/iris3/bus_serviceopt.aspx") search_form = page.forms.with.name("Form1").first search_form.txtsvcno = "014" search_form.txtbusstop = '92129' search_results = agent.submit(search_form) open('output.html','w') {|f| f.write search_results.body} Thanks, Peter From mengkuan at gmail.com Thu Apr 19 23:34:52 2007 From: mengkuan at gmail.com (Meng Kuan) Date: Fri, 20 Apr 2007 11:34:52 +0800 Subject: [Mechanize-users] Do you have any idea what could be the problem with this script? In-Reply-To: <4627CCD4.4080205@rubyrailways.com> References: <4627CCD4.4080205@rubyrailways.com> Message-ID: <78CD7BDE-CFDC-46A5-8B80-1593C9BE6CF1@gmail.com> Hi Peter, Replace the line search_results = agent.submit(search_form) with this search_results = search_form.submit(search_form.buttons.first) cheers, mengkuan On 20 Apr 2007, at 4:11 AM, Peter Szinek wrote: > require 'rubygems' > require 'mechanize' > > agent = WWW::Mechanize.new > agent.user_agent_alias = 'Mac Safari' > page = > agent.get("http://www.sbstransit.com.sg/iris3/bus_serviceopt.aspx") > search_form = page.forms.with.name("Form1").first > search_form.txtsvcno = "014" > search_form.txtbusstop = '92129' > search_results = agent.submit(search_form) > open('output.html','w') {|f| f.write search_results.body} From peter at rubyrailways.com Fri Apr 20 04:05:09 2007 From: peter at rubyrailways.com (Peter Szinek) Date: Fri, 20 Apr 2007 10:05:09 +0200 Subject: [Mechanize-users] Running script does not return the correct page Message-ID: <46287435.60604@rubyrailways.com> Hello all, I have tried to post this yesterday, but noticed I was actually not subscribed yet... Well, here we go again: If I run this script, and observe the output, the results are not there at all (try to do the same in the browser). Any suggestions? require 'rubygems' require 'mechanize' agent = WWW::Mechanize.new agent.user_agent_alias = 'Mac Safari' page = agent.get("http://www.sbstransit.com.sg/iris3/bus_serviceopt.aspx") search_form = page.forms.with.name("Form1").first search_form.txtsvcno = "014" search_form.txtbusstop = '92129' search_results = agent.submit(search_form) open('output.html','w') {|f| f.write search_results.body} Thanks, Peter From mengkuan at gmail.com Fri Apr 20 06:36:01 2007 From: mengkuan at gmail.com (Meng Kuan) Date: Fri, 20 Apr 2007 18:36:01 +0800 Subject: [Mechanize-users] Running script does not return the correct page In-Reply-To: <46287435.60604@rubyrailways.com> References: <46287435.60604@rubyrailways.com> Message-ID: <064A3859-5ABC-4732-8E25-CD9F7BD5C755@gmail.com> Hi Peter, Replace the line search_results = agent.submit(search_form) with this search_results = search_form.submit(search_form.buttons.first) cheers, mengkuan On 20 Apr 2007, at 4:05 PM, Peter Szinek wrote: > Hello all, > > I have tried to post this yesterday, but noticed I was actually not > subscribed yet... Well, here we go again: > > If I run this script, and observe the output, the results are not > there > at all (try to do the same in the browser). Any suggestions? > > require 'rubygems' > require 'mechanize' > > agent = WWW::Mechanize.new > agent.user_agent_alias = 'Mac Safari' > page = > agent.get("http://www.sbstransit.com.sg/iris3/bus_serviceopt.aspx") > search_form = page.forms.with.name("Form1").first > search_form.txtsvcno = "014" > search_form.txtbusstop = '92129' > search_results = agent.submit(search_form) > open('output.html','w') {|f| f.write search_results.body} > > Thanks, > Peter > _______________________________________________ > Mechanize-users mailing list > Mechanize-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mechanize-users From peter at rubyrailways.com Fri Apr 20 10:27:37 2007 From: peter at rubyrailways.com (Peter Szinek) Date: Fri, 20 Apr 2007 16:27:37 +0200 Subject: [Mechanize-users] Running script does not return the correct page In-Reply-To: <064A3859-5ABC-4732-8E25-CD9F7BD5C755@gmail.com> References: <46287435.60604@rubyrailways.com> <064A3859-5ABC-4732-8E25-CD9F7BD5C755@gmail.com> Message-ID: <4628CDD9.5070204@rubyrailways.com> Meng Kuan wrote: > Hi Peter, > > Replace the line > > search_results = agent.submit(search_form) > > with this > > search_results = search_form.submit(search_form.buttons.first) Thanks, that did the trick! Cheers, Peter