Hi all,<br><br>I've been using mechanize for a while and it rocks. Docs are pretty clear and so far I've been able to do it on my own.<br>However, I'm stuck in a weird situation in a script to download my contact list from hotmail.
<br>I've used Firebug to check all urls, and tested it by hand while logged in via browser.<br>Even in the script everything works well until the last 'agent.get_file', which gets stuck with a weird error:<br>
<br>------ snip ------<br>$ ruby msn-scrap.rb <br>#<URI::HTTP:0xfdbc850b8 URL:<a href="http://by124w.bay124.mail.live.com/mail/TodayLight.aspx?&n=1573603203&gs=true">http://by124w.bay124.mail.live.com/mail/TodayLight.aspx?&n=1573603203&gs=true
</a>><br>"<a href="http://by124w.bay124.mail.live.com/mail/GetContacts.aspx">http://by124w.bay124.mail.live.com/mail/GetContacts.aspx</a>"<br>Err: unexpected end of file<br>Trace:<br>/usr/lib/ruby/1.8/mechanize.rb:372:in `read'
<br>/usr/lib/ruby/1.8/mechanize.rb:372:in `fetch_page'<br>/usr/lib/ruby/1.8/net/http.rb:1050:in `request'<br>/usr/lib/ruby/1.8/net/http.rb:2133:in `reading_body'<br>/usr/lib/ruby/1.8/net/http.rb:1049:in `request'
<br>/usr/lib/ruby/1.8/mechanize.rb:345:in `fetch_page'<br>/usr/lib/ruby/1.8/net/http.rb:543:in `start'<br>/usr/lib/ruby/1.8/mechanize.rb:339:in `fetch_page'<br>/usr/lib/ruby/1.8/mechanize.rb:139:in `get'<br>
/usr/lib/ruby/1.8/mechanize.rb:146:in `get_file'<br>msn-scrap.rb:32<br>----- snip ------<br><br>mech.log important part:<br><br>D, [2007-11-12T12:22:35.925521 #24540] DEBUG -- : request-header: referer => <a href="http://by124w.bay124.mail.live.com/mail/TodayLight.aspx?&n=1573603203&gs=true">
http://by124w.bay124.mail.live.com/mail/TodayLight.aspx?&n=1573603203&gs=true</a><br>D, [2007-11-12T12:22:36.589708 #24540] DEBUG -- : response-header: cache-control => private,max-age=86400<br>D, [2007-11-12T12:22:
36.589853 #24540] DEBUG -- : response-header: vary => Accept-Encoding<br>D, [2007-11-12T12:22:36.589934 #24540] DEBUG -- : response-header: connection => keep-alive<br>D, [2007-11-12T12:22:36.590012 #24540] DEBUG -- : response-header: expires => Wed, 01 Jan 1997 12:00:00 GMT, Wed, 01 Jan 1997 12:00:00 GMT
<br>D, [2007-11-12T12:22:36.590089 #24540] DEBUG -- : response-header: p3p => CP="BUS CUR CONo FIN IVDo ONL OUR PHY SAMo TELo"<br>D, [2007-11-12T12:22:36.590166 #24540] DEBUG -- : response-header: date => Mon, 12 Nov 2007 14:28:34 GMT
<br>D, [2007-11-12T12:22:36.590241 #24540] DEBUG -- : response-header: xxn => W4<br>D, [2007-11-12T12:22:36.590344 #24540] DEBUG -- : response-header: content-type => text/csv<br>D, [2007-11-12T12:22:36.590430 #24540] DEBUG -- : response-header: msnserver => H: BAY124-W4 V:
12.0.1190.927 D: 2007-09-27T23:27:08<br>D, [2007-11-12T12:22:36.590509 #24540] DEBUG -- : response-header: content-encoding => gzip<br>D, [2007-11-12T12:22:36.590586 #24540] DEBUG -- : response-header: content-disposition => attachment; filename="
WLMContacts.csv"<br>D, [2007-11-12T12:22:36.590663 #24540] DEBUG -- : response-header: server => Microsoft-IIS/6.0<br>D, [2007-11-12T12:22:36.590738 #24540] DEBUG -- : response-header: content-length => 4285<br>
D, [2007-11-12T12:22:36.591732 #24540] DEBUG -- : gunzip body<br><br>I've tried some ugly hacks, as altering headers and so on (BTW, how do I change request-headers w/o inheriting from www::mechanize ?), but no result.
<br><br>Am I doing something wrong ? Seems to me that the server encodes the file (Firebug shows it too), but mechanize receives a weird error while trying to fetch it. Any ideas ?<br><br>I did another contact scrap for gmail and it worked wonders. There is a post of mine at
<a href="http://zenmachine.wordpress.com">http://zenmachine.wordpress.com</a> where I show how to use firebug and mechanize to find the right URLs.<br><br><br><br>Best regards, and keep up the excellent work.<br><br><br>----
msn-scrap.rb----<br><br>#!/usr/bin/env ruby<br><br># download msn contacts<br><br>require 'rubygems'<br>require 'mechanize'<br>require 'logger'<br><br>begin<br> agent = WWW::Mechanize.new { |a|
a.log = Logger.new("mech.log") }<br> agent.user_agent_alias = "Windows IE 6"<br> <br><br> page = agent.get("<a href="https://login.live.com/login.srf">https://login.live.com/login.srf</a>
")<br> <br> <br><br> form = page.forms.name("f1").first<br> form.login = 'user'<br> form.passwd = 'pass'<br><br> page = agent.submit(form)<br><br> pageContact = agent.get
('<a href="http://g.live.com/1MBAMen-us/sc_mail'">http://g.live.com/1MBAMen-us/sc_mail'</a>)<br> p pageContact.uri<br><br> baseURL=pageContact.uri.host<br> <br> <br> contactURL='http://'+baseURL+'/mail/GetContacts.aspx'
<br> p contactURL<br> <br> page = agent.get_file(contactURL)<br><br> p page<br> <br> if (page.code == '200')<br> puts "saving contacts.csv"<br> page.save_as('contacts_msn.csv')
<br> else<br> puts "error downloading contacts"<br> end<br><br><br> <br>rescue<br> puts "Err: "+$!<br> puts "Trace:"<br> $@.each {|tl| <br> puts tl<br> }<br>
end<br><br clear="all"><br>-- <br>More cowbell, please !