From martin at malditainternet.com Tue Aug 5 13:45:15 2008 From: martin at malditainternet.com (Martin Sarsale) Date: Tue, 5 Aug 2008 14:45:15 -0300 Subject: [Mechanize-users] "boundaries" not set when reposting a form with authentication Message-ID: When mechanize posts a form to a resource needing authentication, the first request is done without sending the credentials. When the server answers a 401, the uri.host is added to @auth_hash and the fetch_page method is called again. This way, when the request is being prepared it includes the credentials. The problem is that in this 2nd request, the "boundary" "attribute" in the content-type header is not set, because when the request is recreated (using fetch_request) for passing fetch_page it doesn't includes the 'special' headers. More specifically: return fetch_page( uri, fetch_request(uri, request.method.downcase.to_sym), cur_page, request_data ) Trying to reuse the original request ("request") fails because the @body instance attr is already set; but if it's cleared before it can be reused: request.body = nil return fetch_page( uri, request, cur_page, request_data ) This fixes my problem (it's just a test script), but I'm not sure if it breaks something else. So please take this as a bug report and not a patch :) thanks -- Martin Sarsale - martin at malditainternet.com From aaron.patterson at gmail.com Tue Aug 5 13:56:24 2008 From: aaron.patterson at gmail.com (Aaron Patterson) Date: Tue, 5 Aug 2008 10:56:24 -0700 Subject: [Mechanize-users] "boundaries" not set when reposting a form with authentication In-Reply-To: References: Message-ID: <6959e1680808051056o6bdbfbbcx797caa88859dd8d0@mail.gmail.com> On Tue, Aug 5, 2008 at 10:45 AM, Martin Sarsale wrote: > When mechanize posts a form to a resource needing authentication, the > first request is done without sending the credentials. When the server > answers a 401, the uri.host is added to @auth_hash and the fetch_page > method is called again. This way, when the request is being prepared > it includes the credentials. > > The problem is that in this 2nd request, the "boundary" "attribute" in > the content-type header is not set, because when the request is > recreated (using fetch_request) for passing fetch_page it doesn't > includes the 'special' headers. > > More specifically: > > return fetch_page( uri, > > fetch_request(uri, request.method.downcase.to_sym), > > cur_page, > > request_data > > ) > > > > Trying to reuse the original request ("request") fails because the > @body instance attr is already set; but if it's cleared before it can > be reused: > > request.body = nil > return fetch_page( uri, > request, > cur_page, > request_data > ) > > This fixes my problem (it's just a test script), but I'm not sure if > it breaks something else. So please take this as a bug report and not > a patch :) Thanks Martin, I'll look in to this. -- Aaron Patterson http://tenderlovemaking.com/ From johan.behe.lind at gmail.com Sun Aug 17 03:40:48 2008 From: johan.behe.lind at gmail.com (Johan Lind) Date: Sun, 17 Aug 2008 09:40:48 +0200 Subject: [Mechanize-users] Convert data to utf-8 Message-ID: I cheated the whole system, and just monkey patched mechanize to return everything scraped to UTF-8, but that might be frowned upon: require 'iconv' module UTF8Mechanize @@converter = Iconv.new("UTF-8", "ISO-8859-1") def utf8_value @@converter.iconv(iso88591_value) end end class WWW::Mechanize::File include UTF8Mechanize alias_method :iso88591_value, :body alias_method :body, :utf8_value end /Johan On Thu, Jul 17 2008 at 5:42 AM, Christophe wrote: > Hello, I'm trying to find a solution to convert everything returned by > mechanize to utf-8, no matter if the original page is utf-8 or iso and I > really don't know where to start from... > > agent = WWW::Mechanize.new { |a| a.log = > Logger.new(File::join(RAILS_ROOT, "log/mechanize.log")) } > one_page = agent.get("www.google.fr") > > My first problem is that one_page encoding should be utf-8 (as stated by > firefox page's properties), instead one_page.content_type is "text/html; > charset=ISO-8859-1" and displaying text content gives wrong accent > conversion. > Second problem, when scraping datas from a REAL ISO-8859-1 website, how > should I do to convert them to utf-8 ? > > Mechanize 0.7.6, ruby 1.8.5, CentOS with utf-8 console > > Thanks