<html><head><style type="text/css"><!-- DIV {margin:0px;} --></style></head><body><div style="font-family:times new roman,new york,times,serif;font-size:12pt"><div>Christophe,<br><br>If you're doing this within Rails (which it appears you are), just use string.toutf8. This method is part of the Kconv module which it appears Rails includes by default.<br><br>Output from my script/console:<br><br>>> Kconv.toutf8 "string"<br>=> "string"<br>>> "string".toutf8<br>=> "string"<br>>> toutf8<br><br>HTH,<br>Matt White<br></div><div style="font-family: times new roman,new york,times,serif; font-size: 12pt;"><br><div style="font-family: arial,helvetica,sans-serif; font-size: 10pt;">----- Original Message ----<br>From: Christophe <anaema_ml@yahoo.fr><br>To: mechanize-users@rubyforge.org<br>Sent: Thursday, July 17, 2008 3:42:23 AM<br>Subject: [Mechanize-users] Convert data to utf-8<br><br>
Hello, I'm trying to find a solution to convert everything returned by <br>mechanize to utf-8, no matter if the original page is utf-8 or iso and I <br>really don't know where to start from...<br><br>agent = WWW::Mechanize.new { |a| a.log = <br>Logger.new(File::join(RAILS_ROOT, "log/mechanize.log")) }<br>one_page = agent.get("<a target="_blank" href="http://www.google.fr">www.google.fr</a>")<br><br>My first problem is that one_page encoding should be utf-8 (as stated by <br>firefox page's properties), instead one_page.content_type is "text/html; <br>charset=ISO-8859-1" and displaying text content gives wrong accent <br>conversion.<br>Second problem, when scraping datas from a REAL ISO-8859-1 website, how <br>should I do to convert them to utf-8 ?<br><br>Mechanize 0.7.6, ruby 1.8.5, CentOS with utf-8 console<br><br>Thanks<br><br>_______________________________________________<br>Mechanize-users mailing list<br><a
ymailto="mailto:Mechanize-users@rubyforge.org" href="mailto:Mechanize-users@rubyforge.org">Mechanize-users@rubyforge.org</a><br><a href="http://rubyforge.org/mailman/listinfo/mechanize-users" target="_blank">http://rubyforge.org/mailman/listinfo/mechanize-users</a><br></div></div></div><br>
</body></html>