[Mechanize-users] Nokogiri encoding bug
Alex Young
alex at blackkettle.org
Wed Jun 17 15:43:13 EDT 2009
Alex Young wrote:
> Hi there,
<snip>
> That being said, a quick fix would be to simply ignore the value
> that comes back from the parser. Since we've already got the encoding,
> what more can the parser tell us? I don't understand that bit yet.
Just a quick follow-up. The easiest patch to do this is:
diff --git a/lib/www/mechanize/page.rb b/lib/www/mechanize/page.rb
index 1f7d884..ac6909e 100644
--- a/lib/www/mechanize/page.rb
+++ b/lib/www/mechanize/page.rb
@@ -64,7 +64,7 @@ module WWW
end
def encoding
- parser.respond_to?(:encoding) ? parser.encoding : nil
+ (parser.respond_to?(:encoding) ? parser.encoding : nil) ||
@encoding
end
def parser
That results in all the tests passing, except for the following:
1) Failure:
test_another_mostly_broken_charset(TestPage) [./test/test_page.rb:32]:
<"UTF8"> expected but was
<nil>.
2) Failure:
test_encoding_override_after_parser_was_initialized(TestPage)
[./test/test_page.rb:58]:
<"ISO-8859-2"> expected but was
<"windows-1255">.
3) Failure:
test_encoding_override_before_parser_initialized(TestPage)
[./test/test_page.rb:47]:
<"ISO-8859-2"> expected but was
<"windows-1255">.
4) Failure:
test_page_decoded_with_charset(TestPage) [./test/test_page.rb:99]:
<"EUC-JP"> expected but was
<nil>.
5) Failure:
test_set_encoding(TestPage) [./test/test_page.rb:69]:
<"UTF-8"> expected but was
<nil>.
291 tests, 1502 assertions, 5 failures, 0 errors
There's clearly something screwy going on with libxml2's HTMLParser, but
I don't have much more than that yet.
Hope this helps.
--
Alex
More information about the Mechanize-users
mailing list