[libxml-devel] [ libxml-Bugs-22956 ] Libxml HTML parser fails on very simple html pages

noreply at rubyforge.org noreply at rubyforge.org
Sun Nov 23 19:54:40 EST 2008


Bugs item #22956, was opened at 2008-11-24 02:54
You can respond by visiting: 
http://rubyforge.org/tracker/?func=detail&atid=1971&aid=22956&group_id=494

Category: None
Group: None
Status: Open
Resolution: None
Priority: 3
Submitted By: Pavel Valodzka (valodzka)
Assigned to: Nobody (None)
Summary: Libxml HTML parser fails on very simple html pages

Initial Comment:
Please, remove check "htmlParseDocument(ctxt) == -1", because it imposible use html parser, it raise exception on every page, for example for google.com:

Error: Tag nobr invalid at :3.
Error: htmlParseEntityRef: expecting ';' at :3.
Error: htmlParseEntityRef: expecting ';' at :3.
Error: htmlParseEntityRef: expecting ';' at :3.
Error: Tag nobr invalid at :3.
Error: htmlParseEntityRef: expecting ';' at :3.
Error: htmlParseEntityRef: expecting ';' at :3.
LibXML::XML::Error: Error: htmlParseEntityRef: expecting ';' at :3.

htmlParseDocument(ctxt) returns -1 very often, it doesn't mean that document can be used.


----------------------------------------------------------------------

You can respond by visiting: 
http://rubyforge.org/tracker/?func=detail&atid=1971&aid=22956&group_id=494


More information about the libxml-devel mailing list