[libxml-devel] [ libxml-Bugs-22956 ] Libxml HTML parser fails on very simple html pages

noreply at rubyforge.org noreply at rubyforge.org
Sun Nov 23 20:40:37 EST 2008


Bugs item #22956, was opened at 2008-11-23 17:54
You can respond by visiting: 
http://rubyforge.org/tracker/?func=detail&atid=1971&aid=22956&group_id=494

Category: None
Group: None
>Status: Closed
>Resolution: Accepted
Priority: 3
Submitted By: Pavel Valodzka (valodzka)
>Assigned to: Charlie Savage (cfis)
Summary: Libxml HTML parser fails on very simple html pages

Initial Comment:
Please, remove check "htmlParseDocument(ctxt) == -1", because it imposible use html parser, it raise exception on every page, for example for google.com:

Error: Tag nobr invalid at :3.
Error: htmlParseEntityRef: expecting ';' at :3.
Error: htmlParseEntityRef: expecting ';' at :3.
Error: htmlParseEntityRef: expecting ';' at :3.
Error: Tag nobr invalid at :3.
Error: htmlParseEntityRef: expecting ';' at :3.
Error: htmlParseEntityRef: expecting ';' at :3.
LibXML::XML::Error: Error: htmlParseEntityRef: expecting ';' at :3.

htmlParseDocument(ctxt) returns -1 very often, it doesn't mean that document can be used.


----------------------------------------------------------------------

>Comment By: Charlie Savage (cfis)
Date: 2008-11-23 18:40

Message:
Yeah, that code has been removed in trunk.

----------------------------------------------------------------------

You can respond by visiting: 
http://rubyforge.org/tracker/?func=detail&atid=1971&aid=22956&group_id=494


More information about the libxml-devel mailing list