Bugs: Browse | Submit New | Admin

[#23954] HTML Entities in attrs are parsed wrong

Date:
2009-02-16 23:09
Priority:
3
Submitted By:
Nobody
Assigned To:
Nobody (None)
Category:
None
State:
Open
Summary:
HTML Entities in attrs are parsed wrong

Detailed description

I'm using libxml on OS X with rfeedparser 0.9.951. Here's a test case:

In the attached file, the first <item> has
      <media:content fileSize="73692623" type="video/mp4" medium="video"
width="640" height="360"
url="http://s3.amazonaws.com/mdialogueproduction/18852/13049/Total_Health_Show_-_Part_3.mp4?x-fid=13049&amp;x-c
hid=11209&amp;x-cid=24122" duration="73692623">

Now, in rfeedparser:

>> require 'rfeedparser'
Could not load expat; trying libxml.
=> true
>> rss = rfp('media.xhtml')
=> [snipped]
>> rss.items[0].enclosures[0].href
=> "http://s3.amazonaws.com/mdialogueproduction/18852/13049/Total_Health_Show_-_Part_3.mp4?x-fid=13049&#38;x
-chid=11209&#38;x-cid=24122"


Notice that &amp; became &#38;. This makes URI::parse choke so it is quite wrong. The first time this encoding
shows up is in LibXML::StrictFeedParser::StrictFeedParserHandler::on_start_element so it seems that libxml is guilty.
However, perhaps from libxml's perspective it is behaving correctly, I don't know enough about the W3C standards to say.

Add A Comment: Notepad

Please login


Followup

No Followups Have Been Posted

Attached Files:

Name Description Download
No Files Currently Attached

Changes:

No Changes Have Been Made to This Item