I'm using libxml on OS X with rfeedparser 0.9.951. Here's a test case:
In the attached file, the first <item> has
<media:content fileSize="73692623" type="video/mp4" medium="video"
width="640" height="360"
url="http://s3.amazonaws.com/mdialogueproduction/18852/13049/Total_Health_Show_-_Part_3.mp4?x-fid=13049&x-c
hid=11209&x-cid=24122" duration="73692623">
Now, in rfeedparser:
>> require 'rfeedparser'
Could not load expat; trying libxml.
=> true
>> rss = rfp('media.xhtml')
=> [snipped]
>> rss.items[0].enclosures[0].href
=> "http://s3.amazonaws.com/mdialogueproduction/18852/13049/Total_Health_Show_-_Part_3.mp4?x-fid=13049&x
-chid=11209&x-cid=24122"
Notice that & became &. This makes URI::parse choke so it is quite wrong. The first time this encoding
shows up is in LibXML::StrictFeedParser::StrictFeedParserHandler::on_start_element so it seems that libxml is guilty.
However, perhaps from libxml's perspective it is behaving correctly, I don't know enough about the W3C standards to say. |