Patches: Browse | Submit New | Admin

[#22173] With feed sanitization disabled, UTF-8 gets broken.

Date:
2008-09-26 08:03
Priority:
3
Submitted By:
Dima Sabanin (sdmitry)
Assigned To:
Nobody (None)
Category:
Parsing
State:
Open
Summary:
With feed sanitization disabled, UTF-8 gets broken.

Detailed description
If I set FeedTools.configurations[:sanitization_enabled] = false then all FeedEntries I get have broken UTF symbols
in #content.

The fix is simple (I post the comments of the monkey patch I applied to my project, but the problem is really simple)

This code starts at line: 41 of lib/feed_tools/helpers/html_helper.rb

      if FeedTools.configurations[:sanitization_enabled]
        fragments = HTML5::HTMLParser.parse_fragment(
          html, :tokenizer => HTML5::HTMLSanitizer, :encoding => 'UTF-8')
      else
###########################################################################
# MONKEY PATCH by railsmonk (orig file: lib/feed_tools/helpers/html_helper.rb)
# 
# If FeedTools.configurations[:sanitization_enabled] is set to false, previous version of this
# line didn't use :encoding => 'UTF-8' argument, so that feeds were coming off with broken UTF
# symbols. By default FeedTools.configurations[:sanitization_enabled] was set to true, so this
# bug was hidden. 
#
# Rest of this method is unchanged, only one line below was changed.
#
# I'll file a bug report for this, but I don't have hopes high on it being acted upon.
# So it's faster to just monkey patch it.
#
# This patch applies to version of FeedTools 0.2.29
############################################################################
        fragments = HTML5::HTMLParser.parse_fragment(html, :encoding => 'UTF-8')
############################################################################
# END OF MONKEY PATCH
############################################################################
     end

Add A Comment: Notepad

Please login


Followup

No Followups Have Been Posted

Attached Files:

Name Description Download
No Files Currently Attached

Changes:

No Changes Have Been Made to This Item