If I set FeedTools.configurations[:sanitization_enabled] = false then all FeedEntries I get have broken UTF symbols
in #content.
The fix is simple (I post the comments of the monkey patch I applied to my project, but the problem is really simple)
This code starts at line: 41 of lib/feed_tools/helpers/html_helper.rb
if FeedTools.configurations[:sanitization_enabled]
fragments = HTML5::HTMLParser.parse_fragment(
html, :tokenizer => HTML5::HTMLSanitizer, :encoding => 'UTF-8')
else
###########################################################################
# MONKEY PATCH by railsmonk (orig file: lib/feed_tools/helpers/html_helper.rb)
#
# If FeedTools.configurations[:sanitization_enabled] is set to false, previous version of this
# line didn't use :encoding => 'UTF-8' argument, so that feeds were coming off with broken UTF
# symbols. By default FeedTools.configurations[:sanitization_enabled] was set to true, so this
# bug was hidden.
#
# Rest of this method is unchanged, only one line below was changed.
#
# I'll file a bug report for this, but I don't have hopes high on it being acted upon.
# So it's faster to just monkey patch it.
#
# This patch applies to version of FeedTools 0.2.29
############################################################################
fragments = HTML5::HTMLParser.parse_fragment(html, :encoding => 'UTF-8')
############################################################################
# END OF MONKEY PATCH
############################################################################
end |