This isn't really a bug, since the passed string is wrong encoded. But since lot of sites seem to send this kind of
encoded subjects it would be worth a fix/workaround.
For example this subject from a facebook notification email:
Subject: =?UTF-8?Q?Stefan_Haubold_hat_dich_als_FreundIn_auf_Facebook_hinzugef=C3?=
=?UTF-8?Q?=BCgt_...?=
The problem is that facebook parser isn't multibyte safe. In utf-8 the german umlaut "ü" in the word
"hinzugefügt" is C3BC. Facebook is encoding their subject as an encoded-word with qouted-printable encoding.
According to the encoded-word RFC an encoded-word shouldn't be longer then 75 chars including 'charset', 'encoding',
'encoded-text', and delimiters. Facebook follows this rule but splits in the middle of the multibyte char.
When TMail tries to parse this subject, and passes the string to iconv, an exception is raised:
Iconv::InvalidCharacter: "\303"
We fixed this on our side like this:
begin
subject = TMail::Unquoter.unquote_and_convert_to(envelope.subject,'utf-8')
rescue Iconv::InvalidCharacter
subject = TMail::Unquoter.unquote_and_convert_to(envelope.subject.gsub(/\?=\s?=\?utf\-8\?q\?/i, ""),
'utf-8')
end
Just make one large encoded-word out of all parts and call unqoute_and_convert_to again. Maybe this fix needs some more
work for other multi-byte encodings (utf-16 ?). But it works for us so far.
I'll attach an Diff with a test for this problem and a fixture for an email with this encoding. |