Bugs: Browse | Submit New | Admin

[#24421] Iconv Exception when parsing wrong encoded mutlibyte strings in TMail::Unquoter.unquote_and_convert

Date:
2009-03-09 15:11
Priority:
3
Submitted By:
Stefan Haubold (soph)
Assigned To:
Nobody (None)
Category:
None
State:
Open
Summary:
Iconv Exception when parsing wrong encoded mutlibyte strings in TMail::Unquoter.unquote_and_convert

Detailed description
This isn't really a bug, since the passed string is wrong encoded. But since lot of sites seem to send this kind of
encoded subjects it would be worth a fix/workaround.

For example this subject from a facebook notification email: 

Subject: =?UTF-8?Q?Stefan_Haubold_hat_dich_als_FreundIn_auf_Facebook_hinzugef=C3?= 
    =?UTF-8?Q?=BCgt_...?=

The problem is that facebook parser isn't multibyte safe. In utf-8 the german umlaut "ü" in the word
"hinzugefügt" is C3BC. Facebook is encoding their subject as an encoded-word with qouted-printable encoding.
According to the encoded-word RFC an encoded-word shouldn't be longer then 75 chars including 'charset', 'encoding',
'encoded-text', and delimiters. Facebook follows this rule but splits in the middle of the multibyte char.

When TMail tries to parse this subject, and passes the string to iconv, an exception is raised: 

Iconv::InvalidCharacter: "\303"

We fixed this on our side like this: 

begin
  subject = TMail::Unquoter.unquote_and_convert_to(envelope.subject,'utf-8')
rescue Iconv::InvalidCharacter
  subject = TMail::Unquoter.unquote_and_convert_to(envelope.subject.gsub(/\?=\s?=\?utf\-8\?q\?/i, ""),
'utf-8')
end

Just make one large encoded-word out of all parts and call unqoute_and_convert_to again. Maybe this fix needs some more
work for other multi-byte encodings (utf-16 ?). But it works for us so far.

I'll attach an Diff with a test for this problem and a fixture for an email with this encoding. 

Add A Comment: Notepad

Please login


Followup

No Followups Have Been Posted

Attached Files:

Name Description Download
tmail_patch_file2.diff SVN Diff with the Test for this bug Download
raw_email_with_wrong_splitted_multibyte_encoded_word_subject Download

Changes:

Field Old Value Date By
File Added4411: raw_email_with_wrong_splitted_multibyte_encoded_word_subject2009-03-09 15:12soph
File Added4410: tmail_patch_file2.diff2009-03-09 15:11soph