Message102781
I got bitten by this too. In addition to not decoding encoded words without whitespace after them, it throws an exception if there is a valid encoded word later in the string and the first encoded word is followed by something that isn't a hex number:
>>> decode_header('aaa=?iso-8859-1?q?bbb?=xxx asdf =?iso-8859-1?q?jkl?=')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.5/email/header.py", line 93, in decode_header
dec = email.quoprimime.header_decode(encoded)
File "/usr/lib/python2.5/email/quoprimime.py", line 336, in header_decode
return re.sub(r'=\w{2}', _unquote_match, s)
File "/usr/lib/python2.5/re.py", line 150, in sub
return _compile(pattern, 0).sub(repl, string, count)
File "/usr/lib/python2.5/email/quoprimime.py", line 324, in _unquote_match
return unquote(s)
File "/usr/lib/python2.5/email/quoprimime.py", line 106, in unquote
return chr(int(s[1:3], 16))
ValueError: invalid literal for int() with base 16: 'xx'
I think it should join the encoded words with the surrounding text if there's no whitespace in between. That seems to be consistent with what the non-RFC-compliant MUAs out there mean when they send such things.
Reverting the change from Issue 1582282 doesn't seem to be a good idea, since it was introduced in response to problems with mailman (see https://bugs.launchpad.net/mailman/+bug/266370). Instead of leaving "Sm=?ISO-8859-1?B?9g==?=rg=?ISO-8859-1?B?5Q==?=sbord" as it is, my patch converts it to [('Sm\xf6rg\xe5sbord', 'iso-8859-1')]. This shouldn't reintroduce the problem mailman was having while fixing ours. |
|
Date |
User |
Action |
Args |
2010-04-10 14:46:09 | leromarinvit | set | recipients:
+ leromarinvit, barry, jafo, ishimoto, tlynn, ggenellina, tkikuchi, tony_nelson, kael, r.david.murray |
2010-04-10 14:46:08 | leromarinvit | set | messageid: <1270910768.58.0.922209662022.issue1079@psf.upfronthosting.co.za> |
2010-04-10 14:46:07 | leromarinvit | link | issue1079 messages |
2010-04-10 14:46:06 | leromarinvit | create | |
|