Message 102781 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	leromarinvit
Recipients	barry, ggenellina, ishimoto, jafo, kael, leromarinvit, r.david.murray, tkikuchi, tlynn, tony_nelson
Date	2010-04-10.14:46:05
SpamBayes Score	2.4375946e-10
Marked as misclassified	No
Message-id	<1270910768.58.0.922209662022.issue1079@psf.upfronthosting.co.za>
In-reply-to

Content
I got bitten by this too. In addition to not decoding encoded words without whitespace after them, it throws an exception if there is a valid encoded word later in the string and the first encoded word is followed by something that isn't a hex number: >>> decode_header('aaa=?iso-8859-1?q?bbb?=xxx asdf =?iso-8859-1?q?jkl?=') Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib/python2.5/email/header.py", line 93, in decode_header dec = email.quoprimime.header_decode(encoded) File "/usr/lib/python2.5/email/quoprimime.py", line 336, in header_decode return re.sub(r'=\w{2}', _unquote_match, s) File "/usr/lib/python2.5/re.py", line 150, in sub return _compile(pattern, 0).sub(repl, string, count) File "/usr/lib/python2.5/email/quoprimime.py", line 324, in _unquote_match return unquote(s) File "/usr/lib/python2.5/email/quoprimime.py", line 106, in unquote return chr(int(s[1:3], 16)) ValueError: invalid literal for int() with base 16: 'xx' I think it should join the encoded words with the surrounding text if there's no whitespace in between. That seems to be consistent with what the non-RFC-compliant MUAs out there mean when they send such things. Reverting the change from Issue 1582282 doesn't seem to be a good idea, since it was introduced in response to problems with mailman (see https://bugs.launchpad.net/mailman/+bug/266370). Instead of leaving "Sm=?ISO-8859-1?B?9g==?=rg=?ISO-8859-1?B?5Q==?=sbord" as it is, my patch converts it to [('Sm\xf6rg\xe5sbord', 'iso-8859-1')]. This shouldn't reintroduce the problem mailman was having while fixing ours.

I got bitten by this too. In addition to not decoding encoded words without whitespace after them, it throws an exception if there is a valid encoded word later in the string and the first encoded word is followed by something that isn't a hex number:

>>> decode_header('aaa=?iso-8859-1?q?bbb?=xxx asdf =?iso-8859-1?q?jkl?=')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.5/email/header.py", line 93, in decode_header
    dec = email.quoprimime.header_decode(encoded)
  File "/usr/lib/python2.5/email/quoprimime.py", line 336, in header_decode
    return re.sub(r'=\w{2}', _unquote_match, s)
  File "/usr/lib/python2.5/re.py", line 150, in sub
    return _compile(pattern, 0).sub(repl, string, count)
  File "/usr/lib/python2.5/email/quoprimime.py", line 324, in _unquote_match
    return unquote(s)
  File "/usr/lib/python2.5/email/quoprimime.py", line 106, in unquote
    return chr(int(s[1:3], 16))
ValueError: invalid literal for int() with base 16: 'xx'

I think it should join the encoded words with the surrounding text if there's no whitespace in between. That seems to be consistent with what the non-RFC-compliant MUAs out there mean when they send such things.

Reverting the change from Issue 1582282 doesn't seem to be a good idea, since it was introduced in response to problems with mailman (see https://bugs.launchpad.net/mailman/+bug/266370). Instead of leaving "Sm=?ISO-8859-1?B?9g==?=rg=?ISO-8859-1?B?5Q==?=sbord" as it is, my patch converts it to [('Sm\xf6rg\xe5sbord', 'iso-8859-1')]. This shouldn't reintroduce the problem mailman was having while fixing ours.

History
Date	User	Action	Args
2010-04-10 14:46:09	leromarinvit	set	recipients: + leromarinvit, barry, jafo, ishimoto, tlynn, ggenellina, tkikuchi, tony_nelson, kael, r.david.murray
2010-04-10 14:46:08	leromarinvit	set	messageid: <1270910768.58.0.922209662022.issue1079@psf.upfronthosting.co.za>
2010-04-10 14:46:07	leromarinvit	link	issue1079 messages
2010-04-10 14:46:06	leromarinvit	create