Message 65630 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	cschnee
Recipients	cschnee
Date	2008-04-19.12:41:31
SpamBayes Score	0.0354802
Marked as misclassified	No
Message-id	<1208608894.4.0.0735061784066.issue2658@psf.upfronthosting.co.za>
In-reply-to

Content
email.Header.decode_header() does not correctly deal with multiline Headerlines. header.py in revision 54371 (1) changes the behaviour, whereas previously multiline headers where parsed correctly, header.py 54371 introduced a new regex part, that renders such headers invalid and they won't be parsed as expected. Given the following header line (doesn't matter if its parsed from a mail or read from a string) which represents IMHO a valid RFC2047 header line: from email.Header import decode_header decode_header('=?windows-1252?Q?=22M=FCller_T=22?=\r\n <T.Mueller@xxx.com>') this will result in: header.py (54371): [('=?windows-1252?Q?=22M=FCller_T=22?=\r\n <T.Mueller@xxx.com>', None)] resp. with header.py (54370): [('"M\xfcller T"', 'windows-1252'), (' <T.Mueller@xxx.com>', None)] Actually both seem parsed wrong, but with 54370 the result looks more sane (the space should be IMO removed). Once the CRLF sequence is removed from the header it works fine and all looks as expected: >>> decode_header('=?windows-1252?Q?=22M=FCller_T=22?= <T.Mueller@xxx.com>') [('"M\xfcller T"', 'windows-1252'), ('<T.Mueller@xxx.com>', None)] This problem might or might not be related to - issue 1372770 - issue 1467619 (1) http://svn.python.org/view?rev=54371&view=rev

email.Header.decode_header() does not correctly deal with multiline
Headerlines.
header.py in revision 54371 (1) changes the behaviour, whereas
previously multiline headers where parsed correctly, header.py 54371
introduced a new regex part, that renders such headers invalid and they
won't be parsed as expected.
Given the following header line (doesn't matter if its parsed from a
mail or read from a string) which represents IMHO a valid RFC2047 header
line:

from email.Header import decode_header
decode_header('=?windows-1252?Q?=22M=FCller_T=22?=\r\n <T.Mueller@xxx.com>')

this will result in:
header.py (54371):
[('=?windows-1252?Q?=22M=FCller_T=22?=\r\n <T.Mueller@xxx.com>', None)]

resp. with header.py (54370):
[('"M\xfcller T"', 'windows-1252'), (' <T.Mueller@xxx.com>', None)]

Actually both seem parsed wrong, but with 54370 the result looks more
sane (the space should be IMO removed). 
Once the CRLF sequence is removed from the header it works fine and all
looks as expected:
>>> decode_header('=?windows-1252?Q?=22M=FCller_T=22?= <T.Mueller@xxx.com>')
[('"M\xfcller T"', 'windows-1252'), ('<T.Mueller@xxx.com>', None)]

This problem might or might not be related to 
- issue 1372770 
- issue 1467619

(1) http://svn.python.org/view?rev=54371&view=rev

History
Date	User	Action	Args
2008-04-19 12:41:34	cschnee	set	spambayes_score: 0.0354802 -> 0.0354802 recipients: + cschnee
2008-04-19 12:41:34	cschnee	set	spambayes_score: 0.0354802 -> 0.0354802 messageid: <1208608894.4.0.0735061784066.issue2658@psf.upfronthosting.co.za>
2008-04-19 12:41:33	cschnee	link	issue2658 messages
2008-04-19 12:41:31	cschnee	create