Issue 37532: email.header.make_header() doesn't work if any `ascii` code is out of range(128)

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/81713

classification

Title:	email.header.make_header() doesn't work if any `ascii` code is out of range(128)
Type:	behavior	Stage:	resolved
Components:	email	Versions:	Python 3.7

process

Status:	closed	Resolution:	not a bug
Dependencies:		Superseder:
Assigned To:		Nosy List:	aldwinaldwin, barry, maxking, r.david.murray, yunlee
Priority:	normal	Keywords:	patch

Created on 2019-07-09 21:20 by yunlee, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Pull Requests
URL	Status	Linked	Edit
PR 14696	closed	aldwinaldwin, 2019-07-11 02:46

Messages (6)
msg347577 - (view)	Author: Yun Li (yunlee)	Date: 2019-07-09 21:20
email.header.make_header() doesn't work if any `ascii` code is out of range(128) For example >>> header = "Your booking at Voyager Int'l Hostel,=?UTF-8?B?IFBhbmFtw6EgQ2l0eQ==?=, Panamá- Casco Antiguo" >>> decode_header(header) [(b"Your booking at Voyager Int'l Hostel,", None), (b' Panam\xc3\xa1 City', 'utf-8'), (b', Panam\xe1- Casco Antiguo', None)] >>> make_header(decode_header(header)) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/local/Cellar/python/3.7.2_2/Frameworks/Python.framework/Versions/3.7/lib/python3.7/email/header.py", line 174, in make_header h.append(s, charset) File "/usr/local/Cellar/python/3.7.2_2/Frameworks/Python.framework/Versions/3.7/lib/python3.7/email/header.py", line 295, in append s = s.decode(input_charset, errors) UnicodeDecodeError: 'ascii' codec can't decode byte 0xe1 in position 9: ordinal not in range(128)
msg347598 - (view)	Author: Aldwin Pollefeyt (aldwinaldwin) *	Date: 2019-07-10 04:36
Maybe a solution, if no charset defined, then encode it as utf-8 in decode_header, because it's Python3's default encoding? diff --git a/Lib/email/header.py b/Lib/email/header.py index 4ab0032bc6..8dbfe58a57 100644 --- a/Lib/email/header.py +++ b/Lib/email/header.py @@ -135,7 +135,10 @@ def decode_header(header): collapsed = [] last_word = last_charset = None for word, charset in decoded_words: - if isinstance(word, str): + if not charset and isinstance(word, str): + word = word.encode('utf-8') + charset = 'utf-8' + elif isinstance(word, str): word = bytes(word, 'raw-unicode-escape') if last_word is None: last_word = word Python 3.9.0a0 (heads/master:110a47c4f4, Jul 10 2019, 11:32:53) [GCC 7.4.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import email.header >>> header = "Your booking at Voyager Int'l Hostel,=?UTF-8?B?IFBhbmFtw6EgQ2l0eQ==?=, Panamá- Casco Antiguo" >>> print(email.header.make_header(email.header.decode_header(header))) Your booking at Voyager Int'l Hostel, Panamá City, Panamá- Casco Antiguo >>>
msg347607 - (view)	Author: Aldwin Pollefeyt (aldwinaldwin) *	Date: 2019-07-10 08:00
Changing everything to utf-8 breaks a lot of tests, so here a less invasive solution? diff --git a/Lib/email/header.py b/Lib/email/header.py index 4ab0032bc6..1e71eeae7f 100644 --- a/Lib/email/header.py +++ b/Lib/email/header.py @@ -136,7 +136,14 @@ def decode_header(header): last_word = last_charset = None for word, charset in decoded_words: if isinstance(word, str): - word = bytes(word, 'raw-unicode-escape') + word_tmp = bytes(word, 'raw-unicode-escape') + input_charset = charset or 'us-ascii' + try: + _ = word_tmp.decode(input_charset, errors='strict') + word = word_tmp + except UnicodeDecodeError: + word = str(word).encode('utf-8') + charset = 'utf-8' if last_word is None: last_word = word last_charset = charset
msg348851 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2019-08-01 12:42
The input header is not valid (non-ascii is not allowed in headers), so you shouldn't expect make_header to do anything sensible. Note that this is the legacy API, which is a toolkit and does not hold your hand when it comes to RFC compliance. Aside from any other concerns, this is long standing behavior (it is the same in python2), and it doesn't make sense to change the behavior of a legacy API.
msg348868 - (view)	Author: Yun Li (yunlee)	Date: 2019-08-01 17:51
Hi, David: I don't think your argument stands here. The whole world does not just include English speaking countries. There are Spanish, Russian, Chinese, etc. Any legacy packages should support all languages instead of just English. This is definitely a bug in this package. I hope that the python support team should fix this issue or simply add the "support English only" description in the function explicitly . Otherwise it is very annoying for other countries to use this package. Thanks! Yun On Thu, Aug 1, 2019 at 5:42 AM R. David Murray <report@bugs.python.org> wrote: > > R. David Murray <rdmurray@bitdance.com> added the comment: > > The input header is not valid (non-ascii is not allowed in headers), so > you shouldn't expect make_header to do anything sensible. Note that this > is the legacy API, which is a toolkit and does not hold your hand when it > comes to RFC compliance. Aside from any other concerns, this is long > standing behavior (it is the same in python2), and it doesn't make sense to > change the behavior of a legacy API. > > ---------- > resolution: -> not a bug > stage: patch review -> resolved > status: open -> closed > > _______________________________________ > Python tracker <report@bugs.python.org> > <https://bugs.python.org/issue37532> > _______________________________________ >
msg348871 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2019-08-01 18:45
Right, and the python email package fully supports non ascii: >>> msg = EmailMessage() >>> msg['Subject'] = "Panamá- Casco Antiguo" >>> bytes(msg) b'Subject: =?utf-8?q?Panam=C3=A1-?= Casco Antiguo\n\n' >>> str(msg) 'Subject: Panamá- Casco Antiguo\n\n' >>> msg['subject'] 'Panamá- Casco Antiguo' make_header also supports non-ascii, you just have to tell it what charset you want to use. Like I said, make_header is part of the legacy API, and it really is a pain to use. That's why we wrote the new API.

History
Date	User	Action	Args
2022-04-11 14:59:17	admin	set	github: 81713
2019-08-01 18:45:58	r.david.murray	set	messages: + msg348871
2019-08-01 17:51:55	yunlee	set	messages: + msg348868
2019-08-01 12:42:01	r.david.murray	set	status: open -> closed resolution: not a bug messages: + msg348851 stage: patch review -> resolved
2019-07-11 02:46:22	aldwinaldwin	set	keywords: + patch stage: patch review pull_requests: + pull_request14496
2019-07-10 08:00:16	aldwinaldwin	set	messages: + msg347607
2019-07-10 05:33:22	xtreak	set	nosy: + maxking
2019-07-10 04:36:50	aldwinaldwin	set	nosy: + aldwinaldwin messages: + msg347598
2019-07-09 21:20:46	yunlee	create