classification
Title: email.header.make_header() doesn't work if any `ascii` code is out of range(128)
Type: behavior Stage: patch review
Components: email Versions: Python 3.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: aldwinaldwin, barry, maxking, r.david.murray, yunlee
Priority: normal Keywords: patch

Created on 2019-07-09 21:20 by yunlee, last changed 2019-07-11 02:46 by aldwinaldwin.

Pull Requests
URL Status Linked Edit
PR 14696 open aldwinaldwin, 2019-07-11 02:46
Messages (3)
msg347577 - (view) Author: Yun Li (yunlee) Date: 2019-07-09 21:20
email.header.make_header() doesn't work if any `ascii` code is out of range(128)

For example 

>>> header = "Your booking at Voyager Int'l Hostel,=?UTF-8?B?IFBhbmFtw6EgQ2l0eQ==?=,   Panamá- Casco Antiguo"

>>> decode_header(header)
[(b"Your booking at Voyager Int'l Hostel,", None), (b' Panam\xc3\xa1 City', 'utf-8'), (b',   Panam\xe1- Casco Antiguo', None)]

>>> make_header(decode_header(header))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/Cellar/python/3.7.2_2/Frameworks/Python.framework/Versions/3.7/lib/python3.7/email/header.py", line 174, in make_header
    h.append(s, charset)
  File "/usr/local/Cellar/python/3.7.2_2/Frameworks/Python.framework/Versions/3.7/lib/python3.7/email/header.py", line 295, in append
    s = s.decode(input_charset, errors)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe1 in position 9: ordinal not in range(128)
msg347598 - (view) Author: Aldwin Pollefeyt (aldwinaldwin) * Date: 2019-07-10 04:36
Maybe a solution, if no charset defined, then encode it as utf-8 in decode_header, because it's Python3's default encoding?


diff --git a/Lib/email/header.py b/Lib/email/header.py
index 4ab0032bc6..8dbfe58a57 100644
--- a/Lib/email/header.py
+++ b/Lib/email/header.py
@@ -135,7 +135,10 @@ def decode_header(header):
     collapsed = []
     last_word = last_charset = None
     for word, charset in decoded_words:
-        if isinstance(word, str):
+        if not charset and isinstance(word, str):
+            word = word.encode('utf-8')
+            charset = 'utf-8'
+        elif isinstance(word, str):
             word = bytes(word, 'raw-unicode-escape')
         if last_word is None:
             last_word = word



Python 3.9.0a0 (heads/master:110a47c4f4, Jul 10 2019, 11:32:53) 
[GCC 7.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import email.header
>>> header = "Your booking at Voyager Int'l Hostel,=?UTF-8?B?IFBhbmFtw6EgQ2l0eQ==?=,   Panamá- Casco Antiguo"
>>> print(email.header.make_header(email.header.decode_header(header)))
Your booking at Voyager Int'l Hostel, Panamá City,   Panamá- Casco Antiguo
>>>
msg347607 - (view) Author: Aldwin Pollefeyt (aldwinaldwin) * Date: 2019-07-10 08:00
Changing everything to utf-8 breaks a lot of tests, so here a less invasive solution?

diff --git a/Lib/email/header.py b/Lib/email/header.py
index 4ab0032bc6..1e71eeae7f 100644
--- a/Lib/email/header.py
+++ b/Lib/email/header.py
@@ -136,7 +136,14 @@ def decode_header(header):
     last_word = last_charset = None
     for word, charset in decoded_words:
         if isinstance(word, str):
-            word = bytes(word, 'raw-unicode-escape')
+            word_tmp = bytes(word, 'raw-unicode-escape')
+            input_charset = charset or 'us-ascii'
+            try:
+                _ = word_tmp.decode(input_charset, errors='strict')
+                word = word_tmp
+            except UnicodeDecodeError:
+                word = str(word).encode('utf-8')
+                charset = 'utf-8'
         if last_word is None:
             last_word = word
             last_charset = charset
History
Date User Action Args
2019-07-11 02:46:22aldwinaldwinsetkeywords: + patch
stage: patch review
pull_requests: + pull_request14496
2019-07-10 08:00:16aldwinaldwinsetmessages: + msg347607
2019-07-10 05:33:22xtreaksetnosy: + maxking
2019-07-10 04:36:50aldwinaldwinsetnosy: + aldwinaldwin
messages: + msg347598
2019-07-09 21:20:46yunleecreate