Message409392
Due to this bug, any user of this function in Python 3.0+ *already* has to be able to handle all of the following outputs in order to use it reliably:
decode_header(...) -> [(str, None)]
or decode_header(...) -> [(bytes, str)]
or decode_header(...) -> [(bytes, (str|None)), (bytes, (str|None)), ...]
== Fix str/bytes inconsistency ==
We could eliminate the inconsistency, and make the function only ever return bytes instead of str, with the following changes to https://github.com/python/cpython/blob/3.10/Lib/email/header.py.
```
diff --git a/Lib/email/header.py.orig b/Lib/email/header.py
index 4ab0032..41e91f2 100644
--- a/Lib/email/header.py
+++ b/Lib/email/header.py
@@ -61,7 +61,7 @@ _max_append = email.quoprimime._max_append
def decode_header(header):
"""Decode a message header value without converting charset.
- Returns a list of (string, charset) pairs containing each of the decoded
+ Returns a list of (bytes, charset) pairs containing each of the decoded
parts of the header. Charset is None for non-encoded parts of the header,
otherwise a lower-case string containing the name of the character set
specified in the encoded string.
@@ -78,7 +78,7 @@ def decode_header(header):
for string, charset in header._chunks]
# If no encoding, just return the header with no charset.
if not ecre.search(header):
- return [(header, None)]
+ return [header.encode(), None)]
# First step is to parse all the encoded parts into triplets of the form
# (encoded_string, encoding, charset). For unencoded strings, the last
# two parts will be None.
```
With these changes, decode_header() would return one of the following:
decode_header(...) -> [(bytes, None)]
or decode_header(...) -> [(bytes, str)]
or decode_header(...) -> [(bytes, (str|None)), (bytes, (str|None)), ...]
== Ensure that charset is always str, never None ==
A couple more small changes:
```
@@ -92,7 +92,7 @@ def decode_header(header):
unencoded = unencoded.lstrip()
first = False
if unencoded:
- words.append((unencoded, None, None))
+ words.append((unencoded, None, 'ascii'))
if parts:
charset = parts.pop(0).lower()
encoding = parts.pop(0).lower()
@@ -133,7 +133,8 @@ def decode_header(header):
# Now convert all words to bytes and collapse consecutive runs of
# similarly encoded words.
collapsed = []
- last_word = last_charset = None
+ last_word = None
+ last_charset = 'ascii'
for word, charset in decoded_words:
if isinstance(word, str):
word = bytes(word, 'raw-unicode-escape')
```
With these changes, decode_header() would return only:
decode_header(...) -> List[(bytes, str)] |
|
Date |
User |
Action |
Args |
2021-12-30 23:08:21 | dlenski | set | recipients:
+ dlenski, barry, r.david.murray, py.user, iritkatriel |
2021-12-30 23:08:21 | dlenski | set | messageid: <1640905701.53.0.857462247688.issue22833@roundup.psfhosted.org> |
2021-12-30 23:08:21 | dlenski | link | issue22833 messages |
2021-12-30 23:08:21 | dlenski | create | |
|