This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Email Header Folding Converts Non-CRLF Newlines to CRLFs
Type: behavior Stage:
Components: Library (Lib) Versions: Python 3.11
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: jwalterclark
Priority: normal Keywords:

Created on 2022-01-21 18:21 by jwalterclark, last changed 2022-04-11 14:59 by admin.

Messages (1)
msg411171 - (view) Author: J. Walter Clark (jwalterclark) Date: 2022-01-21 18:21
In various places in the email library `str.splitlines` is used to split up a message where folding might take place in the original message source. This appears to be a bug because when these split parts are re-joined they are joined by a CRLF.
https://github.com/python/cpython/blob/ef5bb25e2d6147cd44be9c9b166525fb30485be0/Lib/email/header.py#L369

`str.splitlines` splits on "universal newlines" which can include newlines other than the CRLF.
https://docs.python.org/3/library/stdtypes.html#str.splitlines

However, the email RFCs define folding whitespace with CRLF as the only possible newline type (optionally surrounded by WSP (SP/HTAB) and/or comments).
https://datatracker.ietf.org/doc/html/rfc5322#section-3.2.2

The end result is that a message making a roundtrip through the email parser/generator is mangled because it has any non-CRLF "universal newlines" converted to CRLFs. Anything in the header after the non-CRLF "universal newline" appears on it's own line with no preceding whitespace. This appears to happen with all of the stock policies.

```
from email import message_from_bytes
from email.policy import SMTPUTF8

eml_bytes = b'Header-With-FS-Char: BEFORE\x1cAFTER\r\n\r\nBody\r\n'
print(eml_bytes)

message = message_from_bytes(eml_bytes, policy=SMTPUTF8)
print(message.as_bytes(policy=SMTPUTF8))
```

```
b'Header-With-FS-Char: BEFORE\x1cAFTER\r\n\r\nBody\r\n'
b'Header-With-FS-Char: BEFORE\r\nAFTER\r\n\r\nBody\r\n'
```

The operational impact of this mangling is that the "AFTER" text now makes the message format invalid because it is neither a valid header (no ": ") nor the valid start of a message body (only one CRLF). Common MIME-viewers (e.g. Thunderbird/Outlook) appear to interpret it as a body anyway and any subsequent headers become part of the body.
History
Date User Action Args
2022-04-11 14:59:55adminsetgithub: 90620
2022-01-21 18:21:38jwalterclarkcreate