classification
Title: email module creates base64 output with incorrect line breaks
Type: behavior Stage: resolved
Components: email, Library (Lib) Versions: Python 3.7, Python 3.6
process
Status: closed Resolution: out of date
Dependencies: Superseder:
Assigned To: Nosy List: barry, jribbens, r.david.murray
Priority: normal Keywords:

Created on 2017-04-10 17:41 by jribbens, last changed 2017-04-10 20:58 by r.david.murray. This issue is now closed.

Messages (8)
msg291434 - (view) Author: Jon Ribbens (jribbens) * Date: 2017-04-10 17:41
The email module, when creating base64-encoded text parts, does not process line breaks correctly - RFC 2045 s6.8 says that line breaks must be converted to CRLF before base64-encoding, and the email module is not doing this.

>>> from email.mime.text import MIMEText
>>> import base64
>>> m = MIMEText("hello\nthere", _charset="utf-8")
>>> m.as_string()
'Content-Type: text/plain; charset="utf-8"\nMIME-Version: 1.0\nContent-Transfer-Encoding: base64\n\naGVsbG8KdGhlcmU=\n'
>>> base64.b64decode("aGVsbG8KdGhlcmU=")
b'hello\nthere'

You might say that it is the application's job to convert the line endings before calling MIMEText(), but I think all application authors would be surprised by this. Certainly the MailMan authors would be, as they say this is a Python bug not a MailMan bug ;-)
msg291442 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2017-04-10 18:42
This appears to be a problem in the new API as well.  I don't think we can change the legacy API because its been that way forever and applications might be depending on it (that is, the library preserves exactly what it is handed, and an application might break if that changes).  In the new API, though, I think we could get away with fixing it to do the transformation on text strings in the default content manager so that the line endings follow the message policy.  (That is, if you use default, you get \n, if you use SMTP, you get \r\n).  I think we can get away with it because there aren't that many applications using the new API yet.
msg291443 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2017-04-10 18:43
Actually, I think the fix would go in the generator, not in the contentmanager, but it's been long enough since I've worked on the code that I'm not sure.
msg291446 - (view) Author: Jon Ribbens (jribbens) * Date: 2017-04-10 18:58
OK cool, but please note that this is a MIME issue not an SMTP issue - if the message has text that is being base64-encoded then it must use CRLF line breaks regardless of whether SMTP is involved or not.
msg291448 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2017-04-10 19:10
That is true for text/xxxx types, yes.  The policy is named after the target wire protocol, and if you are transmitting an email message over SMTP, that implies MIME.  What to do if you are not sending it over SMTP, though, is a tougher question. One could argue it either way for the 'default' policy, and I'm open to argument.
msg291450 - (view) Author: Jon Ribbens (jribbens) * Date: 2017-04-10 20:26
So on further investigation, with the new API and policy=SMTP, it does generate correct base64 output. So I guess on the basis that the new version can generate the right output, and it appears to be a deliberate choice that the default policy breaks the RFCs, you can close this issue ;-)

>>> from email.message import EmailMessage
>>> from email.policy import SMTP
>>> import base64
>>> msg = EmailMessage(policy=SMTP)
>>> msg.set_content("hello\nthere", cte="base64")
>>> msg.as_string()
'Content-Type: text/plain; charset="utf-8"\r\nContent-Transfer-Encoding: base64\r\nMIME-Version: 1.0\r\n\r\naGVsbG8NCnRoZXJlDQo=\r\n'
>>> base64.b64decode("aGVsbG8NCnRoZXJlDQo=")
b'hello\r\nthere\r\n'
msg291451 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2017-04-10 20:55
Huh.  I ran something like that test and thought I saw the reverse.  I guess I misread my terminal.  Looking at the code, set_content does take care to fix the line ending according to the policy before doing the encoding.
msg291452 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2017-04-10 20:58
There is, however, an issue that if you pass a message with the default policy to the generator and specify SMTP as the policy, it doesn't *recode* the line endings.  I thought there was an open issue for that, but I can't find it.

One solution would be to do as you suggest and make \r\n what we always use when doing base64 encoding.  I'm open to that as a possible fix, but it probably needs at least a brief discussion with Barry.
History
Date User Action Args
2017-04-10 20:58:29r.david.murraysetmessages: + msg291452
2017-04-10 20:55:50r.david.murraysetstatus: open -> closed
resolution: out of date
messages: + msg291451

stage: resolved
2017-04-10 20:26:21jribbenssetmessages: + msg291450
2017-04-10 19:15:04r.david.murraysetcomponents: + email
2017-04-10 19:11:05r.david.murraysetversions: - Python 2.7
2017-04-10 19:10:57r.david.murraysetmessages: + msg291448
versions: + Python 3.6, Python 3.7, - Python 3.5
2017-04-10 18:58:45jribbenssetmessages: + msg291446
2017-04-10 18:43:32r.david.murraysetmessages: + msg291443
2017-04-10 18:42:10r.david.murraysetmessages: + msg291442
components: - email
2017-04-10 18:08:41serhiy.storchakasetnosy: + barry, r.david.murray
components: + email
2017-04-10 17:41:26jribbenscreate