classification
Title: Generator does not translate linesep characters in certain circumstances
Type: behavior Stage: resolved
Components: email Versions: Python 3.4, Python 3.2, Python 3.3
process
Status: closed Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: barry, python-dev, r.david.murray, yu.zhao@getcwd.com
Priority: high Keywords: patch

Created on 2012-04-22 17:57 by r.david.murray, last changed 2014-10-03 02:12 by r.david.murray. This issue is now closed.

Files
File name Uploaded Description Edit
generator_lineneds.patch r.david.murray, 2013-03-04 02:13
Messages (9)
msg158978 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2012-04-22 17:57
I ran into this while translating a test, but it turns out it is a long standing problem.  I presume it has not been an issue because in general in Python2 email messages are read as text with universal newline support, and thus the linesep characters get translated on *read*, and the problem in Generator never shows up.  In python3, however, we will often read messages as binary, which will preserve the existing linesep characters, and expose the Generator bug.  

This isn't a critical bug for Python3 only because if a message is read in binary it will likely be written in binary using \r\n linesep, in which case the right thing will be happening.  Likewise most messages read from disk will be written to disk.  But it should be fixed so that the cases where a message is read in binary and written to disk in text and vice versa are correctly formatted.  (In particular, uses of the new smtplib.send_message could theoretically run in to this, though I haven't tested to see if that is really a problem.)

To reproduce, read data/msg_26.txt from the email test suite in binary mode (or text mode using "linesep='\n'", which will preserve the crlf in that file), and run str on the resulting message.  You'll see that the MIME preamble and the base64 part both have \r\n linesep, instead of the default '\n' linesep used for the rest of the message.
msg183413 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2013-03-04 02:13
Here's a patch, against 3.2.  It definitely does affect smtplib.send_message.
msg183708 - (view) Author: Roundup Robot (python-dev) Date: 2013-03-07 22:31
New changeset 30c0f0dd0b94 by R David Murray in branch '3.2':
#14645: Generator now emits correct linesep for all parts.
http://hg.python.org/cpython/rev/30c0f0dd0b94

New changeset 1b9dc00c4d57 by R David Murray in branch '3.3':
Merge: #14645: Generator now emits correct linesep for all parts.
http://hg.python.org/cpython/rev/1b9dc00c4d57

New changeset 6b69c11b0ad0 by R David Murray in branch 'default':
Merge: #14645: Generator now emits correct linesep for all parts.
http://hg.python.org/cpython/rev/6b69c11b0ad0
msg183709 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2013-03-07 22:57
I'm not going to fix this in Python2.  While the problem exists there, it hasn't ever been reported as a bug.  As noted earlier, this is probably primarily due to the fact that it would be very exceptional to read an email in python2 with anything other than universal newline mode, and Python2 provides no way to emit a message with anything other than \n linesep other than smtplib.sendmail, which does the \n to \r\n translation.  In Python3, in contrast, reading a message as binary is common, and we have smtplib.send_message, which writes the message directly using a \r\n linesep instead of doing a post-transformation the way smtplib.sendmail does.
msg228285 - (view) Author: Yu Zhao (yu.zhao@getcwd.com) Date: 2014-10-03 00:36
This at least shouldn't be done for the BytesGenerator - it breaks binary data integrity. IMO, doing it for the string Generator is not necessary either. The linesep is a policy regarding to MIME syntax. It shouldn't be applied to the payload. Imagine what would happen if people want to be RFC-compliant but keep '\n' in their Linux text files.
msg228288 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-10-03 01:04
The payload must also use \r\n per RFC, unless it is a non-text part, in which case it uses \r\n to separate the content transfer encoded lines.  If you want binary integrity you must use a binary MIME type.
msg228290 - (view) Author: Yu Zhao (yu.zhao@getcwd.com) Date: 2014-10-03 01:19
Ack (Per rfc2046 4.1.1). Since the _writeBody is set to _handle_text when no proper handler exists, the problem should be fixed by adding a binary body handler to BytesGenerator. Will create a separate issue to track the problem.
msg228292 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-10-03 02:11
There already is one: issue 19003.
msg228293 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-10-03 02:12
Well, that may not be exactly the same issue, but I suspect it is related.
History
Date User Action Args
2014-10-03 02:12:03r.david.murraysetmessages: + msg228293
2014-10-03 02:11:09r.david.murraysetmessages: + msg228292
2014-10-03 01:19:49yu.zhao@getcwd.comsetmessages: + msg228290
2014-10-03 01:04:25r.david.murraysetmessages: + msg228288
2014-10-03 00:36:01yu.zhao@getcwd.comsetnosy: + yu.zhao@getcwd.com
messages: + msg228285
2013-03-07 22:57:42r.david.murraysetstatus: open -> closed

stage: patch review -> resolved
messages: + msg183709
versions: + Python 3.4, - Python 2.7
2013-03-07 22:31:55python-devsetnosy: + python-dev
messages: + msg183708
2013-03-04 02:13:36r.david.murraysetfiles: + generator_lineneds.patch
priority: normal -> high
messages: + msg183413

keywords: + patch
stage: needs patch -> patch review
2012-05-24 14:56:08r.david.murraysetassignee: r.david.murray ->

components: + email
nosy: + barry
2012-04-22 17:57:07r.david.murraycreate