classification
Title: email.generator.BytesGenerator corrupts data by changing line endings
Type: behavior Stage: resolved
Components: email Versions: Python 3.6, Python 3.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Alexander.Kruppa, barry, jayvdb, python-dev, r.david.murray, xZise, 天一.何
Priority: normal Keywords: patch

Created on 2013-09-11 07:47 by Alexander.Kruppa, last changed 2016-09-11 21:27 by r.david.murray. This issue is now closed.

Files
File name Uploaded Description Edit
issue19003_email.patch 天一.何, 2014-06-29 02:39 patch for lib/email
Messages (6)
msg197476 - (view) Author: Alexander Kruppa (Alexander.Kruppa) Date: 2013-09-11 07:47
This is a follow-up to #16564. In that issue, BytesGenerator was changed to accept a bytes payload, however processing binary data that way leads to data corruption.

Repost of the update I posted in #16564:

***********************************************************
~/build/Python-3.3.2$ ./python --version
Python 3.3.2

When modifying the test case in Lib/test/test_email/test_email.py like this:

--- Lib/test/test_email/test_email.py	2013-05-15 18:32:55.000000000 +0200
+++ Lib/test/test_email/test_email_mine.py	2013-09-10 14:22:08.160089440 +0200
@@ -1461,17 +1461,17 @@
         # Issue 16564: This does not produce an RFC valid message, since to be
         # valid it should have a CTE of binary.  But the below works in
         # Python2, and is documented as working this way.
-        bytesdata = b'\xfa\xfb\xfc\xfd\xfe\xff'
+        bytesdata = b'\x0b\xfa\xfb\xfc\xfd\xfe\xff'
         msg = MIMEApplication(bytesdata, _encoder=encoders.encode_noop)
         # Treated as a string, this will be invalid code points.
-        self.assertEqual(msg.get_payload(), '\uFFFD' * len(bytesdata))
+        # self.assertEqual(msg.get_payload(), '\uFFFD' * len(bytesdata))
         self.assertEqual(msg.get_payload(decode=True), bytesdata)
         s = BytesIO()
         g = BytesGenerator(s)
         g.flatten(msg)
         wireform = s.getvalue()
         msg2 = email.message_from_bytes(wireform)
-        self.assertEqual(msg.get_payload(), '\uFFFD' * len(bytesdata))
+        # self.assertEqual(msg.get_payload(), '\uFFFD' * len(bytesdata))
         self.assertEqual(msg2.get_payload(decode=True), bytesdata)

then running:

./python ./Tools/scripts/run_tests.py test_email

results in:

======================================================================
FAIL: test_binary_body_with_encode_noop (test_email_mine.TestMIMEApplication)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/localdisk/kruppaal/build/Python-3.Lib/test/test_email/test_email_mine.py", line 1475, in test_binary_body_with_encode_noop
    self.assertEqual(msg2.get_payload(decode=True), bytesdata)
AssertionError: b'\x0b\n\xfa\xfb\xfc\xfd\xfe\xff' != b'\x0b\xfa\xfb\xfc\xfd\xfe\xff'

The '\x0b' byte is incorrectly translated to '\x0b\n', i.e., a New Line character is inserted.

Encoding the bytes array:
bytes(range(256))

results output data (MIME Header stripped):

0000000: 0001 0203 0405 0607 0809 0a0b 0a0c 0a0a  ................
0000010: 0e0f 1011 1213 1415 1617 1819 1a1b 1c0a  ................
0000020: 1d0a 1e0a 1f20 2122 2324 2526 2728 292a  ..... !"#$%&'()*
0000030: 2b2c 2d2e 2f30 3132 3334 3536 3738 393a  +,-./0123456789:
0000040: 3b3c 3d3e 3f40 4142 4344 4546 4748 494a  ;<=>?@ABCDEFGHIJ
0000050: 4b4c 4d4e 4f50 5152 5354 5556 5758 595a  KLMNOPQRSTUVWXYZ
0000060: 5b5c 5d5e 5f60 6162 6364 6566 6768 696a  [\]^_`abcdefghij
0000070: 6b6c 6d6e 6f70 7172 7374 7576 7778 797a  klmnopqrstuvwxyz
0000080: 7b7c 7d7e 7f80 8182 8384 8586 8788 898a  {|}~............
0000090: 8b8c 8d8e 8f90 9192 9394 9596 9798 999a  ................
00000a0: 9b9c 9d9e 9fa0 a1a2 a3a4 a5a6 a7a8 a9aa  ................
00000b0: abac adae afb0 b1b2 b3b4 b5b6 b7b8 b9ba  ................
00000c0: bbbc bdbe bfc0 c1c2 c3c4 c5c6 c7c8 c9ca  ................
00000d0: cbcc cdce cfd0 d1d2 d3d4 d5d6 d7d8 d9da  ................
00000e0: dbdc ddde dfe0 e1e2 e3e4 e5e6 e7e8 e9ea  ................
00000f0: ebec edee eff0 f1f2 f3f4 f5f6 f7f8 f9fa  ................
0000100: fbfc fdfe ff                             .....

That is, a '\n' is inserted after '\x0b', '\x1c', '\x1d', and '\x1e', 
and '\x0d' is replaced by '\n\n'.

***********************************************************

I suspect this is due to the use of self._write_lines(msg._payload) in BytesGenerator._handle_text(); since _write_lines() mangles line endings.
msg221827 - (view) Author: 天一 何 (天一.何) * Date: 2014-06-29 02:30
Confirmed in Python 3.4.1.
msg221828 - (view) Author: 天一 何 (天一.何) * Date: 2014-06-29 02:38
This patch added special behavior with MIMEApplication and may fix this issue.
Can be verified with test_email.
msg228496 - (view) Author: Fabian (xZise) * Date: 2014-10-04 21:12
I can confirm this on 3.4.1 and is really annoying. But the patch should set '_is_raw_payload' to False if the payload is set via 'set_payload' (the operations in 'set_raw_payload' need to be switched).
msg275857 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2016-09-11 21:23
New changeset c0f5702e0f10 by R David Murray in branch '3.5':
#19003: Only replace \r and/or \n line endings in email.generator.
https://hg.python.org/cpython/rev/c0f5702e0f10

New changeset ccad4d142934 by R David Murray in branch 'default':
Merge: #19003: Only replace \r and/or \n line endings in email.generator.
https://hg.python.org/cpython/rev/ccad4d142934
msg275859 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2016-09-11 21:27
I've fixed this to the extent that it is possible without adding support for the 'binary' CTE.  That is, \r, \n, and \r\n are still replaced with the 'correct' line ending characters, which is the correct behavior under the RFCs even for binary data if the CTE is not 'binary'.  Issue 18886 covers the enhancement of supporting the binary CTE.
History
Date User Action Args
2016-09-11 21:27:22r.david.murraysetstatus: open -> closed
versions: + Python 3.5, Python 3.6, - Python 3.2, Python 3.3, Python 3.4
messages: + msg275859

resolution: fixed
stage: resolved
2016-09-11 21:23:47python-devsetnosy: + python-dev
messages: + msg275857
2015-09-19 02:04:56jayvdbsetnosy: + jayvdb
2014-10-04 21:12:30xZisesetnosy: + xZise
messages: + msg228496
2014-06-29 02:39:00天一.何setfiles: + issue19003_email.patch
keywords: + patch
messages: + msg221828
2014-06-29 02:30:57天一.何setnosy: + 天一.何

messages: + msg221827
versions: + Python 3.4
2013-09-11 07:47:20Alexander.Kruppacreate