Issue 19003: email.generator.BytesGenerator corrupts data by changing line endings

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/63203

classification

Title:	email.generator.BytesGenerator corrupts data by changing line endings
Type:	behavior	Stage:	resolved
Components:	email	Versions:	Python 3.6, Python 3.5

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:		Nosy List:	Alexander.Kruppa, barry, jayvdb, python-dev, r.david.murray, xZise, 天一.何
Priority:	normal	Keywords:	patch

Created on 2013-09-11 07:47 by Alexander.Kruppa, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
issue19003_email.patch	天一.何, 2014-06-29 02:39	patch for lib/email

Messages (6)
msg197476 - (view)	Author: Alexander Kruppa (Alexander.Kruppa)	Date: 2013-09-11 07:47
This is a follow-up to #16564. In that issue, BytesGenerator was changed to accept a bytes payload, however processing binary data that way leads to data corruption. Repost of the update I posted in #16564: *********************************************************** ~/build/Python-3.3.2$ ./python --version Python 3.3.2 When modifying the test case in Lib/test/test_email/test_email.py like this: --- Lib/test/test_email/test_email.py 2013-05-15 18:32:55.000000000 +0200 +++ Lib/test/test_email/test_email_mine.py 2013-09-10 14:22:08.160089440 +0200 @@ -1461,17 +1461,17 @@ # Issue 16564: This does not produce an RFC valid message, since to be # valid it should have a CTE of binary. But the below works in # Python2, and is documented as working this way. - bytesdata = b'\xfa\xfb\xfc\xfd\xfe\xff' + bytesdata = b'\x0b\xfa\xfb\xfc\xfd\xfe\xff' msg = MIMEApplication(bytesdata, _encoder=encoders.encode_noop) # Treated as a string, this will be invalid code points. - self.assertEqual(msg.get_payload(), '\uFFFD' * len(bytesdata)) + # self.assertEqual(msg.get_payload(), '\uFFFD' * len(bytesdata)) self.assertEqual(msg.get_payload(decode=True), bytesdata) s = BytesIO() g = BytesGenerator(s) g.flatten(msg) wireform = s.getvalue() msg2 = email.message_from_bytes(wireform) - self.assertEqual(msg.get_payload(), '\uFFFD' * len(bytesdata)) + # self.assertEqual(msg.get_payload(), '\uFFFD' * len(bytesdata)) self.assertEqual(msg2.get_payload(decode=True), bytesdata) then running: ./python ./Tools/scripts/run_tests.py test_email results in: ====================================================================== FAIL: test_binary_body_with_encode_noop (test_email_mine.TestMIMEApplication) ---------------------------------------------------------------------- Traceback (most recent call last): File "/localdisk/kruppaal/build/Python-3.Lib/test/test_email/test_email_mine.py", line 1475, in test_binary_body_with_encode_noop self.assertEqual(msg2.get_payload(decode=True), bytesdata) AssertionError: b'\x0b\n\xfa\xfb\xfc\xfd\xfe\xff' != b'\x0b\xfa\xfb\xfc\xfd\xfe\xff' The '\x0b' byte is incorrectly translated to '\x0b\n', i.e., a New Line character is inserted. Encoding the bytes array: bytes(range(256)) results output data (MIME Header stripped): 0000000: 0001 0203 0405 0607 0809 0a0b 0a0c 0a0a ................ 0000010: 0e0f 1011 1213 1415 1617 1819 1a1b 1c0a ................ 0000020: 1d0a 1e0a 1f20 2122 2324 2526 2728 292a ..... !"#$%&'()* 0000030: 2b2c 2d2e 2f30 3132 3334 3536 3738 393a +,-./0123456789: 0000040: 3b3c 3d3e 3f40 4142 4344 4546 4748 494a ;<=>?@ABCDEFGHIJ 0000050: 4b4c 4d4e 4f50 5152 5354 5556 5758 595a KLMNOPQRSTUVWXYZ 0000060: 5b5c 5d5e 5f60 6162 6364 6566 6768 696a [\]^_`abcdefghij 0000070: 6b6c 6d6e 6f70 7172 7374 7576 7778 797a klmnopqrstuvwxyz 0000080: 7b7c 7d7e 7f80 8182 8384 8586 8788 898a {\|}~............ 0000090: 8b8c 8d8e 8f90 9192 9394 9596 9798 999a ................ 00000a0: 9b9c 9d9e 9fa0 a1a2 a3a4 a5a6 a7a8 a9aa ................ 00000b0: abac adae afb0 b1b2 b3b4 b5b6 b7b8 b9ba ................ 00000c0: bbbc bdbe bfc0 c1c2 c3c4 c5c6 c7c8 c9ca ................ 00000d0: cbcc cdce cfd0 d1d2 d3d4 d5d6 d7d8 d9da ................ 00000e0: dbdc ddde dfe0 e1e2 e3e4 e5e6 e7e8 e9ea ................ 00000f0: ebec edee eff0 f1f2 f3f4 f5f6 f7f8 f9fa ................ 0000100: fbfc fdfe ff ..... That is, a '\n' is inserted after '\x0b', '\x1c', '\x1d', and '\x1e', and '\x0d' is replaced by '\n\n'. *********************************************************** I suspect this is due to the use of self._write_lines(msg._payload) in BytesGenerator._handle_text(); since _write_lines() mangles line endings.
msg221827 - (view)	Author: 天一何 (天一.何) *	Date: 2014-06-29 02:30
Confirmed in Python 3.4.1.
msg221828 - (view)	Author: 天一何 (天一.何) *	Date: 2014-06-29 02:38
This patch added special behavior with MIMEApplication and may fix this issue. Can be verified with test_email.
msg228496 - (view)	Author: Fabian (xZise) *	Date: 2014-10-04 21:12
I can confirm this on 3.4.1 and is really annoying. But the patch should set '_is_raw_payload' to False if the payload is set via 'set_payload' (the operations in 'set_raw_payload' need to be switched).
msg275857 - (view)	Author: Roundup Robot (python-dev)	Date: 2016-09-11 21:23
New changeset c0f5702e0f10 by R David Murray in branch '3.5': #19003: Only replace \r and/or \n line endings in email.generator. https://hg.python.org/cpython/rev/c0f5702e0f10 New changeset ccad4d142934 by R David Murray in branch 'default': Merge: #19003: Only replace \r and/or \n line endings in email.generator. https://hg.python.org/cpython/rev/ccad4d142934
msg275859 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2016-09-11 21:27
I've fixed this to the extent that it is possible without adding support for the 'binary' CTE. That is, \r, \n, and \r\n are still replaced with the 'correct' line ending characters, which is the correct behavior under the RFCs even for binary data if the CTE is not 'binary'. Issue 18886 covers the enhancement of supporting the binary CTE.

History
Date	User	Action	Args
2022-04-11 14:57:50	admin	set	github: 63203
2016-09-11 21:27:22	r.david.murray	set	status: open -> closed versions: + Python 3.5, Python 3.6, - Python 3.2, Python 3.3, Python 3.4 messages: + msg275859 resolution: fixed stage: resolved
2016-09-11 21:23:47	python-dev	set	nosy: + python-dev messages: + msg275857
2015-09-19 02:04:56	jayvdb	set	nosy: + jayvdb
2014-10-04 21:12:30	xZise	set	nosy: + xZise messages: + msg228496
2014-06-29 02:39:00	天一.何	set	files: + issue19003_email.patch keywords: + patch messages: + msg221828
2014-06-29 02:30:57	天一.何	set	nosy: + 天一.何 messages: + msg221827 versions: + Python 3.4
2013-09-11 07:47:20	Alexander.Kruppa	create