This is a follow-up to #16564. In that issue, BytesGenerator was changed to accept a bytes payload, however processing binary data that way leads to data corruption.
Repost of the update I posted in #16564:
***********************************************************
~/build/Python-3.3.2$ ./python --version
Python 3.3.2
When modifying the test case in Lib/test/test_email/test_email.py like this:
--- Lib/test/test_email/test_email.py 2013-05-15 18:32:55.000000000 +0200
+++ Lib/test/test_email/test_email_mine.py 2013-09-10 14:22:08.160089440 +0200
@@ -1461,17 +1461,17 @@
# Issue 16564: This does not produce an RFC valid message, since to be
# valid it should have a CTE of binary. But the below works in
# Python2, and is documented as working this way.
- bytesdata = b'\xfa\xfb\xfc\xfd\xfe\xff'
+ bytesdata = b'\x0b\xfa\xfb\xfc\xfd\xfe\xff'
msg = MIMEApplication(bytesdata, _encoder=encoders.encode_noop)
# Treated as a string, this will be invalid code points.
- self.assertEqual(msg.get_payload(), '\uFFFD' * len(bytesdata))
+ # self.assertEqual(msg.get_payload(), '\uFFFD' * len(bytesdata))
self.assertEqual(msg.get_payload(decode=True), bytesdata)
s = BytesIO()
g = BytesGenerator(s)
g.flatten(msg)
wireform = s.getvalue()
msg2 = email.message_from_bytes(wireform)
- self.assertEqual(msg.get_payload(), '\uFFFD' * len(bytesdata))
+ # self.assertEqual(msg.get_payload(), '\uFFFD' * len(bytesdata))
self.assertEqual(msg2.get_payload(decode=True), bytesdata)
then running:
./python ./Tools/scripts/run_tests.py test_email
results in:
======================================================================
FAIL: test_binary_body_with_encode_noop (test_email_mine.TestMIMEApplication)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/localdisk/kruppaal/build/Python-3.Lib/test/test_email/test_email_mine.py", line 1475, in test_binary_body_with_encode_noop
self.assertEqual(msg2.get_payload(decode=True), bytesdata)
AssertionError: b'\x0b\n\xfa\xfb\xfc\xfd\xfe\xff' != b'\x0b\xfa\xfb\xfc\xfd\xfe\xff'
The '\x0b' byte is incorrectly translated to '\x0b\n', i.e., a New Line character is inserted.
Encoding the bytes array:
bytes(range(256))
results output data (MIME Header stripped):
0000000: 0001 0203 0405 0607 0809 0a0b 0a0c 0a0a ................
0000010: 0e0f 1011 1213 1415 1617 1819 1a1b 1c0a ................
0000020: 1d0a 1e0a 1f20 2122 2324 2526 2728 292a ..... !"#$%&'()*
0000030: 2b2c 2d2e 2f30 3132 3334 3536 3738 393a +,-./0123456789:
0000040: 3b3c 3d3e 3f40 4142 4344 4546 4748 494a ;<=>?@ABCDEFGHIJ
0000050: 4b4c 4d4e 4f50 5152 5354 5556 5758 595a KLMNOPQRSTUVWXYZ
0000060: 5b5c 5d5e 5f60 6162 6364 6566 6768 696a [\]^_`abcdefghij
0000070: 6b6c 6d6e 6f70 7172 7374 7576 7778 797a klmnopqrstuvwxyz
0000080: 7b7c 7d7e 7f80 8182 8384 8586 8788 898a {|}~............
0000090: 8b8c 8d8e 8f90 9192 9394 9596 9798 999a ................
00000a0: 9b9c 9d9e 9fa0 a1a2 a3a4 a5a6 a7a8 a9aa ................
00000b0: abac adae afb0 b1b2 b3b4 b5b6 b7b8 b9ba ................
00000c0: bbbc bdbe bfc0 c1c2 c3c4 c5c6 c7c8 c9ca ................
00000d0: cbcc cdce cfd0 d1d2 d3d4 d5d6 d7d8 d9da ................
00000e0: dbdc ddde dfe0 e1e2 e3e4 e5e6 e7e8 e9ea ................
00000f0: ebec edee eff0 f1f2 f3f4 f5f6 f7f8 f9fa ................
0000100: fbfc fdfe ff .....
That is, a '\n' is inserted after '\x0b', '\x1c', '\x1d', and '\x1e',
and '\x0d' is replaced by '\n\n'.
***********************************************************
I suspect this is due to the use of self._write_lines(msg._payload) in BytesGenerator._handle_text(); since _write_lines() mangles line endings.
|