Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

email.generator.BytesGenerator corrupts data by changing line endings #63203

Closed
AlexanderKruppa mannequin opened this issue Sep 11, 2013 · 6 comments
Closed

email.generator.BytesGenerator corrupts data by changing line endings #63203

AlexanderKruppa mannequin opened this issue Sep 11, 2013 · 6 comments
Labels
topic-email type-bug An unexpected behavior, bug, or error

Comments

@AlexanderKruppa
Copy link
Mannequin

AlexanderKruppa mannequin commented Sep 11, 2013

BPO 19003
Nosy @warsaw, @bitdancer, @xZise, @jayvdb
Files
  • issue19003_email.patch: patch for lib/email
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2016-09-11.21:27:22.023>
    created_at = <Date 2013-09-11.07:47:20.319>
    labels = ['type-bug', 'expert-email']
    title = 'email.generator.BytesGenerator corrupts data by changing line endings'
    updated_at = <Date 2016-09-11.21:27:22.020>
    user = 'https://bugs.python.org/AlexanderKruppa'

    bugs.python.org fields:

    activity = <Date 2016-09-11.21:27:22.020>
    actor = 'r.david.murray'
    assignee = 'none'
    closed = True
    closed_date = <Date 2016-09-11.21:27:22.023>
    closer = 'r.david.murray'
    components = ['email']
    creation = <Date 2013-09-11.07:47:20.319>
    creator = 'Alexander.Kruppa'
    dependencies = []
    files = ['35799']
    hgrepos = []
    issue_num = 19003
    keywords = ['patch']
    message_count = 6.0
    messages = ['197476', '221827', '221828', '228496', '275857', '275859']
    nosy_count = 7.0
    nosy_names = ['barry', 'r.david.murray', 'python-dev', 'Alexander.Kruppa', '\xe5\xa4\xa9\xe4\xb8\x80.\xe4\xbd\x95', 'xZise', 'jayvdb']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue19003'
    versions = ['Python 3.5', 'Python 3.6']

    @AlexanderKruppa
    Copy link
    Mannequin Author

    AlexanderKruppa mannequin commented Sep 11, 2013

    This is a follow-up to bpo-16564. In that issue, BytesGenerator was changed to accept a bytes payload, however processing binary data that way leads to data corruption.

    Repost of the update I posted in bpo-16564:


    ~/build/Python-3.3.2$ ./python --version
    Python 3.3.2

    When modifying the test case in Lib/test/test_email/test_email.py like this:

    --- Lib/test/test_email/test_email.py	2013-05-15 18:32:55.000000000 +0200
    +++ Lib/test/test_email/test_email_mine.py	2013-09-10 14:22:08.160089440 +0200
    @@ -1461,17 +1461,17 @@
             # Issue 16564: This does not produce an RFC valid message, since to be
             # valid it should have a CTE of binary.  But the below works in
             # Python2, and is documented as working this way.
    -        bytesdata = b'\xfa\xfb\xfc\xfd\xfe\xff'
    +        bytesdata = b'\x0b\xfa\xfb\xfc\xfd\xfe\xff'
             msg = MIMEApplication(bytesdata, _encoder=encoders.encode_noop)
             # Treated as a string, this will be invalid code points.
    -        self.assertEqual(msg.get_payload(), '\uFFFD' * len(bytesdata))
    +        # self.assertEqual(msg.get_payload(), '\uFFFD' * len(bytesdata))
             self.assertEqual(msg.get_payload(decode=True), bytesdata)
             s = BytesIO()
             g = BytesGenerator(s)
             g.flatten(msg)
             wireform = s.getvalue()
             msg2 = email.message_from_bytes(wireform)
    -        self.assertEqual(msg.get_payload(), '\uFFFD' * len(bytesdata))
    +        # self.assertEqual(msg.get_payload(), '\uFFFD' * len(bytesdata))
             self.assertEqual(msg2.get_payload(decode=True), bytesdata)

    then running:

    ./python ./Tools/scripts/run_tests.py test_email

    results in:

    ======================================================================
    FAIL: test_binary_body_with_encode_noop (test_email_mine.TestMIMEApplication)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "/localdisk/kruppaal/build/Python-3.3.2/Lib/test/test_email/test_email_mine.py", line 1475, in test_binary_body_with_encode_noop
        self.assertEqual(msg2.get_payload(decode=True), bytesdata)
    AssertionError: b'\x0b\n\xfa\xfb\xfc\xfd\xfe\xff' != b'\x0b\xfa\xfb\xfc\xfd\xfe\xff'

    The '\x0b' byte is incorrectly translated to '\x0b\n', i.e., a New Line character is inserted.

    Encoding the bytes array:
    bytes(range(256))

    results output data (MIME Header stripped):

    0000000: 0001 0203 0405 0607 0809 0a0b 0a0c 0a0a ................
    0000010: 0e0f 1011 1213 1415 1617 1819 1a1b 1c0a ................
    0000020: 1d0a 1e0a 1f20 2122 2324 2526 2728 292a ..... !"#$%&'()*
    0000030: 2b2c 2d2e 2f30 3132 3334 3536 3738 393a +,-./0123456789:
    0000040: 3b3c 3d3e 3f40 4142 4344 4546 4748 494a ;<=>?@abcdefghij
    0000050: 4b4c 4d4e 4f50 5152 5354 5556 5758 595a KLMNOPQRSTUVWXYZ
    0000060: 5b5c 5d5e 5f60 6162 6364 6566 6768 696a [\]^_`abcdefghij
    0000070: 6b6c 6d6e 6f70 7172 7374 7576 7778 797a klmnopqrstuvwxyz
    0000080: 7b7c 7d7e 7f80 8182 8384 8586 8788 898a {|}~............
    0000090: 8b8c 8d8e 8f90 9192 9394 9596 9798 999a ................
    00000a0: 9b9c 9d9e 9fa0 a1a2 a3a4 a5a6 a7a8 a9aa ................
    00000b0: abac adae afb0 b1b2 b3b4 b5b6 b7b8 b9ba ................
    00000c0: bbbc bdbe bfc0 c1c2 c3c4 c5c6 c7c8 c9ca ................
    00000d0: cbcc cdce cfd0 d1d2 d3d4 d5d6 d7d8 d9da ................
    00000e0: dbdc ddde dfe0 e1e2 e3e4 e5e6 e7e8 e9ea ................
    00000f0: ebec edee eff0 f1f2 f3f4 f5f6 f7f8 f9fa ................
    0000100: fbfc fdfe ff .....

    That is, a '\n' is inserted after '\x0b', '\x1c', '\x1d', and '\x1e',
    and '\x0d' is replaced by '\n\n'.


    I suspect this is due to the use of self._write_lines(msg._payload) in BytesGenerator._handle_text(); since _write_lines() mangles line endings.

    @AlexanderKruppa AlexanderKruppa mannequin added topic-email type-bug An unexpected behavior, bug, or error labels Sep 11, 2013
    @ghost
    Copy link

    ghost commented Jun 29, 2014

    Confirmed in Python 3.4.1.

    @ghost
    Copy link

    ghost commented Jun 29, 2014

    This patch added special behavior with MIMEApplication and may fix this issue.
    Can be verified with test_email.

    @xZise
    Copy link
    Mannequin

    xZise mannequin commented Oct 4, 2014

    I can confirm this on 3.4.1 and is really annoying. But the patch should set '_is_raw_payload' to False if the payload is set via 'set_payload' (the operations in 'set_raw_payload' need to be switched).

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Sep 11, 2016

    New changeset c0f5702e0f10 by R David Murray in branch '3.5':
    bpo-19003: Only replace \r and/or \n line endings in email.generator.
    https://hg.python.org/cpython/rev/c0f5702e0f10

    New changeset ccad4d142934 by R David Murray in branch 'default':
    Merge: bpo-19003: Only replace \r and/or \n line endings in email.generator.
    https://hg.python.org/cpython/rev/ccad4d142934

    @bitdancer
    Copy link
    Member

    I've fixed this to the extent that it is possible without adding support for the 'binary' CTE. That is, \r, \n, and \r\n are still replaced with the 'correct' line ending characters, which is the correct behavior under the RFCs even for binary data if the CTE is not 'binary'. bpo-18886 covers the enhancement of supporting the binary CTE.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    topic-email type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants