Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicode names break email header #78605

Closed
jenstroeger mannequin opened this issue Aug 17, 2018 · 13 comments
Closed

Unicode names break email header #78605

jenstroeger mannequin opened this issue Aug 17, 2018 · 13 comments
Labels
3.7 (EOL) end of life 3.8 only security fixes topic-email type-bug An unexpected behavior, bug, or error

Comments

@jenstroeger
Copy link
Mannequin

jenstroeger mannequin commented Aug 17, 2018

BPO 34424
Nosy @warsaw, @bitdancer, @jenstroeger, @csabella, @miss-islington, @mhthies, @Celelibi
PRs
  • bpo-34424: Handle different policy.linesep lengths correctly. #8803
  • [3.7] bpo-34424: Handle different policy.linesep lengths correctly. (GH-8803) #13302
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2019-05-14.02:04:55.615>
    created_at = <Date 2018-08-17.23:02:02.421>
    labels = ['3.8', 'type-bug', '3.7', 'expert-email']
    title = 'Unicode names break email header'
    updated_at = <Date 2019-05-14.14:36:45.980>
    user = 'https://github.com/jenstroeger'

    bugs.python.org fields:

    activity = <Date 2019-05-14.14:36:45.980>
    actor = 'r.david.murray'
    assignee = 'none'
    closed = True
    closed_date = <Date 2019-05-14.02:04:55.615>
    closer = 'cheryl.sabella'
    components = ['email']
    creation = <Date 2018-08-17.23:02:02.421>
    creator = '_savage'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 34424
    keywords = ['patch']
    message_count = 13.0
    messages = ['323686', '323691', '328381', '328382', '328425', '334096', '342315', '342388', '342409', '342411', '342416', '342418', '342469']
    nosy_count = 7.0
    nosy_names = ['barry', 'r.david.murray', '_savage', 'cheryl.sabella', 'miss-islington', 'michael.thies', 'Celelibi']
    pr_nums = ['8803', '13302']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue34424'
    versions = ['Python 3.7', 'Python 3.8']

    @jenstroeger
    Copy link
    Mannequin Author

    jenstroeger mannequin commented Aug 17, 2018

    See also this comment and ensuing conversation: https://bugs.python.org/issue24218?#msg322761

    Consider an email message with the following:

    message = EmailMessage()
    message["From"] = Address(addr_spec="bar@foo.com", display_name="Jens Troeger")
    message["To"] = Address(addr_spec="foo@bar.com", display_name="Martín Córdoba")

    It’s important here that the email itself is ascii encodable, but the names are not. Flattening the object (https://github.com/python/cpython/blob/master/Lib/smtplib.py#L964) incorrectly inserts multiple linefeeds, thus breaking the email header, thus mangling the entire email:

    flatmsg: b'From: Jens Troeger <bar@foo.com>\r\nTo: Fernando =?utf-8?q?Mart=C3=ADn_C=C3=B3rdoba?= <foo@bar.com>\r\r\r\r\r\nSubject:\r\n Confirmation: …\r\n…'

    After an initial investigation into the BytesGenerator (used to flatten an EmailMessage object), here is what’s happening.

    Flattening the body and attachments of the EmailMessage object works, and eventually _write_headers() is called to flatten the headers which happens entry by entry (https://github.com/python/cpython/blob/master/Lib/email/generator.py#L417-L418). Flattening a header entry is a recursive process over the parse tree of the entry, which builds the flattened and encoded final string by descending into the parse tree and encoding & concatenating the individual “parts” (tokens of the header entry).

    Given the parse tree for a header entry like "Martín Córdoba <foo@bar.com>" eventually results in the correct flattened string:

    '=?utf-8?q?Mart=C3=ADn_C=C3=B3rdoba?= <foo@bar.com>\r\n'
    

    at the bottom of the recursion for this “Mailbox” part. The recursive callstack is then:

    _refold_parse_tree _header_value_parser.py:2687
    fold [Mailbox] _header_value_parser.py:144
    _refold_parse_tree _header_value_parser.py:2630
    fold [Address] _header_value_parser.py:144
    _refold_parse_tree _header_value_parser.py:2630
    fold [AddressList] _header_value_parser.py:144
    _refold_parse_tree _header_value_parser.py:2630
    fold [Header] _header_value_parser.py:144
    fold [_UniqueAddressHeader] headerregistry.py:258
    _fold [EmailPolicy] policy.py:205
    fold_binary [EmailPolicy] policy.py:199
    _write_headers [BytesGenerator] generator.py:418
    _write [BytesGenerator] generator.py:195
    

    The problem now arises from the interplay of

        # https://github.com/python/cpython/blob/master/Lib/email/_header_value_parser.py#L2629
        encoded_part = part.fold(policy=policy)[:-1] # strip nl

    which strips the '\n' from the returned string, and

    # https://github.com/python/cpython/blob/master/Lib/email/_header_value_parser.py#L2686
    return policy.linesep.join(lines) + policy.linesep
    

    which adds the policy’s line separation string linesep="\r\n" to the end of the flattened string upon unrolling the recursion.

    I am not sure about a proper fix here, but considering that the linesep policy can be any string length (in this case len("\r\n") == 2) a fixed truncation of one character [:-1] seems wrong. Instead, using:

        encoded_part = part.fold(policy=policy)[:-len(policy.linesep)] # strip nl

    seems to work for entries with and without Unicode characters in their display names.

    @jenstroeger jenstroeger mannequin added 3.7 (EOL) end of life topic-email type-bug An unexpected behavior, bug, or error labels Aug 17, 2018
    @jenstroeger
    Copy link
    Mannequin Author

    jenstroeger mannequin commented Aug 18, 2018

    Pull request #8803

    @bitdancer
    Copy link
    Member

    I've requested some small changes on the PR. If Jens doesn't respond in another week or so someone else could pick it up.

    @bitdancer
    Copy link
    Member

    Michael, if you could check if Jens patch fixes your problem I would appreciate it.

    @mhthies
    Copy link
    Mannequin

    mhthies mannequin commented Oct 25, 2018

    Thanks for pointing me to this issue. :)

    Michael, if you could check if Jens patch fixes your problem I would appreciate it.

    Jens PR does exactly, what I proposed in bpo-35057, so it fixes my problem, indeed.

    @jenstroeger
    Copy link
    Mannequin Author

    jenstroeger mannequin commented Jan 20, 2019

    Can somebody please review and merge #8803 ? I am still waiting for this fix the become mainstream.

    @csabella
    Copy link
    Contributor

    This looks like it was just pending final approval after changes. Please let me know if I can help move it along. Thanks!

    @csabella csabella added 3.8 only security fixes and removed 3.7 (EOL) end of life labels May 13, 2019
    @jenstroeger
    Copy link
    Mannequin Author

    jenstroeger mannequin commented May 13, 2019

    Cheryl, if you can find somebody to approve and merge this fix, that would be greatly appreciated! Anything I can do, please let me know.

    @bitdancer
    Copy link
    Member

    New changeset 45b2f88 by R. David Murray (Jens Troeger) in branch 'master':
    bpo-34424: Handle different policy.linesep lengths correctly. (bpo-8803)
    45b2f88

    @bitdancer
    Copy link
    Member

    Approved and merged. Cheryl, can you shepherd this through the backport process, please? I'm contributing infrequently enough that I'm not even sure which version we are on :)

    @csabella
    Copy link
    Contributor

    David,

    Thank you for the review and merging into master! I'd be glad to backport this. I believe the only backport would be to 3.7 (master is currently 3.8), but please let me know if it would need to go into a security branch.

    @miss-islington
    Copy link
    Contributor

    New changeset c0abd0c by Miss Islington (bot) in branch '3.7':
    bpo-34424: Handle different policy.linesep lengths correctly. (GH-8803)
    c0abd0c

    @csabella csabella added the 3.7 (EOL) end of life label May 14, 2019
    @bitdancer
    Copy link
    Member

    Thank you. I don't believe this is a security issue.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life 3.8 only security fixes topic-email type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants