Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Also support SMTPUTF8 in smtplib's send_message method. #68406

Closed
bitdancer opened this issue May 17, 2015 · 17 comments
Closed

Also support SMTPUTF8 in smtplib's send_message method. #68406

bitdancer opened this issue May 17, 2015 · 17 comments
Labels
topic-email type-feature A feature request or enhancement

Comments

@bitdancer
Copy link
Member

BPO 24218
Nosy @warsaw, @bitdancer, @soltysh, @matrixise, @jenstroeger
Files
  • send_message_smtputf8.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2015-05-17.23:45:06.566>
    created_at = <Date 2015-05-17.16:21:59.940>
    labels = ['type-feature', 'expert-email']
    title = "Also support SMTPUTF8 in smtplib's send_message method."
    updated_at = <Date 2018-08-17.23:03:08.448>
    user = 'https://github.com/bitdancer'

    bugs.python.org fields:

    activity = <Date 2018-08-17.23:03:08.448>
    actor = '_savage'
    assignee = 'none'
    closed = True
    closed_date = <Date 2015-05-17.23:45:06.566>
    closer = 'r.david.murray'
    components = ['email']
    creation = <Date 2015-05-17.16:21:59.940>
    creator = 'r.david.murray'
    dependencies = []
    files = ['39409']
    hgrepos = []
    issue_num = 24218
    keywords = ['patch']
    message_count = 17.0
    messages = ['243413', '243425', '243426', '243433', '243435', '322761', '322767', '322772', '322773', '322789', '323044', '323216', '323559', '323565', '323634', '323662', '323687']
    nosy_count = 6.0
    nosy_names = ['barry', 'r.david.murray', 'python-dev', 'maciej.szulik', 'matrixise', '_savage']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue24218'
    versions = ['Python 3.5']

    @bitdancer
    Copy link
    Member Author

    Now that I've committed bpo-24211, we can also add SMTPUTF8 support to smptlib's send_message command. See attached patch.

    @bitdancer bitdancer added type-feature A feature request or enhancement topic-email labels May 17, 2015
    @soltysh
    Copy link

    soltysh commented May 17, 2015

    David one small comment regarding typo in smtplib.py, but most importantly I'd suggest adding additional test case to cover the path (the newly added one) where you get a UnicodeEncodeError upon encoding from or to with UTF8 and then failing to find SMTPUTF8 on the server side. I see we already have test case to cover SMTPNotSupportedError but this covers just the case where the server does not have SMTPUTF8.

    @bitdancer
    Copy link
    Member Author

    Oh, right, that's what I get for doing this at the end of a long chain of patch reviews :). I added that code after I'd written the test and forgot to go back and test it.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented May 17, 2015

    New changeset 30795a477f85 by R David Murray in branch 'default':
    bpo-24218: Add SMTPUTF8 support to send_message.
    https://hg.python.org/cpython/rev/30795a477f85

    @bitdancer
    Copy link
    Member Author

    Thanks Maciej.

    @jenstroeger
    Copy link
    Mannequin

    jenstroeger mannequin commented Jul 31, 2018

    I was about to open an issue when I found this one.

    Consider an email message with the following:

    message = EmailMessage()
    message["From"] = Address(addr_spec="bar@foo.com", display_name="Jens Troeger")
    message["To"] = Address(addr_spec="foo@bar.com", display_name="Martín Córdoba")

    It’s important here that the email itself is ascii encodable, but the names are not. With that combination, send_message() falsely assumes plain text addresses (see https://github.com/python/cpython/blob/master/Lib/smtplib.py#L949 where it checks only email addresses, not display names!) and therefore the international flag stays False.

    As a result of that, flattening the email object (https://github.com/python/cpython/blob/master/Lib/smtplib.py#L964) incorrectly inserts multiple linefeeds, thus breaking the email header, thus mangling the entire email:

    flatmsg: b'From: Jens Troeger <jens@talaera.com>\r\nTo: Fernando =?utf-8?q?Mart=C3=ADn_C=C3=B3rdoba?= <foo@bar.com>\r\r\r\r\r\nSubject:\r\n Confirmation: …\r\n…'

    I think a proper fix would be in line 949, where email addresses and display names should be checked for encoding.

    The comment to that function should also be adjusted to mention display names?

    Note also that the attached patch does not test the above scenario, and should probably be extended as well.

    @jenstroeger
    Copy link
    Mannequin

    jenstroeger mannequin commented Jul 31, 2018

    (continuing the previous message msg322761)

    …unless the addresses should be checked separately from the display names, in which case the BytesGenerator’s flatten() function should be fixed.

    Without reading the RFC, please let me know how to continue from here.

    @bitdancer
    Copy link
    Member Author

    Well, posting on a closed issue is generally not the best way :)

    The current behavior with regards to the SMTPUTF8 flag is correct (it only matters for *addresses*, display names can already be transmitted if they contain non-ascii using non SMTPUTF8 methods).

    The multiple carriage returns is a bug, and there is an open issue for it, though I'm not finding it at the moment.

    @matrixise
    Copy link
    Member

    Hi David,

    What is the related issue with the new lines?

    On 31 Jul 2018, at 15:18, R. David Murray <report@bugs.python.org> wrote:

    R. David Murray <rdmurray@bitdance.com> added the comment:

    Well, posting on a closed issue is generally not the best way :)

    The current behavior with regards to the SMTPUTF8 flag is correct (it only matters for *addresses*, display names can already be transmitted if they contain non-ascii using non SMTPUTF8 methods).

    The multiple carriage returns is a bug, and there is an open issue for it, though I'm not finding it at the moment.

    ----------


    Python tracker <report@bugs.python.org>
    <https://bugs.python.org/issue24218\>



    Python-bugs-list mailing list
    Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/stephane%40wirtel.be

    @jenstroeger
    Copy link
    Mannequin

    jenstroeger mannequin commented Jul 31, 2018

    Well, posting on a closed issue is generally not the best way :)

    Fair enough ;)

    The multiple carriage returns is a bug, and there is an open issue for it, though I'm not finding it at the moment.

    Oh good, yes that should be fixed!

    My current workaround is setting international = True always, which prevents the multiple CRs. Not pretty but it works…

    @jenstroeger
    Copy link
    Mannequin

    jenstroeger mannequin commented Aug 3, 2018

    So that’s interesting. I thought that setting international = True (see line https://github.com/python/cpython/blob/master/Lib/smtplib.py#L947) would be a neat workaround, but the opposite.

    When delivering those emails to Gmail I started seeing

    Failed to send email: (555, b'5.5.2 Syntax error, goodbye. s53-v6sm1864855qts.5 - gsmtp', 'foo@bar.com')
    

    and it turns out (according to the IETF message linter, https://tools.ietf.org/tools/msglint/) that:

    -----------
    UNKNOWN: unknown header 'User-Agent' at line 4
    ERROR: missing mandatory header 'date' lines 1-7
    ERROR: missing mandatory header 'return-path' lines 1-7
    OK: found part text/plain line 9
    WARNING: line 13 too long (109 chars); text/plain shouldn't need folding (RFC
    2046-4.1.1)
    WARNING: line 15 too long (124 chars); text/plain shouldn't need folding (RFC
    2046-4.1.1)
    WARNING: Character set mislabelled as 'utf-8' when 'us-ascii' suffices, body
    part ending line 22
    -----------

    It seems that now “Date” and “Return-Path” header entries are missing when the email is generated.

    I reverted the initial change. Any updates on the multiple CR problem when flattening?

    @jenstroeger
    Copy link
    Mannequin

    jenstroeger mannequin commented Aug 6, 2018

    David, I tried to find the mentioned '\r\r…\n' issue but I could not find it here. However, from an initial investigation into the BytesGenerator, here is what’s happening.

    Flattening the body and attachments of the EmailMessage object works, and eventually _write_headers() is called to flatten the headers which happens entry by entry (https://github.com/python/cpython/blob/master/Lib/email/generator.py#L417-L418). Flattening a header entry is a recursive process over the parse tree of the entry, which builds the flattened and encoded final string by descending into the parse tree and encoding & concatenating the individual “parts” (tokens of the header entry).

    Given the parse tree for a header entry like "Martín Córdoba <foo@bar.com>" eventually results in the correct flattened string:

    '=?utf-8?q?Mart=C3=ADn_C=C3=B3rdoba?= <foo@bar.com>\r\n'
    

    at the bottom of the recursion for this “Mailbox” part. The recursive callstack is then:

    _refold_parse_tree _header_value_parser.py:2687
    fold [Mailbox] _header_value_parser.py:144
    _refold_parse_tree _header_value_parser.py:2630
    fold [Address] _header_value_parser.py:144
    _refold_parse_tree _header_value_parser.py:2630
    fold [AddressList] _header_value_parser.py:144
    _refold_parse_tree _header_value_parser.py:2630
    fold [Header] _header_value_parser.py:144
    fold [_UniqueAddressHeader] headerregistry.py:258
    _fold [EmailPolicy] policy.py:205
    fold_binary [EmailPolicy] policy.py:199
    _write_headers [BytesGenerator] generator.py:418
    _write [BytesGenerator] generator.py:195
    

    The problem now arises from the interplay of

        # https://github.com/python/cpython/blob/master/Lib/email/_header_value_parser.py#L2629
        encoded_part = part.fold(policy=policy)[:-1] # strip nl

    which strips the '\n' from the returned string, and

    # https://github.com/python/cpython/blob/master/Lib/email/_header_value_parser.py#L2686
    return policy.linesep.join(lines) + policy.linesep
    

    which adds the policy’s line separation string linesep="\r\n" to the end of the flattened string upon unrolling the recursion.

    I am not sure about a proper fix here, but considering that the linesep policy can be any string length (in this case len("\r\n") == 2) a fixed truncation of one character [:-1] seems wrong. Instead, using:

        encoded_part = part.fold(policy=policy)[:-len(policy.linesep)] # strip nl

    seems to work for entries with and without Unicode characters in their display names.

    David, please advise on how to proceed from here.

    @jenstroeger
    Copy link
    Mannequin

    jenstroeger mannequin commented Aug 15, 2018

    @david, any thoughts on this?

    @bitdancer
    Copy link
    Member Author

    Sorry, I haven't had time to look at it yet :( Not sure when I will, things are more than a bit busy for me right now. Ping me again in two weeks if I haven't responded, please. The proposed solution sounds reasonable, though, so you could also propose a PR with tests and that would take me less time to review and I might get to it sooner.

    @jenstroeger
    Copy link
    Mannequin

    jenstroeger mannequin commented Aug 17, 2018

    Thanks David: PR on Github (which is R/O) or where should I submit to?

    @bitdancer
    Copy link
    Member Author

    check out https://devguide.python.org. (Basically, banch and generate a PR on github). And please open a new issue for this.

    @jenstroeger
    Copy link
    Mannequin

    jenstroeger mannequin commented Aug 17, 2018

    New issue: https://bugs.python.org/issue34424

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    topic-email type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants