Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add special case for latin messages in email.mime.text #59221

Closed
mitya57 mannequin opened this issue Jun 6, 2012 · 8 comments
Closed

Add special case for latin messages in email.mime.text #59221

mitya57 mannequin opened this issue Jun 6, 2012 · 8 comments
Labels
topic-email type-feature A feature request or enhancement

Comments

@mitya57
Copy link
Mannequin

mitya57 mannequin commented Jun 6, 2012

BPO 15016
Nosy @warsaw, @bitdancer, @mitya57
Files
  • issue_15016_v2.patch: PATCH (v2): email: Add special case for latin texts
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2016-09-08.20:29:42.915>
    created_at = <Date 2012-06-06.09:13:28.180>
    labels = ['type-feature', 'expert-email']
    title = 'Add special case for latin messages in email.mime.text'
    updated_at = <Date 2016-09-08.20:29:42.912>
    user = 'https://github.com/mitya57'

    bugs.python.org fields:

    activity = <Date 2016-09-08.20:29:42.912>
    actor = 'r.david.murray'
    assignee = 'none'
    closed = True
    closed_date = <Date 2016-09-08.20:29:42.915>
    closer = 'r.david.murray'
    components = ['email']
    creation = <Date 2012-06-06.09:13:28.180>
    creator = 'mitya57'
    dependencies = []
    files = ['25846']
    hgrepos = []
    issue_num = 15016
    keywords = ['patch']
    message_count = 8.0
    messages = ['162399', '162400', '162408', '162410', '163697', '163718', '163791', '275139']
    nosy_count = 4.0
    nosy_names = ['barry', 'v+python', 'r.david.murray', 'mitya57']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue15016'
    versions = ['Python 3.5', 'Python 3.6']

    @mitya57
    Copy link
    Mannequin Author

    mitya57 mannequin commented Jun 6, 2012

    (Follow-up to bpo-14380)

    The attached patch makes the email.mime.text.MIMEText constructor use the iso-8859-1 (aka latin-1) encoding for messages where all characters are in range(256). This also makes them use quoted-printable transfer encoding instead of base64.

    So, the current algorithm of guessing encoding is as follows:

    • all characters are in range(128) -> encoding is us-ascii
    • all characters are in range(256) -> encoding is iso-8859-1 (aka latin-1)
    • else -> encoding is utf-8

    @mitya57 mitya57 mannequin added type-bug An unexpected behavior, bug, or error topic-email labels Jun 6, 2012
    @mitya57
    Copy link
    Mannequin Author

    mitya57 mannequin commented Jun 6, 2012

    Updated the patch:

    • Avoid using letter Ш in test, it's better to use chr(256) as the test case;
    • Updated the comment in MIMEText constructor to reflect the new behaviour.

    @bitdancer
    Copy link
    Member

    Thanks for the patch. I may not get to this until after the beta (or I might, you never know).

    Could you submit a contributor agreement please? http://www.python.org/psf/contrib

    @bitdancer bitdancer changed the title [patch] add special case for latin messages in email.mime.text Add special case for latin messages in email.mime.text Jun 6, 2012
    @mitya57
    Copy link
    Mannequin Author

    mitya57 mannequin commented Jun 6, 2012

    Done, sent an e-mail to contributors@python.org.

    @vpython
    Copy link
    Mannequin

    vpython mannequin commented Jun 24, 2012

    Patch is interesting, using an encoder to detect validity. However, it suffers from some performance problems for long text that has large ASCII prefixes.

    This seems to be an enhancement sort of request rather than a bug... so I wonder why Python 3.2 is listed?

    And in Python 3.3 with PEP-393 strings the C API to strings provides a quick way to determine the maximum character in the string... although I see nothing in the PEP about how to access that information from Python. If it is available, it could provide a much quicker precheck rather than multiple attempts to encode strings with large ASCII prefixes only to discover that the next to last character is in (128,255) and the last character is > 255 (which would be about the worst case scenario for the algorithm in the patch).

    @mitya57
    Copy link
    Mannequin Author

    mitya57 mannequin commented Jun 24, 2012

    This seems to be an enhancement sort of request rather than a bug... so I wonder why Python 3.2 is listed?

    Fixed.

    ... although I see nothing in the PEP about how to access that information from Python.

    You are right, it seems there is no Python API for that (yet?), so I don't see any better solutions for determining the maximum character for now. Also, note that this algorithm had already been used before my patch.

    @mitya57 mitya57 mannequin added type-feature A feature request or enhancement and removed type-bug An unexpected behavior, bug, or error labels Jun 24, 2012
    @bitdancer
    Copy link
    Member

    Well, the original change to using utf-8 by default was considered a bug fix. But I suppose you are right that this goes beyond that into enhancement territory. In which case we could wait for an enhancement to the C API to base it on, for which we'd need to open a new issue.

    On the other hand, the email package already uses the "encode to see if we have ascii" trick elsewhere (though on smaller strings), and the ascii codec is the fastest codec, with latin-1 only slightly slower.

    The critical difference here, though, is that we end up doing two encoding passes, once to test it and a second time to actually create the message body. The same is true of the ascii case. It should be possible to fix this, by using the encoded string in generating the _payload, short circuiting the set_payload mechanism. That's a somewhat ugly hack, necessitated because of the incomplete conversion of email to a unicode-centric design. I'm working on that :)

    So, again, we may be waiting on other enhancements, in this case in the email package, to do this fix "right". But it would be worth figuring out *how* to do it, so that we know what kind of (internal?) API enhancements we want in order to serve this kind of use case.

    @bitdancer
    Copy link
    Member

    The new email API (which was just made non-provisional) uses a "sniff" technique to decide what CTE to use for text bodies set via set_content. So I consider this done (finally). It does not change MIMEText, which is now the legacy API.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    topic-email type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    1 participant