Issue 15016: Add special case for latin messages in email.mime.text

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/59221

classification

Title:	Add special case for latin messages in email.mime.text
Type:	enhancement	Stage:	resolved
Components:	email	Versions:	Python 3.6, Python 3.5

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:		Nosy List:	barry, mitya57, r.david.murray, v+python
Priority:	normal	Keywords:	patch

Created on 2012-06-06 09:13 by mitya57, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
issue_15016_v2.patch	mitya57, 2012-06-06 09:32	PATCH (v2): email: Add special case for latin texts	review

Messages (8)
msg162399 - (view)	Author: Dmitry Shachnev (mitya57) *	Date: 2012-06-06 09:13
(Follow-up to issue 14380) The attached patch makes the email.mime.text.MIMEText constructor use the iso-8859-1 (aka latin-1) encoding for messages where all characters are in range(256). This also makes them use quoted-printable transfer encoding instead of base64. So, the current algorithm of guessing encoding is as follows: - all characters are in range(128) -> encoding is us-ascii - all characters are in range(256) -> encoding is iso-8859-1 (aka latin-1) - else -> encoding is utf-8
msg162400 - (view)	Author: Dmitry Shachnev (mitya57) *	Date: 2012-06-06 09:32
Updated the patch: - Avoid using letter Ш in test, it's better to use chr(256) as the test case; - Updated the comment in MIMEText constructor to reflect the new behaviour.
msg162408 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2012-06-06 13:10
Thanks for the patch. I may not get to this until after the beta (or I might, you never know). Could you submit a contributor agreement please? http://www.python.org/psf/contrib
msg162410 - (view)	Author: Dmitry Shachnev (mitya57) *	Date: 2012-06-06 13:29
Done, sent an e-mail to contributors@python.org.
msg163697 - (view)	Author: Glenn Linderman (v+python) *	Date: 2012-06-24 00:17
Patch is interesting, using an encoder to detect validity. However, it suffers from some performance problems for long text that has large ASCII prefixes. This seems to be an enhancement sort of request rather than a bug... so I wonder why Python 3.2 is listed? And in Python 3.3 with PEP 393 strings the C API to strings provides a quick way to determine the maximum character in the string... although I see nothing in the PEP about how to access that information from Python. If it is available, it could provide a much quicker precheck rather than multiple attempts to encode strings with large ASCII prefixes only to discover that the next to last character is in (128,255) and the last character is > 255 (which would be about the worst case scenario for the algorithm in the patch).
msg163718 - (view)	Author: Dmitry Shachnev (mitya57) *	Date: 2012-06-24 06:00
> This seems to be an enhancement sort of request rather than a bug... so I wonder why Python 3.2 is listed? Fixed. > ... although I see nothing in the PEP about how to access that information from Python. You are right, it seems there is no Python API for that (yet?), so I don't see any better solutions for determining the maximum character for now. Also, note that this algorithm had already been used before my patch.
msg163791 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2012-06-24 14:48
Well, the original change to using utf-8 by default was considered a bug fix. But I suppose you are right that this goes beyond that into enhancement territory. In which case we could wait for an enhancement to the C API to base it on, for which we'd need to open a new issue. On the other hand, the email package already uses the "encode to see if we have ascii" trick elsewhere (though on smaller strings), and the ascii codec is the fastest codec, with latin-1 only slightly slower. The critical difference here, though, is that we end up doing two encoding passes, once to test it and a second time to actually create the message body. The same is true of the ascii case. It should be possible to fix this, by using the encoded string in generating the _payload, short circuiting the set_payload mechanism. That's a somewhat ugly hack, necessitated because of the incomplete conversion of email to a unicode-centric design. I'm working on that :) So, again, we may be waiting on other enhancements, in this case in the email package, to do this fix "right". But it would be worth figuring out how to do it, so that we know what kind of (internal?) API enhancements we want in order to serve this kind of use case.
msg275139 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2016-09-08 20:29
The new email API (which was just made non-provisional) uses a "sniff" technique to decide what CTE to use for text bodies set via set_content. So I consider this done (finally). It does not change MIMEText, which is now the legacy API.

History
Date	User	Action	Args
2022-04-11 14:57:31	admin	set	github: 59221
2016-09-08 20:29:42	r.david.murray	set	status: open -> closed versions: + Python 3.5, Python 3.6, - Python 3.3 messages: + msg275139 resolution: fixed stage: patch review -> resolved
2012-06-24 14:48:06	r.david.murray	set	messages: + msg163791
2012-06-24 06:00:51	mitya57	set	type: behavior -> enhancement messages: + msg163718 versions: - Python 3.2
2012-06-24 00:17:47	v+python	set	nosy: + v+python messages: + msg163697
2012-06-06 13:29:47	mitya57	set	messages: + msg162410
2012-06-06 13:10:45	r.david.murray	set	title: [patch] add special case for latin messages in email.mime.text -> Add special case for latin messages in email.mime.text messages: + msg162408 stage: patch review
2012-06-06 09:34:15	mitya57	set	files: - issue_15016.patch
2012-06-06 09:33:23	mitya57	set	hgrepos: - hgrepo135
2012-06-06 09:32:58	mitya57	set	files: + issue_15016_v2.patch hgrepos: + hgrepo135 messages: + msg162400
2012-06-06 09:14:59	mitya57	set	files: + issue_15016.patch keywords: + patch
2012-06-06 09:13:28	mitya57	create