Message 203467 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	lpolzer
Recipients	lpolzer
Date	2013-11-20.10:51:42
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1384944703.82.0.967643613195.issue19662@psf.upfronthosting.co.za>
In-reply-to

Content
http://hg.python.org/cpython/file/3.3/Lib/smtpd.py#l289 as of now decodes incoming bytes as UTF-8. An SMTP server must not attempt to interpret characters beyond ASCII, however. Originally mail servers were not 8-bit clean, meaning they would only guarantee the lower 7 bits of each octet to be preserved. However even then they were not expected to choke on any input because of attempts to decode it into a specific extended charset. Whenever a mail server does not need to interpret data (like base64-encoded auth information) it is simply left alone and passed through. I am not aware of the reasons that caused the current state, but to correct this behavior and make it possible to support the 8BITMIME feature I suggest decoding received bytes as latin1, leaving it to the user to reinterpret it as UTF-8 or whatever charset they need. Any other simple extended encoding could be used for this, but latin1 is the default in asynchat. The documentation should also mention charset handling. I'll be happy to submit a patch for both code and docs.

http://hg.python.org/cpython/file/3.3/Lib/smtpd.py#l289

as of now decodes incoming bytes as UTF-8.

An SMTP server must not attempt to interpret characters beyond ASCII, however. Originally mail servers were not 8-bit clean, meaning they would only guarantee the lower 7 bits of each octet to be preserved.
However even then they were not expected to choke on any input because of attempts to decode it into a specific extended charset. Whenever a mail server does not need to interpret data (like base64-encoded auth information) it is simply left alone and passed through.

I am not aware of the reasons that caused the current state, but to correct this behavior and make it possible to support the 8BITMIME feature I suggest decoding received bytes as latin1, leaving it to the user to reinterpret it as UTF-8 or whatever charset they need. Any other simple extended encoding could be used for this, but latin1 is the default in asynchat.

The documentation should also mention charset handling. I'll be happy to submit a patch for both code and docs.

History
Date	User	Action	Args
2013-11-20 10:51:43	lpolzer	set	recipients: + lpolzer
2013-11-20 10:51:43	lpolzer	set	messageid: <1384944703.82.0.967643613195.issue19662@psf.upfronthosting.co.za>
2013-11-20 10:51:43	lpolzer	link	issue19662 messages
2013-11-20 10:51:42	lpolzer	create