Author lpolzer
Recipients lpolzer
Date 2013-11-20.10:51:42
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1384944703.82.0.967643613195.issue19662@psf.upfronthosting.co.za>
In-reply-to
Content
http://hg.python.org/cpython/file/3.3/Lib/smtpd.py#l289

as of now decodes incoming bytes as UTF-8.

An SMTP server must not attempt to interpret characters beyond ASCII, however. Originally mail servers were not 8-bit clean, meaning they would only guarantee the lower 7 bits of each octet to be preserved.
However even then they were not expected to choke on any input because of attempts to decode it into a specific extended charset. Whenever a mail server does not need to interpret data (like base64-encoded auth information) it is simply left alone and passed through.

I am not aware of the reasons that caused the current state, but to correct this behavior and make it possible to support the 8BITMIME feature I suggest decoding received bytes as latin1, leaving it to the user to reinterpret it as UTF-8 or whatever charset they need. Any other simple extended encoding could be used for this, but latin1 is the default in asynchat.

The documentation should also mention charset handling. I'll be happy to submit a patch for both code and docs.
History
Date User Action Args
2013-11-20 10:51:43lpolzersetrecipients: + lpolzer
2013-11-20 10:51:43lpolzersetmessageid: <1384944703.82.0.967643613195.issue19662@psf.upfronthosting.co.za>
2013-11-20 10:51:43lpolzerlinkissue19662 messages
2013-11-20 10:51:42lpolzercreate