This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author dmaurer
Recipients barry, dmaurer, r.david.murray
Date 2020-07-15.19:18:03
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1594840683.21.0.418149762397.issue41307@roundup.psfhosted.org>
In-reply-to
Content
In the transscript below, "ms" and "mb" should be equivalent:

>>> from email import message_from_string, message_from_bytes
>>> mt = """\
... Mime-Version: 1.0
... Content-Type: text/plain; charset=UTF-8
... Content-Transfer-Encoding: 8bit
... 
... ä
... """
>>> ms = message_from_string(mt)
>>> mb = message_from_bytes(mt.encode("UTF-8"))

But "mb.as_bytes" succeeds while "ms.as_bytes" raises a "UnicodeEncodeError":

>>> mb.as_bytes()
b'Mime-Version: 1.0\nContent-Type: text/plain; charset=UTF-8\nContent-Transfer-Encoding: 8bit\n\n\xc3\xa4\n'
>>> ms.as_bytes()
Traceback (most recent call last):
...
  File "/usr/local/lib/python3.9/email/generator.py", line 155, in _write_lines
    self.write(line)
  File "/usr/local/lib/python3.9/email/generator.py", line 406, in write
    self._fp.write(s.encode('ascii', 'surrogateescape'))
UnicodeEncodeError: 'ascii' codec can't encode character '\xe4' in position 0: ordinal not in range(128)

Apparently, the "as_bytes" ignores the "charset" parameter from the "Content-Type" header (it should use "utf-8", not "ascii" for the encoding).
History
Date User Action Args
2020-07-15 19:18:03dmaurersetrecipients: + dmaurer, barry, r.david.murray
2020-07-15 19:18:03dmaurersetmessageid: <1594840683.21.0.418149762397.issue41307@roundup.psfhosted.org>
2020-07-15 19:18:03dmaurerlinkissue41307 messages
2020-07-15 19:18:03dmaurercreate