This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author darcy.beurle
Recipients barry, darcy.beurle, r.david.murray
Date 2021-02-26.21:50:12
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1614376213.69.0.504043333554.issue43333@roundup.psfhosted.org>
In-reply-to
Content
I have some emails that I'm importing from an XML format according to rfc822. Some of these have some encoding other than ascii. I create the message with the default policy:

message = email.message_from_string(
                        # Extract text from xml
                        message_name.find("property_string").text,
                        policy=email.policy.default)

Then I want to convert this to bytes so I can append it to an IMAP folder using the imap_tools package:

mailbox.append(email.as_bytes(),
               "INBOX",
               dt=None,
               flag_set=(imap_tools.MailMessageFlags.SEEN))

Which then leads to the following output:

line 405, in parse_goldmine_output
    email.as_bytes(),
  File "/usr/lib64/python3.9/email/message.py", line 178, in as_bytes
    g.flatten(self, unixfrom=unixfrom)
  File "/usr/lib64/python3.9/email/generator.py", line 116, in flatten
    self._write(msg)
  File "/usr/lib64/python3.9/email/generator.py", line 181, in _write
    self._dispatch(msg)
  File "/usr/lib64/python3.9/email/generator.py", line 218, in _dispatch
    meth(msg)
  File "/usr/lib64/python3.9/email/generator.py", line 276, in _handle_multipart
    g.flatten(part, unixfrom=False, linesep=self._NL)
  File "/usr/lib64/python3.9/email/generator.py", line 116, in flatten
    self._write(msg)
  File "/usr/lib64/python3.9/email/generator.py", line 181, in _write
    self._dispatch(msg)
  File "/usr/lib64/python3.9/email/generator.py", line 218, in _dispatch
    meth(msg)
  File "/usr/lib64/python3.9/email/generator.py", line 436, in _handle_text
    super(BytesGenerator,self)._handle_text(msg)
  File "/usr/lib64/python3.9/email/generator.py", line 253, in _handle_text
    self._write_lines(payload)
  File "/usr/lib64/python3.9/email/generator.py", line 155, in _write_lines
    self.write(line)
  File "/usr/lib64/python3.9/email/generator.py", line 410, in write
    self._fp.write(s.encode('ascii', 'surrogateescape'))
UnicodeEncodeError: 'ascii' codec can't encode characters in position 41-43: ordinal not in range(128)


If I change the line:

self._fp.write(s.encode('ascii', 'surrogateescape'))

to:

self._fp.write(s.encode('utf8', 'surrogateescape'))

then it writes the email body with the strange characters (same as in the xml). I'm not sure how to proceed. Those emails should be able to be processed, but the bytes writer doesn't seem to inherit the utf8 encoding from anywhere (e.g. if a utf8 policy is used).
History
Date User Action Args
2021-02-26 21:50:13darcy.beurlesetrecipients: + darcy.beurle, barry, r.david.murray
2021-02-26 21:50:13darcy.beurlesetmessageid: <1614376213.69.0.504043333554.issue43333@roundup.psfhosted.org>
2021-02-26 21:50:13darcy.beurlelinkissue43333 messages
2021-02-26 21:50:12darcy.beurlecreate