Message 340931 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	r.david.murray
Recipients	barry, immerrr again, jaraco, jayvdb, r.david.murray, tanzer@swing.co.at
Date	2019-04-26.16:22:30
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1556295750.22.0.0231104181975.issue25545@roundup.psfhosted.org>
In-reply-to

Content
This is one of the infelicities of the translation of the old API to python3: 'get_payload(decode=True)' actually means 'give me the bytes version of this payload", which in this case is the utf-8, which is what you got. get_payload() means "give me the payload as a string without doing CTE decoding". In a sort of accident-of-translation this turns out to mean "give me the unicode" in this particular case. If the payload had been base64 encoded, you'd have gotten a unicode string containing the base64 characters. Which I grant you is all very confusing. For a more consistent API, use the new one: >>> import email.policy >>> m = email.message_from_bytes(msg_bytes, policy=email.policy.default) >>> bytes(m) b'MIME-Version: 1.0\nContent-Type: text/plain;\n charset=utf-8\nContent-Transfer-Encoding: 8bit\nContent-Disposition: attachment;\n filename="camper_store.csv"\n\nBeyo\xc4\x9flu-\xc4\xb0st' >>> m.get_content() 'Beyoğlu-İst' Here we don't even pretend that you have any use for the encoded version, either CTE encoding or binary encoding: get_content gives you the "fully decoded" payload (decoded from CTE and decoded to unicode).

This is one of the infelicities of the translation of the old API to python3: 'get_payload(decode=True)' actually means 'give me the bytes version of this payload", which in this case is the utf-8, which is what you got.  get_payload() means "give me the payload as a string without doing CTE decoding".    In a sort of accident-of-translation this turns out to mean "give me the unicode" in this particular case.  If the payload had been base64 encoded, you'd have gotten a unicode string containing the base64 characters.

Which I grant you is all very confusing.

For a more consistent API, use the new one:

>>> import email.policy
>>> m = email.message_from_bytes(msg_bytes, policy=email.policy.default)
>>> bytes(m)
b'MIME-Version: 1.0\nContent-Type: text/plain;\n charset=utf-8\nContent-Transfer-Encoding: 8bit\nContent-Disposition: attachment;\n filename="camper_store.csv"\n\nBeyo\xc4\x9flu-\xc4\xb0st'

>>> m.get_content()
'Beyoğlu-İst'

Here we don't even pretend that you have any use for the encoded version, either CTE encoding or binary encoding: get_content gives you the "fully decoded" payload (decoded from CTE *and* decoded to unicode).

History
Date	User	Action	Args
2019-04-26 16:22:30	r.david.murray	set	recipients: + r.david.murray, barry, jaraco, jayvdb, tanzer@swing.co.at, immerrr again
2019-04-26 16:22:30	r.david.murray	set	messageid: <1556295750.22.0.0231104181975.issue25545@roundup.psfhosted.org>
2019-04-26 16:22:30	r.david.murray	link	issue25545 messages
2019-04-26 16:22:30	r.david.murray	create