Issue 34459: email.contentmanager should use IANA encoding

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/78640

classification

Title:	email.contentmanager should use IANA encoding
Type:		Stage:
Components:	email	Versions:	Python 3.7

process

Status:	open	Resolution:
Dependencies:		Superseder:
Assigned To:		Nosy List:	barry, era, r.david.murray
Priority:	normal	Keywords:

Created on 2018-08-22 09:00 by era, last changed 2022-04-11 14:59 by admin.

Messages (2)
msg323869 - (view)	Author: (era)	Date: 2018-08-22 09:00
https://github.com/python/cpython/blob/3.7/Lib/email/contentmanager.py#L64 currently contains the following code: def get_text_content(msg, errors='replace'): content = msg.get_payload(decode=True) charset = msg.get_param('charset', 'ASCII') return content.decode(charset, errors=errors) This breaks when the IANA character set is not identical to the Python encoding name. For example, pass it a message with Content-type: text/plain; charset=cp-850 This breaks for two separate reasons (and I will report two separate bugs); the IANA character-set label should be looked up and converted to a Python codec name (that's this bug) and the character-set alias 'cp-850' is not defined in the lookup table in the place. There are probably other places in contentmanager.py where a similar mapping should take place. I do not have a proper patch, but in general outline, the fix would look like + import email.charset + def get_text_content(msg, errors='replace'): content = msg.get_payload(decode=True) charset = msg.get_param('charset', 'ASCII') - return content.decode(charset, errors=errors) + encoding = Charset(charset).output_charset() + return content.decode(encoding, errors=errors) This was discovered in this Stack Overflow post: https://stackoverflow.com/a/51961225/874188
msg323871 - (view)	Author: (era)	Date: 2018-08-22 09:18
https://bugs.python.org/issue34460 now requests the addition of "cp-850" and "windows-784" as charset aliases in the email.charset module.

History
Date	User	Action	Args
2022-04-11 14:59:05	admin	set	github: 78640
2018-08-22 09:18:02	era	set	messages: + msg323871
2018-08-22 09:00:20	era	create