Message323869
https://github.com/python/cpython/blob/3.7/Lib/email/contentmanager.py#L64 currently contains the following code:
def get_text_content(msg, errors='replace'):
content = msg.get_payload(decode=True)
charset = msg.get_param('charset', 'ASCII')
return content.decode(charset, errors=errors)
This breaks when the IANA character set is not identical to the Python encoding name. For example, pass it a message with
Content-type: text/plain; charset=cp-850
This breaks for two separate reasons (and I will report two separate bugs); the IANA character-set label should be looked up and converted to a Python codec name (that's this bug) and the character-set alias 'cp-850' is not defined in the lookup table in the place.
There are probably other places in contentmanager.py where a similar mapping should take place.
I do not have a proper patch, but in general outline, the fix would look like
+ import email.charset
+
def get_text_content(msg, errors='replace'):
content = msg.get_payload(decode=True)
charset = msg.get_param('charset', 'ASCII')
- return content.decode(charset, errors=errors)
+ encoding = Charset(charset).output_charset()
+ return content.decode(encoding, errors=errors)
This was discovered in this Stack Overflow post: https://stackoverflow.com/a/51961225/874188 |
|
Date |
User |
Action |
Args |
2018-08-22 09:00:20 | era | set | recipients:
+ era, barry, r.david.murray |
2018-08-22 09:00:20 | era | set | messageid: <1534928420.28.0.56676864532.issue34459@psf.upfronthosting.co.za> |
2018-08-22 09:00:20 | era | link | issue34459 messages |
2018-08-22 09:00:19 | era | create | |
|