This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author batterseapower
Recipients Xuefer.x, batterseapower, hyeshik.chang, inndy, kennyluck, loewis, rpetrov, vstinner
Date 2021-03-09.15:25:26
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1615303526.68.0.981384649795.issue7856@roundup.psfhosted.org>
In-reply-to
Content
As of Python 3.7.9 this also affects \xf9\xd6 which should be \u7881 in Unicode. This character is the second character of 宏碁 which is the name of the Taiwanese electronics manufacturer Acer.

You can work around the issue using big5hkscs just like with the original \xf9\xd8 problem.

It looks like the F9D6–F9FE characters all come from the Big5-ETen extension (https://en.wikipedia.org/wiki/Big5#ETEN_extensions, https://moztw.org/docs/big5/table/eten.txt) which is so popular that it is a defacto standard. Big5-2003 (mentioned in a comment below) seems to be an extension of Big5-ETen. For what it's worth, whatwg includes these mappings in their own big5 reference tables: https://encoding.spec.whatwg.org/big5.html. 

Unfortunately Big5 is still in common use in Taiwan. It's pretty funny that Python fails to decode Big5 documents containing the name of one of Taiwan's largest multinationals :-)
History
Date User Action Args
2021-03-09 15:25:26batterseapowersetrecipients: + batterseapower, loewis, hyeshik.chang, vstinner, rpetrov, Xuefer.x, kennyluck, inndy
2021-03-09 15:25:26batterseapowersetmessageid: <1615303526.68.0.981384649795.issue7856@roundup.psfhosted.org>
2021-03-09 15:25:26batterseapowerlinkissue7856 messages
2021-03-09 15:25:26batterseapowercreate