This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author vstinner
Recipients Paul Monson, eryksun, methane, paul.moore, serhiy.storchaka, steve.dower, tim.golden, vstinner, zach.ware
Date 2019-05-06.15:31:31
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1557156693.18.0.147562274285.issue36778@roundup.psfhosted.org>
In-reply-to
Content
Victor:
> cp65001 is *not* utf-8: Microsoft decided to handle surrogates differently for some reasons.

Eryk:
> Do you mean valid UTF-16 surrogate pairs? (...)

Code page 65001 handles lone surrogate differently on Windows XP and older. It changed in Windows Vista:
https://unicodebook.readthedocs.io/operating_systems.html#encode-and-decode-functions

Steve Dower removed support for Vista from test_codecs.py 3 years ago:

commit f5aba58480bb0dd45181f609487ac2ecfcc98673
Author: Steve Dower <steve.dower@microsoft.com>
Date:   Tue Sep 6 19:42:27 2016 -0700

    Issue #27959: Adds oem encoding, alias ansi to mbcs, move aliasmbcs to codec lookup

Maybe it's time to remove Lib/encodings/cp65001.py and add an alias cp65001 => utf_8 in Lib/encodings/aliases.py? See bpo-32592.
History
Date User Action Args
2019-05-06 15:31:33vstinnersetrecipients: + vstinner, paul.moore, tim.golden, methane, zach.ware, serhiy.storchaka, eryksun, steve.dower, Paul Monson
2019-05-06 15:31:33vstinnersetmessageid: <1557156693.18.0.147562274285.issue36778@roundup.psfhosted.org>
2019-05-06 15:31:33vstinnerlinkissue36778 messages
2019-05-06 15:31:31vstinnercreate