> cp65001 is *not* utf-8: Microsoft decided to handle surrogates differently for some reasons.

> Do you mean valid UTF-16 surrogate pairs? (...)

Code page 65001 handles lone surrogate differently on Windows XP and older. It changed in Windows Vista:

Steve Dower removed support for Vista from 3 years ago:

commit f5aba58480bb0dd45181f609487ac2ecfcc98673
Author: Steve Dower <>
Date:   Tue Sep 6 19:42:27 2016 -0700

    Issue #27959: Adds oem encoding, alias ansi to mbcs, move aliasmbcs to codec lookup

Maybe it's time to remove Lib/encodings/ and add an alias cp65001 => utf_8 in Lib/encodings/ See bpo-32592.
