Message341377
> cp65001 is *not* utf-8: Microsoft decided to handle surrogates
> differently for some reasons.
Do you mean valid UTF-16 surrogate pairs? For example:
>>> codecs.code_page_encode(65001, '\ud800\udc00')
(b'\xf0\x90\x80\x80', 2)
PyUnicode_AsUnicodeAndSize is neutral about storing surrogate codes in a 16-bit wchar_t string. In particular, the Python string in this case contains two surrogate codes, but they're passed to WideCharToMultiByte as a UTF-16 surrogate pair for the single character U+10000.
Anyway, it seems to me this issue will be resolved if cp65001.py is rewritten without functools.partial. |
|
Date |
User |
Action |
Args |
2019-05-04 07:35:12 | eryksun | set | recipients:
+ eryksun, paul.moore, vstinner, tim.golden, methane, zach.ware, steve.dower, Paul Monson |
2019-05-04 07:35:12 | eryksun | set | messageid: <1556955312.8.0.837007007732.issue36778@roundup.psfhosted.org> |
2019-05-04 07:35:12 | eryksun | link | issue36778 messages |
2019-05-04 07:35:12 | eryksun | create | |
|