This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author eryksun
Recipients Paul Monson, eryksun, methane, paul.moore, steve.dower, tim.golden, vstinner, zach.ware
Date 2019-05-04.07:35:12
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1556955312.8.0.837007007732.issue36778@roundup.psfhosted.org>
In-reply-to
Content
> cp65001 is *not* utf-8: Microsoft decided to handle surrogates 
> differently for some reasons.

Do you mean valid UTF-16 surrogate pairs? For example:

    >>> codecs.code_page_encode(65001, '\ud800\udc00')
    (b'\xf0\x90\x80\x80', 2)

PyUnicode_AsUnicodeAndSize is neutral about storing surrogate codes in a 16-bit wchar_t string. In particular, the Python string in this case contains two surrogate codes, but they're passed to WideCharToMultiByte as a UTF-16 surrogate pair for the single character U+10000.

Anyway, it seems to me this issue will be resolved if cp65001.py is rewritten without functools.partial.
History
Date User Action Args
2019-05-04 07:35:12eryksunsetrecipients: + eryksun, paul.moore, vstinner, tim.golden, methane, zach.ware, steve.dower, Paul Monson
2019-05-04 07:35:12eryksunsetmessageid: <1556955312.8.0.837007007732.issue36778@roundup.psfhosted.org>
2019-05-04 07:35:12eryksunlinkissue36778 messages
2019-05-04 07:35:12eryksuncreate