New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
_Py_DecodeUTF8Ex() creates surrogate pairs on Windows #78109
Comments
_Py_DecodeUTF8Ex() creates surrogate pairs with 16-bit wchar_t (on Windows), whereas input bytes should be escaped. I'm quite sure that it's a bug. |
Extract of _Py_DecodeUTF8Ex() code, there is an explicit "write a surrogate pair" comment: #if SIZEOF_WCHAR_T == 4
ch = ucs4lib_utf8_decode(&s, e, (Py_UCS4 *)unicode, &outpos);
#else
ch = ucs2lib_utf8_decode(&s, e, (Py_UCS2 *)unicode, &outpos);
#endif
if (ch > 0xFF) {
#if SIZEOF_WCHAR_T == 4
Py_UNREACHABLE();
#else
assert(ch > 0xFFFF && ch <= MAX_UNICODE);
/* write a surrogate pair */
unicode[outpos++] = (wchar_t)Py_UNICODE_HIGH_SURROGATE(ch);
unicode[outpos++] = (wchar_t)Py_UNICODE_LOW_SURROGATE(ch);
#endif
} |
Could you show an example please? |
I saw an issue when reading the code, I didn't try to trigger the issue using real code yet. |
I don't see anything wrong. |
I write a C function to test _Py_DecodeUTF8Ex():
Ok, I just misunderstood the code: the decoder is fine! |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: