Message 358662 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	serhiy.storchaka
Recipients	methane, serhiy.storchaka, vstinner
Date	2019-12-19.09:43:12
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1576748593.06.0.128693518249.issue39087@roundup.psfhosted.org>
In-reply-to

Content
Do you mean some concrete code? Several times I wished similar feature. To get a UTF-8 cache if it exists and encode to UTF-8 without creating a cache otherwise. The private _PyUnicode_UTF8() macro could help if ((s = _PyUnicode_UTF8(str))) { size = _PyUnicode_UTF8_LENGTH(str); tmpbytes = NULL; } else { tmpbytes = _PyUnicode_AsUTF8String(str, "replace"); s = PyBytes_AS_STRING(tmpbytes); size = PyBytes_GET_SIZE(tmpbytes); } but it is not even available outside of unicodeobject.c. PyUnicode_BorrowUTF8() looks too complex for the public API. I am not sure that it will be easy to implement it in PyPy. It also does not cover all use cases -- sometimes you want to convert to UTF-8 but does not use any memory allocation at all (either use an existing buffer or raise an error if there is no cached UTF-8 or the string is not ASCII).

Do you mean some concrete code? Several times I wished similar feature. To get a UTF-8 cache if it exists and encode to UTF-8 without creating a cache otherwise. 

The private _PyUnicode_UTF8() macro could help

if ((s = _PyUnicode_UTF8(str))) {
    size = _PyUnicode_UTF8_LENGTH(str);
    tmpbytes = NULL;
}
else {
    tmpbytes = _PyUnicode_AsUTF8String(str, "replace");
    s = PyBytes_AS_STRING(tmpbytes);
    size = PyBytes_GET_SIZE(tmpbytes);
}

but it is not even available outside of unicodeobject.c.

PyUnicode_BorrowUTF8() looks too complex for the public API. I am not sure that it will be easy to implement it in PyPy. It also does not cover all use cases -- sometimes you want to convert to UTF-8 but does not use any memory allocation at all (either use an existing buffer or raise an error if there is no cached UTF-8 or the string is not ASCII).

History
Date	User	Action	Args
2019-12-19 09:43:13	serhiy.storchaka	set	recipients: + serhiy.storchaka, vstinner, methane
2019-12-19 09:43:13	serhiy.storchaka	set	messageid: <1576748593.06.0.128693518249.issue39087@roundup.psfhosted.org>
2019-12-19 09:43:13	serhiy.storchaka	link	issue39087 messages
2019-12-19 09:43:12	serhiy.storchaka	create