Message 338228 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	serhiy.storchaka
Recipients	ezio.melotti, methane, serhiy.storchaka, vstinner
Date	2019-03-18.14:23:01
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1552918981.83.0.901300276481.issue36346@roundup.psfhosted.org>
In-reply-to

Content
The legacy Unicode C API was deprecated in 3.3. Its support consumes resources: more memory usage by Unicode objects, additional code for handling Unicode objects created with the legacy C API. Currently every Unicode object has a cache for the wchar_t representation. The proposed PR adds two compile time options: HAVE_UNICODE_WCHAR_CACHE and USE_UNICODE_WCHAR_CACHE. Both are set to 1 by default. If USE_UNICODE_WCHAR_CACHE is set to 0, CPython will not use the wchar_t cache internally. The new wchar_t based C API will be used instead of the Py_UNICODE based C API. This can add small performance penalty for creating a temporary buffer for the wchar_t representation. On other hand, this will decrease the long-term memory usage. This build is binary compatible with the standard build and third-party extensions can use the legacy Unicode C API. If HAVE_UNICODE_WCHAR_CACHE is set to 0, the wchar_t cache will be completely removed. The legacy Unicode C API will be not available, and functions that need it (e.g. PyArg_ParseTuple() with the "u" format unit) will always fail. This build is binary incompatible with the standard build if you use the legacy or non-stable Unicode C API. I hope that these options will help third-party projects to prepare for removing the legacy Unicode C API in future.

The legacy Unicode C API was deprecated in 3.3. Its support consumes resources: more memory usage by Unicode objects, additional code for handling Unicode objects created with the legacy C API. Currently every Unicode object has a cache for the wchar_t representation.

The proposed PR adds two compile time options: HAVE_UNICODE_WCHAR_CACHE and USE_UNICODE_WCHAR_CACHE. Both are set to 1 by default.

If USE_UNICODE_WCHAR_CACHE is set to 0, CPython will not use the wchar_t cache internally. The new wchar_t based C API will be used instead of the Py_UNICODE based C API. This can add small performance penalty for creating a temporary buffer for the wchar_t representation. On other hand, this will decrease the long-term memory usage. This build is binary compatible with the standard build and third-party extensions can use the legacy Unicode C API.

If HAVE_UNICODE_WCHAR_CACHE is set to 0, the wchar_t cache will be completely removed. The legacy Unicode C API will be not available, and functions that need it (e.g. PyArg_ParseTuple() with the "u" format unit) will always fail. This build is binary incompatible with the standard build if you use the legacy or non-stable Unicode C API.

I hope that these options will help third-party projects to prepare for removing the legacy Unicode C API in future.

History
Date	User	Action	Args
2019-03-18 14:23:01	serhiy.storchaka	set	recipients: + serhiy.storchaka, vstinner, ezio.melotti, methane
2019-03-18 14:23:01	serhiy.storchaka	set	messageid: <1552918981.83.0.901300276481.issue36346@roundup.psfhosted.org>
2019-03-18 14:23:01	serhiy.storchaka	link	issue36346 messages
2019-03-18 14:23:01	serhiy.storchaka	create