Message 400620 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	vstinner
Recipients	georg.brandl, indygreg, methane, petr.viktorin, serhiy.storchaka, vstinner
Date	2021-08-30.15:24:40
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1630337080.58.0.861233648777.issue45025@roundup.psfhosted.org>
In-reply-to

Content
> The macro PyUnicode_KIND is part of the documented public C API. IMO it was a mistake to expose it as part of the public C API. This is an implementation detail which should not be exposed. The C API should not expose directly how characters are stored in memory, but provide an abstract way to read and write Unicode characters. The PEP 393 implementation broke the old C API in many ways because it exposed too many implementation details. Sadly, the new C API is... not better :-( If tomorrow, CPython is modified to use UTF-8 internally (as PyPy does), the C API will likely be broken again in many (new funny) ways. 11 years after the PEP 393 (Python 3.3), we only start fixing the old C API :-( The work will be completed in 2 or 3 Python releases (Python 3.12 or 3.13): * https://www.python.org/dev/peps/pep-0623/ * https://www.python.org/dev/peps/pep-0624/ The C API for Unicode strings is causing a lot of issues in PyPy which uses UTF-8 internally. C extensions can fail to build on PyPy if they use functions (macros) like PyUnicode_KIND().

> The macro PyUnicode_KIND is part of the documented public C API.

IMO it was a mistake to expose it as part of the public C API. This is an implementation detail which should not be exposed. The C API should not expose *directly* how characters are stored in memory, but provide an abstract way to read and write Unicode characters.

The PEP 393 implementation broke the old C API in many ways because it exposed too many implementation details. Sadly, the new C API is... not better :-(

If tomorrow, CPython is modified to use UTF-8 internally (as PyPy does), the C API will likely be broken *again* in many (new funny) ways.

11 years after the PEP 393 (Python 3.3), we only start fixing the old C API :-( The work will be completed in 2 or 3 Python releases (Python 3.12 or 3.13):

* https://www.python.org/dev/peps/pep-0623/
* https://www.python.org/dev/peps/pep-0624/

The C API for Unicode strings is causing a lot of issues in PyPy which uses UTF-8 internally. C extensions can fail to build on PyPy if they use functions (macros) like PyUnicode_KIND().

History
Date	User	Action	Args
2021-08-30 15:24:40	vstinner	set	recipients: + vstinner, georg.brandl, petr.viktorin, methane, serhiy.storchaka, indygreg
2021-08-30 15:24:40	vstinner	set	messageid: <1630337080.58.0.861233648777.issue45025@roundup.psfhosted.org>
2021-08-30 15:24:40	vstinner	link	issue45025 messages
2021-08-30 15:24:40	vstinner	create