This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Promote PyUnicode_AsUTF8AndSize to be available with the limited API (PEP 384)
Type: enhancement Stage: resolved
Components: C API Versions: Python 3.10
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: alex Nosy List: alex, benjamin.peterson, methane, serhiy.storchaka, skrah, steve.dower
Priority: normal Keywords: easy (C), patch

Created on 2020-09-14 19:37 by alex, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 22252 merged alex, 2020-09-15 02:27
Messages (13)
msg376896 - (view) Author: Alex Gaynor (alex) * (Python committer) Date: 2020-09-14 19:37
This function is incredibly useful for efficient interoperability between Python and other languages with UTF-8 based strings (e.g. Rust). Right now it's not possible to do interop without several copies/allocations if you're trying to build an abi3 wheel. This is tactically relevant to me here: https://github.com/PyO3/pyo3/issues/1125

This API has been stable since it was introduced in Python 3.1, so I think making it stable would be appropriate.

Inada, Benjamin suggested I should ask you for your feedback on this. I'm happy to send a pull request.
msg376922 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2020-09-15 02:05
+1. It is a very important API.
msg376941 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-09-15 14:21
What about PyUnicode_AsUTF8? Should it be made public too or left for internal use only?

What about third-party implementations of Python? How hard to implement this API on an implementation without reference counts? It is interesting to hear the expert opinion of the core developers of PyPy.
msg376947 - (view) Author: Alex Gaynor (alex) * (Python committer) Date: 2020-09-15 16:21
I think less is more, one API is plenty :-)

It looks to me like the API is already supported on PyPy, so I think it's fine from that perspective: https://foss.heptapod.net/pypy/pypy/-/blob/branch/py3.7/pypy/module/cpyext/unicodeobject.py#L493
msg376952 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-09-15 20:17
PyUnicode_AsUTF8() is used 3 times more than PyUnicode_AsUTF8AndSize().

$ find -type f -name '*.c' -exec egrep 'PyUnicode_AsUTF8AndSize\(' '{}' + | wc -l
35
$ find -type f -name '*.c' -exec egrep 'PyUnicode_AsUTF8\(' '{}' + | wc -l
101
msg376973 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2020-09-16 02:06
PyUnicode_AsUTF8 is useful "API". But it can be implemented as C macro, C inline function, or functions/macros in any other languages using PyUnicode_AsUTF8AndSize.

PyUnicode_AsUTF8AndSize is more important for "ABI".
msg376978 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-09-16 07:37
I agree about PyUnicode_AsUTF8.

But I think it would be worth to ask PyPy team about PyUnicode_AsUTF8AndSize.

An alternate C API is PyUnicode_GetUTF8Buffer (issue39087). It requires explicit releasing the buffer after use, so it can be used even on implementations with moving garbage collector.
msg376987 - (view) Author: Alex Gaynor (alex) * (Python committer) Date: 2020-09-16 11:35
Py_buffer is not part of the limited API at all, so I don't think it's usable for this.
msg377094 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-09-18 07:49
Oh, would not be worth to add Py_buffer to the limited API?
msg377106 - (view) Author: Alex Gaynor (alex) * (Python committer) Date: 2020-09-18 11:35
It's a big project I think :-) Py_Buffer is allocated on the stack, so either we'd have to agree to never change it's ABI (size, alignment, etc.) or we'd need to completely change the interface.
msg379026 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2020-10-19 21:43
Agreed that there's no way we can make Py_buffer part of the limited ABI.

I just looked over the PR and it's missing a What's New entry (e.g. https://docs.python.org/3/whatsnew/3.9.html#c-api-changes). Other than that, looks fine to me.
msg379036 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2020-10-19 22:17
New changeset 3a8fdb28794b2f19f6c8464378fb8b46bce1f5f4 by Alex Gaynor in branch 'master':
bpo-41784: make PyUnicode_AsUTF8AndSize part of the limited API (GH-22252)
https://github.com/python/cpython/commit/3a8fdb28794b2f19f6c8464378fb8b46bce1f5f4
msg379037 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2020-10-19 22:18
Thanks, Alex!
History
Date User Action Args
2022-04-11 14:59:35adminsetgithub: 85950
2020-10-19 22:18:07steve.dowersetstatus: open -> closed
resolution: fixed
messages: + msg379037

stage: patch review -> resolved
2020-10-19 22:17:56steve.dowersetmessages: + msg379036
2020-10-19 21:43:25steve.dowersetnosy: + steve.dower
messages: + msg379026
2020-09-18 11:35:09alexsetmessages: + msg377106
2020-09-18 07:49:41serhiy.storchakasetnosy: + skrah
messages: + msg377094
2020-09-16 11:35:54alexsetmessages: + msg376987
2020-09-16 07:37:21serhiy.storchakasetmessages: + msg376978
2020-09-16 02:06:25methanesetmessages: + msg376973
2020-09-15 20:17:59serhiy.storchakasetmessages: + msg376952
2020-09-15 16:22:00alexsetmessages: + msg376947
2020-09-15 14:21:40serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg376941
2020-09-15 02:27:04alexsetkeywords: + patch
stage: patch review
pull_requests: + pull_request21307
2020-09-15 02:20:13alexsetassignee: alex
2020-09-15 02:06:48methanesetversions: + Python 3.10
2020-09-15 02:05:26methanesetmessages: + msg376922
2020-09-14 19:37:03alexcreate