Title: Promote PyUnicode_AsUTF8AndSize to be available with the limited API (PEP 384)
Type: enhancement Stage: patch review
Components: C API Versions: Python 3.10
Status: open Resolution:
Dependencies: Superseder:
Assigned To: alex Nosy List: alex, benjamin.peterson, inada.naoki, serhiy.storchaka, skrah
Priority: normal Keywords: easy (C), patch

Created on 2020-09-14 19:37 by alex, last changed 2020-09-18 11:35 by alex.

Pull Requests
URL Status Linked Edit
PR 22252 open alex, 2020-09-15 02:27
Messages (10)
msg376896 - (view) Author: Alex Gaynor (alex) * (Python committer) Date: 2020-09-14 19:37
This function is incredibly useful for efficient interoperability between Python and other languages with UTF-8 based strings (e.g. Rust). Right now it's not possible to do interop without several copies/allocations if you're trying to build an abi3 wheel. This is tactically relevant to me here:

This API has been stable since it was introduced in Python 3.1, so I think making it stable would be appropriate.

Inada, Benjamin suggested I should ask you for your feedback on this. I'm happy to send a pull request.
msg376922 - (view) Author: Inada Naoki (inada.naoki) * (Python committer) Date: 2020-09-15 02:05
+1. It is a very important API.
msg376941 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-09-15 14:21
What about PyUnicode_AsUTF8? Should it be made public too or left for internal use only?

What about third-party implementations of Python? How hard to implement this API on an implementation without reference counts? It is interesting to hear the expert opinion of the core developers of PyPy.
msg376947 - (view) Author: Alex Gaynor (alex) * (Python committer) Date: 2020-09-15 16:21
I think less is more, one API is plenty :-)

It looks to me like the API is already supported on PyPy, so I think it's fine from that perspective:
msg376952 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-09-15 20:17
PyUnicode_AsUTF8() is used 3 times more than PyUnicode_AsUTF8AndSize().

$ find -type f -name '*.c' -exec egrep 'PyUnicode_AsUTF8AndSize\(' '{}' + | wc -l
$ find -type f -name '*.c' -exec egrep 'PyUnicode_AsUTF8\(' '{}' + | wc -l
msg376973 - (view) Author: Inada Naoki (inada.naoki) * (Python committer) Date: 2020-09-16 02:06
PyUnicode_AsUTF8 is useful "API". But it can be implemented as C macro, C inline function, or functions/macros in any other languages using PyUnicode_AsUTF8AndSize.

PyUnicode_AsUTF8AndSize is more important for "ABI".
msg376978 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-09-16 07:37
I agree about PyUnicode_AsUTF8.

But I think it would be worth to ask PyPy team about PyUnicode_AsUTF8AndSize.

An alternate C API is PyUnicode_GetUTF8Buffer (issue39087). It requires explicit releasing the buffer after use, so it can be used even on implementations with moving garbage collector.
msg376987 - (view) Author: Alex Gaynor (alex) * (Python committer) Date: 2020-09-16 11:35
Py_buffer is not part of the limited API at all, so I don't think it's usable for this.
msg377094 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-09-18 07:49
Oh, would not be worth to add Py_buffer to the limited API?
msg377106 - (view) Author: Alex Gaynor (alex) * (Python committer) Date: 2020-09-18 11:35
It's a big project I think :-) Py_Buffer is allocated on the stack, so either we'd have to agree to never change it's ABI (size, alignment, etc.) or we'd need to completely change the interface.
Date User Action Args
2020-09-18 11:35:09alexsetmessages: + msg377106
2020-09-18 07:49:41serhiy.storchakasetnosy: + skrah
messages: + msg377094
2020-09-16 11:35:54alexsetmessages: + msg376987
2020-09-16 07:37:21serhiy.storchakasetmessages: + msg376978
2020-09-16 02:06:25inada.naokisetmessages: + msg376973
2020-09-15 20:17:59serhiy.storchakasetmessages: + msg376952
2020-09-15 16:22:00alexsetmessages: + msg376947
2020-09-15 14:21:40serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg376941
2020-09-15 02:27:04alexsetkeywords: + patch
stage: patch review
pull_requests: + pull_request21307
2020-09-15 02:20:13alexsetassignee: alex
2020-09-15 02:06:48inada.naokisetversions: + Python 3.10
2020-09-15 02:05:26inada.naokisetmessages: + msg376922
2020-09-14 19:37:03alexcreate