classification
Title: Enhancement request for proxying PyString
Type: Stage:
Components: Extension Modules Versions: Python 3.10
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Thrameos, serhiy.storchaka
Priority: normal Keywords:

Created on 2020-12-24 19:13 by Thrameos, last changed 2020-12-25 11:47 by serhiy.storchaka.

Messages (2)
msg383701 - (view) Author: Karl Nelson (Thrameos) Date: 2020-12-24 19:13
When developing with JPype, the largest hole currently is that Java returns a string type which cannot be represented as a str.  Java strings are string like and immutable and can be pulled to Python when needed, but it is best if they remain in Java until Python requests it as pulling all string values through the API and pushing them back can result in serious overhead.  Thus they need to be represented as a Proxy to a string, which can be accessed as a string at anytime.

Throughout the Python API str is treated as a concrete type (though it is somewhat polymorphic due to different storage for code points sizes.)  There is also handling for an "unready" string which needs additional treatment before it can be accessed.  Unfortunately this does not appear to be suitable for creating a proxy object which can be pulled from another source to create a string on demand.   Having a "__str__()" method is insufficient as that merely makes an object able to become a string rather than considered to be a string by the rest of the API.

Would it be possible to generalize the concept of an unready string so that when Ready is called it fetches the actually string contents, creates a piece of memory to store the string contents (outside of the object itself), and sets the access flags for so that the code points can be interpreted?   Is this already possible in the API?  Are there any other plans to make the str type able to operate as a proxy?
msg383738 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-12-25 11:47
There is no longer PyString in Python, only PyUnicode.

There are plans to get rid of PyUnicode_READY(). After removing support of "legacy" Unicode objects (which will happen in few years), PyUnicode_READY() will be no longer needed, so all calls of it could be removed. Currently there is a last chance to redesign it for other purposes. I suggest to discuss this on one of mailing lists (Python-ideas or even Python-Dev) with wider auditory, as it can have large impact on the future of C API.

Although I am not sure that PyUnicode_READY() is called in all needed cases. It just happen that the code is not tested intensively with "legacy" Unicode objects because in normal case you get already ready objects. Actually, functions like _PyUnicode_EqualToASCIIString do not call it intentionally and read the Py_UNICODE content of non-ready Unicode objects directly.
History
Date User Action Args
2020-12-25 11:47:31serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg383738
2020-12-24 19:13:42Thrameoscreate