classification
Title: unicodedata.ucnhash_CAPI removed from Python 3.10 without deprecation
Type: behavior Stage:
Components: Library (Lib) Versions: Python 3.11, Python 3.10
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: erlendaasland, hroncok, koubaa, methane, vstinner
Priority: normal Keywords:

Created on 2021-06-14 12:21 by hroncok, last changed 2021-06-14 19:47 by hroncok.

Messages (13)
msg395792 - (view) Author: Miro Hrončok (hroncok) * Date: 2021-06-14 12:21
In bpo-42157, the unicodedata.ucnhash_CAPI attribute was removed without deprecation. This breaks at least https://github.com/dgrunwald/rust-cpython with:

    AttributeError: module 'unicodedata' has no attribute 'ucnhash_CAPI'

Please revert the removal and deprecate the attribute for 2 Python releases if you want to remove it.

Thanks
msg395801 - (view) Author: Erlend E. Aasland (erlendaasland) * (Python triager) Date: 2021-06-14 14:53
It's not removed, it's renamed (by 84f7382215b9e024a5590454726b6ae4b0ca70a0, GH-22994, bpo-42157). You can access it using the '_ucnhash_CAPI' attribute.
msg395806 - (view) Author: Miro Hrončok (hroncok) * Date: 2021-06-14 15:13
Right. Nevertheless, the reaming has effectively removed the old name.
msg395816 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-06-14 17:09
Oh, I forgot about this issue. Let me rebuild the context.

Copy of the What's New in Python 3.10 entry:

"Removed the unicodedata.ucnhash_CAPI attribute which was an internal PyCapsule object. The related private _PyUnicode_Name_CAPI structure was moved to the internal C API. (Contributed by Victor Stinner in bpo-42157.)"


The C API changes. Python <= 3.9:

typedef struct {

    /* Size of this struct */
    int size;

    /* Get name for a given character code.  Returns non-zero if
       success, zero if not.  Does not set Python exceptions.
       If self is NULL, data come from the default version of the database.
       If it is not NULL, it should be a unicodedata.ucd_X_Y_Z object */
    int (*getname)(PyObject *self, Py_UCS4 code, char* buffer, int buflen,
                   int with_alias_and_seq);

    /* Get character code for a given name.  Same error handling
       as for getname. */
    int (*getcode)(PyObject *self, const char* name, int namelen, Py_UCS4* code,
                   int with_named_seq);

} _PyUnicode_Name_CAPI;

Python >= 3.10:

typedef struct {

    /* Get name for a given character code.
       Returns non-zero if success, zero if not.
       Does not set Python exceptions. */
    int (*getname)(Py_UCS4 code, char* buffer, int buflen,
                   int with_alias_and_seq);

    /* Get character code for a given name.
       Same error handling as for getname(). */
    int (*getcode)(const char* name, int namelen, Py_UCS4* code,
                   int with_named_seq);

} _PyUnicode_Name_CAPI;

Changes:

* _PyUnicode_Name_CAPI.size was removed
* getname and getcode functions have no more "self" argument

There was also a "void *state" parameter in commit https://github.com/python/cpython/commit/47e1afd2a1793b5818a16c41307a4ce976331649 but it was removed later.


In Python, it's used in two places:

* unicodeobject.c: "\N{...}" format to get a code point by its name
* codecs.c: PyCodec_NameReplaceErrors(), "namereplace" error handler

Both used self=NULL in Python 3.9.


It was simpler to remove the C API rather than trying to keep backward compatibility. The problem was to support the "self" parameter.

See the comment:
---
// Check if self is an unicodedata.UCD instance.
// If self is NULL (when the PyCapsule C API is used), return 0.
// PyModule_Check() is used to avoid having to retrieve the ucd_type.
// See unicodedata_functions comment to the rationale of this macro.
#define UCD_Check(self) (self != NULL && !PyModule_Check(self))
---

In my PR, I wrote:

"I prefer to merge this early in the 3.10 dev cycle, to increase chances of getting early user feedback if this change breaks 3rd party applications.

Thanks for the review @methane. Usually, features require 2 Python release to be removed, with a deprecation first. But this specific case is really weird. I chose to remove it immediately. IMO it was exposed in public "by mistake", whereas a private attribute would be enough for internal usage."

https://github.com/python/cpython/pull/22994#issuecomment-716958371
msg395817 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-06-14 17:15
> This breaks at least https://github.com/dgrunwald/rust-cpython

When I search for "ucnhash_CAPI" in the GitHub search, I only find commented code:

https://github.com/dgrunwald/rust-cpython/blob/b63d691addc978952380a8eb146d01a444e16e7a/src/objects/capsule.rs

Does rust-cpython generate code? Do you have more details about the error? How does it use unicodedata.ucnhash_CAPI?
msg395819 - (view) Author: Miro Hrončok (hroncok) * Date: 2021-06-14 17:28
All details I have about rust-cpython are that it fails tests with:

    AttributeError: module 'unicodedata' has no attribute 'ucnhash_CAPI'

See the test failures in https://koschei.fedoraproject.org/package/rust-cpython e.g.:

---- src/objects/capsule.rs - objects::capsule::PyCapsule (line 34) stdout ----
Test executable failed (exit code 101).
stderr:
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: PyErr { ptype: <class 'AttributeError'>, pvalue: Some(AttributeError("module 'unicodedata' has no attribute 'ucnhash_CAPI'")), ptraceback: None }', src/objects/capsule.rs:77:2
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
---- src/objects/capsule.rs - py_capsule (line 232) stdout ----
Test executable failed (exit code 101).
stderr:
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: PyErr { ptype: <class 'AttributeError'>, pvalue: Some(AttributeError("module 'unicodedata' has no attribute 'ucnhash_CAPI'")), ptraceback: None }', src/objects/capsule.rs:73:47
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
failures:
    src/objects/capsule.rs - objects::capsule::PyCapsule (line 34)
    src/objects/capsule.rs - py_capsule (line 232)

(Note that there are also other failures regarding an implicit float->int conversion, but they seem to be caused by a change that followed the deprecation period.)
msg395820 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-06-14 17:30
The UCD_Check() issue was discussed in Mohamed Koubaa's PRs:

* https://github.com/python/cpython/pull/22145 (closed) <= HERE
* https://github.com/python/cpython/pull/22328 (merged)
* https://github.com/python/cpython/pull/22490 (closed)
msg395821 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-06-14 17:32
> (Note that there are also other failures regarding an implicit float->int conversion, but they seem to be caused by a change that followed the deprecation period.)

Does it mean that rust-cpython was broken in Python 3.10 even if a change was prepared with a deprecation period if Python 3.9? Does it mean that the deprecation period was inefficient on this project?
msg395823 - (view) Author: Miro Hrončok (hroncok) * Date: 2021-06-14 17:38
> Does it mean that rust-cpython was broken in Python 3.10 even if a change was prepared with a deprecation period if Python 3.9? Does it mean that the deprecation period was inefficient on this project?

I don't see any deprecation warning when running the test suite with Python 3.9 and I am unsure what exactly changed there. My assumption is that it might be related to PyNumber_Long not taking floats any more and I am unsure that this particular thing omitted deprecation warnings. Still investigating.
msg395824 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-06-14 17:46
If we decide to restore the C API to the Python 3.9 C API, *all* changes done in the unicodedata in Python 3.10 should be reverted, since early changes already changed/broke the C API, and following changes rely on that.
msg395840 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-06-14 19:28
> https://github.com/dgrunwald/rust-cpython/blob/b63d691addc978952380a8eb146d01a444e16e7a/src/objects/capsule.rs

A friend explained me that it's a doctest and unicodedata.ucnhash_CAPI was picked as an example, but any other capsule could be used to test the Capsule API of rust-cpython. rust-cpython doesn't use unicodedata.ucnhash_CAPI.
msg395841 - (view) Author: Erlend E. Aasland (erlendaasland) * (Python triager) Date: 2021-06-14 19:32
Sounds like an easy solution is to open an issue/PR against rust-cpython to update the doctest, IMO.
msg395843 - (view) Author: Miro Hrončok (hroncok) * Date: 2021-06-14 19:47
Updating the doctest is certainly a good solution for this particular project. However I still think this regression deserves to be resolved. This was part of the API, whether intended or not.
History
Date User Action Args
2021-06-14 19:47:01hroncoksetmessages: + msg395843
2021-06-14 19:32:10erlendaaslandsetmessages: + msg395841
2021-06-14 19:28:17vstinnersetmessages: + msg395840
2021-06-14 17:46:19vstinnersetmessages: + msg395824
2021-06-14 17:38:12hroncoksetmessages: + msg395823
2021-06-14 17:32:21vstinnersetmessages: + msg395821
2021-06-14 17:30:02vstinnersetmessages: + msg395820
2021-06-14 17:28:54hroncoksetmessages: + msg395819
2021-06-14 17:15:19vstinnersetmessages: + msg395817
2021-06-14 17:09:38vstinnersetnosy: + methane, koubaa
messages: + msg395816
2021-06-14 15:13:21hroncoksetmessages: + msg395806
2021-06-14 14:53:00erlendaaslandsetnosy: + erlendaasland
messages: + msg395801
2021-06-14 12:21:13hroncokcreate