This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: [C API] PEP 674: Disallow using macros (Py_TYPE and Py_SIZE) as l-value
Type: Stage: patch review
Components: C API Versions: Python 3.11
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: arhadthedev, erlendaasland, gdr@garethrees.org, lemburg, rhettinger, vstinner
Priority: normal Keywords: patch

Created on 2021-10-14 21:17 by vstinner, last changed 2022-04-11 14:59 by admin.

Files
File name Uploaded Description Edit
pep674_regex.py vstinner, 2021-12-01 00:25
Pull Requests
URL Status Linked Edit
PR 28961 closed vstinner, 2021-10-14 23:09
PR 28976 closed vstinner, 2021-10-15 12:41
PR 29860 merged vstinner, 2021-11-30 10:36
PR 29866 merged vstinner, 2021-11-30 13:12
Messages (41)
msg403950 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-10-14 21:17
The Python C API provides "AS" functions to convert an object to another type, like PyFloat_AS_DOUBLE(). These macros can be abused to be used as l-value: "PyFloat_AS_DOUBLE(obj) = new_value;". It prevents to change the PyFloat implementation and makes life harder for Python implementations other than CPython.

I propose to convert these macros to static inline functions to disallow using them as l-value.

I made a similar change for Py_REFCNT(), Py_TYPE() and Py_SIZE(). For these functions, I added "SET" variants: Py_SET_REFCNT(), Py_SET_TYPE(), Py_SET_SIZE(). Here, I don't think that the l-value case is legit, and so I don't see the need to add a way to *set* a value.

For example, I don't think that PyFloat_SET_DOUBLE(obj, value) would make sense. A Python float object is supposed to be immutable.
msg403956 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2021-10-14 22:06
> These macros can be abused to be used as l-value

You could simply document, "don't do that".  Also if these is a need to make an assignment, you're now going to have to create a new setter function to fill the need.

We really don't have to go on thin ice converting to functions that might or might not be inlined depending on compiler specific nuances.

AFAICT, no one has ever has problems with these being macros.  There really isn't a problem to be solved and the "solution" may in fact introduce new problems that we didn't have before.

Put me down for a -1 on the these blanket macro-to-inline function rewrites.  The premise is almost entirely a matter of option, "macros are bad, functions are good" and a naive assumption, "inline functions" always inline.
msg403961 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-10-14 23:47
Raymond:
> AFAICT, no one has ever has problems with these being macros.

This issue is about the API of PyFloat_AS_DOUBLE(). Implementing it as a macro or a static inline function is an implementation detail which doesn't matter. But I don't know how to disallow "PyFloat_AS_DOUBLE(obj) = value" if it is defined as a macro.

Have a look at the Facebook "nogil" project which is incompatible with accessing directly the PyObject.ob_refcnt member:
"Extensions must use Py_REFCNT and Py_SET_REFCNT instead of directly accessing reference count fields"
https://docs.google.com/document/d/18CXhDb1ygxg-YXNBJNzfzZsDFosB5e6BfnXLlejd9l0/edit

Raymond:
> You could simply document, "don't do that".

Documentation doesn't work. Developers easily fall into traps when it's possible to fall. See bpo-30459 for such trap with PyList_SET_ITEM() and PyCell_SET() macros. They were misused by two Python projects.


Raymond:
> We really don't have to go on thin ice converting to functions that might or might not be inlined depending on compiler specific nuances.

Do you have a concrete example where a static inline function is not inlined, whereas it was inlined when it was a macro? So far, I'm not aware of any performance issue like that.

There were attempts to use __attribute__((always_inline)) (Py_ALWAYS_INLINE), but so far, using it was not a clear win.
msg403965 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-10-15 00:00
I searched for "PyFloat_AS_DOUBLE.*=" regex in the PyPI top 5000 projects. I couldn't find any project doing that.

I only found perfectly safe comparisons:

traits/ctraits.c:            if (PyFloat_AS_DOUBLE(value) <= PyFloat_AS_DOUBLE(low)) {
traits/ctraits.c:            if (PyFloat_AS_DOUBLE(value) >= PyFloat_AS_DOUBLE(high)) {
c/_cffi_backend.c:        return PyFloat_AS_DOUBLE(ob) != 0.0;
pandas/_libs/src/klib/khash_python.h:           ( PyFloat_AS_DOUBLE(a) == PyFloat_AS_DOUBLE(b) );
msg403990 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2021-10-15 09:10
I am with Raymond on this one.

If "protecting against wrong use" is the only reason to go down the slippery path of starting to rely on compiler optimizations for performance critical operations, the argument is not good enough.

If people do use macros in l-value mode, it's their problem when their code breaks, not ours. Please don't forget that we are operating under the consenting adults principle: we expect users of the CPython API to use it as documented and expect them to take care of the fallout, if things break when they don't.

We don't need to police developers into doing so.
msg403995 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-10-15 09:43
For PyObject, I converted Py_REFCNT(), Py_TYPE() and Py_SIZE() to static inline functions to enforce the usage of Py_SET_REFCNT(), Py_SET_TYPE() and Py_SET_SIZE(). Only a minority of C extensions are affected by these changes. Also, there is more pressure from recent optimization projects to abstract accesses to PyObject members.

I agree that it doesn't seem that "AS" functions are abused to *set* the inner string:

* PyByteArray_AS_STRING()
* PyBytes_AS_STRING()
* PyFloat_AS_DOUBLE()

> If "protecting against wrong use" is the only reason to go down the slippery path of starting to rely on compiler optimizations for performance critical operations, the argument is not good enough.

Again, I'm not aware of any performance issue caused by short static inline functions like Py_TYPE() or the proposed PyFloat_AS_DOUBLE(). If there is a problem, it should be addressed, since Python uses more and more static inline functions.

static inline functions is a common feature of C language. I'm not sure where your doubts of bad performance come from.

Using static inline functions has other advantages. It helps debugging and profiling, since the function name can be retrieved by debuggers and profilers when analysing the machine code. It also avoids macro pitfalls (like abusing a macro to use it as an l-value ;-)).
msg403999 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2021-10-15 10:20
On 15.10.2021 11:43, STINNER Victor wrote:
> Again, I'm not aware of any performance issue caused by short static inline functions like Py_TYPE() or the proposed PyFloat_AS_DOUBLE(). If there is a problem, it should be addressed, since Python uses more and more static inline functions.
> 
> static inline functions is a common feature of C language. I'm not sure where your doubts of bad performance come from.

Inlining is something that is completely under the control of the
used compilers. Compilers are free to not inline function marked for
inlining, which can result in significant slowdowns on platforms
which are e.g. restricted in RAM and thus emphasize on small code size,
or where the CPUs have small caches or not enough registers (think
micro-controllers).

The reason why we have those macros is because we want the developers to be
able to make a conscious decision "please inline this code unconditionally
and regardless of platform or compiler". The developer will know better
what to do than the compiler.

If the developer wants to pass control over to the compiler s/he can use
the corresponding C function, which is usually available (and then, in many
cases, also provides error handling).

> Using static inline functions has other advantages. It helps debugging and profiling, since the function name can be retrieved by debuggers and profilers when analysing the machine code. It also avoids macro pitfalls (like abusing a macro to use it as an l-value ;-)).

Perhaps, but then I never had to profile macro use in the past. Instead,
what I typically found was that using macros results in faster code when
used in inner loops, so profiling usually guided me to use macros instead
of functions.

That said, the macros you have inlined so far were all really trivial,
so a compiler will most likely always inline them (the number of machine
code instructions for the call would be more than needed for
the actual operation).

Perhaps we ought to have a threshold for making such decisions, e.g.
number of machine code instructions generated for the macro or so, to
not get into discussions every time :-)

A blanket "static inline" is always better than a macro is not good
enough as an argument, though.

Esp. in PGO driven optimizations the compiler could opt for using
the function call rather than inlining if it finds that the code
in question is not used much and it needs to save space to have
loops fit into CPU caches.
msg404001 - (view) Author: Gareth Rees (gdr@garethrees.org) * (Python triager) Date: 2021-10-15 11:01
If the problem is accidental use of the result of PyFloat_AS_DOUBLE() as an lvalue, why not use the comma operator to ensure that the result is an rvalue?

The C99 standard says "A comma operator does not yield an lvalue" in §6.5.17; I imagine there is similar text in other versions of the standard.

The idea would be to define a helper macro like this:

    /* As expr, but can only be used as an rvalue. */
    #define Py_RVALUE(expr) ((void)0, (expr))

and then use the helper where needed, for example:

    #define PyFloat_AS_DOUBLE(op) Py_RVALUE(((PyFloatObject *)(op))->ob_fval)
msg404010 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-10-15 12:42
> #define Py_RVALUE(expr) ((void)0, (expr))

Oh, that's a clever trick!

I wrote GH-28976 which uses it.
msg404039 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-10-15 17:44
I created bpo-45490: "[meta][C API] Avoid C macro pitfalls and usage of static inline functions" to discuss macros and static inline functions more generally.
msg406343 - (view) Author: Oleg Iarygin (arhadthedev) * Date: 2021-11-15 07:54
Marc-Andre:
> Inlining is something that is completely under the control of the
used compilers. Compilers are free to not inline function marked for
inlining [...]

I checked the following C snippet on gcc.godbolt.org using GCC 4.1.2 and Clang 3.0.0 with <no flags>/-O0/-O1/-Os, and both compilers inline a function marked as static inline:

    static inline int foo(int a)
    {
        return a * 2;
    }

    int bar(int a)
    {
        return foo(a) < 0;
    }

So even with -O0, GCC from 2007 and Clang from 2011 perform inlining. Though, old versions of CLang leave a dangling original copy of foo for some reason. I hope a linker removes it later.

As for other compilers, I believe that if somebody specifies -O0, that person has a sound reason to do so (like per-line debugging, building precise flame graphs, or other specific scenario where execution performance does not matter), so inlining interferes here anyway.
msg406344 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2021-11-15 09:22
On 15.11.2021 08:54, Oleg Iarygin wrote:
> 
> Oleg Iarygin <oleg@arhadthedev.net> added the comment:
> 
> Marc-Andre:
>> Inlining is something that is completely under the control of the
> used compilers. Compilers are free to not inline function marked for
> inlining [...]
> 
> I checked the following C snippet on gcc.godbolt.org using GCC 4.1.2 and Clang 3.0.0 with <no flags>/-O0/-O1/-Os, and both compilers inline a function marked as static inline:
> 
>     static inline int foo(int a)
>     {
>         return a * 2;
>     }
> 
>     int bar(int a)
>     {
>         return foo(a) < 0;
>     }
> 
> So even with -O0, GCC from 2007 and Clang from 2011 perform inlining. Though, old versions of CLang leave a dangling original copy of foo for some reason. I hope a linker removes it later.

That's a great website :-) Thanks for sharing.

However, even with x86-64 gcc 11.2, I get assembler which does not inline
foo() without compiler options or with -O0: https://gcc.godbolt.org/z/oh6qnffh7

Only with -O1, the site reports inlining foo().

> As for other compilers, I believe that if somebody specifies -O0, that person has a sound reason to do so (like per-line debugging, building precise flame graphs, or other specific scenario where execution performance does not matter), so inlining interferes here anyway.

Sure, but my point was a different one: even with higher optimization
levels, the compiler can decide whether or not to inline. We expect
the compiler to inline, but cannot be sure.

With macros the compiler has no choice and we are in control and even
when using -O0, you will still want e.g. Py_INCREF() and Py_DECREF()
inlined.
msg406345 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-11-15 09:35
I wrote PEP 670 "Convert macros to functions in the Python C API" for this issue:
https://www.python.org/dev/peps/pep-0670/
msg406346 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-11-15 09:54
I don't understand what you are trying to prove about compilers not inlining when you explicitly ask them... not to inline.

The purpose of the -O0 option is to minimize the build time, with a trade-off: don't expect the built executable to be fast. If you care about Python performance... well, don't use -O0? Python ./configure --with-pydebug builds Python with -Og which is not -O0. The -Og level is special, it's a different trade-off between the compiler build time and Python runtime performance.

If you want a Python debug build (Py_DEBUG macro defined, ./configure --with-pydebug), it's perfectly fine to build it with -O2 or -O3 to make sure that static inline functions are inlined. You can also enable LTO and PGO on a debug build.

GCC -Og option:
"""
-Og

    Optimize debugging experience. -Og should be the optimization level of choice for the standard edit-compile-debug cycle, offering a reasonable level of optimization while maintaining fast compilation and a good debugging experience. It is a better choice than -O0 for producing debuggable code because some compiler passes that collect debug information are disabled at -O0.

    Like -O0, -Og completely disables a number of optimization passes so that individual options controlling them have no effect. Otherwise -Og enables all -O1 optimization flags except for those that may interfere with debugging:

    -fbranch-count-reg  -fdelayed-branch 
    -fdse  -fif-conversion  -fif-conversion2  
    -finline-functions-called-once 
    -fmove-loop-invariants  -fmove-loop-stores  -fssa-phiopt 
    -ftree-bit-ccp  -ftree-dse  -ftree-pta  -ftree-sra
"""
https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html

I prefer to use gcc -O0 when I develop on Python because the build time matters a lot in my very specific use case, and gcc -O0 is the best to debug Python in a debugger. See my article:
https://developers.redhat.com/articles/2021/09/08/debugging-python-c-extensions-gdb

On RHEL8, the Python 3.9 debug build is now built with -O0 to be fully usable in gdb (to debug C extensions).

In RHEL, the main motivation to use -O0 rather than -Og was to get a fully working gdb debugger on C extensions. With -Og, we get too many <optimized out> values which are blocking debugging :-(
msg406348 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2021-11-15 10:46
On 15.11.2021 10:54, STINNER Victor wrote:
> 
> I don't understand what you are trying to prove about compilers not inlining when you explicitly ask them... not to inline.

I'm not trying to prove anything, Victor.

I'm only stating the fact that by switching from macros to inline
functions we are giving away control to the compilers and should not
be surprised that Python now suddenly runs a lot slower on systems
which either have inlining optimizations switched off or where the
compiler (wrongly) assumes that creating more assembler would result
in slower code.

I've heard all your arguments against macros, but don't believe the
blanket approach to convert to inline functions is warranted in all
cases, in particular not for code which is private to the interpreter
and where we know that we need the code inlined to not result in
unexpected performance regressions.

I also don't believe that we should assume that Python C extension
authors will unintentionally misuse Python API macros. If they do,
it's their business to sort out any issues, not ours. If we document
that macros may not be used as l-values, that's clear enough. We don't
need to use compiler restrictions to impose such limitations.

IMO, conversion to inline functions should only happen, when

a) the core language implementation has a direct benefit, and

b) it is very unlikely that compilers will not inline the code
   with -O2 settings, e.g. perhaps using a threshold of LOCs
   or by testing with the website Oleg mentioned.

Overall, I think the PEP 670 should get some more attentions from the
SC to have a guideline to use as basis for deciding whether or not
to use the static inline function approach. That way we could avoid
these discussions :-)

BTW: Thanks for the details about -O0 vs. -Og.
msg406996 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-11-25 13:26
I decided to exclude macros which can be used as l-value from the PEP 670, since the motivation to disallow using them as l-value is different, and I prefer to restrict PEP 670 scope.

Disallowing using macros as l-value is more about hide implementation details and improving compatibility with Python implementations other than CPython, like PyPy or RustPython.

The PEP 670 is restricted to advantages and disavantages of converting macros to functions (static inline or regular functions).
msg407283 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-11-29 14:57
PyBytes_AS_STRING() and PyByteArray_AS_STRING() are used to modify string characters, but not used directly as l-value.

Search in PyPI top 5000 packages:

$ ./search_pypi_top_5000.sh '(PyByteArray|PyBytes)_AS_.*[^!<>=]=[^=]'
pypi-top-5000_2021-08-17/plyvel-1.3.0.tar.gz
pypi-top-5000_2021-08-17/numexpr-2.7.3.tar.gz
pypi-top-5000_2021-08-17/Cython-0.29.24.tar.gz

numexpr-2.7.3, numexpr/numexpr_object.cpp:

PyBytes_AS_STRING(constsig)[i] = 'b';
PyBytes_AS_STRING(constsig)[i] = 'i';
PyBytes_AS_STRING(constsig)[i] = 'l';
PyBytes_AS_STRING(constsig)[i] = 'f';
PyBytes_AS_STRING(constsig)[i] = 'd';
PyBytes_AS_STRING(constsig)[i] = 'c';
PyBytes_AS_STRING(constsig)[i] = 's';

plyvel-1.3.0, plyvel/_plyvel.cpp: 

PyByteArray_AS_STRING(string)[i] = (char) v;
PyByteArray_AS_STRING(string)[i] = (char) v;

Cython-0.29.24:

$ grep -E '(PyByteArray|PyBytes)_AS_.*[^!<>=]=[^=]' -R .
./Cython/Utility/StringTools.c:            PyByteArray_AS_STRING(string)[i] = (char) v;
./Cython/Utility/StringTools.c:        PyByteArray_AS_STRING(string)[i] = (char) v;
./Cython/Utility/StringTools.c:            PyByteArray_AS_STRING(bytearray)[n] = value;
msg407358 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-11-30 11:14
New changeset c19c3a09618ac400538ee412f84be4c1196c7bab by Victor Stinner in branch 'main':
bpo-45476: Add _Py_RVALUE() macro (GH-29860)
https://github.com/python/cpython/commit/c19c3a09618ac400538ee412f84be4c1196c7bab
msg407374 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-11-30 14:13
I created this issue to disallow macros like PyFloat_AS_DOUBLE() and PyDict_GET_SIZE() as l-value. It seems like this change by itself is controversial.

I proposed one way to implement this change: convert macros to static inline functions. I didn't expect that this conversion would be also controversial. For now, I abandon the static inline approach, to focus on the implementation which keeps macros: modify macros to use _Py_RVALUE() => PR 28976.

Once the PR 28976 will be merged and the PEP 670 will be accepted, we can reconsider converting these macros to static inline functions, then it should be non controversial.
msg407375 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-11-30 14:14
New changeset 4b97d974ecca9cce532be55410fe851eb9fdcf21 by Victor Stinner in branch 'main':
bpo-45476: Disallow using asdl_seq_GET() as l-value (GH-29866)
https://github.com/python/cpython/commit/4b97d974ecca9cce532be55410fe851eb9fdcf21
msg407395 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-11-30 18:34
I wrote PEP 674 "Disallow using macros as l-value" for this change: https://python.github.io/peps/pep-0674/
msg407410 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-11-30 23:37
In the PyPI top 5000, I found two projects using PyDescr_TYPE() and PyDescr_NAME() as l-value: M2Crypto and mecab-python3. In both cases, it was code generated by SWIG:

 * This file was automatically generated by SWIG (http://www.swig.org).
 * Version 4.0.2

M2Crypto-0.38.0/src/SWIG/_m2crypto_wrap.c and mecab-python3-1.0.4/src/MeCab/MeCab_wrap.cpp contain the function:

SWIGINTERN PyGetSetDescrObject *
SwigPyStaticVar_new_getset(PyTypeObject *type, PyGetSetDef *getset) {

  PyGetSetDescrObject *descr;
  descr = (PyGetSetDescrObject *)PyType_GenericAlloc(SwigPyStaticVar_Type(), 0);
  assert(descr);
  Py_XINCREF(type);
  PyDescr_TYPE(descr) = type;
  PyDescr_NAME(descr) = PyString_InternFromString(getset->name);
  descr->d_getset = getset;
  if (PyDescr_NAME(descr) == NULL) {
    Py_DECREF(descr);
    descr = NULL;
  }
  return descr;
}
msg407412 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-11-30 23:46
I found 4 projects using "Py_TYPE(obj) = new_type;" in the PyPI top 5000:

mypy-0.910:

* mypyc/lib-rt/misc_ops.c: Py_TYPE(template_) = &PyType_Type;
* mypyc/lib-rt/misc_ops.c: Py_TYPE(t) = metaclass;

recordclass-0.16.3:

* lib/recordclass/_dataobject.c: Py_TYPE(op) = type;
* lib/recordclass/_dataobject.c: Py_TYPE(op) = type;
* lib/recordclass/_litetuple.c: //         Py_TYPE(ob) = &PyLiteTupleType_Type;

pysha3-1.0.2:

* Modules/_sha3/sha3module.c: Py_TYPE(type) = &PyType_Type;

datatable-1.0.0.tar.gz:

* src/core/python/namedtuple.cc: Py_TYPE(v) = type.v;
* src/core/python/tuple.cc: Py_TYPE(v_new) = v_type;
msg407416 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-12-01 00:21
In the PyPI top 5000 projects, I found 32 projects using "Py_SIZE(obj) =
new_size": 8 of them are written manually, 24 use Cython.

8 projects using "Py_SIZE(obj) = new_size":

* guppy3-3.1.2: src/sets/bitset.c and src/sets/nodeset.c
* mypy-0.910: list_resize() in mypyc/lib-rt/pythonsupport.h
* pickle5-0.0.12: pickle5/_pickle.c
* python-snappy-0.6.0: maybe_resize() in snappy/snappymodule.cc
* recordclass-0.16.3: lib/recordclass/_dataobject.c + code generated by Cython
* scipy-1.7.3: scipy/_lib/boost/boost/python/object/make_instance.hpp
* zodbpickle-2.2.0: src/zodbpickle/_pickle_33.c
* zstd-1.5.0.2: src/python-zstd.c

24 projects using "Py_SIZE(obj) = new_size" generated by an outdated Cython:

* Naked-0.1.31
* Shapely-1.8.0
* dedupe-hcluster-0.3.8
* fastdtw-0.3.4
* fuzzyset-0.0.19
* gluonnlp-0.10.0
* hdbscan-0.8.27
* jenkspy-0.2.0
* lightfm-1.16
* neobolt-1.7.17
* orderedset-2.0.3
* ptvsd-4.3.2
* py_spy-0.3.11
* pyemd-0.5.1
* pyhacrf-datamade-0.2.5
* pyjq-2.5.2
* pypcap-1.2.3
* python-crfsuite-0.9.7
* reedsolo-1.5.4
* tables-3.6.1
* thriftpy-0.3.9
* thriftrw-1.8.1
* tinycss-0.4
* triangle-20200424
msg407417 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-12-01 00:25
Attached pep674_regex.py generated a regex to search for code incompatible with the PEP 674.

To download PyPI top 5000, you can use my script:
https://github.com/vstinner/misc/blob/main/cpython/download_pypi_top.py

To grep a regex in tarball and ZIP archives, you can use the rg command:

$ rg -zl REGEX DIRECTORY/*.{zip,gz,bz2,tgz}

Or you can try my script:
https://github.com/vstinner/misc/blob/main/cpython/search_pypi_top.py
msg407456 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-12-01 13:29
I updated my ./search_pypi_top_5000.py script to ignore files generated by Cython.

On PyPI top 5000, I only found 16 projects impacted by the PEP 674 (16/5000 = 0.3%):

* datatable-1.0.0
* frozendict-2.1.1
* guppy3-3.1.2
* M2Crypto-0.38.0
* mecab-python3-1.0.4
* mypy-0.910
* Naked-0.1.31
* pickle5-0.0.12
* pycurl-7.44.1
* PyGObject-3.42.0
* pysha3-1.0.2
* python-snappy-0.6.0
* recordclass-0.16.3
* scipy-1.7.3
* zodbpickle-2.2.0
* zstd-1.5.0.2

I ignored manually two false positives in 3 projects:

* "#define __Pyx_SET_SIZE(obj, size) Py_SIZE(obj) = (size)" in Cython
* "* Py_TYPE(obj) = new_type must be replaced with Py_SET_TYPE(obj, new_type)": comment in psycopg2 and psycopg2-binary
msg407463 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-12-01 15:54
Oops, sorry, pycurl-7.44.1 and PyGObject-3.42.0 are not affected, they only define Py_SET_TYPE() macro for backward compatibility. So right now, only 14 projects are affected.
msg407528 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-12-02 15:13
> * zstd-1.5.0.2: src/python-zstd.c

I proposed a fix upstream: https://github.com/sergey-dryabzhinsky/python-zstd/pull/70
msg407529 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-12-02 15:20
> frozendict-2.1.1

If I understand correctly, this module is compatible with the PEP 674, it only has to copy Python 3.11 header files once Python 3.11 is released, to port the project to Python 3.11.

Incompatable code is not part of the "frozendict" implementation, but only in copies of the Python header files (Python 3.6, 3.7, 3.8, 3.9 and 3.10). For example, it contains the frozendict/src/3_10/cpython_src/Include/object.h header: copy of CPython Include/object.h file.

Source code: https://github.com/Marco-Sulla/python-frozendict
msg407531 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-12-02 15:27
> pysha3-1.0.2

This module must not be used on Python 3.6 and newer which has a built-in support for SHA-3 hash functions. Example:

$ python3.6
Python 3.6.15 (default, Sep  5 2021, 00:00:00) 
>>> import hashlib
>>> h=hashlib.new('sha3_224'); h.update(b'hello'); print(h.hexdigest())
b87f88c72702fff1748e58b87e9141a42c0dbedc29a78cb0d4a5cd81

By the way, building pysha3 on Python 3.11 now fails with:

    Modules/_sha3/backport.inc:78:10: fatal error: pystrhex.h: No such file or directory

The pystrhex.h header file has been removed in Python 3.11 by bpo-45434. But I don't think that it's worth it trying to port it to Python 3.11, if the module must not be used on Python 3.6 and newer.

Environment markers can be used to skip the pysha3 dependency on Python 3.6 on newer.

Example: "pysha3; python_version < '3.6'"
msg407532 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-12-02 15:41
> Naked-0.1.31

Affected code is only code generated by Cython: the project only has to regenerated code with a recent Cython version.
msg407536 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-12-02 16:19
> mypy-0.910

I proposed a fix: https://github.com/python/mypy/pull/11652
msg407872 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-12-06 22:20
> In the PyPI top 5000, I found two projects using PyDescr_TYPE() and PyDescr_NAME() as l-value: M2Crypto and mecab-python3. In both cases, it was code generated by SWIG

I proposed a first PR for Py_TYPE():
https://github.com/swig/swig/pull/2116
msg407875 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-12-06 22:48
> python-snappy-0.6.0: maybe_resize() in snappy/snappymodule.cc

I proposed a fix: https://github.com/andrix/python-snappy/pull/114
msg411774 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2022-01-26 16:37
> In the PyPI top 5000, I found two projects using PyDescr_TYPE() and PyDescr_NAME() as l-value: M2Crypto and mecab-python3. In both cases, it was code generated by SWIG

I created bpo-46538 "[C API] Make the PyDescrObject structure opaque" to handle PyDescr_NAME() and PyDescr_TYPE() macros. But IMO it's not really worth it to make the PyDescrObject structure opaque. It's just too much work, whereas PyDescrObject is not performance sensitive. It's ok to continue exposing this structure in public for now.

I will exclude PyDescr_NAME() and PyDescr_TYPE() from the PEP 674.
msg411802 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2022-01-26 22:56
> datatable-1.0.0.tar.gz

I created https://github.com/h2oai/datatable/pull/3231
msg411803 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2022-01-26 23:12
> pickle5-0.0.12: pickle5/_pickle.c

This project is a backport targeting Python 3.7 and older. I'm not sure if it makes sense to update to it to Python 3.11.

It's the same for pysha3 which targets Python <= 3.5.
msg411826 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2022-01-27 03:07
> * guppy3-3.1.2: src/sets/bitset.c and src/sets/nodeset.c

I created: https://github.com/zhuyifei1999/guppy3/pull/40
msg411827 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2022-01-27 03:09
> * scipy-1.7.3: scipy/_lib/boost/boost/python/object/make_instance.hpp

This is a vendored the Boost.org python module which has already been fixed in boost 1.78.0 (commit: January 2021) by:
https://github.com/boostorg/python/commit/500194edb7833d0627ce7a2595fec49d0aae2484

scipy should just update its scipy/_lib/boost copy.
msg411831 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2022-01-27 03:24
> recordclass-0.16.3: lib/recordclass/_dataobject.c + code generated by Cython


I created: https://bitbucket.org/intellimath/recordclass/pull-requests/1/python-311-support-use-py_set_size
msg411921 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2022-01-27 20:29
> * zodbpickle-2.2.0: src/zodbpickle/_pickle_33.c

Technically, zodbpickle works on Python 3.11 and is not impacted by the Py_SIZE() change.

_pickle_33.c redefines the Py_SIZE() macro to continue using as an l-value:
https://github.com/zopefoundation/zodbpickle/commit/8d99afcea980fc7bb2ef38aadf53300e08fc4318

I proposed a PR to use Py_SET_SIZE() explicitly:
https://github.com/zopefoundation/zodbpickle/pull/64
History
Date User Action Args
2022-04-11 14:59:51adminsetgithub: 89639
2022-01-27 20:29:40vstinnersettitle: [C API] PEP 674: Disallow using macros as l-value -> [C API] PEP 674: Disallow using macros (Py_TYPE and Py_SIZE) as l-value
2022-01-27 20:29:07vstinnersetmessages: + msg411921
2022-01-27 03:24:45vstinnersetmessages: + msg411831
2022-01-27 03:09:35vstinnersetmessages: + msg411827
2022-01-27 03:07:01vstinnersetmessages: + msg411826
2022-01-26 23:12:12vstinnersetmessages: + msg411803
2022-01-26 22:56:33vstinnersetmessages: + msg411802
2022-01-26 16:37:19vstinnersetmessages: + msg411774
2021-12-06 22:48:03vstinnersetmessages: + msg407875
2021-12-06 22:20:56vstinnersetmessages: + msg407872
2021-12-02 21:20:33mark.dickinsonsetnosy: - mark.dickinson
2021-12-02 16:19:09vstinnersetmessages: + msg407536
2021-12-02 15:41:05vstinnersetmessages: + msg407532
2021-12-02 15:27:52vstinnersetmessages: + msg407531
2021-12-02 15:20:49vstinnersetmessages: + msg407529
2021-12-02 15:13:22vstinnersetmessages: + msg407528
2021-12-01 15:54:33vstinnersetmessages: + msg407463
2021-12-01 13:29:04vstinnersetmessages: + msg407456
2021-12-01 00:25:37vstinnersetfiles: + pep674_regex.py

messages: + msg407417
2021-12-01 00:21:50vstinnersetmessages: + msg407416
2021-11-30 23:46:17vstinnersetmessages: + msg407412
2021-11-30 23:37:23vstinnersetmessages: + msg407410
2021-11-30 23:33:23vstinnersettitle: [C API] Disallow using PyFloat_AS_DOUBLE() as l-value -> [C API] PEP 674: Disallow using macros as l-value
2021-11-30 18:34:51vstinnersetmessages: + msg407395
2021-11-30 14:14:03vstinnersetmessages: + msg407375
2021-11-30 14:13:03vstinnersetmessages: + msg407374
2021-11-30 13:12:10vstinnersetpull_requests: + pull_request28092
2021-11-30 11:14:49vstinnersetmessages: + msg407358
2021-11-30 10:36:39vstinnersetpull_requests: + pull_request28087
2021-11-29 14:57:07vstinnersetmessages: + msg407283
2021-11-25 13:26:12vstinnersetmessages: + msg406996
2021-11-15 20:41:34erlendaaslandsetnosy: + erlendaasland
2021-11-15 10:46:54lemburgsetmessages: + msg406348
2021-11-15 09:54:49vstinnersetmessages: + msg406346
2021-11-15 09:35:37vstinnersetmessages: + msg406345
2021-11-15 09:22:59lemburgsetmessages: + msg406344
2021-11-15 07:54:56arhadthedevsetnosy: + arhadthedev
messages: + msg406343
2021-10-15 17:44:08vstinnersetmessages: + msg404039
2021-10-15 12:44:52vstinnersettitle: [C API] Convert "AS" functions, like PyFloat_AS_DOUBLE(), to static inline functions -> [C API] Disallow using PyFloat_AS_DOUBLE() as l-value
2021-10-15 12:42:16vstinnersetmessages: + msg404010
2021-10-15 12:41:11vstinnersetpull_requests: + pull_request27264
2021-10-15 11:01:27gdr@garethrees.orgsetnosy: + gdr@garethrees.org
messages: + msg404001
2021-10-15 10:20:32lemburgsetmessages: + msg403999
2021-10-15 09:43:06vstinnersetmessages: + msg403995
2021-10-15 09:10:44lemburgsetmessages: + msg403990
2021-10-15 09:03:45mark.dickinsonsetnosy: + mark.dickinson
2021-10-15 00:00:17vstinnersetmessages: + msg403965
2021-10-14 23:47:11vstinnersetmessages: + msg403961
2021-10-14 23:09:24vstinnersetkeywords: + patch
stage: patch review
pull_requests: + pull_request27250
2021-10-14 22:06:29rhettingersetnosy: + rhettinger, lemburg
messages: + msg403956
2021-10-14 21:17:31vstinnercreate