This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Allow "p" in Py_BuildValue
Type: Stage: patch review
Components: Versions:
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: JelleZijlstra, godlygeek, pablogsal, serhiy.storchaka, vstinner
Priority: normal Keywords: patch

Created on 2021-09-29 18:08 by pablogsal, last changed 2022-04-11 14:59 by admin.

Pull Requests
URL Status Linked Edit
PR 28634 open pablogsal, 2021-09-29 18:22
Messages (18)
msg402897 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2021-09-29 18:08
We should allow Py_BuildValue to take a "p" argument that takes a C int and produces a Python Bool so it would be symmetric with the p format code for PyArg_Parse that takes a Python object and returns a C int containing PyObject_IsTrue(obj).
msg402899 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2021-09-29 19:48
There was no much need of this feature. In rare cases when we needed to build a bool in Py_BuildValue (I have found only 2 cases in the stdlib, and one of them is added by me) we use the following idiom:

   Py_BuildValue("...O...", ..., expr ? Py_True : Py_False, ...)

You can have a temptation to write it as

   Py_BuildValue("...p...", ..., expr, ...)

but there is a catch -- the arguments should be a C int. If it is not, you can break a stack. Explicit cast to int ("(int)expr") is not always correct, so you will need to write "expr ? 1 : 0" which is not much better than "expr ? Py_True : Py_False".
msg402900 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-09-29 20:20
I would be interested to see how the "p" format could be used in existing code. I found a few places would benefit of it. Either create a new draft PR, or put it in the same PR 28634.

Modules/_itertoolsmodule.c:

        return Py_BuildValue("O(N)(OO)", Py_TYPE(lz), it, lz->saved, Py_True);

    return Py_BuildValue("O(O)(OO)", Py_TYPE(lz), lz->it, lz->saved,
                         lz->firstpass ? Py_True : Py_False);

Modules/_ssl.c:

        ok = RAND_bytes((unsigned char*)PyBytes_AS_STRING(bytes), len);
        if (ok == 0 || ok == 1)
            return Py_BuildValue("NO", bytes, ok == 1 ? Py_True : Py_False);

    return Py_BuildValue(
        "{sksssssssisi"
        "sOssssssss"
        "}",
        "id", cipher_id,
        "name", cipher_name,
        "protocol", cipher_protocol,
        "description", buf,
        "strength_bits", strength_bits,
        "alg_bits", alg_bits
        ,"aead", aead ? Py_True : Py_False,
        "symmetric", skcipher,
        "digest", digest,
        "kea", kx,
        "auth", auth
       );


Modules/main.c:

    runargs = PyTuple_Pack(2, module, set_argv0 ? Py_True : Py_False);

Objects/fileobject.c:

    stream = _PyObject_CallMethodId(io, &PyId_open, "isisssO", fd, mode,
                                 buffering, encoding, errors,
                                 newline, closefd ? Py_True : Py_False);
msg402901 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2021-09-29 20:23
> Either create a new draft PR, or put it in the same PR 28634.


I prefer to reach an agreement first here before I add more things to the PR
msg402902 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2021-09-29 20:25
> There was no much need of this feature. In rare cases when we needed to build a bool in Py_BuildValue (I have found only 2 cases in the stdlib, and one of them is added by me) we use the following idiom:

Is true that this is not very common, but I have find this in several occasions writing extension code and given the small impact and maintenance cost and the symmetry with PyArg_Parse I figured that could be useful
msg402904 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-09-29 20:33
> but there is a catch -- the arguments should be a C int. If it is not, you can break a stack.

I never understood how "..." arguments work under the hood. What happens if you pass a double, it is stored as a double on the C stack, and then Py_BuildValue() will read junk data?
msg402910 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2021-09-29 20:48
The va_start macro just sets up a pointer to the first function parameter, e.g.:-

 void func (int a, ...)
 { 
   // va_start
   char *p = (char *) &a + sizeof a;
 }

which makes p point to the second parameter. The va_arg macro does this:-

 void func (int a, ...)
 { 
   // va_start
   char *p = (char *) &a + sizeof a;

   // va_arg
   int i1 = *((int *)p);
   p += sizeof (int);

   // va_arg
   int i2 = *((int *)p);
   p += sizeof (int);

   // va_arg
   long i2 = *((long *)p);
   p += sizeof (long);
 }
msg402915 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2021-09-29 21:46
> What happens if you pass a double, it is stored as a double on the C stack, and then Py_BuildValue() will read junk data?

AFAIK, it is complicated. On old computer primitive compilers just pushed arguments one by one on the stack (in platform-depending order). When you push 64-bytes double and read 32-bit int, you get a junk not only for this read, but for reads of all following or preceding arguments.

Modern compilers on modern platforms can use registers to pass variable arguments. I do not know details, but doubles and ints are passed using different registers, so the difference can be even larger.

My concern is that such feature can provoke bugs. The risk can exceed the minor benefit.
msg402917 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2021-09-29 21:58
You can see how is implemented on am64 for example here:

https://blog.nelhage.com/2010/10/amd64-and-va_arg/
msg402918 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2021-09-29 21:58
> My concern is that such feature can provoke bugs. The risk can exceed the minor benefit.

But isn't that risk the same for other formatting parameters?
msg402921 - (view) Author: Matt Wozniski (godlygeek) * Date: 2021-09-29 23:25
> but there is a catch -- the arguments should be a C int

Or a type that promotes to int. If you pass a C short or char, or a C++ bool, it is implicitly promoted to int.

> so you will need to write "expr ? 1 : 0"

Or alternatively "!!expr"

> which is not much better than "expr ? Py_True : Py_False"

I had to write that recently after reading through the Py_BuildValue docs twice to make sure I wasn't missing a format code I could use. It's a strange omission, especially because 'p' exists for PyArg_Parse. I'd very much like to see this change.
msg402923 - (view) Author: Matt Wozniski (godlygeek) * Date: 2021-09-29 23:38
I spotted three other uses in the stdlib:

Modules/_io/_iomodule.c

        raw = PyObject_CallFunction(RawIO_class, "OsOO",
                                    path_or_fd, rawmode,
                                    closefd ? Py_True : Py_False,
                                    opener);

    wrapper = PyObject_CallFunction((PyObject *)&PyTextIOWrapper_Type,
                                    "OsssO",
                                    buffer,
                                    encoding, errors, newline,
                                    line_buffering ? Py_True : Py_False);

Modules/_sqlite/connection.c

    if (PySys_Audit("sqlite3.enable_load_extension",
                    "OO", self, onoff ? Py_True : Py_False) < 0) {
        return NULL;
    }
msg402924 - (view) Author: Matt Wozniski (godlygeek) * Date: 2021-09-29 23:49
The leftmost argument of the ternary is an int for every example that Victor and I found in the stdlib, so no casting would be required in any of these cases.
msg402938 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2021-09-30 08:12
> But isn't that risk the same for other formatting parameters?

I think that the risk for other formatting parameters is smaller, because you know, that there are different formatting parameters for different integer and floating point types, and for pointers, and you know that you should care about truncation and overflow if the type of the argument is different from the type of the parameter.

But I am not in strict opposition against this feature. It is just that I have fixed so many bugs, that I try to search potential flaws and drawbacks in any new feature. Now you know about this, and you can decide whether the benefit is larger than the risk of potential damages. To me they look equally small.

Technically PR 28634 LGTM.
msg402941 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-09-30 09:04
> I think that the risk for other formatting parameters is smaller, because you know, that there are different formatting parameters for different integer and floating point types, and for pointers, and you know that you should care about truncation and overflow if the type of the argument is different from the type of the parameter.

IMO such problem can be solved with documentation.

Pablo: can you try to explain the problem in the documentation, and maybe provide an example in the doc showing how to avoid it?

I guess that a generic fix is to replace "value" with "value != 0". In C, "expr != 0" is an int, no?

What I understood is that Py_BuildValue("p", value) is problematic if value type is not int.

"!value" or "!!value" also has the issue if I understood correctly. Like:

    long value = 3; Py_BuildValue("p", !value);

I agree with Serhiy that Py_BuildValue() is ok-ish with other formats, but IMO it's the responsibility of the developer to pass the expect type (int for "p" format).

This problem is not specific to Py_BuildValue(). printf() also has exactly the same issue. Such code has an undefined behavior:

  long long x = 1; print("%d\n", x);

You must cast explicitly:

  long long x = 1; print("%d\n", (int)x);

Or use a different format, like:

  long long x = 1; print("%lld\n", x);
msg402951 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2021-09-30 12:04
> But I am not in strict opposition against this feature. It is just that I have fixed so many bugs, that I try to search potential flaws and drawbacks in any new feature. Now you know about this, and you can decide whether the benefit is larger than the risk of potential damages. To me they look equally small.

Thanks, Serhiy, for your insights and careful consideration. Among many things, your attention to detail, experience and careful consideration are what I admire the most! You rock :)

I will look at the stdlib examples and will incorporated them into the PR and then I will carefully consider the change again.

Thanks for the comments!
msg402965 - (view) Author: Matt Wozniski (godlygeek) * Date: 2021-09-30 15:44
> "!value" or "!!value" also has the issue if I understood correctly.

No, just as "value != 0" is an int, so is "!value".
msg412234 - (view) Author: Jelle Zijlstra (JelleZijlstra) * (Python committer) Date: 2022-02-01 03:59
I stumbled upon the PR and tend to agree with Serhiy that this could be a source of nasty bugs. Passing the wrong type to a va_arg() argument is undefined behavior (e.g. https://wiki.sei.cmu.edu/confluence/display/c/EXP47-C.+Do+not+call+va_arg+with+an+argument+of+the+incorrect+type), and it does seem easy to do that by accident for an int-like type. For printf() compilers implement custom checks that make sure the types are right, but we don't get that luxury with Py_BuildValue(). It's also easy to make a bool yourself with the O code and Py_True/Py_False, so I don't think the additional code is worth the risk.
History
Date User Action Args
2022-04-11 14:59:50adminsetgithub: 89488
2022-02-01 03:59:42JelleZijlstrasetnosy: + JelleZijlstra
messages: + msg412234
2021-09-30 15:44:46godlygeeksetmessages: + msg402965
2021-09-30 12:04:41pablogsalsetmessages: + msg402951
2021-09-30 09:04:20vstinnersetmessages: + msg402941
2021-09-30 08:12:20serhiy.storchakasetmessages: + msg402938
2021-09-29 23:49:32godlygeeksetmessages: + msg402924
2021-09-29 23:38:20godlygeeksetmessages: + msg402923
2021-09-29 23:25:08godlygeeksetnosy: + godlygeek
messages: + msg402921
2021-09-29 21:58:50pablogsalsetmessages: + msg402918
2021-09-29 21:58:01pablogsalsetmessages: + msg402917
2021-09-29 21:46:30serhiy.storchakasetmessages: + msg402915
2021-09-29 20:48:05pablogsalsetmessages: + msg402910
2021-09-29 20:33:40vstinnersetmessages: + msg402904
2021-09-29 20:25:22pablogsalsetmessages: + msg402902
2021-09-29 20:23:57pablogsalsetmessages: + msg402901
2021-09-29 20:20:56vstinnersetnosy: + vstinner
messages: + msg402900
2021-09-29 19:48:54serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg402899
2021-09-29 18:22:35pablogsalsetkeywords: + patch
stage: patch review
pull_requests: + pull_request27003
2021-09-29 18:08:04pablogsalcreate