This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Argument parsing option c should accept int between -128 to 255 ?
Type: enhancement Stage: resolved
Components: C API Versions: Python 3.9
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: Dennis Sweeney, tzickel
Priority: normal Keywords:

Created on 2020-03-27 07:45 by tzickel, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Messages (2)
msg365139 - (view) Author: (tzickel) * Date: 2020-03-27 07:45
I converted some code from python to c-api and was surprised that a code stopped working.

Basically the "c" parsing option allows for 1 char bytes or bytearray inputs and converts them to a C char.

But just as indexing a bytes array returns an int, so should this option support it. i.e. b't'[0] = 116

Not sure if it should limit between 0 to 255 or -128 to 127.
msg365812 - (view) Author: Dennis Sweeney (Dennis Sweeney) * (Python committer) Date: 2020-04-05 07:05
I think this question is about types in c, apart from any Python c API. 

According to https://docs.python.org/3/c-api/arg.html#numbers, the specifier is

    c: (bytes or bytearray of length 1) -> [char]

so you should be able to write to a c variable of type "char". In c, "signed char"s are signed, with values in [-128..127]. C also has an "unsigned char" type, with values in [0..255]. Both types of variables contain eight bits of information, but they are interpreted in different ways. As such, we can write something like this:

    signed char c1;
    unsigned char c2;
    PyObject *tup = Py_BuildValue("(c)", 0xff);

    PyArg_ParseTuple(tup, "c", &c1);
    PyArg_ParseTuple(tup, "c", &c2);

    if (c1 < 0) {
        printf("First is signed.\n");
    }
    else {
        printf("First is unsigned.\n");
    }

    if (c2 < 0) {
        printf("Second is signed.\n");
    }
    else {
        printf("Second is unsigned.\n");
    }

and get back:

    First is signed.
    Second is unsigned.

Here, c1 and c2 each store nothing but the eight bits 0b11111111 (a.k.a. 0xff), but the compiler interprets c1 in two's-complement as -1 whereas it interprets c2 as 255, simply based on variable types.

If you just care about which eight bits you have, using "char" is good enough, and comparing "char"s for equality is all well and good. But if you're doing arithmetic or numerical comparisons on chars, I believe it's best practice to explicitly declare "signed" or "unsigned", since it's implementation-defined which one the compiler will do if you don't specify.

Note that if you replace 0xff with -1 in the c code above, the result will probably be the same, since the int -1 will be cast to the the same least significant byte as 0xff (the upper bytes are thrown away).

(A technicality: even the bounds for the number of bits in a char are implementation-specific, but unsigned chars must support *at least* [-127..127] and signed chars must support *at least* [0..255], and implementation using more than 8 bits are quite rare. If you wanted to be totally sure about exactly the types you're using, you could technically use uint8_t or int8_t.)
History
Date User Action Args
2022-04-11 14:59:28adminsetgithub: 84266
2021-07-27 02:41:17Dennis Sweeneysetstatus: open -> closed
resolution: not a bug
stage: resolved
2020-04-05 07:05:22Dennis Sweeneysetnosy: + Dennis Sweeney
messages: + msg365812
2020-03-27 07:45:42tzickelcreate