Issue 40085: Argument parsing option c should accept int between -128 to 255 ?

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/84266

classification

Title:	Argument parsing option c should accept int between -128 to 255 ?
Type:	enhancement	Stage:	resolved
Components:	C API	Versions:	Python 3.9

process

Status:	closed	Resolution:	not a bug
Dependencies:		Superseder:
Assigned To:		Nosy List:	Dennis Sweeney, tzickel
Priority:	normal	Keywords:

Created on 2020-03-27 07:45 by tzickel, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Messages (2)
msg365139 - (view)	Author: (tzickel) *	Date: 2020-03-27 07:45
I converted some code from python to c-api and was surprised that a code stopped working. Basically the "c" parsing option allows for 1 char bytes or bytearray inputs and converts them to a C char. But just as indexing a bytes array returns an int, so should this option support it. i.e. b't'[0] = 116 Not sure if it should limit between 0 to 255 or -128 to 127.
msg365812 - (view)	Author: Dennis Sweeney (Dennis Sweeney) *	Date: 2020-04-05 07:05
I think this question is about types in c, apart from any Python c API. According to https://docs.python.org/3/c-api/arg.html#numbers, the specifier is c: (bytes or bytearray of length 1) -> [char] so you should be able to write to a c variable of type "char". In c, "signed char"s are signed, with values in [-128..127]. C also has an "unsigned char" type, with values in [0..255]. Both types of variables contain eight bits of information, but they are interpreted in different ways. As such, we can write something like this: signed char c1; unsigned char c2; PyObject tup = Py_BuildValue("(c)", 0xff); PyArg_ParseTuple(tup, "c", &c1); PyArg_ParseTuple(tup, "c", &c2); if (c1 < 0) { printf("First is signed.\n"); } else { printf("First is unsigned.\n"); } if (c2 < 0) { printf("Second is signed.\n"); } else { printf("Second is unsigned.\n"); } and get back: First is signed. Second is unsigned. Here, c1 and c2 each store nothing but the eight bits 0b11111111 (a.k.a. 0xff), but the compiler interprets c1 in two's-complement as -1 whereas it interprets c2 as 255, simply based on variable types. If you just care about which eight bits you have, using "char" is good enough, and comparing "char"s for equality is all well and good. But if you're doing arithmetic or numerical comparisons on chars, I believe it's best practice to explicitly declare "signed" or "unsigned", since it's implementation-defined which one the compiler will do if you don't specify. Note that if you replace 0xff with -1 in the c code above, the result will probably be the same, since the int -1 will be cast to the the same least significant byte as 0xff (the upper bytes are thrown away). (A technicality: even the bounds for the number of bits in a char are implementation-specific, but unsigned chars must support at least* [-127..127] and signed chars must support at least [0..255], and implementation using more than 8 bits are quite rare. If you wanted to be totally sure about exactly the types you're using, you could technically use uint8_t or int8_t.)

History
Date	User	Action	Args
2022-04-11 14:59:28	admin	set	github: 84266
2021-07-27 02:41:17	Dennis Sweeney	set	status: open -> closed resolution: not a bug stage: resolved
2020-04-05 07:05:22	Dennis Sweeney	set	nosy: + Dennis Sweeney messages: + msg365812
2020-03-27 07:45:42	tzickel	create