Get rid of rare format units in PyArg_Parse* #68197

serhiy-storchaka · 2015-04-19T19:14:44Z

BPO	24009
Nosy	@ronaldoussoren, @vadmium, @serhiy-storchaka
Dependencies	bpo-24042: Convert os._getfullpathname() and os._isdir() to Argument Clinic
Files	issue24009_textio_decoder_getstate.patch: Get rid of "y#" in textio

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = 'https://github.com/serhiy-storchaka'
closed_at = None
created_at = <Date 2015-04-19.19:14:44.075>
labels = ['interpreter-core', 'type-feature']
title = 'Get rid of rare format units in PyArg_Parse*'
updated_at = <Date 2018-06-14.10:29:46.552>
user = 'https://github.com/serhiy-storchaka'

bugs.python.org fields:

activity = <Date 2018-06-14.10:29:46.552>
actor = 'taleinat'
assignee = 'serhiy.storchaka'
closed = False
closed_date = None
closer = None
components = ['Interpreter Core']
creation = <Date 2015-04-19.19:14:44.075>
creator = 'serhiy.storchaka'
dependencies = ['24042']
files = ['39186']
hgrepos = []
issue_num = 24009
keywords = ['patch']
message_count = 9.0
messages = ['241546', '241870', '242007', '242646', '242652', '242951', '244084', '244087', '244088']
nosy_count = 4.0
nosy_names = ['ronaldoussoren', 'python-dev', 'martin.panter', 'serhiy.storchaka']
pr_nums = []
priority = 'low'
resolution = None
stage = None
status = 'open'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue24009'
versions = ['Python 3.5', 'Python 3.6']

serhiy-storchaka · 2015-04-19T19:14:44Z

There are a lot of format units supported in PyArg_Parse* functions, but some of them are rarely or never used in current CPython code. Some of format units are legacy from Python 2 and are not needed in modern Python 3 code or can be replaced with custom converter.

Here are results of grepping (not including Modules/_testcapimodule.c).

"es", "es#", "et#", "z*", "Z#" are not used.

"y#":
Modules/_io/textio.c:2334: if (!PyArg_ParseTuple(_state, "y#i", &dec_buffer, &dec_buffer_len, &dec_flags)) { \

"z#":
Modules/_ctypes/_ctypes.c:3327: if (!PyArg_ParseTuple(args, "is|Oz#", &index, &name, &paramflags, &iid, &iid_len))

"u#":
Modules/arraymodule.c:248: if (!PyArg_Parse(v, "u#;array item must be unicode character", &p, &len))
PC/winreg.c:1547: if (!PyArg_ParseTuple(args, "OZiu#:SetValue",

"y":
Modules/_io/textio.c:2334: if (!PyArg_ParseTuple(_state, "y#i", &dec_buffer, &dec_buffer_len, &dec_flags)) { \
Modules/_cursesmodule.c:2790: if (!PyArg_ParseTuple(args,"y;str", &str))
Modules/_cursesmodule.c:3026: if (!PyArg_ParseTuple(args, "y|iiiiiiiii:tparm",
Modules/posixmodule.c:3767: if (!PyArg_ParseTuple (args, "y:_getfullpathname",
Modules/posixmodule.c:3872: if (!PyArg_ParseTuple(args, "y:_isdir", &path))
Modules/faulthandler.c:941: if (!PyArg_ParseTuple(args, "y:fatal_error", &message))

"et":
Modules/socketmodule.c:4499: if (!PyArg_ParseTuple(args, "et:gethostbyname", "idna", &name))
Modules/socketmodule.c:4667: if (!PyArg_ParseTuple(args, "et:gethostbyname_ex", "idna", &name))
Modules/socketmodule.c:4744: if (!PyArg_ParseTuple(args, "et:gethostbyaddr", "idna", &ip_num))
Modules/_tkinter.c:2099: if (!PyArg_ParseTuple(args, "et:splitlist", "utf-8", &list))
Modules/_tkinter.c:2162: if (!PyArg_ParseTuple(args, "et:split", "utf-8", &list))
Modules/_ssl.c:3038: if (!PyArg_ParseTupleAndKeywords(args, kwds, "O!iet:_wrap_socket", kwlist,
Modules/_ssl.c:3070: if (!PyArg_Parse(hostname_obj, "et", "idna", &hostname))

"s*":
Modules/_codecsmodule.c:188: if (!PyArg_ParseTuple(args, "s*|z:escape_decode",
Modules/_codecsmodule.c:552: if (!PyArg_ParseTuple(args, "s*|z:unicode_escape_decode",
Modules/_codecsmodule.c:569: if (!PyArg_ParseTuple(args, "s*|z:raw_unicode_escape_decode",
Modules/_codecsmodule.c:696: if (!PyArg_ParseTuple(args, "s*|z:readbuffer_encode",
Modules/_ssl.c:3734: if (!PyArg_ParseTuple(args, "s*d:RAND_add", &view, &entropy))
Modules/fcntlmodule.c:225: if (PyArg_Parse(ob_arg, "s*:ioctl", &pstr)) {
Modules/clinic/arraymodule.c.h:278: if (!PyArg_Parse(arg, "s*:fromstring", &buffer))

"s#":
Modules/_gdbmmodule.c:128: if (!PyArg_Parse(key, "s#", &krec.dptr, &krec.dsize) )
Modules/_gdbmmodule.c:176: if (!PyArg_Parse(v, "s#", &krec.dptr, &krec.dsize) ) {
Modules/_gdbmmodule.c:194: if (!PyArg_Parse(w, "s#", &drec.dptr, &drec.dsize)) {
Modules/fcntlmodule.c:71: if (PyArg_Parse(arg, "s#", &str, &len)) {
Modules/_ctypes/_ctypes.c:2569: if (!PyArg_ParseTuple(args, "Os#", &dict, &data, &len))
Modules/clinic/unicodedata.c.h:361: if (!PyArg_Parse(arg, "s#:lookup", &name, &name_length))
Modules/clinic/_dbmmodule.c.h:62: if (!PyArg_ParseTuple(args, "s#|O:get",
Modules/clinic/_dbmmodule.c.h:95: if (!PyArg_ParseTuple(args, "s#|O:setdefault",
Modules/clinic/_gdbmmodule.c.h:150: if (!PyArg_Parse(arg, "s#:nextkey", &key, &key_length))
Modules/_dbmmodule.c:108: if (!PyArg_Parse(key, "s#", &krec.dptr, &tmp_size) )
Modules/_dbmmodule.c:132: if ( !PyArg_Parse(v, "s#", &krec.dptr, &tmp_size) ) {
Modules/_dbmmodule.c:150: if ( !PyArg_Parse(w, "s#", &drec.dptr, &tmp_size) ) {
Modules/_dbmmodule.c:336: if ( !PyArg_Parse(default_value, "s#", &val.dptr, &tmp_size) ) {

In future may be we could deprecate some format units and remove them in 4.0.

This issue is a meta issue. Every case should be considered individually.

serhiy-storchaka · 2015-04-23T17:15:06Z

In textio.c, the decoder always should return bytes, not arbitrary read-only buffer (this is required in other parts of the code). So "y#" can be replaced with "O" with PyBytes_GET_SIZE.

taleinat · 2015-04-25T08:22:38Z

+1. I was recently trying to use the C API for a 3rd party library, and all of these subtly different string parameter formats made things surprisingly confusing.

These are part of the Python C API, so removing them could break 3rd party code. Simply searching through the stdlib is not enough to show that these are not in use. So removal would require a deprecation period.

python-dev · 2015-05-06T06:54:26Z

New changeset d65233f630e1 by Serhiy Storchaka in branch 'default':
Issue bpo-24009: Got rid of using rare "y#" format unit in TextIOWrapper.tell().
https://hg.python.org/cpython/rev/d65233f630e1

ronaldoussoren · 2015-05-06T10:14:21Z

Note that these format characters can also be used outside of CPython.

serhiy-storchaka · 2015-05-12T09:47:24Z

Yes, of course, I think we shouldn't drop support of these format units. But using them likely is a sign of outdated or transitional code. It should be discouraged in new code, and every case should be analyzed and cleaned.

vadmium · 2015-05-26T06:04:30Z

“u#” should not be deprecated without first deprecating “u”, which is less useful due to not returning a buffer length.

Also, I have always been mystified about how “s#”, “z#”, “y” and “y#” can properly to return a pointer into a buffer for arbitrary immutable bytes-like objects, without requiring PyBuffer_Release() to be called. Perhaps this is bad design to be discouraged. Or maybe a documentation oversight somewhere.

serhiy-storchaka · 2015-05-26T07:39:46Z

“s#”, “z#”, “y” and “y#” work only with read-only buffers, for which PyBuffer_Release() is no-op operation. Initially they was designed for work with old buffer protocol that doesn't support releasing a buffer.

vadmium · 2015-05-26T07:46:12Z

Yes I just figured out that myself. Specifically, PyBufferProcs.bf_releasebuffer has to be NULL, and the buffer stays alive as long as the object stays alive.

Also it looks like I was wrong about “u” being useless. I was tricked by a contradiction in the documentation, but I will try to fix this in a patch to bpo-24278.

serhiy-storchaka self-assigned this Apr 19, 2015

serhiy-storchaka added interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement labels Apr 19, 2015

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Get rid of rare format units in PyArg_Parse* #68197

Get rid of rare format units in PyArg_Parse* #68197

serhiy-storchaka commented Apr 19, 2015

serhiy-storchaka commented Apr 19, 2015

serhiy-storchaka commented Apr 23, 2015

taleinat commented Apr 25, 2015

python-dev mannequin commented May 6, 2015

ronaldoussoren commented May 6, 2015

serhiy-storchaka commented May 12, 2015

vadmium commented May 26, 2015

serhiy-storchaka commented May 26, 2015

vadmium commented May 26, 2015

Get rid of rare format units in PyArg_Parse* #68197

Get rid of rare format units in PyArg_Parse* #68197

Comments

serhiy-storchaka commented Apr 19, 2015

serhiy-storchaka commented Apr 19, 2015

serhiy-storchaka commented Apr 23, 2015

taleinat commented Apr 25, 2015

python-dev mannequin commented May 6, 2015

ronaldoussoren commented May 6, 2015

serhiy-storchaka commented May 12, 2015

vadmium commented May 26, 2015

serhiy-storchaka commented May 26, 2015

vadmium commented May 26, 2015