This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Get rid of rare format units in PyArg_Parse*
Type: enhancement Stage:
Components: Interpreter Core Versions: Python 3.6, Python 3.5
process
Status: open Resolution:
Dependencies: 24042 Superseder:
Assigned To: serhiy.storchaka Nosy List: martin.panter, python-dev, ronaldoussoren, serhiy.storchaka
Priority: low Keywords: patch

Created on 2015-04-19 19:14 by serhiy.storchaka, last changed 2022-04-11 14:58 by admin.

Files
File name Uploaded Description Edit
issue24009_textio_decoder_getstate.patch serhiy.storchaka, 2015-04-23 18:04 Get rid of "y#" in textio review
Messages (9)
msg241546 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-04-19 19:14
There are a lot of format units supported in PyArg_Parse* functions, but some of them are rarely or never used in current CPython code. Some of format units are legacy from Python 2 and are not needed in modern Python 3 code or can be replaced with custom converter.

Here are results of grepping (not including Modules/_testcapimodule.c).

"es", "es#", "et#", "z*", "Z#" are not used.

"y#":
Modules/_io/textio.c:2334:        if (!PyArg_ParseTuple(_state, "y#i", &dec_buffer, &dec_buffer_len, &dec_flags)) { \

"z#":
Modules/_ctypes/_ctypes.c:3327:    if (!PyArg_ParseTuple(args, "is|Oz#", &index, &name, &paramflags, &iid, &iid_len))

"u#":
Modules/arraymodule.c:248:    if (!PyArg_Parse(v, "u#;array item must be unicode character", &p, &len))
PC/winreg.c:1547:    if (!PyArg_ParseTuple(args, "OZiu#:SetValue",

"y":
Modules/_io/textio.c:2334:        if (!PyArg_ParseTuple(_state, "y#i", &dec_buffer, &dec_buffer_len, &dec_flags)) { \
Modules/_cursesmodule.c:2790:    if (!PyArg_ParseTuple(args,"y;str", &str))
Modules/_cursesmodule.c:3026:    if (!PyArg_ParseTuple(args, "y|iiiiiiiii:tparm",
Modules/posixmodule.c:3767:    if (!PyArg_ParseTuple (args, "y:_getfullpathname",
Modules/posixmodule.c:3872:    if (!PyArg_ParseTuple(args, "y:_isdir", &path))
Modules/faulthandler.c:941:    if (!PyArg_ParseTuple(args, "y:fatal_error", &message))

"et":
Modules/socketmodule.c:4499:    if (!PyArg_ParseTuple(args, "et:gethostbyname", "idna", &name))
Modules/socketmodule.c:4667:    if (!PyArg_ParseTuple(args, "et:gethostbyname_ex", "idna", &name))
Modules/socketmodule.c:4744:    if (!PyArg_ParseTuple(args, "et:gethostbyaddr", "idna", &ip_num))
Modules/_tkinter.c:2099:    if (!PyArg_ParseTuple(args, "et:splitlist", "utf-8", &list))
Modules/_tkinter.c:2162:    if (!PyArg_ParseTuple(args, "et:split", "utf-8", &list))
Modules/_ssl.c:3038:        if (!PyArg_ParseTupleAndKeywords(args, kwds, "O!iet:_wrap_socket", kwlist,
Modules/_ssl.c:3070:        if (!PyArg_Parse(hostname_obj, "et", "idna", &hostname))

"s*":
Modules/_codecsmodule.c:188:    if (!PyArg_ParseTuple(args, "s*|z:escape_decode",
Modules/_codecsmodule.c:552:    if (!PyArg_ParseTuple(args, "s*|z:unicode_escape_decode",
Modules/_codecsmodule.c:569:    if (!PyArg_ParseTuple(args, "s*|z:raw_unicode_escape_decode",
Modules/_codecsmodule.c:696:    if (!PyArg_ParseTuple(args, "s*|z:readbuffer_encode",
Modules/_ssl.c:3734:    if (!PyArg_ParseTuple(args, "s*d:RAND_add", &view, &entropy))
Modules/fcntlmodule.c:225:        if (PyArg_Parse(ob_arg, "s*:ioctl", &pstr)) {
Modules/clinic/arraymodule.c.h:278:    if (!PyArg_Parse(arg, "s*:fromstring", &buffer))

"s#":
Modules/_gdbmmodule.c:128:    if (!PyArg_Parse(key, "s#", &krec.dptr, &krec.dsize) )
Modules/_gdbmmodule.c:176:    if (!PyArg_Parse(v, "s#", &krec.dptr, &krec.dsize) ) {
Modules/_gdbmmodule.c:194:        if (!PyArg_Parse(w, "s#", &drec.dptr, &drec.dsize)) {
Modules/fcntlmodule.c:71:        if (PyArg_Parse(arg, "s#", &str, &len)) {
Modules/_ctypes/_ctypes.c:2569:    if (!PyArg_ParseTuple(args, "Os#", &dict, &data, &len))
Modules/clinic/unicodedata.c.h:361:    if (!PyArg_Parse(arg, "s#:lookup", &name, &name_length))
Modules/clinic/_dbmmodule.c.h:62:    if (!PyArg_ParseTuple(args, "s#|O:get",
Modules/clinic/_dbmmodule.c.h:95:    if (!PyArg_ParseTuple(args, "s#|O:setdefault",
Modules/clinic/_gdbmmodule.c.h:150:    if (!PyArg_Parse(arg, "s#:nextkey", &key, &key_length))
Modules/_dbmmodule.c:108:    if (!PyArg_Parse(key, "s#", &krec.dptr, &tmp_size) )
Modules/_dbmmodule.c:132:    if ( !PyArg_Parse(v, "s#", &krec.dptr, &tmp_size) ) {
Modules/_dbmmodule.c:150:        if ( !PyArg_Parse(w, "s#", &drec.dptr, &tmp_size) ) {
Modules/_dbmmodule.c:336:        if ( !PyArg_Parse(default_value, "s#", &val.dptr, &tmp_size) ) {

In future may be we could deprecate some format units and remove them in 4.0.

This issue is a meta issue. Every case should be considered individually.
msg241870 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-04-23 17:15
In textio.c, the decoder always should return bytes, not arbitrary read-only buffer (this is required in other parts of the code). So "y#" can be replaced with "O" with PyBytes_GET_SIZE.
msg242007 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2015-04-25 08:22
+1. I was recently trying to use the C API for a 3rd party library, and all of these subtly different string parameter formats made things surprisingly confusing.

These are part of the Python C API, so removing them could break 3rd party code. Simply searching through the stdlib is not enough to show that these are not in use. So removal would require a deprecation period.
msg242646 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2015-05-06 06:54
New changeset d65233f630e1 by Serhiy Storchaka in branch 'default':
Issue #24009: Got rid of using rare "y#" format unit in TextIOWrapper.tell().
https://hg.python.org/cpython/rev/d65233f630e1
msg242652 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2015-05-06 10:14
Note that these format characters can also be used outside of CPython.
msg242951 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-05-12 09:47
Yes, of course, I think we shouldn't drop support of these format units. But using them likely is a sign of outdated or transitional code. It should be discouraged in new code, and every case should be analyzed and cleaned.
msg244084 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-05-26 06:04
“u#” should not be deprecated without first deprecating “u”, which is less useful due to not returning a buffer length.

Also, I have always been mystified about how “s#”, “z#”, “y” and “y#” can properly to return a pointer into a buffer for arbitrary immutable bytes-like objects, without requiring PyBuffer_Release() to be called. Perhaps this is bad design to be discouraged. Or maybe a documentation oversight somewhere.
msg244087 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-05-26 07:39
“s#”, “z#”, “y” and “y#” work only with read-only buffers, for which PyBuffer_Release() is no-op operation. Initially they was designed for work with old buffer protocol that doesn't support releasing a buffer.
msg244088 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-05-26 07:46
Yes I just figured out that myself. Specifically, PyBufferProcs.bf_releasebuffer has to be NULL, and the buffer stays alive as long as the object stays alive.

Also it looks like I was wrong about “u” being useless. I was tricked by a contradiction in the documentation, but I will try to fix this in a patch to Issue 24278.
History
Date User Action Args
2022-04-11 14:58:15adminsetgithub: 68197
2018-06-14 10:29:46taleinatsetnosy: - taleinat
2015-05-26 07:46:11martin.pantersetmessages: + msg244088
2015-05-26 07:39:46serhiy.storchakasetmessages: + msg244087
2015-05-26 06:04:29martin.pantersetnosy: + martin.panter
messages: + msg244084
2015-05-12 09:47:24serhiy.storchakasetmessages: + msg242951
2015-05-06 10:14:20ronaldoussorensetnosy: + ronaldoussoren
messages: + msg242652
2015-05-06 06:54:26python-devsetnosy: + python-dev
messages: + msg242646
2015-04-25 08:22:37taleinatsetnosy: + taleinat
messages: + msg242007
2015-04-23 19:01:19serhiy.storchakasetdependencies: + Convert os._getfullpathname() and os._isdir() to Argument Clinic
2015-04-23 18:04:57serhiy.storchakasetfiles: + issue24009_textio_decoder_getstate.patch
2015-04-23 18:04:18serhiy.storchakasetfiles: - issue24009_textio_decoder_getstate.patch
2015-04-23 17:15:06serhiy.storchakasetfiles: + issue24009_textio_decoder_getstate.patch
keywords: + patch
messages: + msg241870
2015-04-19 19:14:44serhiy.storchakacreate