Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get rid of rare format units in PyArg_Parse* #68197

Open
serhiy-storchaka opened this issue Apr 19, 2015 · 9 comments
Open

Get rid of rare format units in PyArg_Parse* #68197

serhiy-storchaka opened this issue Apr 19, 2015 · 9 comments
Assignees
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement

Comments

@serhiy-storchaka
Copy link
Member

BPO 24009
Nosy @ronaldoussoren, @vadmium, @serhiy-storchaka
Dependencies
  • bpo-24042: Convert os._getfullpathname() and os._isdir() to Argument Clinic
  • Files
  • issue24009_textio_decoder_getstate.patch: Get rid of "y#" in textio
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/serhiy-storchaka'
    closed_at = None
    created_at = <Date 2015-04-19.19:14:44.075>
    labels = ['interpreter-core', 'type-feature']
    title = 'Get rid of rare format units in PyArg_Parse*'
    updated_at = <Date 2018-06-14.10:29:46.552>
    user = 'https://github.com/serhiy-storchaka'

    bugs.python.org fields:

    activity = <Date 2018-06-14.10:29:46.552>
    actor = 'taleinat'
    assignee = 'serhiy.storchaka'
    closed = False
    closed_date = None
    closer = None
    components = ['Interpreter Core']
    creation = <Date 2015-04-19.19:14:44.075>
    creator = 'serhiy.storchaka'
    dependencies = ['24042']
    files = ['39186']
    hgrepos = []
    issue_num = 24009
    keywords = ['patch']
    message_count = 9.0
    messages = ['241546', '241870', '242007', '242646', '242652', '242951', '244084', '244087', '244088']
    nosy_count = 4.0
    nosy_names = ['ronaldoussoren', 'python-dev', 'martin.panter', 'serhiy.storchaka']
    pr_nums = []
    priority = 'low'
    resolution = None
    stage = None
    status = 'open'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue24009'
    versions = ['Python 3.5', 'Python 3.6']

    @serhiy-storchaka
    Copy link
    Member Author

    There are a lot of format units supported in PyArg_Parse* functions, but some of them are rarely or never used in current CPython code. Some of format units are legacy from Python 2 and are not needed in modern Python 3 code or can be replaced with custom converter.

    Here are results of grepping (not including Modules/_testcapimodule.c).

    "es", "es#", "et#", "z*", "Z#" are not used.

    "y#":
    Modules/_io/textio.c:2334: if (!PyArg_ParseTuple(_state, "y#i", &dec_buffer, &dec_buffer_len, &dec_flags)) { \

    "z#":
    Modules/_ctypes/_ctypes.c:3327: if (!PyArg_ParseTuple(args, "is|Oz#", &index, &name, &paramflags, &iid, &iid_len))

    "u#":
    Modules/arraymodule.c:248: if (!PyArg_Parse(v, "u#;array item must be unicode character", &p, &len))
    PC/winreg.c:1547: if (!PyArg_ParseTuple(args, "OZiu#:SetValue",

    "y":
    Modules/_io/textio.c:2334: if (!PyArg_ParseTuple(_state, "y#i", &dec_buffer, &dec_buffer_len, &dec_flags)) { \
    Modules/_cursesmodule.c:2790: if (!PyArg_ParseTuple(args,"y;str", &str))
    Modules/_cursesmodule.c:3026: if (!PyArg_ParseTuple(args, "y|iiiiiiiii:tparm",
    Modules/posixmodule.c:3767: if (!PyArg_ParseTuple (args, "y:_getfullpathname",
    Modules/posixmodule.c:3872: if (!PyArg_ParseTuple(args, "y:_isdir", &path))
    Modules/faulthandler.c:941: if (!PyArg_ParseTuple(args, "y:fatal_error", &message))

    "et":
    Modules/socketmodule.c:4499: if (!PyArg_ParseTuple(args, "et:gethostbyname", "idna", &name))
    Modules/socketmodule.c:4667: if (!PyArg_ParseTuple(args, "et:gethostbyname_ex", "idna", &name))
    Modules/socketmodule.c:4744: if (!PyArg_ParseTuple(args, "et:gethostbyaddr", "idna", &ip_num))
    Modules/_tkinter.c:2099: if (!PyArg_ParseTuple(args, "et:splitlist", "utf-8", &list))
    Modules/_tkinter.c:2162: if (!PyArg_ParseTuple(args, "et:split", "utf-8", &list))
    Modules/_ssl.c:3038: if (!PyArg_ParseTupleAndKeywords(args, kwds, "O!iet:_wrap_socket", kwlist,
    Modules/_ssl.c:3070: if (!PyArg_Parse(hostname_obj, "et", "idna", &hostname))

    "s*":
    Modules/_codecsmodule.c:188: if (!PyArg_ParseTuple(args, "s*|z:escape_decode",
    Modules/_codecsmodule.c:552: if (!PyArg_ParseTuple(args, "s*|z:unicode_escape_decode",
    Modules/_codecsmodule.c:569: if (!PyArg_ParseTuple(args, "s*|z:raw_unicode_escape_decode",
    Modules/_codecsmodule.c:696: if (!PyArg_ParseTuple(args, "s*|z:readbuffer_encode",
    Modules/_ssl.c:3734: if (!PyArg_ParseTuple(args, "s*d:RAND_add", &view, &entropy))
    Modules/fcntlmodule.c:225: if (PyArg_Parse(ob_arg, "s*:ioctl", &pstr)) {
    Modules/clinic/arraymodule.c.h:278: if (!PyArg_Parse(arg, "s*:fromstring", &buffer))

    "s#":
    Modules/_gdbmmodule.c:128: if (!PyArg_Parse(key, "s#", &krec.dptr, &krec.dsize) )
    Modules/_gdbmmodule.c:176: if (!PyArg_Parse(v, "s#", &krec.dptr, &krec.dsize) ) {
    Modules/_gdbmmodule.c:194: if (!PyArg_Parse(w, "s#", &drec.dptr, &drec.dsize)) {
    Modules/fcntlmodule.c:71: if (PyArg_Parse(arg, "s#", &str, &len)) {
    Modules/_ctypes/_ctypes.c:2569: if (!PyArg_ParseTuple(args, "Os#", &dict, &data, &len))
    Modules/clinic/unicodedata.c.h:361: if (!PyArg_Parse(arg, "s#:lookup", &name, &name_length))
    Modules/clinic/_dbmmodule.c.h:62: if (!PyArg_ParseTuple(args, "s#|O:get",
    Modules/clinic/_dbmmodule.c.h:95: if (!PyArg_ParseTuple(args, "s#|O:setdefault",
    Modules/clinic/_gdbmmodule.c.h:150: if (!PyArg_Parse(arg, "s#:nextkey", &key, &key_length))
    Modules/_dbmmodule.c:108: if (!PyArg_Parse(key, "s#", &krec.dptr, &tmp_size) )
    Modules/_dbmmodule.c:132: if ( !PyArg_Parse(v, "s#", &krec.dptr, &tmp_size) ) {
    Modules/_dbmmodule.c:150: if ( !PyArg_Parse(w, "s#", &drec.dptr, &tmp_size) ) {
    Modules/_dbmmodule.c:336: if ( !PyArg_Parse(default_value, "s#", &val.dptr, &tmp_size) ) {

    In future may be we could deprecate some format units and remove them in 4.0.

    This issue is a meta issue. Every case should be considered individually.

    @serhiy-storchaka serhiy-storchaka self-assigned this Apr 19, 2015
    @serhiy-storchaka serhiy-storchaka added interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement labels Apr 19, 2015
    @serhiy-storchaka
    Copy link
    Member Author

    In textio.c, the decoder always should return bytes, not arbitrary read-only buffer (this is required in other parts of the code). So "y#" can be replaced with "O" with PyBytes_GET_SIZE.

    @taleinat
    Copy link
    Contributor

    +1. I was recently trying to use the C API for a 3rd party library, and all of these subtly different string parameter formats made things surprisingly confusing.

    These are part of the Python C API, so removing them could break 3rd party code. Simply searching through the stdlib is not enough to show that these are not in use. So removal would require a deprecation period.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented May 6, 2015

    New changeset d65233f630e1 by Serhiy Storchaka in branch 'default':
    Issue bpo-24009: Got rid of using rare "y#" format unit in TextIOWrapper.tell().
    https://hg.python.org/cpython/rev/d65233f630e1

    @ronaldoussoren
    Copy link
    Contributor

    Note that these format characters can also be used outside of CPython.

    @serhiy-storchaka
    Copy link
    Member Author

    Yes, of course, I think we shouldn't drop support of these format units. But using them likely is a sign of outdated or transitional code. It should be discouraged in new code, and every case should be analyzed and cleaned.

    @vadmium
    Copy link
    Member

    vadmium commented May 26, 2015

    “u#” should not be deprecated without first deprecating “u”, which is less useful due to not returning a buffer length.

    Also, I have always been mystified about how “s#”, “z#”, “y” and “y#” can properly to return a pointer into a buffer for arbitrary immutable bytes-like objects, without requiring PyBuffer_Release() to be called. Perhaps this is bad design to be discouraged. Or maybe a documentation oversight somewhere.

    @serhiy-storchaka
    Copy link
    Member Author

    “s#”, “z#”, “y” and “y#” work only with read-only buffers, for which PyBuffer_Release() is no-op operation. Initially they was designed for work with old buffer protocol that doesn't support releasing a buffer.

    @vadmium
    Copy link
    Member

    vadmium commented May 26, 2015

    Yes I just figured out that myself. Specifically, PyBufferProcs.bf_releasebuffer has to be NULL, and the buffer stays alive as long as the object stays alive.

    Also it looks like I was wrong about “u” being useless. I was tricked by a contradiction in the documentation, but I will try to fix this in a patch to bpo-24278.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants