Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Argument Clinic for bisect.bisect_left #72940

Closed
JulienPalard opened this issue Nov 20, 2016 · 54 comments
Closed

Argument Clinic for bisect.bisect_left #72940

JulienPalard opened this issue Nov 20, 2016 · 54 comments
Assignees

Comments

@JulienPalard
Copy link
Member

BPO 28754
Nosy @rhettinger, @vstinner, @larryhastings, @skrah, @vadmium, @serhiy-storchaka, @JulienPalard
PRs
  • bpo-28754: AC on bisect. #177
  • Files
  • issue28754.diff: Missing Modules/clinic/_bisectmodule.c.h
  • issue28754-2.diff
  • bisect_left.diff
  • bisect.diff
  • insort.diff
  • insort-left.diff
  • issue28754-3.diff
  • issue28754-4.diff
  • issue28754-5.diff
  • check_bisect_doc.py
  • issue28754-6.diff
  • issue28754-7.diff
  • issue28754-8.diff
  • issue28754-9.diff
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/rhettinger'
    closed_at = <Date 2017-09-24.23:15:12.135>
    created_at = <Date 2016-11-20.16:51:59.575>
    labels = ['3.7', 'expert-argument-clinic']
    title = 'Argument Clinic for bisect.bisect_left'
    updated_at = <Date 2017-09-25.02:51:14.535>
    user = 'https://github.com/JulienPalard'

    bugs.python.org fields:

    activity = <Date 2017-09-25.02:51:14.535>
    actor = 'rhettinger'
    assignee = 'rhettinger'
    closed = True
    closed_date = <Date 2017-09-24.23:15:12.135>
    closer = 'rhettinger'
    components = ['Argument Clinic']
    creation = <Date 2016-11-20.16:51:59.575>
    creator = 'mdk'
    dependencies = []
    files = ['45562', '45563', '45568', '45569', '45570', '45571', '45572', '45626', '45628', '45642', '45650', '45693', '45832', '45836']
    hgrepos = []
    issue_num = 28754
    keywords = ['patch']
    message_count = 54.0
    messages = ['281282', '281283', '281285', '281299', '281302', '281303', '281652', '281669', '281685', '281712', '281713', '281717', '281734', '281761', '281762', '281763', '281764', '281765', '281768', '281772', '281779', '281791', '281792', '281832', '281834', '281878', '281880', '281888', '281892', '281899', '281901', '281926', '282028', '282030', '282038', '282041', '282049', '282063', '282308', '282350', '282842', '282854', '282864', '282875', '282877', '282882', '282888', '282889', '282891', '283100', '291160', '291162', '302894', '302908']
    nosy_count = 7.0
    nosy_names = ['rhettinger', 'vstinner', 'larry', 'skrah', 'martin.panter', 'serhiy.storchaka', 'mdk']
    pr_nums = ['177']
    priority = 'normal'
    resolution = 'rejected'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue28754'
    versions = ['Python 3.7']

    @JulienPalard
    Copy link
    Member Author

    Today I read https://docs.python.org/3.6/howto/clinic.html so I tried one: bisect.bisect_left.

    I was unable to do bisect_right, as it's an "alias" for bisect, and there's a unit-test checking self.assertEqual(self.module.bisect, self.module.bisect_right). Any hint welcome.

    @larryhastings
    Copy link
    Contributor

    There's special syntax to handle aliases. From comments in clinic.py:

        # alternatively:
        #     modulename.fnname [as c_basename] = modulename.existing_fn_name
        # clones the parameters and return converter from that
        # function.  you can't modify them.  you must enter a
        # new docstring.
    

    Sorry the docs are out of date--patches welcome. And speaking of patches, thanks for the patch! I'll review it when you say it's ready.

    @larryhastings
    Copy link
    Contributor

    Oh, and, if the code literally asserts they're the same function, that's just a sanity check based on the implementation. You could preserve that if you care to, or you could just write a new function and remove the assertion. Do what you think is best, then post your patch to the tracker and we'll all argue about it forever, how's that sound?

    @serhiy-storchaka serhiy-storchaka added the 3.7 (EOL) end of life label Nov 20, 2016
    @JulienPalard
    Copy link
    Member Author

    I searched an occurrence of what I'm describing which is already using clinic and there is, at least, one in Modules/binascii.c line 1090:

    TL;DR: The idea is to use the modulename.fnname [as c_basename] = modulename.existing_fn_name clinic syntax, drop a "This function is also available as …" in the docstring, and check if they return the same thing in the tests.

    This way looks good to me, it uses the dedicated syntax which looks right, and I'll drop the self.assertEqual(self.module.bisect, self.module.bisect_right) which is an implementation detail, replacing it by tests checking that both have the same behavior.

    /*[clinic input]
    binascii.b2a_hex
    
        data: Py_buffer
        /
    
    Hexadecimal representation of binary data.
    
    The return value is a bytes object.  This function is also
    available as "hexlify()".
    [clinic start generated code]*/
    
    static PyObject *
    binascii_b2a_hex_impl(PyObject *module, Py_buffer *data)
    /*[clinic end generated code: output=92fec1a95c9897a0 input=96423cfa299ff3b1]*/
    {
        return _Py_strhex_bytes((const char *)data->buf, data->len);
    }
    
    /*[clinic input]
    binascii.hexlify = binascii.b2a_hex
    
    Hexadecimal representation of binary data.
    
    The return value is a bytes object.
    [clinic start generated code]*/
    
    static PyObject *
    binascii_hexlify_impl(PyObject *module, Py_buffer *data)
    /*[clinic end generated code: output=749e95e53c14880c input=2e3afae7f083f061]*/
    {
        return _Py_strhex_bytes((const char *)data->buf, data->len);
    }
    

    @JulienPalard
    Copy link
    Member Author

    Here it is for the whole bisect module.

    I separated my work in commits, but I'm not sure how rietveld will eat that as they'll have unknown references, so I'll probably also upload a single patch with a known reference.

    @JulienPalard
    Copy link
    Member Author

    The whole diff is reviewable in bpo-28754-3.diff.

    @vstinner
    Copy link
    Member

    Hum, I dislike your changes on test_bisect.py: I don't see why Argument Clinic would have to modify tests!?

    I created issue bpo-28792: "bisect: implement aliases in Python, remove C aliases".

    @JulienPalard
    Copy link
    Member Author

    New simplier patch thanks to https://bugs.python.org/issue28792. Also corrected docstrings.

    Also ran a micro-benchmark of bisect.bisect(foo, "c") with foo = list("abcdef"):
    Median +- std dev: [before] 434 ns +- 17 ns -> [after] 369 ns +- 22 ns: 1.18x faster

    FTR: ./python -m perf timeit -s 'import bisect; foo = list("abdef")' 'bisect.bisect(foo, "c")' -o after.json --inherit=PYTHONPATH --affinity=2,3

    @vstinner
    Copy link
    Member

    bpo-28754-4.diff does 3 things:

    • convert C code to Argument Clinic
    • change docstrings of the C code
    • change unit tests

    bpo-28754-5.diff doesn't touch tests, but still does 2 things. Sadly, I dislike the changes on docstrings because it makes the C docstring and the Python docstring inconsistent.

    I suggest to copy/paste docstring between C and Python code.

    By the way, why do we have a docstring different from Doc/library/bisect.rst? Maybe we can simply use the same doc everywhere, but just add reST formatting in Doc/library/bisect.rst?

    Julien: I suggest to first restrict your change to Argument Clinic, and later write a new patch to update docstrings. I prefer to only do one thing per commit.

    @vadmium
    Copy link
    Member

    vadmium commented Nov 25, 2016

    Victor, what changes to the doc strings are you talking about? Adding descriptions of the parameters, maybe? Other than that they look very similar to me.

    I haven’t looked closely, but the Python doc strings have been updated at least once more than the C doc strings: r54666 and bpo-1602378. So if you are copying anything, it might be better to copy from Py to C. But I would do that separately.

    The doc strings and reference documentation just seem to have been added independently: ce2d120c3903 (main documentation) vs 67b3ac439f64 (doc strings)

    @vstinner
    Copy link
    Member

    Martin: "Victor, what changes to the doc strings are you talking about?"

    See attached check_bisect_doc.py script.

    Before:
    ---
    === bisect_right ===
    bisect_right(a, x[, lo[, hi]]) -> index

    Return the index where to insert item x in list a, assuming a is sorted.

    The return value i is such that all e in a[:i] have e <= x, and all e in
    a[i:] have e > x. So if x already appears in the list, i points just
    beyond the rightmost x already there

    Optional args lo (default 0) and hi (default len(a)) bound the
    slice of a to be searched.

    === insort_right ===

    insort_right(a, x[, lo[, hi]])

    Insert item x in list a, and keep it sorted assuming a is sorted.

    If x is already in a, insert it to the right of the rightmost x.

    Optional args lo (default 0) and hi (default len(a)) bound the
    slice of a to be searched.

    === bisect_left ===
    bisect_left(a, x[, lo[, hi]]) -> index

    Return the index where to insert item x in list a, assuming a is sorted.

    The return value i is such that all e in a[:i] have e < x, and all e in
    a[i:] have e >= x. So if x already appears in the list, i points just
    before the leftmost x already there.

    Optional args lo (default 0) and hi (default len(a)) bound the
    slice of a to be searched.

    === insort_left ===

    insort_left(a, x[, lo[, hi]])

    Insert item x in list a, and keep it sorted assuming a is sorted.

    If x is already in a, insert it to the left of the leftmost x.

    Optional args lo (default 0) and hi (default len(a)) bound the
    slice of a to be searched.

    ---

    After:
    ---
    === bisect_right ===
    Return the index where to insert item x in list a, assuming a is sorted.

    a
    The list in which x is to be inserted.
    x
    The value to be inserted.
    lo
    Lower bound, defaults to 0.
    hi
    Upper bound, defaults to -1 (meaning len(a)).

    The return value i is such that all e in a[:i] have e <= x, and all e in
    a[i:] have e > x. So if x already appears in the list, i points just
    beyond the rightmost x already there.

    Optional args lo and hi bound the slice of a to be searched.
    === insort_right ===
    Insert item x in list a, and keep it sorted assuming a is sorted.

    a
    The list in which x will be inserted.
    x
    The value to insert.
    lo
    Lower bound, defaults to 0.
    hi
    Upper bound, defaults to -1 (meaning len(a)).

    If x is already in a, insert it to the right of the rightmost x.

    Optional args lo and hi bound the slice of a to be searched.
    === bisect_left ===
    Return the index where to insert item x in list a, assuming a is sorted.

    a
    The list in which x is to be inserted.
    x
    The value to be inserted.
    lo
    Lower bound, defaults to 0.
    hi
    Upper bound, defaults to -1 (meaning len(a)).

    The return value i is such that all e in a[:i] have e < x, and all e in
    a[i:] have e >= x. So if x already appears in the list, i points just
    before the leftmost x already there.

    Optional args lo and hi bound the slice of a to be searched.
    === insort_left ===
    Insert item x in list a, and keep it sorted assuming a is sorted.

    a
    The list in which x will be inserted.
    x
    The value to insert.
    lo
    Lower bound, defaults to 0.
    hi
    Upper bound, defaults to -1 (meaning len(a)).

    If x is already in a, insert it to the left of the leftmost x.

    Optional args lo and hi bound the slice of a to be searched.

    ---

    The output is different. I suggest to not touch the docstring in the patch upgrading _bisect.c to Argument Clinic to ease the review.

    @vadmium
    Copy link
    Member

    vadmium commented Nov 25, 2016

    Taking the first function, bisect_right(), as an example, I see these differences:

    • bisect_right(a, x[, lo[, hi]]) -> index
      This signature is removed. I think removing it is reasonable, because pydoc can extract the proper signature from the Arg Clinic metadata.

    • Additional descriptions of each parameter. I tend to think these are redundant with the main text, so agree with removing them from the patch now.

    • Addition of full stop (.) at end of first paragraph. I suggested this as a minor cleanup, but it could be fixed later if you prefer.

    • Removal of default values in last paragraph. The first is redundant with the Arg Clinic signature, so I suggested to remove it. For the second, I would add it back if we remove the list of parameters, since it explains what the special value -1 means.

    Do you want to revert all these differences?

    @JulienPalard
    Copy link
    Member Author

    Quoting @martin

    • bisect_right(a, x[, lo[, hi]]) -> index
      This signature is removed. I think removing it is reasonable, because pydoc can extract the proper signature from the Arg Clinic metadata.

    In fact, http://docs.python.org/howto/clinic.html, bullet ".4", told me do remove it:

    If the old docstring had a first line that looked like a function signature, throw that line away. (The docstring doesn’t need it anymore—when you use help() on your builtin in the future, the first line will be built automatically based on the function’s signature.)

    • Additional descriptions of each parameter. I tend to think these are redundant with the main text, so agree with removing them from the patch now.

    OK, will remove them.

    • Addition of full stop (.) at end of first paragraph. I suggested this as a minor cleanup, but it could be fixed later if you prefer.

    Can remove it to simplify verification, but as doctests will be different as doc told me to remove the signature, so is it work removing this dot ? I don't think so.

    • Removal of default values in last paragraph. The first is redundant with the Arg Clinic signature, so I suggested to remove it. For the second, I would add it back if we remove the list of parameters, since it explains what the special value -1 means.

    I'll revert it too as I'll remove the documentation in the parameter list.

    Will upload a patch later.

    @JulienPalard
    Copy link
    Member Author

    Here is the new patch, I ran a diff between "./python -m pydoc _bisect" before and after my patch, here it is:

    13,15c13
    < bisect_left(...)
    < bisect_left(a, x[, lo[, hi]]) -> index
    <
    ---

    bisect_left(a, x, lo=0, hi=-1)
    

    25,27c23
    < bisect_right(...)
    < bisect_right(a, x[, lo[, hi]]) -> index
    <
    ---

    bisect_right(a, x, lo=0, hi=-1)
    

    32c28
    < beyond the rightmost x already there
    ---

        beyond the rightmost x already there.
    

    37,39c33
    < insort_left(...)
    < insort_left(a, x[, lo[, hi]])
    <
    ---

    insort_left(a, x, lo=0, hi=-1)
    

    47,49c41
    < insort_right(...)
    < insort_right(a, x[, lo[, hi]])
    <
    ---

    insort_right(a, x, lo=0, hi=-1)
    

    So I kept my addition of the missing full stop, and dropped the handwritten signature as Argument Clinic generates it.

    @JulienPalard
    Copy link
    Member Author

    Should we also update howto/clinic, bullet "11.", to be explicit about not adding per-parameter in the same patch? Like a:

    If you're porting existing function to Argument clinic, skip this step to simplify code review.

    ?

    @rhettinger
    Copy link
    Contributor

    Curiously, this patch gives about a 10% to 15% speedup. Any sense of how that improvement arises?

    $ py3.7 -m timeit -s 'from bisect import bisect' -s 'a=list(range(100))' 'bisect(a, 10)'

    229 nsec # Clang baseline
    202 nsec # Clang with patch

    189 nsec # GCC-6 baseline
    156 nsec # Gcc-6 with patch

    @JulienPalard
    Copy link
    Member Author

    Hi Raymond,

    Curiously, this patch gives about a 10% to 15% speedup. Any sense of how that improvement arises?

    That's because Argument Clinic is generating methoddef with METH_FASTCALL:

    $ grep FASTCALL Modules/clinic/_bisectmodule.c.h 
        {"bisect_right", (PyCFunction)bisect_bisect_right, METH_FASTCALL, bisect_bisect_right__doc__},
        {"insort_right", (PyCFunction)bisect_insort_right, METH_FASTCALL, bisect_insort_right__doc__},
        {"bisect_left", (PyCFunction)bisect_bisect_left, METH_FASTCALL, bisect_bisect_left__doc__},
        {"insort_left", (PyCFunction)bisect_insort_left, METH_FASTCALL, bisect_insort_left__doc__},

    Instead of METH_VARARGS|METH_KEYWORDS:

    • {"bisect_right", (PyCFunction)bisect_right,
    •    METH_VARARGS|METH_KEYWORDS, bisect_right_doc},
      
    • {"insort_right", (PyCFunction)insort_right,
    •    METH_VARARGS|METH_KEYWORDS, insort_right_doc},
      
    • {"bisect_left", (PyCFunction)bisect_left,
    •    METH_VARARGS|METH_KEYWORDS, bisect_left_doc},
      
    • {"insort_left", (PyCFunction)insort_left,
    •    METH_VARARGS|METH_KEYWORDS, insort_left_doc},
      

    @serhiy-storchaka
    Copy link
    Member

    Julien, you can declare the hi parameter as

    hi: Py_ssize_t(py_default="len(a)") = -1
    

    if you want the signature be matching the documentation.

    In general, it is better to use converter names rather than format codes.

    Curiously, this patch gives about a 10% to 15% speedup. Any sense of how that improvement arises?

    This is rather a random difference. Try to run the bench several times. With using Victor's perf module the difference is not significant.

    Maybe using FASTCALL have some positive effect, but it is unlikely noticeable with such complex function.

    @JulienPalard
    Copy link
    Member Author

    Hi Serhiy,

    Julien, you can declare the hi parameter as
    hi: Py_ssize_t(py_default="len(a)") = -1

    Looks like a good idea, I was aware of its existance but did not took the time to read the doc about it, kind of learning step by stpe. But as you provided the syntax, I tested it, thanks for this! But it fails with a RuntimeError during python -m pydoc _bisect, I'm currently trying to understand why:

    $ ./python -m pydoc _bisect 
    Traceback (most recent call last):
      File "/home/mdk/cpython-git/Lib/runpy.py", line 193, in _run_module_as_main
        "__main__", mod_spec)
      File "/home/mdk/cpython-git/Lib/runpy.py", line 85, in _run_code
        exec(code, run_globals)
      File "/home/mdk/cpython-git/Lib/pydoc.py", line 2663, in <module>
        cli()
      File "/home/mdk/cpython-git/Lib/pydoc.py", line 2628, in cli
        help.help(arg)
      File "/home/mdk/cpython-git/Lib/pydoc.py", line 1908, in help
        elif request: doc(request, 'Help on %s:', output=self._output)
      File "/home/mdk/cpython-git/Lib/pydoc.py", line 1645, in doc
        pager(render_doc(thing, title, forceload))
      File "/home/mdk/cpython-git/Lib/pydoc.py", line 1638, in render_doc
        return title % desc + '\n\n' + renderer.document(object, name)
      File "/home/mdk/cpython-git/Lib/pydoc.py", line 382, in document
        if inspect.ismodule(object): return self.docmodule(*args)
      File "/home/mdk/cpython-git/Lib/pydoc.py", line 1172, in docmodule
        contents.append(self.document(value, key, name))
      File "/home/mdk/cpython-git/Lib/pydoc.py", line 384, in document
        if inspect.isroutine(object): return self.docroutine(*args)
      File "/home/mdk/cpython-git/Lib/pydoc.py", line 1357, in docroutine
        signature = inspect.signature(object)
      File "/home/mdk/cpython-git/Lib/inspect.py", line 2994, in signature
        return Signature.from_callable(obj, follow_wrapped=follow_wrapped)
      File "/home/mdk/cpython-git/Lib/inspect.py", line 2744, in from_callable
        follow_wrapper_chains=follow_wrapped)
      File "/home/mdk/cpython-git/Lib/inspect.py", line 2223, in _signature_from_callable
        skip_bound_arg=skip_bound_arg)
      File "/home/mdk/cpython-git/Lib/inspect.py", line 2055, in _signature_from_builtin
        return _signature_fromstr(cls, func, s, skip_bound_arg)
      File "/home/mdk/cpython-git/Lib/inspect.py", line 2003, in _signature_fromstr
        p(name, default)
      File "/home/mdk/cpython-git/Lib/inspect.py", line 1985, in p
        default_node = RewriteSymbolics().visit(default_node)
      File "/home/mdk/cpython-git/Lib/ast.py", line 253, in visit
        return visitor(node)
      File "/home/mdk/cpython-git/Lib/ast.py", line 317, in generic_visit
        new_node = self.visit(old_value)
      File "/home/mdk/cpython-git/Lib/ast.py", line 253, in visit
        return visitor(node)
      File "/home/mdk/cpython-git/Lib/inspect.py", line 1977, in visit_Name
        return wrap_value(node.id)
      File "/home/mdk/cpython-git/Lib/inspect.py", line 1959, in wrap_value
        raise RuntimeError()
    RuntimeError

    > Curiously, this patch gives about a 10% to 15% speedup. Any sense of how that improvement arises?
    This is rather a random difference. Try to run the bench several times. With using Victor's perf module the difference is not significant.

    If you search for "perf" in the history of this issue you'll see that I already tested with Victor's perf module and it yielded a consistent 18% improvement on bisect.bisect("abcdef", "c") on two different machines. Clearly with more than 5 items, it will start to fade, and with a lot of items, it's not significand, obviously as it's an optimization about the call and not the implementation. So Raymond's test with 100 items may still see a speedup.

    @JulienPalard
    Copy link
    Member Author

    Hi Serhiy,

    hi: Py_ssize_t(py_default="len(a)") = -1
    

    Won't works, as pydoc will use the inspect module (_signature_fromstr) to get the signature. _signature_fromstr expects a valid python signature, but def foo(a, x, lo=0, high=len(a)): pass is not valid (SyntaxError).

    Should we note that in the clinic documentation:

    py_default
    default as it should appear in Python code, as a string. Or None if there is no default.

    py_default
    default as it should appear in valid Python code, as a string. Or None if there is no default.

    ?

    @vstinner
    Copy link
    Member

    Argument Clinic adds a signature and a better docstring. So I like it!

    Speedup for tiny lists is just a nice side effect.

    @rhettinger
    Copy link
    Contributor

    [SS]

    This is rather a random difference. Try to run the bench several times.

    I did run several times. The results were perfectly stable (+/- 1ns). It tells us that METH_FASTCALL is something we want to use as broadly as possible.

    [JP]

    it yielded a consistent 18% improvement on bisect.bisect("abcdef", "c")
    on two different machines.

    Thanks for running timings. When you do timing in the future, try to select timings that are indicative of actual use. No one uses bisect to search strings for a character -- the typical use case is searching a list of range breakpoints like the example shown in the docs:

        def grade(score, breakpoints=[60, 70, 80, 90], grades='FDCBA'):
            i = bisect(breakpoints, score)
            return grades[i]

    Hash tables are faster for exact lookup. Bisect is mainly used for searching ranges (i.e. tax brackets or position in a cumulative distribution).

    [VS]

    Speedup for tiny lists is just a nice side effect.

    To you perhaps; however, the entire reason I put this code in many years ago was for performance. The existing pure python code was already very fast. Anyway, this patch is a nice improvement in that regard.

    [Everyone]

    What to do about the "hi" argument is a sticky point. This is a long-standing API, so any API changes would likely break some code or lock us into exposing unintuitive implementation details (like hi=-1).

    The point of view of the existing C code, the pure Python version, and the docs is that the "hi" argument is an optional argument that when omitted will default to "len(a)". It was not intended that a user would explicitly pass in "hi=None" as allowed by the pure python version but not by the C version, nor was it intended to have the user pass in "hi=-1" as allowed by the C version but causes incorrect behavior on the pure Python version.

    It is unfortunate that the argument clinic has not yet grown support for the concept of "there is special meaning for an omitted argument that cannot be simulated by passing in a particular default value". I believe when this issue has arisen elsewhere, the decision was to skip applying argument clinic at all. See: builtin_getattr, builtin_next, dict_pop, etc.

    In this case though, we already have a conflict, so it may be possible to resolve the issue through clear parameter descriptions, indicating that "hi=-1" and "hi=None" are implementation details and that it is intended that the user never explicitly pass in either of those values, and that only the guaranteed ways to get the default is to omit the argument entirely or to pass in "hi=len(a)" as a numeric argument.

    @serhiy-storchaka
    Copy link
    Member

    I would suggest to use sys.maxsize for default value in Argument Clinic, but right now bisect functions don't support well hi > len(a). This needs adding an equivalent of hi = min(hi, len(a)) in C code.

    -1 could be explicitly supported for backward compatibility. But it could be deprecated.

    @rhettinger
    Copy link
    Contributor

    I would suggest to use sys.maxsize for default value in Argument Clinic

    No thanks. I don't want to subtly change the API for this module or even suggest to users that it might be a good idea to set hi > len. Further, we don't want to slow the pure python code for this pointless extension.

    Really, we're dancing around the issue that argument clinic and signatures need to become more expressive, allowing for omitted arguments without requiring a default value. Waiting for this to be done is far preferable to mangling long-standing APIs to force fit them to into argument clinic.

    Elsewhere, we've shown self discipline and restraint when applying AC, skipping over cases where it doesn't fit. If we can't find a way to apply AC without changing this API, this issue should be closed and deferred until AC grows the requisite expressive power.

    @rhettinger rhettinger self-assigned this Nov 27, 2016
    @rhettinger
    Copy link
    Contributor

    After some further thought, I am open to changing the C API to accept either hi=None or hi=-1 to reconcile the implementation details without breaking any existing code that relies on either. That would let the argument clinic expose a default value of hi=None. That isn't perfect because it complicates the code and it exposes an implementation detail contrary to what the API originally intended.

    Short of that, this module ought to be skipped for the argument clinic pending a build-out of signature objects to handle signatures that vary depending on the number of arguments supplied.

    @JulienPalard
    Copy link
    Member Author

    Hi Raymond,

    About Argument Clinic
    #####################

    Just to clarify the situation, Argument Clinic allows for clear method signature, typically: hi: Py_ssize_t(py_default="len(a)") = -1 gives the expected "hi=len(a)" signature successfully.

    The problem is that it's not a legal signature (it's a syntax error), so pydoc crashes while parsing it (search for RuntimeError in the history of the issue for a full stacktrace).

    About what should we put in the signature
    #########################################

    Question is: Should we allow "semantically precise, but frustrating*" signatures to be documented or should we be move the semantic away from the signature?

    *: I mean, is a new user sees "hi=len(a)" in a signature in a docstring he will immediately flag this syntax as valid and usable later, which lead him to a syntax error and some frustration.

    Something else to keep in mind is that I expect users be able to call those methods indirectly, like:

    def whatever_they_need(self, lo=0, hi=?):
        …
        bisect.bisect(…, …, lo, hi)
        …

    Currently, to follow strictly the documentation they have to write:

    def whatever_they_need(self, lo=0, hi=None):
        …
        if hi is not None:
            bisect.bisect(…, …, lo, hi)
        else:
            bisect.bisect(…, …, lo)
        …

    Which looks bad to me. So from my point of view we *have to* tell them what is the real default value for this parameter so they can use it in their implementations.

    Should we generalize and say that depending on the number of arguments is a bad practice? I think so.

    @rhettinger
    Copy link
    Contributor

    Sorry Julien, you don't to decide that long-standing APIs that have worked just fine for users are now a bad practice. Some of these APIs (range, dict.pop, type) we designed by Guido and work fine in practice. The expression, "don't have the tail wag the dog" comes to mind (where "tail" is the existing successful language and "dog" is the argument clinic). Besides, API change is very disruptive to users (especially those trying to migrate to Python 3).

    For the case of bisect, I've given a way forward. It is okay to make the documented default value for "hi" to be "None" while internally still allowing -1 so that we don't break code that happens to depend on it. In the clinic, use "hi: 'O' = None". Then in the dispatch code, handle both the None case and the numeric case (see islice_new() in itertoolsmodule.c for some ideas on how to do this.

    What isn't okay is to document and expose "hi=-1" because that is a confusing and error-prone default value that is just an awkward hack to express "hi=len(a)".

    @vstinner
    Copy link
    Member

    Sorry, I didn't follow the discussion, but why not using "/" magic separator in Argument Clinic to mark arguments as optional but don't document their default value?

    @serhiy-storchaka
    Copy link
    Member

    Yet one option is to introduce the constant UNBOUND = -1 in the bisect module and use it as a default value.

    @serhiy-storchaka
    Copy link
    Member

    Sorry, I didn't follow the discussion, but why not using "/" magic separator in Argument Clinic to mark arguments as optional but don't document their default value?

    '/' marks positional-only arguments, not optional arguments.

    @vstinner
    Copy link
    Member

    Ah right, _bisect.bisect_right() support keywords. I expected that it
    doesn't support keywords yet ;-)

    @vadmium
    Copy link
    Member

    vadmium commented Nov 29, 2016

    Regarding the problem with the default value, can we use “unspecified”, as documented at <https://docs.python.org/dev/howto/clinic.html#writing-a-custom-converter\>, or is that an old relic that no longer exists (hinted at the end of bpo-20293)?

    Failing that, I prefer Raymond’s earlier suggestion: say that any support for −1 or None are implementation details, and not part of the API. The same statement could be added to the native Python doc strings as well.

    BTW I am not opposed to the proposal for proper hi=None support either, although it does complicate the change. IMO allowing hi=None could be done as a second step.

    @JulienPalard
    Copy link
    Member Author

    I tried myself at Argument Clinic custom converters to create the "optional ssize_t converter" and it works, so as advised by Raymond, pydoc now shows None, and the C implementation now allows for both default values None and -1, as the custom converter returns -1 for none.

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Nov 29, 2016

    Julien, the syntax converters look pretty clever. Do we need AC
    everywhere though? I wonder (once again) if this is really more
    readable than the existing code.

    @vstinner
    Copy link
    Member

    Stefan Krah added the comment:

    Julien, the syntax converters look pretty clever. Do we need AC
    everywhere though?

    Please see previous comments for advantages of AC (signature object,
    docstring, speed).

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Nov 29, 2016

    Signature and docstring can be done manually with very little effort.
    Currently METH_FASTCALL is AC only, but I hope that will change.

    @vadmium
    Copy link
    Member

    vadmium commented Nov 29, 2016

    If adding proper support for hi=None, maybe lo=None should also be supported. Also, I would think the main Doc/library/bisect.rst documentation needs updating, and a test and What’s New entry added.

    @rhettinger
    Copy link
    Contributor

    If adding proper support for hi=None, maybe lo=None should
    also be supported.

    That would be gratuitous. Lo already has a reasonable, useful, and self-explanatory value. This would add more complexity to the signature while reducing clarity. I really don't want to see calls like bisect(arr, x, None).

    @vadmium
    Copy link
    Member

    vadmium commented Dec 3, 2016

    Fair enough, I don’t really mind if it is (lo=0, hi=None). I think I have only used bisect with both defaults anyway.

    @JulienPalard
    Copy link
    Member Author

    Hi Martin,

    Historically (lo=0, hi=None) was the implementation (in Python), until the C implementation, which used (lo=0, hi=-1). As my implementation allows for -1 and None, I don't think I'm breaking something.

    @JulienPalard
    Copy link
    Member Author

    I was able to simplify my patch a bit, I think I should also add a test to ensure we keep the hi=-1 and hi=None compatibility in the future.

    @JulienPalard
    Copy link
    Member Author

    Added a test to ensure compatibility of both hi=None (introduced in original Python version) and hi=-1 (Introduced by the C version).

    Modified Python version to be compatible with the C-introduced hi=-1, so that the new test pass.

    @JulienPalard
    Copy link
    Member Author

    Maybe wait for bpo-28933.

    @larryhastings
    Copy link
    Contributor

    @JulienPalard
    Copy link
    Member Author

    Hi Larry,

    In any cases it looks like supporting hi=-1 and hi=None is mandatory:

    hi=None is the current implementation in the bisect (Python) module.
    hi=-1 is the current implementation in the _bisect (C) module.

    Both are currently living together, the C version takes over the Python version when available, but the Python version is older, so both -1 and None has been the "current implementation" during some time.

    Raymond legitimately fear that someone, somewhere, used -1 in some code. We should not break it (even if it had never been documented).

    So let's just accept both for the moment, and if deprecation of one or another should be discussed, let's do it later, in another issue. I'm _just_ trying to make AC move forward here, without breaking anything.

    @larryhastings
    Copy link
    Contributor

    Okay, but do it with a converter function, not by changing the default behavior for Py_ssize_t. The vast majority of uses of Py_ssize_t should not accept None.

    @rhettinger
    Copy link
    Contributor

    Why does this need a converter function? I there is any reason this can't use "O" with "hi" defaulting to None (matching the pure python API) and letting the C function handle both -1 and None in the implementation rather than in AC?

    @JulienPalard
    Copy link
    Member Author

    Hi Raymond,

    I don't like having the converters in the C implementation too, that's why I'm working on bpo-28933 to clean this.

    letting the C function handle both -1 and None in the implementation rather than in AC?

    It works, yes. But I prefer to clearly split responsibilities: AC being responsible of adapting argument from PyObjects to C types, and C functions being responsible of ... doing their job.

    If the idea in bpo-28933 is accepted, we'll be able to declare hi as simply as:

    hi: Py_ssize_t(c_default="-1") = None
    

    meaning "C function will get a Py_ssize_t, default value for C code is -1, None is documented, and None can be passed to get the C default value", that's this last point "None can be passed to get the C default value" that I introduce in bpo-28933.

    With this syntax, both C converters and the python hi_parameter_converter can be dropped.

    @larryhastings
    Copy link
    Contributor

    We don't *have* to use a converter function, no. But consider: someone somewhere will have to convert a PyObject * into a Py_ssize_t, also accepting None. Doing it in a converter function means we can easily share the code with bisect_right, which should presumably behave the same way. It also keeps all the argument parsing logic in the Argument Clinic domain instead of having some in AC and some in the implementation.

    Is there some drawback to using a converter function?

    @JulienPalard
    Copy link
    Member Author

    As the pattern of this converter is not widely used, I'll let it in the code of _bisect for the moment, see: http://bugs.python.org/issue28933.

    @serhiy-storchaka
    Copy link
    Member

    Now Py_ssize_t(accept={int, NoneType}) can be used instead of a local converter.

    @rhettinger
    Copy link
    Contributor

    Wouldn't it be easily to let the OP apply AC to some other module. Right now, it isn't a very good fit and was this a tough one to start with.

    @rhettinger
    Copy link
    Contributor

    After more thought, I'm going to close this one because the Argument Clinic and signature objects aren't yet a good fit for this API. The bisect module isn't broken (its API has been successfully living in the wild for a very long time). I do welcome Argument Clinic patches where there is a good fit but don't think this is the place. Each of the proposed patches makes the module worst in one way or another.

    @rhettinger
    Copy link
    Contributor

    FWIW, the itertools module has many functions that can have the ArgumentClinic applied without any fundamental problems (pretty much all of the functions will work except for repeat(), accumulate(), and islice()).

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Projects
    None yet
    Development

    No branches or pull requests

    6 participants