Argument Clinic for bisect.bisect_left #72940

JulienPalard · 2016-11-20T16:52:00Z

BPO	28754
Nosy	@rhettinger, @vstinner, @larryhastings, @skrah, @vadmium, @serhiy-storchaka, @JulienPalard
PRs	bpo-28754: AC on bisect. #177
Files	issue28754.diff: Missing Modules/clinic/_bisectmodule.c.h issue28754-2.diff bisect_left.diff bisect.diff insort.diff insort-left.diff issue28754-3.diff issue28754-4.diff issue28754-5.diff check_bisect_doc.py issue28754-6.diff issue28754-7.diff issue28754-8.diff issue28754-9.diff

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = 'https://github.com/rhettinger'
closed_at = <Date 2017-09-24.23:15:12.135>
created_at = <Date 2016-11-20.16:51:59.575>
labels = ['3.7', 'expert-argument-clinic']
title = 'Argument Clinic for bisect.bisect_left'
updated_at = <Date 2017-09-25.02:51:14.535>
user = 'https://github.com/JulienPalard'

bugs.python.org fields:

activity = <Date 2017-09-25.02:51:14.535>
actor = 'rhettinger'
assignee = 'rhettinger'
closed = True
closed_date = <Date 2017-09-24.23:15:12.135>
closer = 'rhettinger'
components = ['Argument Clinic']
creation = <Date 2016-11-20.16:51:59.575>
creator = 'mdk'
dependencies = []
files = ['45562', '45563', '45568', '45569', '45570', '45571', '45572', '45626', '45628', '45642', '45650', '45693', '45832', '45836']
hgrepos = []
issue_num = 28754
keywords = ['patch']
message_count = 54.0
messages = ['281282', '281283', '281285', '281299', '281302', '281303', '281652', '281669', '281685', '281712', '281713', '281717', '281734', '281761', '281762', '281763', '281764', '281765', '281768', '281772', '281779', '281791', '281792', '281832', '281834', '281878', '281880', '281888', '281892', '281899', '281901', '281926', '282028', '282030', '282038', '282041', '282049', '282063', '282308', '282350', '282842', '282854', '282864', '282875', '282877', '282882', '282888', '282889', '282891', '283100', '291160', '291162', '302894', '302908']
nosy_count = 7.0
nosy_names = ['rhettinger', 'vstinner', 'larry', 'skrah', 'martin.panter', 'serhiy.storchaka', 'mdk']
pr_nums = ['177']
priority = 'normal'
resolution = 'rejected'
stage = 'resolved'
status = 'closed'
superseder = None
type = None
url = 'https://bugs.python.org/issue28754'
versions = ['Python 3.7']

JulienPalard · 2016-11-20T16:51:59Z

Today I read https://docs.python.org/3.6/howto/clinic.html so I tried one: bisect.bisect_left.

I was unable to do bisect_right, as it's an "alias" for bisect, and there's a unit-test checking self.assertEqual(self.module.bisect, self.module.bisect_right). Any hint welcome.

larryhastings · 2016-11-20T17:08:08Z

There's special syntax to handle aliases. From comments in clinic.py:

    # alternatively:
    #     modulename.fnname [as c_basename] = modulename.existing_fn_name
    # clones the parameters and return converter from that
    # function.  you can't modify them.  you must enter a
    # new docstring.

Sorry the docs are out of date--patches welcome. And speaking of patches, thanks for the patch! I'll review it when you say it's ready.

larryhastings · 2016-11-20T17:09:41Z

Oh, and, if the code literally asserts they're the same function, that's just a sanity check based on the implementation. You could preserve that if you care to, or you could just write a new function and remove the assertion. Do what you think is best, then post your patch to the tracker and we'll all argue about it forever, how's that sound?

JulienPalard · 2016-11-20T20:12:56Z

I searched an occurrence of what I'm describing which is already using clinic and there is, at least, one in Modules/binascii.c line 1090:

TL;DR: The idea is to use the modulename.fnname [as c_basename] = modulename.existing_fn_name clinic syntax, drop a "This function is also available as …" in the docstring, and check if they return the same thing in the tests.

This way looks good to me, it uses the dedicated syntax which looks right, and I'll drop the self.assertEqual(self.module.bisect, self.module.bisect_right) which is an implementation detail, replacing it by tests checking that both have the same behavior.

/*[clinic input]
binascii.b2a_hex

    data: Py_buffer
    /

Hexadecimal representation of binary data.

The return value is a bytes object.  This function is also
available as "hexlify()".
[clinic start generated code]*/

static PyObject *
binascii_b2a_hex_impl(PyObject *module, Py_buffer *data)
/*[clinic end generated code: output=92fec1a95c9897a0 input=96423cfa299ff3b1]*/
{
    return _Py_strhex_bytes((const char *)data->buf, data->len);
}

/*[clinic input]
binascii.hexlify = binascii.b2a_hex

Hexadecimal representation of binary data.

The return value is a bytes object.
[clinic start generated code]*/

static PyObject *
binascii_hexlify_impl(PyObject *module, Py_buffer *data)
/*[clinic end generated code: output=749e95e53c14880c input=2e3afae7f083f061]*/
{
    return _Py_strhex_bytes((const char *)data->buf, data->len);
}

JulienPalard · 2016-11-20T21:30:09Z

Here it is for the whole bisect module.

I separated my work in commits, but I'm not sure how rietveld will eat that as they'll have unknown references, so I'll probably also upload a single patch with a known reference.

JulienPalard · 2016-11-20T21:34:31Z

The whole diff is reviewable in bpo-28754-3.diff.

vstinner · 2016-11-24T21:08:52Z

Hum, I dislike your changes on test_bisect.py: I don't see why Argument Clinic would have to modify tests!?

I created issue bpo-28792: "bisect: implement aliases in Python, remove C aliases".

JulienPalard · 2016-11-24T23:37:00Z

New simplier patch thanks to https://bugs.python.org/issue28792. Also corrected docstrings.

Also ran a micro-benchmark of bisect.bisect(foo, "c") with foo = list("abcdef"):
Median +- std dev: [before] 434 ns +- 17 ns -> [after] 369 ns +- 22 ns: 1.18x faster

FTR: ./python -m perf timeit -s 'import bisect; foo = list("abdef")' 'bisect.bisect(foo, "c")' -o after.json --inherit=PYTHONPATH --affinity=2,3

vstinner · 2016-11-25T08:29:46Z

bpo-28754-4.diff does 3 things:

convert C code to Argument Clinic
change docstrings of the C code
change unit tests

bpo-28754-5.diff doesn't touch tests, but still does 2 things. Sadly, I dislike the changes on docstrings because it makes the C docstring and the Python docstring inconsistent.

I suggest to copy/paste docstring between C and Python code.

By the way, why do we have a docstring different from Doc/library/bisect.rst? Maybe we can simply use the same doc everywhere, but just add reST formatting in Doc/library/bisect.rst?

Julien: I suggest to first restrict your change to Argument Clinic, and later write a new patch to update docstrings. I prefer to only do one thing per commit.

vadmium · 2016-11-25T13:00:17Z

Victor, what changes to the doc strings are you talking about? Adding descriptions of the parameters, maybe? Other than that they look very similar to me.

I haven’t looked closely, but the Python doc strings have been updated at least once more than the C doc strings: r54666 and bpo-1602378. So if you are copying anything, it might be better to copy from Py to C. But I would do that separately.

The doc strings and reference documentation just seem to have been added independently: ce2d120c3903 (main documentation) vs 67b3ac439f64 (doc strings)

vstinner · 2016-11-25T13:07:21Z

Martin: "Victor, what changes to the doc strings are you talking about?"

See attached check_bisect_doc.py script.

Before:
---
=== bisect_right ===
bisect_right(a, x[, lo[, hi]]) -> index

Return the index where to insert item x in list a, assuming a is sorted.

The return value i is such that all e in a[:i] have e <= x, and all e in
a[i:] have e > x. So if x already appears in the list, i points just
beyond the rightmost x already there

Optional args lo (default 0) and hi (default len(a)) bound the
slice of a to be searched.

=== insort_right ===

insort_right(a, x[, lo[, hi]])

Insert item x in list a, and keep it sorted assuming a is sorted.

If x is already in a, insert it to the right of the rightmost x.

Optional args lo (default 0) and hi (default len(a)) bound the
slice of a to be searched.

=== bisect_left ===
bisect_left(a, x[, lo[, hi]]) -> index

Return the index where to insert item x in list a, assuming a is sorted.

The return value i is such that all e in a[:i] have e < x, and all e in
a[i:] have e >= x. So if x already appears in the list, i points just
before the leftmost x already there.

Optional args lo (default 0) and hi (default len(a)) bound the
slice of a to be searched.

=== insort_left ===

insort_left(a, x[, lo[, hi]])

Insert item x in list a, and keep it sorted assuming a is sorted.

If x is already in a, insert it to the left of the leftmost x.

Optional args lo (default 0) and hi (default len(a)) bound the
slice of a to be searched.

---

After:
---
=== bisect_right ===
Return the index where to insert item x in list a, assuming a is sorted.

a
The list in which x is to be inserted.
x
The value to be inserted.
lo
Lower bound, defaults to 0.
hi
Upper bound, defaults to -1 (meaning len(a)).

The return value i is such that all e in a[:i] have e <= x, and all e in
a[i:] have e > x. So if x already appears in the list, i points just
beyond the rightmost x already there.

Optional args lo and hi bound the slice of a to be searched.
=== insort_right ===
Insert item x in list a, and keep it sorted assuming a is sorted.

a
The list in which x will be inserted.
x
The value to insert.
lo
Lower bound, defaults to 0.
hi
Upper bound, defaults to -1 (meaning len(a)).

If x is already in a, insert it to the right of the rightmost x.

Optional args lo and hi bound the slice of a to be searched.
=== bisect_left ===
Return the index where to insert item x in list a, assuming a is sorted.

a
The list in which x is to be inserted.
x
The value to be inserted.
lo
Lower bound, defaults to 0.
hi
Upper bound, defaults to -1 (meaning len(a)).

The return value i is such that all e in a[:i] have e < x, and all e in
a[i:] have e >= x. So if x already appears in the list, i points just
before the leftmost x already there.

Optional args lo and hi bound the slice of a to be searched.
=== insort_left ===
Insert item x in list a, and keep it sorted assuming a is sorted.

a
The list in which x will be inserted.
x
The value to insert.
lo
Lower bound, defaults to 0.
hi
Upper bound, defaults to -1 (meaning len(a)).

If x is already in a, insert it to the left of the leftmost x.

Optional args lo and hi bound the slice of a to be searched.

---

The output is different. I suggest to not touch the docstring in the patch upgrading _bisect.c to Argument Clinic to ease the review.

vadmium · 2016-11-25T13:35:21Z

Taking the first function, bisect_right(), as an example, I see these differences:

bisect_right(a, x[, lo[, hi]]) -> index
This signature is removed. I think removing it is reasonable, because pydoc can extract the proper signature from the Arg Clinic metadata.
Additional descriptions of each parameter. I tend to think these are redundant with the main text, so agree with removing them from the patch now.
Addition of full stop (.) at end of first paragraph. I suggested this as a minor cleanup, but it could be fixed later if you prefer.
Removal of default values in last paragraph. The first is redundant with the Arg Clinic signature, so I suggested to remove it. For the second, I would add it back if we remove the list of parameters, since it explains what the special value -1 means.

Do you want to revert all these differences?

JulienPalard · 2016-11-25T19:37:07Z

Quoting @martin

bisect_right(a, x[, lo[, hi]]) -> index
This signature is removed. I think removing it is reasonable, because pydoc can extract the proper signature from the Arg Clinic metadata.

In fact, http://docs.python.org/howto/clinic.html, bullet ".4", told me do remove it:

If the old docstring had a first line that looked like a function signature, throw that line away. (The docstring doesn’t need it anymore—when you use help() on your builtin in the future, the first line will be built automatically based on the function’s signature.)

Additional descriptions of each parameter. I tend to think these are redundant with the main text, so agree with removing them from the patch now.

OK, will remove them.

Addition of full stop (.) at end of first paragraph. I suggested this as a minor cleanup, but it could be fixed later if you prefer.

Can remove it to simplify verification, but as doctests will be different as doc told me to remove the signature, so is it work removing this dot ? I don't think so.

Removal of default values in last paragraph. The first is redundant with the Arg Clinic signature, so I suggested to remove it. For the second, I would add it back if we remove the list of parameters, since it explains what the special value -1 means.

I'll revert it too as I'll remove the documentation in the parameter list.

Will upload a patch later.

JulienPalard · 2016-11-26T09:23:20Z

Here is the new patch, I ran a diff between "./python -m pydoc _bisect" before and after my patch, here it is:

13,15c13
< bisect_left(...)
< bisect_left(a, x[, lo[, hi]]) -> index
<
---

bisect_left(a, x, lo=0, hi=-1)

25,27c23
< bisect_right(...)
< bisect_right(a, x[, lo[, hi]]) -> index
<
---

bisect_right(a, x, lo=0, hi=-1)

32c28
< beyond the rightmost x already there
---

    beyond the rightmost x already there.

37,39c33
< insort_left(...)
< insort_left(a, x[, lo[, hi]])
<
---

insort_left(a, x, lo=0, hi=-1)

47,49c41
< insort_right(...)
< insort_right(a, x[, lo[, hi]])
<
---

insort_right(a, x, lo=0, hi=-1)

So I kept my addition of the missing full stop, and dropped the handwritten signature as Argument Clinic generates it.

JulienPalard · 2016-11-26T09:27:43Z

Should we also update howto/clinic, bullet "11.", to be explicit about not adding per-parameter in the same patch? Like a:

If you're porting existing function to Argument clinic, skip this step to simplify code review.

?

rhettinger · 2016-11-26T10:19:50Z

Curiously, this patch gives about a 10% to 15% speedup. Any sense of how that improvement arises?

$ py3.7 -m timeit -s 'from bisect import bisect' -s 'a=list(range(100))' 'bisect(a, 10)'

229 nsec # Clang baseline
202 nsec # Clang with patch

189 nsec # GCC-6 baseline
156 nsec # Gcc-6 with patch

JulienPalard · 2016-11-26T10:26:37Z

Hi Raymond,

Curiously, this patch gives about a 10% to 15% speedup. Any sense of how that improvement arises?

That's because Argument Clinic is generating methoddef with METH_FASTCALL:

$ grep FASTCALL Modules/clinic/_bisectmodule.c.h 
    {"bisect_right", (PyCFunction)bisect_bisect_right, METH_FASTCALL, bisect_bisect_right__doc__},
    {"insort_right", (PyCFunction)bisect_insort_right, METH_FASTCALL, bisect_insort_right__doc__},
    {"bisect_left", (PyCFunction)bisect_bisect_left, METH_FASTCALL, bisect_bisect_left__doc__},
    {"insort_left", (PyCFunction)bisect_insort_left, METH_FASTCALL, bisect_insort_left__doc__},

Instead of METH_VARARGS|METH_KEYWORDS:

{"bisect_right", (PyCFunction)bisect_right,

   METH_VARARGS|METH_KEYWORDS, bisect_right_doc},

{"insort_right", (PyCFunction)insort_right,

   METH_VARARGS|METH_KEYWORDS, insort_right_doc},

{"bisect_left", (PyCFunction)bisect_left,

   METH_VARARGS|METH_KEYWORDS, bisect_left_doc},

{"insort_left", (PyCFunction)insort_left,

   METH_VARARGS|METH_KEYWORDS, insort_left_doc},

serhiy-storchaka · 2016-11-26T10:45:06Z

Julien, you can declare the hi parameter as

hi: Py_ssize_t(py_default="len(a)") = -1

if you want the signature be matching the documentation.

In general, it is better to use converter names rather than format codes.

Curiously, this patch gives about a 10% to 15% speedup. Any sense of how that improvement arises?

This is rather a random difference. Try to run the bench several times. With using Victor's perf module the difference is not significant.

Maybe using FASTCALL have some positive effect, but it is unlikely noticeable with such complex function.

JulienPalard · 2016-11-26T11:44:18Z

Hi Serhiy,

Julien, you can declare the hi parameter as
hi: Py_ssize_t(py_default="len(a)") = -1

Looks like a good idea, I was aware of its existance but did not took the time to read the doc about it, kind of learning step by stpe. But as you provided the syntax, I tested it, thanks for this! But it fails with a RuntimeError during python -m pydoc _bisect, I'm currently trying to understand why:

$ ./python -m pydoc _bisect 
Traceback (most recent call last):
  File "/home/mdk/cpython-git/Lib/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/mdk/cpython-git/Lib/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/mdk/cpython-git/Lib/pydoc.py", line 2663, in <module>
    cli()
  File "/home/mdk/cpython-git/Lib/pydoc.py", line 2628, in cli
    help.help(arg)
  File "/home/mdk/cpython-git/Lib/pydoc.py", line 1908, in help
    elif request: doc(request, 'Help on %s:', output=self._output)
  File "/home/mdk/cpython-git/Lib/pydoc.py", line 1645, in doc
    pager(render_doc(thing, title, forceload))
  File "/home/mdk/cpython-git/Lib/pydoc.py", line 1638, in render_doc
    return title % desc + '\n\n' + renderer.document(object, name)
  File "/home/mdk/cpython-git/Lib/pydoc.py", line 382, in document
    if inspect.ismodule(object): return self.docmodule(*args)
  File "/home/mdk/cpython-git/Lib/pydoc.py", line 1172, in docmodule
    contents.append(self.document(value, key, name))
  File "/home/mdk/cpython-git/Lib/pydoc.py", line 384, in document
    if inspect.isroutine(object): return self.docroutine(*args)
  File "/home/mdk/cpython-git/Lib/pydoc.py", line 1357, in docroutine
    signature = inspect.signature(object)
  File "/home/mdk/cpython-git/Lib/inspect.py", line 2994, in signature
    return Signature.from_callable(obj, follow_wrapped=follow_wrapped)
  File "/home/mdk/cpython-git/Lib/inspect.py", line 2744, in from_callable
    follow_wrapper_chains=follow_wrapped)
  File "/home/mdk/cpython-git/Lib/inspect.py", line 2223, in _signature_from_callable
    skip_bound_arg=skip_bound_arg)
  File "/home/mdk/cpython-git/Lib/inspect.py", line 2055, in _signature_from_builtin
    return _signature_fromstr(cls, func, s, skip_bound_arg)
  File "/home/mdk/cpython-git/Lib/inspect.py", line 2003, in _signature_fromstr
    p(name, default)
  File "/home/mdk/cpython-git/Lib/inspect.py", line 1985, in p
    default_node = RewriteSymbolics().visit(default_node)
  File "/home/mdk/cpython-git/Lib/ast.py", line 253, in visit
    return visitor(node)
  File "/home/mdk/cpython-git/Lib/ast.py", line 317, in generic_visit
    new_node = self.visit(old_value)
  File "/home/mdk/cpython-git/Lib/ast.py", line 253, in visit
    return visitor(node)
  File "/home/mdk/cpython-git/Lib/inspect.py", line 1977, in visit_Name
    return wrap_value(node.id)
  File "/home/mdk/cpython-git/Lib/inspect.py", line 1959, in wrap_value
    raise RuntimeError()
RuntimeError

> Curiously, this patch gives about a 10% to 15% speedup. Any sense of how that improvement arises?
This is rather a random difference. Try to run the bench several times. With using Victor's perf module the difference is not significant.

If you search for "perf" in the history of this issue you'll see that I already tested with Victor's perf module and it yielded a consistent 18% improvement on bisect.bisect("abcdef", "c") on two different machines. Clearly with more than 5 items, it will start to fade, and with a lot of items, it's not significand, obviously as it's an optimization about the call and not the implementation. So Raymond's test with 100 items may still see a speedup.

JulienPalard · 2016-11-26T12:52:19Z

Hi Serhiy,

hi: Py_ssize_t(py_default="len(a)") = -1

Won't works, as pydoc will use the inspect module (_signature_fromstr) to get the signature. _signature_fromstr expects a valid python signature, but def foo(a, x, lo=0, high=len(a)): pass is not valid (SyntaxError).

Should we note that in the clinic documentation:

py_default
default as it should appear in Python code, as a string. Or None if there is no default.

py_default
default as it should appear in valid Python code, as a string. Or None if there is no default.

?

vstinner · 2016-11-26T15:06:45Z

Argument Clinic adds a signature and a better docstring. So I like it!

Speedup for tiny lists is just a nice side effect.

rhettinger · 2016-11-26T19:00:04Z

[SS]

This is rather a random difference. Try to run the bench several times.

I did run several times. The results were perfectly stable (+/- 1ns). It tells us that METH_FASTCALL is something we want to use as broadly as possible.

[JP]

it yielded a consistent 18% improvement on bisect.bisect("abcdef", "c")
on two different machines.

Thanks for running timings. When you do timing in the future, try to select timings that are indicative of actual use. No one uses bisect to search strings for a character -- the typical use case is searching a list of range breakpoints like the example shown in the docs:

    def grade(score, breakpoints=[60, 70, 80, 90], grades='FDCBA'):
        i = bisect(breakpoints, score)
        return grades[i]

Hash tables are faster for exact lookup. Bisect is mainly used for searching ranges (i.e. tax brackets or position in a cumulative distribution).

[VS]

Speedup for tiny lists is just a nice side effect.

To you perhaps; however, the entire reason I put this code in many years ago was for performance. The existing pure python code was already very fast. Anyway, this patch is a nice improvement in that regard.

[Everyone]

What to do about the "hi" argument is a sticky point. This is a long-standing API, so any API changes would likely break some code or lock us into exposing unintuitive implementation details (like hi=-1).

The point of view of the existing C code, the pure Python version, and the docs is that the "hi" argument is an optional argument that when omitted will default to "len(a)". It was not intended that a user would explicitly pass in "hi=None" as allowed by the pure python version but not by the C version, nor was it intended to have the user pass in "hi=-1" as allowed by the C version but causes incorrect behavior on the pure Python version.

It is unfortunate that the argument clinic has not yet grown support for the concept of "there is special meaning for an omitted argument that cannot be simulated by passing in a particular default value". I believe when this issue has arisen elsewhere, the decision was to skip applying argument clinic at all. See: builtin_getattr, builtin_next, dict_pop, etc.

In this case though, we already have a conflict, so it may be possible to resolve the issue through clear parameter descriptions, indicating that "hi=-1" and "hi=None" are implementation details and that it is intended that the user never explicitly pass in either of those values, and that only the guaranteed ways to get the default is to omit the argument entirely or to pass in "hi=len(a)" as a numeric argument.

serhiy-storchaka · 2016-11-26T19:18:14Z

I would suggest to use sys.maxsize for default value in Argument Clinic, but right now bisect functions don't support well hi > len(a). This needs adding an equivalent of hi = min(hi, len(a)) in C code.

-1 could be explicitly supported for backward compatibility. But it could be deprecated.

rhettinger · 2016-11-27T18:37:29Z

I would suggest to use sys.maxsize for default value in Argument Clinic

No thanks. I don't want to subtly change the API for this module or even suggest to users that it might be a good idea to set hi > len. Further, we don't want to slow the pure python code for this pointless extension.

Really, we're dancing around the issue that argument clinic and signatures need to become more expressive, allowing for omitted arguments without requiring a default value. Waiting for this to be done is far preferable to mangling long-standing APIs to force fit them to into argument clinic.

Elsewhere, we've shown self discipline and restraint when applying AC, skipping over cases where it doesn't fit. If we can't find a way to apply AC without changing this API, this issue should be closed and deferred until AC grows the requisite expressive power.

rhettinger · 2016-11-27T20:12:25Z

After some further thought, I am open to changing the C API to accept either hi=None or hi=-1 to reconcile the implementation details without breaking any existing code that relies on either. That would let the argument clinic expose a default value of hi=None. That isn't perfect because it complicates the code and it exposes an implementation detail contrary to what the API originally intended.

Short of that, this module ought to be skipped for the argument clinic pending a build-out of signature objects to handle signatures that vary depending on the number of arguments supplied.

JulienPalard · 2016-11-28T14:22:17Z

Hi Raymond,

About Argument Clinic
#####################

Just to clarify the situation, Argument Clinic allows for clear method signature, typically: hi: Py_ssize_t(py_default="len(a)") = -1 gives the expected "hi=len(a)" signature successfully.

The problem is that it's not a legal signature (it's a syntax error), so pydoc crashes while parsing it (search for RuntimeError in the history of the issue for a full stacktrace).

About what should we put in the signature
#########################################

Question is: Should we allow "semantically precise, but frustrating*" signatures to be documented or should we be move the semantic away from the signature?

*: I mean, is a new user sees "hi=len(a)" in a signature in a docstring he will immediately flag this syntax as valid and usable later, which lead him to a syntax error and some frustration.

Something else to keep in mind is that I expect users be able to call those methods indirectly, like:

def whatever_they_need(self, lo=0, hi=?):
    …
    bisect.bisect(…, …, lo, hi)
    …

Currently, to follow strictly the documentation they have to write:

def whatever_they_need(self, lo=0, hi=None):
    …
    if hi is not None:
        bisect.bisect(…, …, lo, hi)
    else:
        bisect.bisect(…, …, lo)
    …

Which looks bad to me. So from my point of view we *have to* tell them what is the real default value for this parameter so they can use it in their implementations.

Should we generalize and say that depending on the number of arguments is a bad practice? I think so.

rhettinger · 2016-11-28T16:16:14Z

Sorry Julien, you don't to decide that long-standing APIs that have worked just fine for users are now a bad practice. Some of these APIs (range, dict.pop, type) we designed by Guido and work fine in practice. The expression, "don't have the tail wag the dog" comes to mind (where "tail" is the existing successful language and "dog" is the argument clinic). Besides, API change is very disruptive to users (especially those trying to migrate to Python 3).

For the case of bisect, I've given a way forward. It is okay to make the documented default value for "hi" to be "None" while internally still allowing -1 so that we don't break code that happens to depend on it. In the clinic, use "hi: 'O' = None". Then in the dispatch code, handle both the None case and the numeric case (see islice_new() in itertoolsmodule.c for some ideas on how to do this.

What isn't okay is to document and expose "hi=-1" because that is a confusing and error-prone default value that is just an awkward hack to express "hi=len(a)".

vstinner · 2016-11-28T16:51:11Z

Sorry, I didn't follow the discussion, but why not using "/" magic separator in Argument Clinic to mark arguments as optional but don't document their default value?

serhiy-storchaka · 2016-11-28T17:03:22Z

Yet one option is to introduce the constant UNBOUND = -1 in the bisect module and use it as a default value.

serhiy-storchaka · 2016-11-28T17:23:22Z

Sorry, I didn't follow the discussion, but why not using "/" magic separator in Argument Clinic to mark arguments as optional but don't document their default value?

'/' marks positional-only arguments, not optional arguments.

vstinner · 2016-11-28T17:50:09Z

Ah right, _bisect.bisect_right() support keywords. I expected that it
doesn't support keywords yet ;-)

vadmium · 2016-11-29T01:01:44Z

Regarding the problem with the default value, can we use “unspecified”, as documented at <https://docs.python.org/dev/howto/clinic.html#writing-a-custom-converter\>, or is that an old relic that no longer exists (hinted at the end of bpo-20293)?

Failing that, I prefer Raymond’s earlier suggestion: say that any support for −1 or None are implementation details, and not part of the API. The same statement could be added to the native Python doc strings as well.

BTW I am not opposed to the proposal for proper hi=None support either, although it does complicate the change. IMO allowing hi=None could be done as a second step.

JulienPalard · 2016-11-29T18:43:31Z

I tried myself at Argument Clinic custom converters to create the "optional ssize_t converter" and it works, so as advised by Raymond, pydoc now shows None, and the C implementation now allows for both default values None and -1, as the custom converter returns -1 for none.

skrah · 2016-11-29T19:01:28Z

Julien, the syntax converters look pretty clever. Do we need AC
everywhere though? I wonder (once again) if this is really more
readable than the existing code.

vstinner · 2016-11-29T20:00:52Z

Stefan Krah added the comment:

Julien, the syntax converters look pretty clever. Do we need AC
everywhere though?

Please see previous comments for advantages of AC (signature object,
docstring, speed).

skrah · 2016-11-29T20:22:38Z

Signature and docstring can be done manually with very little effort.
Currently METH_FASTCALL is AC only, but I hope that will change.

vadmium · 2016-11-29T21:34:29Z

If adding proper support for hi=None, maybe lo=None should also be supported. Also, I would think the main Doc/library/bisect.rst documentation needs updating, and a test and What’s New entry added.

rhettinger · 2016-11-30T04:01:14Z

If adding proper support for hi=None, maybe lo=None should
also be supported.

That would be gratuitous. Lo already has a reasonable, useful, and self-explanatory value. This would add more complexity to the signature while reducing clarity. I really don't want to see calls like bisect(arr, x, None).

vadmium · 2016-12-03T23:08:29Z

Fair enough, I don’t really mind if it is (lo=0, hi=None). I think I have only used bisect with both defaults anyway.

JulienPalard · 2016-12-04T16:11:20Z

Hi Martin,

Historically (lo=0, hi=None) was the implementation (in Python), until the C implementation, which used (lo=0, hi=-1). As my implementation allows for -1 and None, I don't think I'm breaking something.

JulienPalard · 2016-12-10T09:36:39Z

I was able to simplify my patch a bit, I think I should also add a test to ensure we keep the hi=-1 and hi=None compatibility in the future.

JulienPalard · 2016-12-10T14:54:55Z

Added a test to ensure compatibility of both hi=None (introduced in original Python version) and hi=-1 (Introduced by the C version).

Modified Python version to be compatible with the C-introduced hi=-1, so that the new test pass.

JulienPalard · 2016-12-10T17:58:40Z

Maybe wait for bpo-28933.

larryhastings · 2016-12-10T20:44:31Z

FWIW, pypy uses defaults of lo=0, hi=None.

https://bitbucket.org/pypy/pypy/src/25da7bc9719457aee57547993833d5168ba6aded/lib-python/2.7/bisect.py?at=default&fileviewer=file-view-default

JulienPalard · 2016-12-10T21:22:13Z

Hi Larry,

In any cases it looks like supporting hi=-1 and hi=None is mandatory:

hi=None is the current implementation in the bisect (Python) module.
hi=-1 is the current implementation in the _bisect (C) module.

Both are currently living together, the C version takes over the Python version when available, but the Python version is older, so both -1 and None has been the "current implementation" during some time.

Raymond legitimately fear that someone, somewhere, used -1 in some code. We should not break it (even if it had never been documented).

So let's just accept both for the moment, and if deprecation of one or another should be discussed, let's do it later, in another issue. I'm _just_ trying to make AC move forward here, without breaking anything.

larryhastings · 2016-12-10T22:19:54Z

Okay, but do it with a converter function, not by changing the default behavior for Py_ssize_t. The vast majority of uses of Py_ssize_t should not accept None.

rhettinger · 2016-12-11T00:11:39Z

Why does this need a converter function? I there is any reason this can't use "O" with "hi" defaulting to None (matching the pure python API) and letting the C function handle both -1 and None in the implementation rather than in AC?

JulienPalard · 2016-12-11T00:27:09Z

Hi Raymond,

I don't like having the converters in the C implementation too, that's why I'm working on bpo-28933 to clean this.

letting the C function handle both -1 and None in the implementation rather than in AC?

It works, yes. But I prefer to clearly split responsibilities: AC being responsible of adapting argument from PyObjects to C types, and C functions being responsible of ... doing their job.

If the idea in bpo-28933 is accepted, we'll be able to declare hi as simply as:

hi: Py_ssize_t(c_default="-1") = None

meaning "C function will get a Py_ssize_t, default value for C code is -1, None is documented, and None can be passed to get the C default value", that's this last point "None can be passed to get the C default value" that I introduce in bpo-28933.

With this syntax, both C converters and the python hi_parameter_converter can be dropped.

larryhastings · 2016-12-11T00:32:38Z

We don't *have* to use a converter function, no. But consider: someone somewhere will have to convert a PyObject * into a Py_ssize_t, also accepting None. Doing it in a converter function means we can easily share the code with bisect_right, which should presumably behave the same way. It also keeps all the argument parsing logic in the Argument Clinic domain instead of having some in AC and some in the implementation.

Is there some drawback to using a converter function?

JulienPalard · 2016-12-13T13:31:46Z

As the pattern of this converter is not widely used, I'll let it in the code of _bisect for the moment, see: http://bugs.python.org/issue28933.

serhiy-storchaka · 2017-04-05T06:34:15Z

Now Py_ssize_t(accept={int, NoneType}) can be used instead of a local converter.

rhettinger · 2017-04-05T06:42:20Z

Wouldn't it be easily to let the OP apply AC to some other module. Right now, it isn't a very good fit and was this a tough one to start with.

rhettinger · 2017-09-24T23:15:12Z

After more thought, I'm going to close this one because the Argument Clinic and signature objects aren't yet a good fit for this API. The bisect module isn't broken (its API has been successfully living in the wild for a very long time). I do welcome Argument Clinic patches where there is a good fit but don't think this is the place. Each of the proposed patches makes the module worst in one way or another.

rhettinger · 2017-09-25T02:51:14Z

FWIW, the itertools module has many functions that can have the ArgumentClinic applied without any fundamental problems (pretty much all of the functions will work except for repeat(), accumulate(), and islice()).

JulienPalard added the topic-argument-clinic label Nov 20, 2016

serhiy-storchaka added the 3.7 (EOL) end of life label Nov 20, 2016

rhettinger self-assigned this Nov 27, 2016

rhettinger closed this as completed Sep 24, 2017

ezio-melotti transferred this issue from another repository Apr 10, 2022

Argument Clinic for bisect.bisect_left #72940

Argument Clinic for bisect.bisect_left #72940

Comments

JulienPalard commented Nov 20, 2016

JulienPalard commented Nov 20, 2016

larryhastings commented Nov 20, 2016

larryhastings commented Nov 20, 2016

JulienPalard commented Nov 20, 2016

JulienPalard commented Nov 20, 2016

JulienPalard commented Nov 20, 2016

vstinner commented Nov 24, 2016

JulienPalard commented Nov 24, 2016

vstinner commented Nov 25, 2016

vadmium commented Nov 25, 2016

vstinner commented Nov 25, 2016

vadmium commented Nov 25, 2016

JulienPalard commented Nov 25, 2016

JulienPalard commented Nov 26, 2016

JulienPalard commented Nov 26, 2016

rhettinger commented Nov 26, 2016

JulienPalard commented Nov 26, 2016

serhiy-storchaka commented Nov 26, 2016

JulienPalard commented Nov 26, 2016

JulienPalard commented Nov 26, 2016

vstinner commented Nov 26, 2016

rhettinger commented Nov 26, 2016

serhiy-storchaka commented Nov 26, 2016

rhettinger commented Nov 27, 2016

rhettinger commented Nov 27, 2016

JulienPalard commented Nov 28, 2016

rhettinger commented Nov 28, 2016

vstinner commented Nov 28, 2016

serhiy-storchaka commented Nov 28, 2016

serhiy-storchaka commented Nov 28, 2016

vstinner commented Nov 28, 2016

vadmium commented Nov 29, 2016

JulienPalard commented Nov 29, 2016

skrah mannequin commented Nov 29, 2016

vstinner commented Nov 29, 2016

skrah mannequin commented Nov 29, 2016

vadmium commented Nov 29, 2016

rhettinger commented Nov 30, 2016

vadmium commented Dec 3, 2016

JulienPalard commented Dec 4, 2016

JulienPalard commented Dec 10, 2016

JulienPalard commented Dec 10, 2016

JulienPalard commented Dec 10, 2016

larryhastings commented Dec 10, 2016

JulienPalard commented Dec 10, 2016

larryhastings commented Dec 10, 2016

rhettinger commented Dec 11, 2016

JulienPalard commented Dec 11, 2016

larryhastings commented Dec 11, 2016

JulienPalard commented Dec 13, 2016

serhiy-storchaka commented Apr 5, 2017

rhettinger commented Apr 5, 2017

rhettinger commented Sep 24, 2017

rhettinger commented Sep 25, 2017