New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Argument Clinic for bisect.bisect_left #72940
Comments
Today I read https://docs.python.org/3.6/howto/clinic.html so I tried one: bisect.bisect_left. I was unable to do |
There's special syntax to handle aliases. From comments in clinic.py:
Sorry the docs are out of date--patches welcome. And speaking of patches, thanks for the patch! I'll review it when you say it's ready. |
Oh, and, if the code literally asserts they're the same function, that's just a sanity check based on the implementation. You could preserve that if you care to, or you could just write a new function and remove the assertion. Do what you think is best, then post your patch to the tracker and we'll all argue about it forever, how's that sound? |
I searched an occurrence of what I'm describing which is already using clinic and there is, at least, one in Modules/binascii.c line 1090: TL;DR: The idea is to use the This way looks good to me, it uses the dedicated syntax which looks right, and I'll drop the
|
Here it is for the whole bisect module. I separated my work in commits, but I'm not sure how rietveld will eat that as they'll have unknown references, so I'll probably also upload a single patch with a known reference. |
The whole diff is reviewable in |
Hum, I dislike your changes on test_bisect.py: I don't see why Argument Clinic would have to modify tests!? I created issue bpo-28792: "bisect: implement aliases in Python, remove C aliases". |
New simplier patch thanks to https://bugs.python.org/issue28792. Also corrected docstrings. Also ran a micro-benchmark of FTR: ./python -m perf timeit -s 'import bisect; foo = list("abdef")' 'bisect.bisect(foo, "c")' -o after.json --inherit=PYTHONPATH --affinity=2,3 |
bpo-28754-4.diff does 3 things:
bpo-28754-5.diff doesn't touch tests, but still does 2 things. Sadly, I dislike the changes on docstrings because it makes the C docstring and the Python docstring inconsistent. I suggest to copy/paste docstring between C and Python code. By the way, why do we have a docstring different from Doc/library/bisect.rst? Maybe we can simply use the same doc everywhere, but just add reST formatting in Doc/library/bisect.rst? Julien: I suggest to first restrict your change to Argument Clinic, and later write a new patch to update docstrings. I prefer to only do one thing per commit. |
Victor, what changes to the doc strings are you talking about? Adding descriptions of the parameters, maybe? Other than that they look very similar to me. I haven’t looked closely, but the Python doc strings have been updated at least once more than the C doc strings: r54666 and bpo-1602378. So if you are copying anything, it might be better to copy from Py to C. But I would do that separately. The doc strings and reference documentation just seem to have been added independently: ce2d120c3903 (main documentation) vs 67b3ac439f64 (doc strings) |
Martin: "Victor, what changes to the doc strings are you talking about?" See attached check_bisect_doc.py script. Before: Return the index where to insert item x in list a, assuming a is sorted. The return value i is such that all e in a[:i] have e <= x, and all e in Optional args lo (default 0) and hi (default len(a)) bound the === insort_right === insort_right(a, x[, lo[, hi]]) Insert item x in list a, and keep it sorted assuming a is sorted. If x is already in a, insert it to the right of the rightmost x. Optional args lo (default 0) and hi (default len(a)) bound the === bisect_left === Return the index where to insert item x in list a, assuming a is sorted. The return value i is such that all e in a[:i] have e < x, and all e in Optional args lo (default 0) and hi (default len(a)) bound the === insort_left === insort_left(a, x[, lo[, hi]]) Insert item x in list a, and keep it sorted assuming a is sorted. If x is already in a, insert it to the left of the leftmost x. Optional args lo (default 0) and hi (default len(a)) bound the --- After: a The return value i is such that all e in a[:i] have e <= x, and all e in Optional args lo and hi bound the slice of a to be searched. a If x is already in a, insert it to the right of the rightmost x. Optional args lo and hi bound the slice of a to be searched. a The return value i is such that all e in a[:i] have e < x, and all e in Optional args lo and hi bound the slice of a to be searched. a If x is already in a, insert it to the left of the leftmost x. Optional args lo and hi bound the slice of a to be searched. --- The output is different. I suggest to not touch the docstring in the patch upgrading _bisect.c to Argument Clinic to ease the review. |
Taking the first function, bisect_right(), as an example, I see these differences:
Do you want to revert all these differences? |
Quoting @martin
In fact, http://docs.python.org/howto/clinic.html, bullet ".4", told me do remove it:
OK, will remove them.
Can remove it to simplify verification, but as doctests will be different as doc told me to remove the signature, so is it work removing this dot ? I don't think so.
I'll revert it too as I'll remove the documentation in the parameter list. Will upload a patch later. |
Here is the new patch, I ran a diff between "./python -m pydoc _bisect" before and after my patch, here it is: 13,15c13
25,27c23
32c28
37,39c33
47,49c41
So I kept my addition of the missing full stop, and dropped the handwritten signature as Argument Clinic generates it. |
Should we also update howto/clinic, bullet "11.", to be explicit about not adding per-parameter in the same patch? Like a:
? |
Curiously, this patch gives about a 10% to 15% speedup. Any sense of how that improvement arises? $ py3.7 -m timeit -s 'from bisect import bisect' -s 'a=list(range(100))' 'bisect(a, 10)' 229 nsec # Clang baseline 189 nsec # GCC-6 baseline |
Hi Raymond,
That's because Argument Clinic is generating methoddef with METH_FASTCALL: $ grep FASTCALL Modules/clinic/_bisectmodule.c.h
{"bisect_right", (PyCFunction)bisect_bisect_right, METH_FASTCALL, bisect_bisect_right__doc__},
{"insort_right", (PyCFunction)bisect_insort_right, METH_FASTCALL, bisect_insort_right__doc__},
{"bisect_left", (PyCFunction)bisect_bisect_left, METH_FASTCALL, bisect_bisect_left__doc__},
{"insort_left", (PyCFunction)bisect_insort_left, METH_FASTCALL, bisect_insort_left__doc__}, Instead of METH_VARARGS|METH_KEYWORDS:
|
Julien, you can declare the hi parameter as
if you want the signature be matching the documentation. In general, it is better to use converter names rather than format codes.
This is rather a random difference. Try to run the bench several times. With using Victor's perf module the difference is not significant. Maybe using FASTCALL have some positive effect, but it is unlikely noticeable with such complex function. |
Hi Serhiy,
Looks like a good idea, I was aware of its existance but did not took the time to read the doc about it, kind of learning step by stpe. But as you provided the syntax, I tested it, thanks for this! But it fails with a RuntimeError during python -m pydoc _bisect, I'm currently trying to understand why: $ ./python -m pydoc _bisect
Traceback (most recent call last):
File "/home/mdk/cpython-git/Lib/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/home/mdk/cpython-git/Lib/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/mdk/cpython-git/Lib/pydoc.py", line 2663, in <module>
cli()
File "/home/mdk/cpython-git/Lib/pydoc.py", line 2628, in cli
help.help(arg)
File "/home/mdk/cpython-git/Lib/pydoc.py", line 1908, in help
elif request: doc(request, 'Help on %s:', output=self._output)
File "/home/mdk/cpython-git/Lib/pydoc.py", line 1645, in doc
pager(render_doc(thing, title, forceload))
File "/home/mdk/cpython-git/Lib/pydoc.py", line 1638, in render_doc
return title % desc + '\n\n' + renderer.document(object, name)
File "/home/mdk/cpython-git/Lib/pydoc.py", line 382, in document
if inspect.ismodule(object): return self.docmodule(*args)
File "/home/mdk/cpython-git/Lib/pydoc.py", line 1172, in docmodule
contents.append(self.document(value, key, name))
File "/home/mdk/cpython-git/Lib/pydoc.py", line 384, in document
if inspect.isroutine(object): return self.docroutine(*args)
File "/home/mdk/cpython-git/Lib/pydoc.py", line 1357, in docroutine
signature = inspect.signature(object)
File "/home/mdk/cpython-git/Lib/inspect.py", line 2994, in signature
return Signature.from_callable(obj, follow_wrapped=follow_wrapped)
File "/home/mdk/cpython-git/Lib/inspect.py", line 2744, in from_callable
follow_wrapper_chains=follow_wrapped)
File "/home/mdk/cpython-git/Lib/inspect.py", line 2223, in _signature_from_callable
skip_bound_arg=skip_bound_arg)
File "/home/mdk/cpython-git/Lib/inspect.py", line 2055, in _signature_from_builtin
return _signature_fromstr(cls, func, s, skip_bound_arg)
File "/home/mdk/cpython-git/Lib/inspect.py", line 2003, in _signature_fromstr
p(name, default)
File "/home/mdk/cpython-git/Lib/inspect.py", line 1985, in p
default_node = RewriteSymbolics().visit(default_node)
File "/home/mdk/cpython-git/Lib/ast.py", line 253, in visit
return visitor(node)
File "/home/mdk/cpython-git/Lib/ast.py", line 317, in generic_visit
new_node = self.visit(old_value)
File "/home/mdk/cpython-git/Lib/ast.py", line 253, in visit
return visitor(node)
File "/home/mdk/cpython-git/Lib/inspect.py", line 1977, in visit_Name
return wrap_value(node.id)
File "/home/mdk/cpython-git/Lib/inspect.py", line 1959, in wrap_value
raise RuntimeError()
RuntimeError
If you search for "perf" in the history of this issue you'll see that I already tested with Victor's perf module and it yielded a consistent 18% improvement on bisect.bisect("abcdef", "c") on two different machines. Clearly with more than 5 items, it will start to fade, and with a lot of items, it's not significand, obviously as it's an optimization about the call and not the implementation. So Raymond's test with 100 items may still see a speedup. |
Hi Serhiy,
Won't works, as pydoc will use the inspect module (_signature_fromstr) to get the signature. _signature_fromstr expects a valid python signature, but Should we note that in the clinic documentation:
? |
Argument Clinic adds a signature and a better docstring. So I like it! Speedup for tiny lists is just a nice side effect. |
[SS]
I did run several times. The results were perfectly stable (+/- 1ns). It tells us that METH_FASTCALL is something we want to use as broadly as possible. [JP]
Thanks for running timings. When you do timing in the future, try to select timings that are indicative of actual use. No one uses bisect to search strings for a character -- the typical use case is searching a list of range breakpoints like the example shown in the docs: def grade(score, breakpoints=[60, 70, 80, 90], grades='FDCBA'):
i = bisect(breakpoints, score)
return grades[i] Hash tables are faster for exact lookup. Bisect is mainly used for searching ranges (i.e. tax brackets or position in a cumulative distribution). [VS]
To you perhaps; however, the entire reason I put this code in many years ago was for performance. The existing pure python code was already very fast. Anyway, this patch is a nice improvement in that regard. [Everyone] What to do about the "hi" argument is a sticky point. This is a long-standing API, so any API changes would likely break some code or lock us into exposing unintuitive implementation details (like hi=-1). The point of view of the existing C code, the pure Python version, and the docs is that the "hi" argument is an optional argument that when omitted will default to "len(a)". It was not intended that a user would explicitly pass in "hi=None" as allowed by the pure python version but not by the C version, nor was it intended to have the user pass in "hi=-1" as allowed by the C version but causes incorrect behavior on the pure Python version. It is unfortunate that the argument clinic has not yet grown support for the concept of "there is special meaning for an omitted argument that cannot be simulated by passing in a particular default value". I believe when this issue has arisen elsewhere, the decision was to skip applying argument clinic at all. See: builtin_getattr, builtin_next, dict_pop, etc. In this case though, we already have a conflict, so it may be possible to resolve the issue through clear parameter descriptions, indicating that "hi=-1" and "hi=None" are implementation details and that it is intended that the user never explicitly pass in either of those values, and that only the guaranteed ways to get the default is to omit the argument entirely or to pass in "hi=len(a)" as a numeric argument. |
I would suggest to use sys.maxsize for default value in Argument Clinic, but right now bisect functions don't support well hi > len(a). This needs adding an equivalent of hi = min(hi, len(a)) in C code. -1 could be explicitly supported for backward compatibility. But it could be deprecated. |
No thanks. I don't want to subtly change the API for this module or even suggest to users that it might be a good idea to set hi > len. Further, we don't want to slow the pure python code for this pointless extension. Really, we're dancing around the issue that argument clinic and signatures need to become more expressive, allowing for omitted arguments without requiring a default value. Waiting for this to be done is far preferable to mangling long-standing APIs to force fit them to into argument clinic. Elsewhere, we've shown self discipline and restraint when applying AC, skipping over cases where it doesn't fit. If we can't find a way to apply AC without changing this API, this issue should be closed and deferred until AC grows the requisite expressive power. |
After some further thought, I am open to changing the C API to accept either hi=None or hi=-1 to reconcile the implementation details without breaking any existing code that relies on either. That would let the argument clinic expose a default value of hi=None. That isn't perfect because it complicates the code and it exposes an implementation detail contrary to what the API originally intended. Short of that, this module ought to be skipped for the argument clinic pending a build-out of signature objects to handle signatures that vary depending on the number of arguments supplied. |
Hi Raymond, About Argument Clinic Just to clarify the situation, Argument Clinic allows for clear method signature, typically: The problem is that it's not a legal signature (it's a syntax error), so pydoc crashes while parsing it (search for RuntimeError in the history of the issue for a full stacktrace). About what should we put in the signature Question is: Should we allow "semantically precise, but frustrating*" signatures to be documented or should we be move the semantic away from the signature? *: I mean, is a new user sees "hi=len(a)" in a signature in a docstring he will immediately flag this syntax as valid and usable later, which lead him to a syntax error and some frustration. Something else to keep in mind is that I expect users be able to call those methods indirectly, like: def whatever_they_need(self, lo=0, hi=?):
…
bisect.bisect(…, …, lo, hi)
… Currently, to follow strictly the documentation they have to write: def whatever_they_need(self, lo=0, hi=None):
…
if hi is not None:
bisect.bisect(…, …, lo, hi)
else:
bisect.bisect(…, …, lo)
… Which looks bad to me. So from my point of view we *have to* tell them what is the real default value for this parameter so they can use it in their implementations. Should we generalize and say that depending on the number of arguments is a bad practice? I think so. |
Sorry Julien, you don't to decide that long-standing APIs that have worked just fine for users are now a bad practice. Some of these APIs (range, dict.pop, type) we designed by Guido and work fine in practice. The expression, "don't have the tail wag the dog" comes to mind (where "tail" is the existing successful language and "dog" is the argument clinic). Besides, API change is very disruptive to users (especially those trying to migrate to Python 3). For the case of bisect, I've given a way forward. It is okay to make the documented default value for "hi" to be "None" while internally still allowing -1 so that we don't break code that happens to depend on it. In the clinic, use "hi: 'O' = None". Then in the dispatch code, handle both the None case and the numeric case (see islice_new() in itertoolsmodule.c for some ideas on how to do this. What isn't okay is to document and expose "hi=-1" because that is a confusing and error-prone default value that is just an awkward hack to express "hi=len(a)". |
Sorry, I didn't follow the discussion, but why not using "/" magic separator in Argument Clinic to mark arguments as optional but don't document their default value? |
Yet one option is to introduce the constant UNBOUND = -1 in the bisect module and use it as a default value. |
'/' marks positional-only arguments, not optional arguments. |
Ah right, _bisect.bisect_right() support keywords. I expected that it |
Regarding the problem with the default value, can we use “unspecified”, as documented at <https://docs.python.org/dev/howto/clinic.html#writing-a-custom-converter\>, or is that an old relic that no longer exists (hinted at the end of bpo-20293)? Failing that, I prefer Raymond’s earlier suggestion: say that any support for −1 or None are implementation details, and not part of the API. The same statement could be added to the native Python doc strings as well. BTW I am not opposed to the proposal for proper hi=None support either, although it does complicate the change. IMO allowing hi=None could be done as a second step. |
I tried myself at Argument Clinic custom converters to create the "optional ssize_t converter" and it works, so as advised by Raymond, pydoc now shows None, and the C implementation now allows for both default values None and -1, as the custom converter returns -1 for none. |
Julien, the syntax converters look pretty clever. Do we need AC |
Stefan Krah added the comment:
Please see previous comments for advantages of AC (signature object, |
Signature and docstring can be done manually with very little effort. |
If adding proper support for hi=None, maybe lo=None should also be supported. Also, I would think the main Doc/library/bisect.rst documentation needs updating, and a test and What’s New entry added. |
That would be gratuitous. Lo already has a reasonable, useful, and self-explanatory value. This would add more complexity to the signature while reducing clarity. I really don't want to see calls like bisect(arr, x, None). |
Fair enough, I don’t really mind if it is (lo=0, hi=None). I think I have only used bisect with both defaults anyway. |
Hi Martin, Historically (lo=0, hi=None) was the implementation (in Python), until the C implementation, which used (lo=0, hi=-1). As my implementation allows for -1 and None, I don't think I'm breaking something. |
I was able to simplify my patch a bit, I think I should also add a test to ensure we keep the hi=-1 and hi=None compatibility in the future. |
Added a test to ensure compatibility of both hi=None (introduced in original Python version) and hi=-1 (Introduced by the C version). Modified Python version to be compatible with the C-introduced hi=-1, so that the new test pass. |
Maybe wait for bpo-28933. |
FWIW, pypy uses defaults of lo=0, hi=None. |
Hi Larry, In any cases it looks like supporting hi=-1 and hi=None is mandatory: hi=None is the current implementation in the bisect (Python) module.
hi=-1 is the current implementation in the _bisect (C) module. Both are currently living together, the C version takes over the Python version when available, but the Python version is older, so both -1 and None has been the "current implementation" during some time. Raymond legitimately fear that someone, somewhere, used -1 in some code. We should not break it (even if it had never been documented). So let's just accept both for the moment, and if deprecation of one or another should be discussed, let's do it later, in another issue. I'm _just_ trying to make AC move forward here, without breaking anything. |
Okay, but do it with a converter function, not by changing the default behavior for Py_ssize_t. The vast majority of uses of Py_ssize_t should not accept None. |
Why does this need a converter function? I there is any reason this can't use "O" with "hi" defaulting to None (matching the pure python API) and letting the C function handle both -1 and None in the implementation rather than in AC? |
Hi Raymond, I don't like having the converters in the C implementation too, that's why I'm working on bpo-28933 to clean this.
It works, yes. But I prefer to clearly split responsibilities: AC being responsible of adapting argument from PyObjects to C types, and C functions being responsible of ... doing their job. If the idea in bpo-28933 is accepted, we'll be able to declare hi as simply as:
meaning "C function will get a Py_ssize_t, default value for C code is -1, None is documented, and None can be passed to get the C default value", that's this last point "None can be passed to get the C default value" that I introduce in bpo-28933. With this syntax, both C converters and the python hi_parameter_converter can be dropped. |
We don't *have* to use a converter function, no. But consider: someone somewhere will have to convert a PyObject * into a Py_ssize_t, also accepting None. Doing it in a converter function means we can easily share the code with bisect_right, which should presumably behave the same way. It also keeps all the argument parsing logic in the Argument Clinic domain instead of having some in AC and some in the implementation. Is there some drawback to using a converter function? |
As the pattern of this converter is not widely used, I'll let it in the code of _bisect for the moment, see: http://bugs.python.org/issue28933. |
Now |
Wouldn't it be easily to let the OP apply AC to some other module. Right now, it isn't a very good fit and was this a tough one to start with. |
After more thought, I'm going to close this one because the Argument Clinic and signature objects aren't yet a good fit for this API. The bisect module isn't broken (its API has been successfully living in the wild for a very long time). I do welcome Argument Clinic patches where there is a good fit but don't think this is the place. Each of the proposed patches makes the module worst in one way or another. |
FWIW, the itertools module has many functions that can have the ArgumentClinic applied without any fundamental problems (pretty much all of the functions will work except for repeat(), accumulate(), and islice()). |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: