classification
Title: Argument Clinic should use a non-error-prone syntax to mark text signatures
Type: behavior Stage: resolved
Components: Versions: Python 3.4
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: larry Nosy List: barry, brett.cannon, gennad, gvanrossum, jkloth, larry, ncoghlan, python-dev, scoder, serhiy.storchaka, skrah, yselivanov, zach.ware
Priority: normal Keywords: patch

Created on 2014-01-21 10:05 by larry, last changed 2014-02-02 07:22 by scoder. This issue is now closed.

Files
File name Uploaded Description Edit
larry.sig=.marker.for.signatures.diff.1.txt larry, 2014-01-26 12:36 review
larry.sig=.marker.for.signatures.diff.2.diff larry, 2014-01-28 08:28 review
larry.sig=.marker.for.signatures.diff.3.diff larry, 2014-01-28 09:27 review
Messages (24)
msg208634 - (view) Author: Larry Hastings (larry) * (Python committer) Date: 2014-01-21 10:05
Sorry this is so long--but I wanted to make my point.  Here's the tl;dr summary.

The problem: The syntax used for Argument-Clinic-generated text
signatures for builtins means CPython mistakenly identifies
hand-written, unparsable pseudo-signatures as legitimate
signatures.  This causes real, non-hypothetical problems.

I think we should change the syntax to something people would
never write by accident.  Here are some suggestions:

"*("
"*clinic*("
"\01 clinic("

--

A quick recap on how signature information for builtins works.

The builtin's docstring contains the signature, encoded as text using
a special syntax on the first line.  CPython callables always have
getters for their __doc__ member; the doc getter function examines
the first line, and if it detects a signature it skips past it and
returns the rest.  CPython's new getter on callables __text_signature__
also look at the internal docstring.  If it detects a signature it
returns it, otherwise it returns None.

inspect.signature then retrieves __text_signature__, and if ast.parse()
parses it, it populates the appropriate Signature and returns that.
And then pydoc uses the Signature object to print the first line of
help().


In #19674 there was some discussion on what this syntax should be.
Guido suggested they look like this:

   functionname(args, etc)\n    

He felt it was a good choice, and pointed out that Sphinx autodoc
uses this syntax.  (Not because using this syntax would help
Sphinx--it won't.  Just as a "here's how someone else solved
the problem" data point.)


__doc__ and __text_signature_ aren't very smart about detecting
signatures.  Here's their test in pseudo-code:
    if the first N bytes match the name of the function,
    and the N+1th byte is a left parenthesis,
    then it's assumed to be a valid signature.

--

First, consider: this signature syntax is the convention docstrings
already use.  Nearly every builtin callable in Python has a hand-written
docstring that starts with "functionname(".

Great!, you might think, we get signatures for free, even on functions
that haven't been converted to Argument Clinic!

The problem is, many of these pseudo-signatures aren't proper Python.
Consider the first line of the docstring for os.lstat():

"lstat(path, *, dir_fd=None) -> stat result\n"

This line passes the "is it a text signature test?", so __doc__
skips past it and __text_signature__ returns it.  But it isn't
valid actually valid.  ast.parse() rejects it, so inspect.signature
returns nothing.  pydoc doesn't get a valid signature, so it prints
"lstat(...)", and the user is deprived of the helpful line
handwritten by lstat's author.

That's bad enough.  Now consider the first *two* lines of the
docstring for builtin open():

"open(file, mode='r', buffering=-1, encoding=None,\n"
"     errors=None, newline=None, closefd=True, opener=None) -> file object\n"

__doc__ clips the first line but retains the second.  pydoc prints
"open(...)", followed by the second line!  Now we have the problem
reported in #20075: "help(open) eats first line".


Both of these problems go away if I add one more check to the
signature-detecting code: does the line end with ')'?  But that's
only a band-aid on the problem.  Consider socket.accept's
docstring:

"_accept() -> (integer, address info)\n"

Okay, so __doc__ and __text_signature__ could count parentheses
and require them to balance.  But then they'd have to handle strings
that contain parentheses, which means they'd also have to understand
string quoting.

And there would *still* be handwritten docstrings that would pass
that test but wouldn't parse properly.  Consider bisect.insort_right:

"insort_right(a, x[, lo[, hi]])\n"

We could only be *certain* if we gave up on having two parsers.
Write the signature-recognizer code only once, in C, then call that
in __doc__ and __text_signature__ and inspect.signature().  But that
seems unreasonable.

Okay, so we could attack the problem from the other end.  Clean
up all the docstrings in CPython, either by converting to Argument
Clinic or just fixing them by hand.  But that means that
*third-party modules* will still have the mysterious problem.

Therefore I strongly suggest we switch to a syntax that nobody will
ever use by accident.

Have I convinced you?
msg208656 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2014-01-21 15:40
You have convinced me.
msg208683 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2014-01-21 18:33
Larry Hastings <report@bugs.python.org> wrote:
> I think we should change the syntax to something people would
> never write by accident.  Here are some suggestions:
> 
> "*("
> "*clinic*("
> "\01 clinic("

I like the original "def (...)\n" approach from #19674.  If that is not
possible for some reason, "*(" is fine, too.
msg208685 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-01-21 18:57
What if the __text_signature__ and __doc__ getter will call ast.parse() (actually compile()) on signature candidate? If it fails, then builtin has no signature, the __text_signature__ getter returns '', and the __doc__ getter returns all original docstring.
msg208708 - (view) Author: Larry Hastings (larry) * (Python committer) Date: 2014-01-21 21:49
Serhiy: I'm going to add PEP 457 features to the text signature, because
without them the inspect.Signature objects will be wrong.  So __doc__ and __text_signature__ would have to remove those first before handing the string to ast.parse.

I wasn't seriously proposing that __doc__ and __text_signature__ parse the string. That would obviously be a waste of CPU.  I was simply taking the argument to its logical extreme.
msg208788 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2014-01-22 12:08
Right, at the very least we want to handle positional only arguments (since PEP 362 also handles those).

My one concern with using "def " as the prefix is the fact it's not actually Python syntax.

How do you feel about using "sig: " as the prefix? That would still read sensibly even if you somehow got hold of the unaltered C level docstring rather than going through the properties.
msg208789 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2014-01-22 12:13
+1 for "sig: ".
msg208790 - (view) Author: Larry Hastings (larry) * (Python committer) Date: 2014-01-22 12:14
The fact that "def " isn't Python syntax is, if anything, a mild point in its favor.  I don't want anyone hitting on this by accident.  "sig:" isn't Python syntax either, and yet you yourself propose it ;-)

How about "sig="?
msg208796 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2014-01-22 12:57
That wasn't quite what I meant. "def (a, b, c)" *looks* like Python syntax (aside from the missing function name), but "def (a, b, c, /)" does not. So I consider "def " a misleading prefix.

By contrast, neither of these looks like it is trying to be a valid function header, while still hinting strongly that it is signature related:

"sig: (a, b, c)"
"sig: (a, b, c, /)"

I would also be fine with "sig=" (since humans shouldn't be reading this regardless):

"sig=(a, b, c)"
"sig=(a, b, c, /)"
msg209301 - (view) Author: Larry Hastings (larry) * (Python committer) Date: 2014-01-26 12:36
Here's a first cut at a patch.  All signatures now start with "sig=(".

I also added a special marker: if the first parameter starts with "$", we know for certain it's a "self" (or "module" or "type") parameter.  This means we can lose the heuristics for "do we have a self parameter?", making inspect.Signature a little more bullet-proof.  "$" was chosen as it isn't a legal token in Python.
msg209308 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2014-01-26 13:23
I like the "sig=" and the "$".  There seems to be a small glitch in __rdivmod__:

help(int.__rdivmod__)

__rdivmod__(<self>, value)
    sig=($self, value)
    Returns divmod(value, self).

The sig line is shown (and the preferred form is the imperative "Return divmod").
msg209311 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2014-01-26 13:44
Stefan actually picked up on an existing bug there, that this patch just changes the spelling of:

>>> int.__rdivmod__.__doc__
'__rdivmod__(self, value)\nReturns divmod(value, self).'
>>> int.__rdivmod__.__text_signature__
'(self, value)'

When reviewing Larry's typeobject.c patch, both Guido and I missed the fact that some of the "slot" macros have implicit lines in their docstrings if the signature is common across all instances of that slot:

#define UNSLOT(NAME, SLOT, FUNCTION, WRAPPER, DOC) \
    ETSLOT(NAME, as_number.SLOT, FUNCTION, WRAPPER, \
           NAME "(self)\n" DOC)
#define IBSLOT(NAME, SLOT, FUNCTION, WRAPPER, DOC) \
    ETSLOT(NAME, as_number.SLOT, FUNCTION, WRAPPER, \
           NAME "(self, value)\nReturns self" DOC "value.")
#define BINSLOT(NAME, SLOT, FUNCTION, DOC) \
    ETSLOT(NAME, as_number.SLOT, FUNCTION, wrap_binaryfunc_l, \
           NAME "(self, value)\nReturns self" DOC "value.")
#define RBINSLOT(NAME, SLOT, FUNCTION, DOC) \
    ETSLOT(NAME, as_number.SLOT, FUNCTION, wrap_binaryfunc_r, \
           NAME "(self, value)\nReturns value" DOC "self.")
#define BINSLOTNOTINFIX(NAME, SLOT, FUNCTION, DOC) \
    ETSLOT(NAME, as_number.SLOT, FUNCTION, wrap_binaryfunc_l, \
           NAME "(self, value)\n" DOC)
#define RBINSLOTNOTINFIX(NAME, SLOT, FUNCTION, DOC) \
    ETSLOT(NAME, as_number.SLOT, FUNCTION, wrap_binaryfunc_r, \
           NAME "(self, value)\n" DOC)

For those, we need to change the macro and then remove the redundant info from the individual slot definitions.
msg209312 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2014-01-26 13:47
Aside from that glitch, the new scheme looks good to me. I also like the fact it allows the text signature to be included in the docstring normally if inspect.Signature doesn't support it, so help(f) can still be generated automatically.
msg209313 - (view) Author: Larry Hastings (larry) * (Python committer) Date: 2014-01-26 13:55
Nick: you lost me.  Change the macro how?  What is the redundant info?  I already changed those macros so they generate signatures, both currently in trunk and in my patch.
msg209314 - (view) Author: Larry Hastings (larry) * (Python committer) Date: 2014-01-26 13:57
Stefan: I made the change.  I left "Implements <something>" alone as "Implement <something>" sounded wrong, but all the rest dropped their s's.
msg209319 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2014-01-26 14:15
> What is the redundant info?

{"__rdivmod__", __builtin_offsetof (PyHeapTypeObject, as_number.nb_divmod), (void *)(slot_nb_divmod), wrap_binaryfunc_r, "sig=($self, value)\n" "sig=($self, value)\nReturns divmod(value, self)."}

There are two "sig" instances. What Nick said is that the situation was pretty
much the same before the "sig" patch and should be fixed regardless:

{"__rdivmod__", __builtin_offsetof (PyHeapTypeObject, as_number.nb_divmod), (void *)(slot_nb_divmod), wrap_binaryfunc_r, "__rdivmod__" "(self, value)\n" "__rdivmod__(self, value)\nReturns divmod(value, self)."}
msg209324 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2014-01-26 14:34
Right, the macros I quoted all include the signature automatically
(because it's consistent) and *then* the DOC that is passed in. So
when you use them, the signature should be left out of the slot
definition itself. Currently, we're defining it in both places, hence
the duplication.
msg209509 - (view) Author: Larry Hastings (larry) * (Python committer) Date: 2014-01-28 08:28
Attached is a second patch.

* Now includes input and output checksums.  Checksums are now
  truncated to 16 characters each, otherwise the line is >80 columns.

* Fixes the doubled-up signature lines for type object slot default
  signatures.  I ran "gcc -E" and wrote a quick script to print out
  all lines with doubled-up signatures.  There were only two: divmod
  and rdivmod.

* Pretty sure this was in the first patch, but just thought I'd mention
  it: for functions using optional groups, we can't generate a legal
  signature.  So Clinic kicks out the name of the function instead
  of "sig=", meaning that it puts back the docstring first line
  for human consumption!  I am so clever, tee hee.
msg209518 - (view) Author: Larry Hastings (larry) * (Python committer) Date: 2014-01-28 09:27
I'm surprised it made a review link.  It didn't apply cleanly for me here.

While merging I noticed that the imperative declension fix had snuck out of the diff somehow.  So I redid that.

Attached is an updated patch.


Also I should mention: clinic.py currently accepts both the old and new comment format.  I'll leave support for the old one in until just before the last release candidate.
msg209541 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2014-01-28 12:47
Looks good to me :)

I also like the fact it simplifies the internal APIs by making it really trivial to detect the presence of a clinic signature from C.
msg209542 - (view) Author: Roundup Robot (python-dev) Date: 2014-01-28 13:00
New changeset d6311829da15 by Larry Hastings in branch 'default':
Issue #20326: Argument Clinic now uses a simple, unique signature to
http://hg.python.org/cpython/rev/d6311829da15
msg209543 - (view) Author: Larry Hastings (larry) * (Python committer) Date: 2014-01-28 13:01
Yeah.  I did a pretty terrible job of articulating why the "<fn_name>(" signature was a bad idea in the first place ;-)
msg209709 - (view) Author: Stefan Behnel (scoder) * Date: 2014-01-30 08:05
I stumble over this because I had already adapted our doctests in Cython to the changed Py3.4 behaviour, so they started failing now because the automatic signature extraction was essentially reverted in CPython.

I had started to consider it a feature that CPython 3.4 was finally smart enough to pick up signatures from docstrings, at least for documentation purposes, much the same way that tools like epydoc or Sphinx do it. Cython has a feature to embed signatures for that reason, and so far, they happily ended up in "__text_signature__" in Py3.4.

I understand the problem that not all of these signatures are valid Python signatures. What Cython currently generates certainly isn't.

The new "sig=" is not supported by any of the existing documentation tools. Having Cython follow here would mean that they would no longer be able to read the signatures, and it's clearly more important for the time being to keep *them* working nicely. This change actually facilitates that, because it leaves the embedded signatures untouched, so that these tools can normally pick them up again. So I agree that the reverted behaviour is in fact better than what Py3.4 previously did.

FWIW, I think the best way to go forward would be to try to find a way to map Cython's C signatures directly to a reasonable version of a "__signature__" object. This hasn't appeared entirely trivial so far, but my guess is that the recent requirements on and improvements to the argument clinic should also have made this mapping a little less hard, because there are at least a bit of an infrastructure and some precedents around. Still, Cython's coverage of C/C++ types (also in signatures) is hugely broader than what ac would ever want to support, so we'll have to see what stumbling blocks remain on that road.
msg209955 - (view) Author: Stefan Behnel (scoder) * Date: 2014-02-02 07:22
I tried this in Cython and ISTM that the C level parser is a bit too forgiving:

    def sig(a, b):
        """sig=(a*b)"""
        return a * b
History
Date User Action Args
2014-02-02 07:22:10scodersetmessages: + msg209955
2014-01-30 08:05:44scodersetnosy: + scoder
messages: + msg209709
2014-01-28 13:01:14larrysetstatus: open -> closed
resolution: fixed
stage: needs patch -> resolved
2014-01-28 13:01:02larrysetmessages: + msg209543
2014-01-28 13:00:28python-devsetnosy: + python-dev
messages: + msg209542
2014-01-28 12:47:05ncoghlansetmessages: + msg209541
2014-01-28 09:27:10larrysetfiles: + larry.sig=.marker.for.signatures.diff.3.diff

messages: + msg209518
2014-01-28 08:31:56larrylinkissue20264 dependencies
2014-01-28 08:28:22larrysetfiles: + larry.sig=.marker.for.signatures.diff.2.diff
keywords: + patch
messages: + msg209509
2014-01-26 14:34:54ncoghlansetmessages: + msg209324
2014-01-26 14:15:48skrahsetmessages: + msg209319
2014-01-26 13:57:44larrysetmessages: + msg209314
2014-01-26 13:55:21larrysetmessages: + msg209313
2014-01-26 13:47:14ncoghlansetmessages: + msg209312
2014-01-26 13:44:17ncoghlansetmessages: + msg209311
2014-01-26 13:23:54skrahsetmessages: + msg209308
2014-01-26 12:37:08larrysetfiles: + larry.sig=.marker.for.signatures.diff.1.txt

messages: + msg209301
2014-01-25 08:09:07ncoghlanlinkissue20075 dependencies
2014-01-23 18:18:33yselivanovsetnosy: + yselivanov
2014-01-22 12:57:44ncoghlansetmessages: + msg208796
2014-01-22 12:14:15larrysetmessages: + msg208790
2014-01-22 12:13:00skrahsetmessages: + msg208789
2014-01-22 12:08:28ncoghlansetmessages: + msg208788
2014-01-21 21:49:45larrysetmessages: + msg208708
2014-01-21 18:57:34serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg208685
2014-01-21 18:33:27skrahsetmessages: + msg208683
2014-01-21 15:40:42gvanrossumsetmessages: + msg208656
2014-01-21 14:46:30jklothsetnosy: + jkloth
2014-01-21 10:05:49larrycreate