classification
Title: Preserve original representation for integers / floats in docstrings
Type: enhancement Stage:
Components: Interpreter Core Versions: Python 3.11
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Arfrever, BTaskaya, Scott Stevens, barry, carsten.klein@axn-software.de, eric.araujo, ezio.melotti, georg.brandl, larry, pablogsal, r.david.murray, rhettinger, serhiy.storchaka, terry.reedy
Priority: normal Keywords:

Created on 2012-12-28 13:04 by larry, last changed 2021-08-20 10:10 by mark.dickinson.

Messages (35)
msg178383 - (view) Author: Larry Hastings (larry) * (Python committer) Date: 2012-12-28 13:04
The line declaring the function dbm.open looks like this:
    def open(file, flag='r', mode=0o666):

The docstring for dbm.open looks like this:
    open(file, flag='r', mode=438)

Obviously 438==0o666.  But the author used the octal representation because it's more readable.  Unfortunately Python throws that enhanced readability away when it round-trips the rvalue from a string into an integer and back into a string again for the docstring.

It might be an improvement if Python preserved the original source code's representation for integer (and perhaps float) default arguments for parameters.  I haven't looked at the code that does the parsing / builds the docstring, but I suspect we could hang the original representation on the AST node and retrieve it when building the docstring.

The only problem I can forsee: what about code that uses local variables, or computation including perhaps function calls, to calculate default values?  On the one hand, the local variable or the function call may be inscrutable--on the other, perhaps the magic integer value it replaced was no better.  Or we could have a heuristic, like if the original representation contains internal spaces or parentheses we use str(rvalue), otherwise we use the original representation.
msg178384 - (view) Author: Larry Hastings (larry) * (Python committer) Date: 2012-12-28 13:08
(I was also considering proposing using annotations to tell the parser we want the original representation in the docstring, but I suspect that's a bad idea.  That would instantly restrict the untamed frontier that is annotations.)
msg178385 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2012-12-28 14:53
It's an interesting idea.  This sounds like the wrong solution to me, though:  it's significant extra machinery to produce a solution that only fixes a small handful of cases;  IOW, the benefit / cost ratio seems to small to make this worth it.  E.g., apart from the function calls that you mention, what about expressions?  "-0x8000" isn't a numeric literal, so the 'original representation' information attached to "0x8000" will have been lost.

I'm also sceptical that this can be done as simply as you describe:  isn't the AST no longer available at the time that the docstring is built?

Perhaps what we need instead is a general mechanism to override the generated signature line?
msg178392 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2012-12-28 16:49
I don't think you mean 'docstring'.  A docstring is something a human writes in the source code.  I presume you are actually talking about introspection of the signature here.  Beyond that, I agree with Mark's comments.
msg178397 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2012-12-28 17:09
BTW, in case it saves anyone else some time, the current machinery is in Lib/pydoc.py, in the `TextDoc.docroutine` method.  It uses inspect.getfullargspec to retrieve the information to format, though I guess using inspect.Signature would be the modern way to do this.
msg178398 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2012-12-28 17:10
Bah.  s/inspect.Signature/inspect.signature/
msg178457 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2012-12-29 01:26
David is correct

>>> dbm.open.__doc__
"Open or create database at path given by *file*.\n\n    Optional argument *flag* can be 'r' (default) for read-only access, 'w'\n    for read-write access of an existing database, 'c' for read-write access\n    to a new or existing database, and 'n' for read-write access to a new\n    database.\n\n    Note: 'r' and 'w' fail if the database doesn't exist; 'c' creates it\n    only if it doesn't exist; and 'n' always creates a new database.\n    "
>>> help(dbm.open)
Help on function open in module dbm:

open(file, flag='r', mode=438)
    Open or create database at path given by *file*.
...    

IDLE tooltip (still using inspect.getfullargspec) also shows
open(file, flag='r', mode=438)
The int comes from dbm.open.__defaults__[1]
438
msg178470 - (view) Author: Larry Hastings (larry) * (Python committer) Date: 2012-12-29 03:35
Okay, counter-proposal time.  We add a new field to the Parameter object, the preferred string representation of the default.  If the parameter has a default, it is always a string, by default repr(parameter_default_value); if the parameter has no default then it is None.

You can then override the default:

    @inspect.use_original_representation('mode')
    def open(file, flag='r', mode=0o666):

And if os.open were supplied in os.py:

    @inspect.override_string_representation('mode',
        'os.O_CREAT | os.O_RDWR')
    def open(file, flags, mode=0o777, *, dir_fd=None):

(p.s. I know the mode argument there is wrong, it has a million things in it and this is just to get the idea across.)

Then the docstring generator, IDLE, etc. would be told "Please use this new field of the Parameter object when displaying the default to the user."
msg178474 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2012-12-29 05:35
If you wish to pursue this, I suggest starting with 'the simplest thing that works' for the text cases at hand. They all involve 'mode' and you have not presented and I cannot think of other cases. So somewhere in the signature generation code:
  if function name in <set> and parameter_name == 'mode':
    replace_decimal_with_octal(parameter_default)

If more generality is really needed, pick a new reserved attribute for functions and set it at the time of definition.
    def open(file, flag='r', mode=0o666):
    open.__param_rep__ = {'mode': 'octal'} #or whatever is chosen

I suppose the advantage of adding the syntactic sugar of a decorator, after getting the above to work, is that the doc could be hidden away in the inspect model, where is would be easily ignored.\

Still, this does seem like a lot of 'noise' for a small bit of extra 'signal' increment.
msg178487 - (view) Author: Carsten Klein (carsten.klein@axn-software.de) Date: 2012-12-29 12:59
The problem with this is that at the time that pydoc gets the information via inspect, the numbers have already been parsed as long or double and the original notation is no longer available.

This is due to the fact that during build of the AST node for the NUMBER type, the value will already be deserialized into its machine representation, which is either long or double.

The only way to preserve that information would be to extend the NUM_type with an additional 's' field which then would preserve its original notation and which can be retrieved from the AST.

pydoc, however, would still fail as it does not use the AST. In order to restore the original information, pydoc must then source the original file or source of the function or class method and parse it using the AST.

A much simpler approach would be to simply get the function or method source and extract its formal parameter list using for example a regular expression.

However, preserving the original notation in the runtime is not required and shouldn't be done.
msg178488 - (view) Author: Carsten Klein (carsten.klein@axn-software.de) Date: 2012-12-29 13:03
Here are some links into the sources:

Python/ast.c, ast_for_atom(), line 1872ff.
Python/ast.c, parsenumber(), line 3632ff.
msg178490 - (view) Author: Carsten Klein (carsten.klein@axn-software.de) Date: 2012-12-29 13:12
However, hinting inspect to use a different format when serializing the default values for existing keyword parameters of methods or functions
seems to be a good idea and +1 by me for that.

Personally, I'd rather have the decorator based solution than having to manually add additional fields to a given method or function.
msg178510 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-12-29 16:38
> And if os.open were supplied in os.py:
>
>     @inspect.override_string_representation('mode',
>         'os.O_CREAT | os.O_RDWR')
>     def open(file, flags, mode=0o777, *, dir_fd=None):

Other use case is a sentinel default. "foo(arg={})" looks better than "foo(arg=<object object>)" for function which use a sentinel idiom:

    _sentinel = object()
    def foo(arg=_sentinel):
        if arg is _sentinel:
            arg = {}
        ...

Sometimes full signature overwriting is needed (see for example Python implementation of operator.methodcaller() in issue16694).
msg178519 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2012-12-29 18:22
A simple, minimal-invasive solution would be to allow a signature for documentation purposes as the first line of the docstrings.

pydoc could recognize this (if docstring.startswith(func.__name__ + '(') or something like that), and display the given signature instead of the introspected one.
msg178521 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-12-29 18:41
> pydoc could recognize this (if docstring.startswith(func.__name__ + '(') or something like that), and display the given signature instead of the introspected one.

Looks good for me.
msg178542 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2012-12-29 22:43
It seems to me that the real issue is not to preserve the original representation. What if the original author specified mode as 438 or calculated it as 0o600|0o60|0o6 ? He might and we should still like to see it as 0o666.

So the real issue is to specify the representation on retrieval. We already have a mechanism for that: subclasses that override __str__ and __repr__! Moreover, that mechanism works for all accesses that do not use an explicit format, not just those functions that are re-written to use some redundant new machinery. It also allows display representations that would *not* be legal input syntax and thus could *not* be the original representation. Two examples:

class octint(int):
    'int that displays as octal'
    def __str__(self):
        return oct(self)
    __repr__ = __str__

mode = octint(0o644)
print(mode)

class flags4(int):
    'int that displays as 4 binary flags'
    def __str__(self):
        return '|{:04b}|'.format(self)
    __repr__ = __str__

a = flags4(8)
b = flags4(3)
print(a, b, flags4(a|b))

def f(mode=octint(0o666), flags = flags4(0b1011)): pass

print(f.__defaults__)
import inspect
print(inspect.formatargspec(*inspect.getfullargspec(f)))

# prints
0o644
|1000| |0011| |1011|
(0o666, |1011|)
(mode=0o666, flags=|1011|)

So I think this issue should be changed to 'Add octint int subclass to stdlib and use it for default file modes'. The inspect module could be a place to put it.
msg178647 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2012-12-31 07:30
20+ years of Python success suggest this isn't a problem that needs solving.  AFAICT, other languages haven't found a need to preserve number representation either.  Likewise, Linux itself doesn't preserve the original form of a chmod call.  I recommend closing this one.
msg178663 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2012-12-31 11:02
> 20+ years of Python success suggest this isn't a problem that needs solving.

That reasoning could be applied to almost all open tracker issues.

> Likewise, Linux itself doesn't preserve the original form of a chmod call.

Where would/could it do so?  C has no introspection facility equivalent to pydoc, which is discussed here.  In the Linux manual pages, octal literals are used.  Introspective tools like "strace" also display octal literals when tracing *chmod calls.

That said, I agree that this is not an issue worth solving just because of octal literals.  But there are more cases in which the actual signature doesn't represent the best way to document the function API, and if a simple solution can be found it would not be different from fixing a minor annoyance elsewhere in Python.
msg178843 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2013-01-02 19:59
Either repeating the default value in the text with the desired form (and leave it "wrong" in the signature) or what Georg suggested in msg178519 sound good to me.  I don't think anything more than that is necessary to solve this issue.
msg178856 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2013-01-02 22:13
A subclass with a custom representation, as I suggested above, is even simpler and involves no change to inspect or docstring conventions. I otherwise agree with closing this.
msg178931 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-01-03 10:13
> A subclass with a custom representation, as I suggested above, is even
> simpler and involves no change to inspect or docstring conventions.

Agree, but this is a particular and cumbersome solution.

I open new issue16842 for docstring conventions.
msg181497 - (view) Author: Larry Hastings (larry) * (Python committer) Date: 2013-02-06 04:48
Georg: what other functions do you know of where (as you suggest) the signature could be improved?
msg181504 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2013-02-06 09:04
For example, any function where an argument has a "sentinel" object as the default value, such as socket.create_connection().
msg182754 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2013-02-23 16:02
Note: Nick Coghlan's idea of having Named Value support in Python just came up again on python-dev, and this would be a perfect application for it (pretty much exactly Terry's proposal in msg178542).
msg182816 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2013-02-23 20:27
Serhiy called subclassing 'particular and cumbersome'.
'Cumbersome' is an opinion. I consider subclassing elegant. The ease of doing so and specializing only what one needs to is a major feature of Python. It only took me a couple of minutes to whip up solutions for two different cases.

I think 'particular' is wrong. Subclassing is a general solution for a particular class of values. As I illustrated, it results in the value getting its custom representation *everywhere*. Other solutions seem to give it its custom representation only in the particular context of standard signature displays, and its 'regular' not-so-good representation when directly accessed. To me, that is much worse.
msg182862 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-02-24 09:19
It's cumbersome and burdensome because you need for every nonstandard default value:

1) define a class with __repr__();
2) instantiate a sentinel;
3) check for the sentinel in the function and replace it but an actual value.

    class _ExternalAttrDefault:
        def __repr__():
            return '(stat.S_IRUSR|stat.S_IRUSR)<<16'

    _external_attr_default = _ExternalAttrDefault()

    def open(self, name, mode='r', external_attr=_external_attr_default):
        if external_attr is _external_attr_default:
            external_attr = (stat.S_IRUSR|stat.S_IRUSR)<<16
        ...

Instead of just:

    def open(self, name, mode='r', external_attr=(stat.S_IRUSR|stat.S_IRUSR)<<16):
        """
        Foo.open(name, mode='r', external_attr=(stat.S_IRUSR|stat.S_IRUSR)<<16)
        """
        ...
msg182863 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-02-24 09:32
It's particular because there are functions whose signatures can't be expressed with valid Python syntax (dict, range, operator.methodcaller, many curses functions). They required other solution and this more general solution is applicable for the problem of nonstandard representation of default values.
msg182908 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2013-02-25 00:40
Your indeed cumbersome and overly specific process has almost nothing to do with what I proposed and gave two examples of. Please reread and understand msg178542 before you criticize it.

Class octint pythonically solves the octal mode display problem that started this issue, and in my opinion about as elegantly as possible.
msg182910 - (view) Author: Larry Hastings (larry) * (Python committer) Date: 2013-02-25 01:17
FWIW I think the octint class is a great idea.  It's nice and localized, and it should have no performance impact and only a small maintenance impact.  It'll also preserve the readability of the default if you pull it out with inspect.getfullargspec / inspect.Signature and repr it.  I'm not sure how IDLE et all produce their tooltips, but whatever technique they use octint should work there.  Heck, it should even survive calculations for default values (as I mentioned in the original post), assuming that int + octint = octint.

Is there a significant downside to octint that I'm missing?
msg182921 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2013-02-25 06:08
Sorry, as with all similar subclasses, class op subclass = class unless one explicitly subclasses the answer or overrides the __op__ method to do that for you.

class octint(int):
    'int that displays as octal'
    def __str__(self):
        return oct(self)
    __repr__ = __str__

mode = octint(0o640)
print(mode + 4, octint(mode+4))

def __add__(self, other):
    return octint(int.__add__(self, other))
    # octint(self+other) is infinite recursion 

octint.__add__ = __add__
print(mode+4)
>>> 
420 0o644
0o644
msg381023 - (view) Author: Batuhan Taskaya (BTaskaya) * (Python committer) Date: 2020-11-15 18:27
I'm very late to join this thread, but since the Constant() node has now a kind field, we can possibly add this (though I'm not saying we should, depending on whether still this would be a nice addition to clinic docstrings.)

2 options that I can think off:
- Introduce a couple of new tokens (BIN_NUMBER, OCT_NUMBER, HEX_NUMBER). 
- Add a new field to the tok_state struct (like const char* number_type) and when constructing the Constant node in the _PyPegen_number_token, add that number_type as the kind field of the constant.

In case of anyone wondering, the latter would be a +20 lines addition (no changes on the grammar / tokens, just a couple of new lines into the tokenizer and the _PyPegen_number_token.)
msg396721 - (view) Author: Scott Stevens (Scott Stevens) Date: 2021-06-29 12:19
I'm now seeing docs.python.org has regressed. For 3.9, calls present their defaults in octal, in 3.10 (beta), they're presented in decimal.

https://docs.python.org/3.10/library/pathlib.html#pathlib.Path.touch

https://docs.python.org/3.10/library/os.html#os.mkdir

Not sure if this is the right issue to be mentioning it on; if not, please let me know so I can file another issue.
msg396728 - (view) Author: √Čric Araujo (eric.araujo) * (Python committer) Date: 2021-06-29 13:49
Please open a ticket!  This one is about docstrings and pydoc; docs.python.org is built from rst docs with sphinx.
msg396745 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2021-06-29 16:19
I suspect the difference is due to a change in the way that Sphinx handles the py:method directive: the rst source for `pathlib.Path.touch` hasn't changed between 3.9 and 3.10, but the 3.9 docs are built with Sphinx 2.4.4, while the 3.10 docs are built with Sphinx 3.2.1.
msg396746 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2021-06-29 16:31
FWIW, the relevant change seems to be this one: https://github.com/sphinx-doc/sphinx/pull/7155
History
Date User Action Args
2021-08-20 10:10:09mark.dickinsonsetnosy: - mark.dickinson
2021-06-29 16:31:15mark.dickinsonsetmessages: + msg396746
2021-06-29 16:19:25mark.dickinsonsetmessages: + msg396745
2021-06-29 13:49:35eric.araujosetmessages: + msg396728
versions: + Python 3.11, - Python 3.4
2021-06-29 12:19:27Scott Stevenssetnosy: + Scott Stevens
messages: + msg396721
2020-11-15 18:27:18BTaskayasetnosy: + pablogsal
2020-11-15 18:27:00BTaskayasetnosy: + BTaskaya
messages: + msg381023
2014-04-21 05:34:55eric.araujosetnosy: + eric.araujo
2013-02-25 06:08:47terry.reedysetmessages: + msg182921
2013-02-25 01:17:57larrysetmessages: + msg182910
2013-02-25 00:40:44terry.reedysetmessages: + msg182908
2013-02-24 09:32:59serhiy.storchakasetmessages: + msg182863
2013-02-24 09:19:56serhiy.storchakasetmessages: + msg182862
2013-02-23 20:27:20terry.reedysetmessages: + msg182816
2013-02-23 16:35:45barrysetnosy: + barry
2013-02-23 16:02:19r.david.murraysetmessages: + msg182754
2013-02-06 09:04:59georg.brandlsetmessages: + msg181504
2013-02-06 04:48:03larrysetmessages: + msg181497
2013-01-03 10:13:20serhiy.storchakasetmessages: + msg178931
2013-01-02 22:13:10terry.reedysetmessages: + msg178856
2013-01-02 19:59:00ezio.melottisetmessages: + msg178843
2012-12-31 11:02:49georg.brandlsetmessages: + msg178663
2012-12-31 07:30:22rhettingersetnosy: + rhettinger
messages: + msg178647
2012-12-31 01:11:54ezio.melottisetnosy: + ezio.melotti
2012-12-29 22:43:17terry.reedysetmessages: + msg178542
2012-12-29 18:41:08serhiy.storchakasetmessages: + msg178521
2012-12-29 18:22:57georg.brandlsetnosy: + georg.brandl
messages: + msg178519
2012-12-29 16:38:54serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg178510
2012-12-29 13:12:41carsten.klein@axn-software.desetmessages: + msg178490
2012-12-29 13:03:20carsten.klein@axn-software.desetmessages: + msg178488
2012-12-29 12:59:04carsten.klein@axn-software.desetnosy: + carsten.klein@axn-software.de
messages: + msg178487
2012-12-29 05:35:20terry.reedysetmessages: + msg178474
2012-12-29 05:30:53Arfreversetnosy: + Arfrever
2012-12-29 03:35:15larrysetmessages: + msg178470
2012-12-29 01:26:51terry.reedysetnosy: + terry.reedy
messages: + msg178457
2012-12-28 17:10:26mark.dickinsonsetmessages: + msg178398
2012-12-28 17:09:38mark.dickinsonsetmessages: + msg178397
2012-12-28 16:49:53r.david.murraysetnosy: + r.david.murray
messages: + msg178392
2012-12-28 14:53:22mark.dickinsonsetnosy: + mark.dickinson
messages: + msg178385
2012-12-28 13:08:04larrysetmessages: + msg178384
2012-12-28 13:04:13larrycreate