classification
Title: PyUnicode_FromFormat(): add %T format for an object type name
Type: Stage: resolved
Components: Interpreter Core Versions: Python 3.8
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: Nosy List: eric.smith, petr.viktorin, serhiy.storchaka, vstinner
Priority: normal Keywords: patch

Created on 2018-09-06 09:13 by vstinner, last changed 2018-09-19 22:48 by vstinner. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 9080 merged vstinner, 2018-09-06 09:19
PR 9103 closed vstinner, 2018-09-07 16:12
PR 9122 closed vstinner, 2018-09-09 07:43
PR 9187 merged vstinner, 2018-09-11 21:45
PR 9251 closed vstinner, 2018-09-13 00:32
Messages (17)
msg324675 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-09-06 09:13
The PEP 399 requires that C accelerator behaves exactly as Python, but a lot of C code truncates type name to an arbitrary length: 80, 100, 200, up to 500 (not sure if it's a number of bytes or characters).

Py_TYPE(obj)->tp_name is a common pattern: it would be nice to have a new "%T" format in PyUnicode_FromFormat(), so it can be used directly in PyErr_Format(), to format an object type name.

Attached PR implements the proposed %T format and modify unicodeobject.c to use it.

I propose to then write a second PR to modify all C code of CPython using Py_TYPE(obj)->tp_name to use the new %T type.
msg324676 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-09-06 09:23
My previous attempt to fix that issue, 7 years ago: bpo-10833 :-)

See also bpo-7330 where I implemented width and precision (ex: "%5.3s") in PyUnicode_FromFormat().
msg324677 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-09-06 09:26
I cannot find %T in printf() manual pages on Fedora 28 (Linux).

I can find it in the strftime() documentation:

   %T     The time in 24-hour notation (%H:%M:%S).  (SU)

But I don't think that it's an issue since printf() and strftime() formatters are exclusive, no? For example, strftime() %s means "number of seconds since the Epoch" whereas printf() %s means a "const char*" byte string.
msg324678 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-09-06 09:29
Oh, PyUnicode_FromFormat() has %A to format as ASCII, whereas printf() already has %A but for a different meaning:

   a, A   For a conversion, the double argument is converted to hexadecimal notation (using the letters abcdef) in the style [-]0xh.hhhhp+-; for A conversion the prefix  0X,  the  letters  ABCDEF, and the exponent separator P is used. (...)
msg324679 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2018-09-06 09:33
I don't think you have to worry about %T being used by other formatting functions. If (heaven forbid) dates were ever supported by PyUnicode_FromFormat(), there would have to be a way to switch from "normal" argument processing to argument-specific formatting specifiers, anyway.
msg324767 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-09-07 16:01
New changeset 886483e2b9bbabf60ab769683269b873381dd5ee by Victor Stinner in branch 'master':
bpo-34595: Add %T format to PyUnicode_FromFormatV() (GH-9080)
https://github.com/python/cpython/commit/886483e2b9bbabf60ab769683269b873381dd5ee
msg324792 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-09-07 21:03
IIRC there is similar issue or a discussion on one of mailing lists. But the idea about adding this feature on Python side too was considered. It would be better to find this discussion before pushing this change.

In some cases we have the type itself, not an instance. So it makes sense to make %T an equivalent of arg->tp_name instead of Py_TYPE(arg)->tp_name.

On Python side, you need to output either short name obj.__class__.__name__ or full qualified name obj.__class__.__module__ + '.' + obj.__class__.__qualname__ with exception that the module name should be omitted if it is 'builtins'.

    obj.__class__.__qualname__ if obj.__class__.__module__ == 'builtins' else f'{obj.__class__.__module__}.{obj.__class__.__qualname__}'

The case of the module name '__main__' can be handled specially too.
Obviously it is desirable to have a more handy way of writing such expression.

On C side, the problem is that tp_name means different, depending of the kind of the type. In some cases it is a short name, in other cases it is a full qualified name. It is not easy to write a code that produces the same output in Python and C. I have added a helper _PyType_Name() that helps to solve a part of these issues. If you want to output a short name (just cls.__name__ in Python), use _PyType_Name(cls) instead of cls->tp_name. But this doesn't help for the case when you need to output a full qualified name.
msg324801 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-09-07 21:45
> In some cases we have the type itself, not an instance. So it makes sense to make %T an equivalent of arg->tp_name instead of Py_TYPE(arg)->tp_name.

"arg->tp_name" is rare in the code base, whereas "Py_TYPE(arg)->tp_name" is a very common pattern.

I'm not sure that a formatter is needed for "arg->tp_name", you can already use "%s" with "arg->tp_name", no?

My intent was to make the C code less verbose, respect the PEP 399, but also indirectly avoid the implicit borrowed reference of Py_TYPE() :-)
https://pythoncapi.readthedocs.io/bad_api.html#borrowed-references


> On Python side, you need to output either short name obj.__class__.__name__ or full qualified name obj.__class__.__module__ + '.' + obj.__class__.__qualname__ with exception that the module name should be omitted if it is 'builtins'.

Sometimes, the qualified name would be more appropriate, but it's sometimes tricky to decide if the short name, qualified name or fully qualified name is the "right" name... So I chose to restrict this issue to the most common case, Py_TYPE(arg)->tp_name :-)

Ah, and changing strings is a risk of breaking the backward compatibility. For example, cause issue with pickle. So it should be discussed on a case by case basis.

Moreover, if you want to change a string, the Python code should be updated as well. I suggest to open a new issue to discuss that.

Don't get me wrong, I'm interested to do these changes, but it's a wider project :-)


> obj.__class__.__qualname__ if obj.__class__.__module__ == 'builtins' else f'{obj.__class__.__module__}.{obj.__class__.__qualname__}'
>
> The case of the module name '__main__' can be handled specially too.
> Obviously it is desirable to have a more handy way of writing such expression.

To be honest, I also considered to proposed a second formatter to do something like that :-) But as you explained, I'm not sure which name is the good name: qualified or fully qualified (with module name)?

First of all, would it help to have a *function* to get these names? Maybe we could first use such functions before discussing adding a new formatter in PyUnicode_FromFormat()?
msg324821 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-09-08 08:16
An alternative would be to add multiple formatters. Example:

* %Tn: type name, type.__name__, Py_TYPE(obj)->tp_name
* %Tq: qualified name, type.__qualname__
* %Ts: short name, never contains "."
* %Tf: fully qualified name, "module.qualified.name"

What do you think Serhiy?
msg324828 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-09-08 10:49
I think we need to handle only two cases: short and fully qualified names. __qualname__ without __module__ doesn't make sense, and the value of tp_name depends on implementation details (is it Python class or builtin class, heap class or dynamic class?). Maybe use %t and %T?

But we may want to support formatting the name of the type itself and the name of the object's type. This give us 4 variants.

For old string formatting we can introduce new % codes (with possible modifiers). But in modern string formatting "T" can have meaning for some types (e.g. for datetime). We can implement __format__ for the type type itself (though it can cause confusion if cls.__format__() is different from cls.__format__(instance)), but for formatting the name of  the object's type (as in your original proposition) we need to add a new  conversion flag like "!r".
msg324875 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-09-09 07:53
> I think we need to handle only two cases: short and fully qualified names. __qualname__ without __module__ doesn't make sense, and the value of tp_name depends on implementation details (is it Python class or builtin class, heap class or dynamic class?). Maybe use %t and %T?

Ok, I wrote PR 9122 to add %t format and modify %T format:

* %t <=> type(obj).__name__
* %T <=> f"{type(obj).__module__}.{type(obj).__qualname__}"


> But we may want to support formatting the name of the type itself and the name of the object's type. This give us 4 variants.

Again, I'm not sure about these ones. _PyType_Name() can be used for %t-like directly on a type. Later we can expose _PyType_FullName() (function that I added in my latest PR) as a private function for %T-like directly on a type.


> For old string formatting we can introduce new % codes (with possible modifiers). But in modern string formatting "T" can have meaning for some types (e.g. for datetime). We can implement __format__ for the type type itself (though it can cause confusion if cls.__format__() is different from cls.__format__(instance)), but for formatting the name of  the object's type (as in your original proposition) we need to add a new  conversion flag like "!r".

I'm not sure that I understood directly.

Do you want to add a third formatter in PyUnicode_FromFormat() which would use Py_TYPE(obj)->tp_name? I dislike Py_TYPE(obj)->tp_name, since my intent is to conform to the PEP 399: tp_name is not accessible at the Python level, only type(obj).__name__ and type(obj).__qualname__.

Or do you want to add a new formatter to type.__format__() to expose %T at the Python level, f"{type(obj).__module__}.{type(obj).__qualname__}"?

Currently, type(obj).__name__ is the most popular way to format a string. Would it break the backward compatibility to modify *existing* error messages?
msg324876 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-09-09 08:33
> Ok, I wrote PR 9122 to add %t format and modify %T format:

Nice!

I agree that it is easy to use _PyType_Name() directly. But using _PyType_FullName() instead of tp_name can be cumbersome because it returns a new object and needs error handling.

> Or do you want to add a new formatter to type.__format__() to expose %T at the Python level, f"{type(obj).__module__}.{type(obj).__qualname__}"?

Yes, I think we need a convenient way of formatting fully qualified name that omits the module name for types in the builtins module. It is equivalent to Py_TYPE(obj)->tp_name for extension types which is the most popular way to format a type name in error messages for now.

There are several open issues for inconsistency in error messages for Python and C implementations, because the former use type(obj).__name__ or obj.__class__.__name__, and the latter use Py_TYPE(obj)->tp_name. I hope finally we will fix this.
msg324877 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-09-09 08:37
> in error messages

And in reprs. It is common to format a repr as "{typename}(...)" or "<{typename}(...)>". The difference is whether the typename is a short or fully qualified name.
msg325045 - (view) Author: Petr Viktorin (petr.viktorin) * (Python committer) Date: 2018-09-11 18:34
> The PEP 399 requires that C accelerator behaves exactly as Python, [...]

It does not. PEP 399 requires that that the C code must pass the same *test suite*. And error messages in particular tend to not be checked in tests.

Anyway, I don't see how that applies to replacing `Py_TYPE(obj)->tp_name` by `%T`. The real reason for *that* change is removing borrowed references, right?
I have not yet seen a good reason why Py_TYPE(obj) is bad. The reasons you give in https://pythoncapi.readthedocs.io/bad_api.html#borrowed-references are about tagged pointers and PyList_GetItem(), but Py_TYPE() is very different.

I don't think the reasons are strong enough to add new API to PyUnicode_FromFormat().
msg325078 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-09-11 22:23
New changeset 998b80636690ffbdb0a278810d9c031fad38631d by Victor Stinner in branch 'master':
Revert "bpo-34595: Add %T format to PyUnicode_FromFormatV() (GH-9080)" (GH-9187)
https://github.com/python/cpython/commit/998b80636690ffbdb0a278810d9c031fad38631d
msg325079 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-09-11 22:24
Petr Viktorin asked me to open a wider discussion about this issue on python-dev. I just reverted my first change and posted:
https://mail.python.org/pipermail/python-dev/2018-September/155150.html
msg325814 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-09-19 22:48
I close this issue until we can agree on an API.
History
Date User Action Args
2018-09-19 22:48:11vstinnersetstatus: open -> closed
resolution: rejected
messages: + msg325814

stage: patch review -> resolved
2018-09-13 00:32:23vstinnersetpull_requests: + pull_request8682
2018-09-11 22:24:39vstinnersetmessages: + msg325079
2018-09-11 22:23:28vstinnersetmessages: + msg325078
2018-09-11 21:45:47vstinnersetpull_requests: + pull_request8625
2018-09-11 18:34:52petr.viktorinsetnosy: + petr.viktorin
messages: + msg325045
2018-09-09 08:37:06serhiy.storchakasetmessages: + msg324877
2018-09-09 08:33:14serhiy.storchakasetmessages: + msg324876
2018-09-09 07:53:53vstinnersetmessages: + msg324875
2018-09-09 07:43:23vstinnersetpull_requests: + pull_request8576
2018-09-08 10:49:28serhiy.storchakasetmessages: + msg324828
2018-09-08 08:16:19vstinnersetmessages: + msg324821
2018-09-07 21:45:00vstinnersetmessages: + msg324801
2018-09-07 21:03:50serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg324792
2018-09-07 16:12:12vstinnersetpull_requests: + pull_request8557
2018-09-07 16:01:04vstinnersetmessages: + msg324767
2018-09-06 09:33:51eric.smithsetnosy: + eric.smith
messages: + msg324679
2018-09-06 09:29:05vstinnersetmessages: + msg324678
2018-09-06 09:26:57vstinnersetmessages: + msg324677
2018-09-06 09:23:39vstinnersetmessages: + msg324676
2018-09-06 09:19:20vstinnersetkeywords: + patch
stage: patch review
pull_requests: + pull_request8539
2018-09-06 09:13:30vstinnercreate