Issue 39574: str.__doc__ is misleading

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/83755

classification

Title:	str.__doc__ is misleading
Type:	enhancement	Stage:	patch review
Components:	Documentation	Versions:	Python 3.9, Python 3.8, Python 3.7

process

Status:	open	Resolution:
Dependencies:		Superseder:
Assigned To:	docs@python	Nosy List:	docs@python, eric.smith, kcirtsew, serhiy.storchaka, steven.daprano
Priority:	normal	Keywords:	patch

Created on 2020-02-07 01:01 by kcirtsew, last changed 2022-04-11 14:59 by admin.

Pull Requests
URL	Status	Linked	Edit
PR 18401	open	eric.smith, 2020-02-07 11:12

Messages (9)
msg361524 - (view)	Author: Zachary Westrick (kcirtsew)	Date: 2020-02-07 01:07
The docstring for the str() builtin reads str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.__str__() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to 'strict'. The statement "encoding defaults to sys.getdefaultencoding()." implies that the encoding argument defaults to sys.getdefaultencoding(), which would typically mean that str(X, encoding=sys.getdefaultencoding()) == str(X) However, this is not the case str(b'mystring', encoding=sys.getdefaultencoding()) -> 'mystring' str(b'mystring') -> "b'mystring'" It seems that the phrase "encoding defaults" is not referring to the argument named encoding.
msg361547 - (view)	Author: Steven D'Aprano (steven.daprano) *	Date: 2020-02-07 09:05
The docs are correct, you are just misinterpreting them. Which could, I guess, suggest the docs could do with improvement. With one argument, `str(obj)` returns a string via `object.__str__(obj)` or `repr(obj)`, whichever is defined. That includes the case where obj is a bytes object. Only in the two or three argument case where you explicitly provide either the encoding or errors parameter will bytes be decoded. But you must provide at least one of encoding or errors. If you provide neither, you have the one-argument form above. The default value for encoding is only relevant in cases like this: # encoding defaults to sys.getdefaultencoding() py> str(b'a', errors='ignore') 'a' Here's my suggested rewording: *** str(object='') -> str str(bytes_or_buffer [, encoding] [, errors]) -> str Create a new string object from the given object. If a single argument is given, returns the result of object.__str__() (if defined) or repr(object). If encoding or errors or both are specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. If errors is specified, the default encoding is sys.getdefaultencoding(). If encoding is specified, errors defaults to 'strict'.
msg361551 - (view)	Author: Eric V. Smith (eric.smith) *	Date: 2020-02-07 09:26
That's a good improvement, Steven. I like your wording about errors better than the wording about encoding, so how about changing the next to last sentence to: "If errors is specified, encoding defaults to sys.getdefaultencoding()."
msg361560 - (view)	Author: Steven D'Aprano (steven.daprano) *	Date: 2020-02-07 10:26
Eric: sure, I'm happy with your modification. Alas, I'm currently having technology issues which prevents me from doing a PR. Would you care to do the honours?
msg361591 - (view)	Author: Eric V. Smith (eric.smith) *	Date: 2020-02-07 11:15
I've created a PR and requested review from stevendaprano. I think the backports are correct.
msg361594 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2020-02-07 12:33
See a discussion on Python-Dev: https://mail.python.org/archives/list/python-dev@python.org/message/YMIGWRUERUG66CKRJXDXNPCIDHRQJY6V/
msg361618 - (view)	Author: Steven D'Aprano (steven.daprano) *	Date: 2020-02-08 00:57
On Fri, Feb 07, 2020 at 12:33:45PM +0000, Serhiy Storchaka wrote: > > Serhiy Storchaka <storchaka+cpython@gmail.com> added the comment: > > See a discussion on Python-Dev: https://mail.python.org/archives/list/python-dev@python.org/message/YMIGWRUERUG66CKRJXDXNPCIDHRQJY6V/ I don't know whether the very odd calls str(encoding='spam') str(errors='eggs') str(encoding='spam', errors='eggs') are intentional or not. I suspect not: to me, it looks like an accident of implementation, not a deliberate feature. Under what circumstances would somebody intentionally provide an encoding and error handler when they aren't actually going to use them? There may be really unusual cases: args = () if condition else (mybytes,) str = str(*args, encoding='spam') but I doubt they are going to be either common or something we ought to encourage. Regardless of whether we deprecate and remove those three odd cases or not, I don't think we should bother documenting them. If anyone disagrees, and wants to document them, that's okay, but you can document them as a separate PR with a separate discussion. Let's just fix the confusion over the default encoding here and worry about other issues later. Don't let the perfect get in the way of the good enough for now :-)
msg361619 - (view)	Author: Eric V. Smith (eric.smith) *	Date: 2020-02-08 00:59
I agree that the current changes are an improvement, and should be committed.
msg361817 - (view)	Author: Steven D'Aprano (steven.daprano) *	Date: 2020-02-11 13:34
Sorry everyone, due to technology problems I am unable to comment on the github page, and due to ISP problems I've been off the internet for a few days. > pull_request: https://github.com/python/cpython/pull/18401 [Serhiy] > Is not "or both" redundant? I don't think so. In regular English, "or" can imply exclusive-or: "Shall we eat at the Thai or the Italian restaurant?" There are four relevant cases: - supply neither encoding nor errors; - supply only encoding; - supply only errors; - supply both encoding and errors. Using "or" may be, for some readers, ambiguous: is the last option included or not? For the sake of two extra words, let's make it clear and unambiguous. [Serhiy] > Use just 'utf-8' instead of sys.getdefaultencoding(). It is a > constant in Python 3. I didn't know that. I'm okay with that change, thank you. [Serhiy] > - str(bytes_or_buffer[, encoding[, errors]]) -> str > + str(bytes_or_buffer, encoding='utf-8', errors='strict') -> str I'm happy with that. Thank you everyone, and sorry again that I have trouble with the Github process. (I need a new computer with a newer OS.)

History
Date	User	Action	Args
2022-04-11 14:59:26	admin	set	github: 83755
2020-02-11 13:34:48	steven.daprano	set	messages: + msg361817
2020-02-09 20:20:00	terry.reedy	set	versions: - Python 3.5, Python 3.6
2020-02-08 04:23:54	martin.panter	link	issue35318 superseder
2020-02-08 00:59:23	eric.smith	set	messages: + msg361619
2020-02-08 00:57:53	steven.daprano	set	messages: + msg361618
2020-02-07 12:33:45	serhiy.storchaka	set	nosy: + serhiy.storchaka messages: + msg361594
2020-02-07 11:15:18	eric.smith	set	messages: + msg361591
2020-02-07 11:12:26	eric.smith	set	keywords: + patch stage: patch review pull_requests: + pull_request17777
2020-02-07 10:26:22	steven.daprano	set	messages: + msg361560
2020-02-07 09:26:51	eric.smith	set	nosy: + eric.smith messages: + msg361551
2020-02-07 09:05:14	steven.daprano	set	nosy: + steven.daprano messages: + msg361547 versions: + Python 3.6, Python 3.7, Python 3.8, Python 3.9
2020-02-07 01:07:08	kcirtsew	set	messages: + msg361524
2020-02-07 01:01:54	kcirtsew	create