classification
Title: str.__doc__ is misleading
Type: enhancement Stage: patch review
Components: Documentation Versions: Python 3.9, Python 3.8, Python 3.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: docs@python Nosy List: docs@python, eric.smith, kcirtsew, serhiy.storchaka, steven.daprano
Priority: normal Keywords: patch

Created on 2020-02-07 01:01 by kcirtsew, last changed 2020-02-11 13:34 by steven.daprano.

Pull Requests
URL Status Linked Edit
PR 18401 open eric.smith, 2020-02-07 11:12
Messages (9)
msg361524 - (view) Author: Zachary Westrick (kcirtsew) Date: 2020-02-07 01:07
The docstring for the str() builtin reads

str(object='') -> str
str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or
errors is specified, then the object must expose a data buffer
that will be decoded using the given encoding and error handler.
Otherwise, returns the result of object.__str__() (if defined)
or repr(object).
encoding defaults to sys.getdefaultencoding().
errors defaults to 'strict'.

The statement "encoding defaults to sys.getdefaultencoding()." implies that the encoding argument defaults to sys.getdefaultencoding(), which would typically mean that 

str(X, encoding=sys.getdefaultencoding()) == str(X)

However, this is not the case

str(b'mystring', encoding=sys.getdefaultencoding()) -> 'mystring'
str(b'mystring') -> "b'mystring'"

It seems that the phrase "encoding defaults" is not referring to the argument named encoding.
msg361547 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2020-02-07 09:05
The docs are correct, you are just misinterpreting them. Which could, I guess, suggest the docs could do with improvement.

With *one* argument, `str(obj)` returns a string via `object.__str__(obj)` or `repr(obj)`, whichever is defined. That includes the case where obj is a bytes object.

*Only* in the two or three argument case where you explicitly provide either the encoding or errors parameter will bytes be decoded. But you must provide at least one of encoding or errors. If you provide neither, you have the one-argument form above.

The default value for encoding is only relevant in cases like this:

    # encoding defaults to sys.getdefaultencoding()
    py> str(b'a', errors='ignore')
    'a'



Here's my suggested rewording:


***


str(object='') -> str
str(bytes_or_buffer [, encoding] [, errors]) -> str

Create a new string object from the given object.

If a single argument is given, returns the result of object.__str__() (if defined) or repr(object).

If encoding or errors or both are specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. If errors is specified, the default encoding is sys.getdefaultencoding(). If encoding is specified, errors defaults to 'strict'.
msg361551 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2020-02-07 09:26
That's a good improvement, Steven. I like your wording about errors better than the wording about encoding, so how about changing the next to last sentence to:

"If errors is specified, encoding defaults to sys.getdefaultencoding()."
msg361560 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2020-02-07 10:26
Eric: sure, I'm happy with your modification.

Alas, I'm currently having technology issues which prevents me from 
doing a PR. Would you care to do the honours?
msg361591 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2020-02-07 11:15
I've created a PR and requested review from stevendaprano. I think the backports are correct.
msg361594 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-02-07 12:33
See a discussion on Python-Dev: https://mail.python.org/archives/list/python-dev@python.org/message/YMIGWRUERUG66CKRJXDXNPCIDHRQJY6V/
msg361618 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2020-02-08 00:57
On Fri, Feb 07, 2020 at 12:33:45PM +0000, Serhiy Storchaka wrote:
> 
> Serhiy Storchaka <storchaka+cpython@gmail.com> added the comment:
> 
> See a discussion on Python-Dev: https://mail.python.org/archives/list/python-dev@python.org/message/YMIGWRUERUG66CKRJXDXNPCIDHRQJY6V/

I don't know whether the very odd calls 

    str(encoding='spam')
    str(errors='eggs')
    str(encoding='spam', errors='eggs')

are intentional or not. I suspect not: to me, it looks like an accident 
of implementation, not a deliberate feature. Under what circumstances 
would somebody intentionally provide an encoding and error handler when 
they aren't actually going to use them? There may be really unusual 
cases:

    args = () if condition else (mybytes,)
    str = str(*args, encoding='spam')

but I doubt they are going to be either common or something we ought to 
encourage. Regardless of whether we deprecate and remove those three odd 
cases or not, I don't think we should bother documenting them.

If anyone disagrees, and wants to document them, that's okay, but you 
can document them as a separate PR with a separate discussion. Let's 
just fix the confusion over the default encoding here and worry about 
other issues later. Don't let the perfect get in the way of the good 
enough for now :-)
msg361619 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2020-02-08 00:59
I agree that the current changes are an improvement, and should be committed.
msg361817 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2020-02-11 13:34
Sorry everyone, due to technology problems I am unable to comment on the 
github page, and due to ISP problems I've been off the internet for a 
few days.

> pull_request: https://github.com/python/cpython/pull/18401

[Serhiy]
> Is not "or both" redundant?

I don't think so. In regular English, "or" can imply exclusive-or:

    "Shall we eat at the Thai or the Italian restaurant?"

There are four relevant cases:

- supply neither encoding nor errors;
- supply only encoding;
- supply only errors;
- supply both encoding and errors.

Using "or" may be, for some readers, ambiguous: is the last option 
included or not? For the sake of two extra words, let's make it clear 
and unambiguous.

[Serhiy]
> Use just 'utf-8' instead of sys.getdefaultencoding(). It is a 
> constant in Python 3.

I didn't know that. I'm okay with that change, thank you.

[Serhiy]
> - str(bytes_or_buffer[, encoding[, errors]]) -> str
> + str(bytes_or_buffer, encoding='utf-8', errors='strict') -> str

I'm happy with that.

Thank you everyone, and sorry again that I have trouble with the Github 
process. (I need a new computer with a newer OS.)
History
Date User Action Args
2020-02-11 13:34:48steven.dapranosetmessages: + msg361817
2020-02-09 20:20:00terry.reedysetversions: - Python 3.5, Python 3.6
2020-02-08 04:23:54martin.panterlinkissue35318 superseder
2020-02-08 00:59:23eric.smithsetmessages: + msg361619
2020-02-08 00:57:53steven.dapranosetmessages: + msg361618
2020-02-07 12:33:45serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg361594
2020-02-07 11:15:18eric.smithsetmessages: + msg361591
2020-02-07 11:12:26eric.smithsetkeywords: + patch
stage: patch review
pull_requests: + pull_request17777
2020-02-07 10:26:22steven.dapranosetmessages: + msg361560
2020-02-07 09:26:51eric.smithsetnosy: + eric.smith
messages: + msg361551
2020-02-07 09:05:14steven.dapranosetnosy: + steven.daprano

messages: + msg361547
versions: + Python 3.6, Python 3.7, Python 3.8, Python 3.9
2020-02-07 01:07:08kcirtsewsetmessages: + msg361524
2020-02-07 01:01:54kcirtsewcreate