Improve doc for str(bytesobject) #57747

GuillaumeBouchard · 2011-12-06T12:56:42Z

BPO	13538
Nosy	@terryjreedy, @pitrou, @ezio-melotti, @merwok, @bitdancer, @cjerdonek
Files	issue-13538-1-default.patch issue-13538-2-default.patch issue-13538-3-default.patch issue-13538-5-default.patch issue-13538-6-default.patch issue-13538-7-default.patch

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2012-11-21.02:17:59.808>
created_at = <Date 2011-12-06.12:56:42.113>
labels = ['easy', 'type-feature', 'docs']
title = 'Improve doc for str(bytesobject)'
updated_at = <Date 2012-11-21.13:38:31.989>
user = 'https://bugs.python.org/GuillaumeBouchard'

bugs.python.org fields:

activity = <Date 2012-11-21.13:38:31.989>
actor = 'python-dev'
assignee = 'docs@python'
closed = True
closed_date = <Date 2012-11-21.02:17:59.808>
closer = 'chris.jerdonek'
components = ['Documentation']
creation = <Date 2011-12-06.12:56:42.113>
creator = 'Guillaume.Bouchard'
dependencies = []
files = ['27556', '27591', '27592', '27944', '28040', '28045']
hgrepos = []
issue_num = 13538
keywords = ['patch', 'easy']
message_count = 24.0
messages = ['148914', '148916', '148917', '148918', '148919', '148922', '149163', '149362', '172716', '172832', '172960', '172989', '172990', '172991', '173018', '173019', '173023', '175262', '175950', '175976', '175977', '175978', '176039', '176058']
nosy_count = 9.0
nosy_names = ['terry.reedy', 'pitrou', 'ezio.melotti', 'eric.araujo', 'r.david.murray', 'chris.jerdonek', 'docs@python', 'python-dev', 'Guillaume.Bouchard']
pr_nums = []
priority = 'normal'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue13538'
versions = ['Python 3.2', 'Python 3.3', 'Python 3.4']

GuillaumeBouchard · 2011-12-06T12:56:41Z

The docstring associated with str() says:

str(string[, encoding[, errors]]) -> str

Create a new string object from the given encoded string.
encoding defaults to the current default string encoding.
errors can be 'strict', 'replace' or 'ignore' and defaults to 'strict'.

When it is stated in the on-line documentation::

When only object is given, this returns its nicely printable representation.

My issue comes when I tried to convert bytes to str.

As stated in the documentation, and to avoid implicit behavior, converting str to bytes cannot be done without giving an encoding (using bytes(my_str, encoding=..) or my_str.encode(...). bytes(my_str) will raise a TypeError). But if you try to convert bytes to str using str(my_bytes), python will returns you the so-called nicely printable representation of the bytes object).

ie. ::

  >>> bytes("foo")
  Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
  TypeError: string argument without an encoding
  >>> str(b"foo")
  "b'foo'"

As a matter of coherency and to avoid silent errors, I suggest that str() of a byte object without encoding raise an exception. I think it is usually what people want. If one wants a *nicely printable representation* of their bytes object, they can call explicitly the repr() function and will quickly see that what they just printed is wrong. But if they want to convert a byte object to its unicode representation, they will prefer an exception rather than a silently failing converting which leads to an unicode string starting with 'b"' and ending with '"'.

bitdancer · 2011-12-06T13:12:16Z

I agree with you that this is inconsistent. However, having str raise an error is pretty much a non-starter as a suggestion. str always falls back to the repr; in general str(obj) should always return some value, otherwise the assumptions of a *lot* of Python code would be broken.

Personally I'm not at all sure why str takes encoding and errors arguments (I never use them). I'd rather there be only one way to do that, decode. In other words, why do we have special case support for byte strings in the str conversion function?

But I don't think that can be changed either, so I think we are stuck with documenting the existing situation better. Do you want to propose a doc patch?

pitrou · 2011-12-06T13:14:32Z

Personally I'm not at all sure why str takes encoding and errors
arguments (I never use them).

Probably because the unicode type also did in 2.x.
And also because it makes it compatible with arbitrary buffer objects:

>>> str(memoryview(b"foo"), "ascii")
'foo'

GuillaumeBouchard · 2011-12-06T13:56:40Z

str always falls back to the repr; in general str(obj) should always return some value, otherwise the assumptions of a *lot* of Python code would be broken.

Perhaps it may raises a warning ?

ie, the only reason encoding exists if for the conversion of bytes (or something which looks like bytes) to str. Do you think it may be possible to special case the use of str for bytes (and bytesarray) with something like this:

def str(object, encoding=None, errors=None):
    if encoding is not None:
         # usual work
    else:
       if isinstance(object, (bytes, bytesarray)):
             warning('Converting bytes/bytesarray to str without encoding, it may not be what you expect')
             return object.__str__()

But by the way, adding warnings and special case everywhere seems not too pythonic.

Do you want to propose a doc patch?

The docstring for str() should looks like something like, in my frenglish way of writing english ::

Create a new string object from the given encoded string.

If object is bytes, bytesarray or a buffer-like object, encoding and error
can be set. errors can be 'strict', 'replace' or 'ignore' and defaults to
'strict'.

WARNING, if encoding is not set, the object is converted to a nicely
printable representation, which is totally different from what you may expect.

Perhaps a warning may be added in the on-line documentation, such as ::

.. warning::
When str() converts a bytes/bytesarray or a buffer-like object and
*encoding* is not specified, the result will an unicode nicely printable
representation, which is totally different from the unicode representation of
you object using a specified encoding.

Whould you like a .diff on top of the current mercurial repository ?

bitdancer · 2011-12-06T14:30:26Z

A diff would be great.

We try to use warnings sparingly, and I don't think this is a case that warrants it. Possibly a .. note is worthwhile, perhaps with an example for the bytes case, but even that may be too much.

I also wouldn't use the wording "is totally different from what you would expect", since by now I do expect it :). How about something like "the result will not be the decoded version of the bytes, but instead will be the repr of the object", with a cross link to repr.

pitrou · 2011-12-06T15:00:18Z

Well, I forgot to mention it in my previous message, but there is already a warning that you can activate with the -b option:

$ ./python -b
Python 3.3.0a0 (default:6b6c79eba944, Dec  6 2011, 11:11:32) 
[GCC 4.5.2] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> str(b"")
__main__:1: BytesWarning: str() on a bytes instance
"b''"

And you can even turn it into an error with -bb:

$ ./python -bb
Python 3.3.0a0 (default:6b6c79eba944, Dec  6 2011, 11:11:32) 
[GCC 4.5.2] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> str(b"")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
BytesWarning: str() on a bytes instance

However, -b is highly unlikely to become the default, for the reasons already explained. It was mainly meant to ease porting from Python 2.

merwok · 2011-12-10T16:02:27Z

A note in the docs (without note/warning directives, just a note) and maybe the docstring would be good. It should better explain that str has two uses: converting anything to a str (using __str__ or __repr__), decode buffer to str (with encoding and errors arguments). str(b'') is a case of the first use, not the second (and likewise %s formatting).

terryjreedy · 2011-12-12T22:41:13Z

I think Eric's suggestion is the proper approach.

cjerdonek · 2012-10-12T03:17:49Z

This may have been addressed to some extent by bpo-14783:

http://hg.python.org/cpython/rev/3773c98d9da8

cjerdonek · 2012-10-13T21:14:29Z

Attaching a proposed patch along the lines suggested by Éric.

ezio-melotti · 2012-10-15T11:05:10Z

Instead of documenting what *encoding* and *errors* do, I would just say that str(bytesobj, encoding, errors) is equivalent to bytesobj.decode(encoding, errors) (assuming it really is). I don't like encodings/decodings done via the str/bytes constructors, and I think the docs should encourage the use of bytes.decode/str.encode.

cjerdonek · 2012-10-15T17:27:03Z

I would just say that str(bytesobj, encoding, errors) is equivalent to bytesobj.decode(encoding, errors) (assuming it really is).

Good suggestion. And yes, code is shared in the following way:

http://hg.python.org/cpython/file/d3c7ebdc71bb/Objects/bytesobject.c#l2306

One thing that would need to be addressed in the str() version is if bytesobj is a PEP-3118 character buffer, after which it falls back to bytesobj.decode(encoding, errors). I will update the patch so people can see how it looks.

pitrou · 2012-10-15T17:29:57Z

Indeed:

>>> m = memoryview(b"")
>>> str(m, "utf-8")
''
>>> m.decode("utf-8")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'memoryview' object has no attribute 'decode'

pitrou · 2012-10-15T17:30:35Z

Note: "character buffer" isn't a term we use anymore (in Python 3, that is).

cjerdonek · 2012-10-16T07:11:05Z

Attaching updated patch based on Ezio and Antoine's comments. Let me know if I'm not using the correct or preferred terminology around buffer objects and the buffer protocol.

It doesn't seem like the section on the buffer protocol actually says what objects implementing the buffer protocol should be called. I gather indirectly from the docs that such objects are called "buffer objects" (as opposed to just "buffers"):

http://docs.python.org/dev/c-api/buffer.html#bufferobjects

cjerdonek · 2012-10-16T07:22:42Z

Reattaching patch (a line was missing).

ezio-melotti · 2012-10-16T09:58:25Z

+ str(bytes, encoding[, errors='strict'])
+ str(bytes, errors[, encoding='utf-8'])

Why not simply str(bytes, encoding='utf-8', errors='strict')? (Your signature suggests that str(b'abc', 'strict') should work.)

+ the string itself. This behavior differs from :func:`repr` in that the

I'm not sure this is the right place where to explain the differences between __str__ and __repr__ (or maybe it is?). Also doesn't str() falls back on __repr__ if __str__ is missing? Does :meth:`__str__` link to object.__str__?

+ If *encoding* or *errors* is given,

and/or

+ (or :class:`bytearray`) object, then :func:`str` calls

I would use 'is equivalent to', rather than 'calls'.

+ :meth:`bytes.decode(encoding, errors) <bytes.decode>` on the object
+ and returns the value. Otherwise, the bytes object underlying the buffer
+ object is obtained before calling :meth:`bytes.decode() <bytes.decode>`.

:meth:`bytes.decode` should be enough.

+ Passing a :func:`bytes <bytes>`

:func:`bytes` should be enough (if it isn't, maybe you want :func:`.bytes`).

cjerdonek · 2012-11-10T03:35:17Z

New patch incorporating Ezio's suggestions, along with some other changes.

cjerdonek · 2012-11-19T08:26:59Z

Updating patch after Ezio's review on Rietveld.

cjerdonek · 2012-11-20T02:16:11Z

Attaching new patch to address Ezio's further comments (for the convenience of comparing in Rietveld). I will be committing this.

merwok · 2012-11-20T04:13:17Z

I left a few remarks. The patch is very nice, thanks!

cjerdonek · 2012-11-20T04:44:37Z

Thanks, Éric! (And thanks also to Ezio who helped quite a bit with the improvements.) I replied to your comments on Rietveld.

python-dev · 2012-11-21T01:56:12Z

New changeset f32f1cb508ad by Chris Jerdonek in branch '3.2':
Improve str() and object.__str__() documentation (issue bpo-13538).
http://hg.python.org/cpython/rev/f32f1cb508ad

New changeset 6630a1c42204 by Chris Jerdonek in branch '3.3':
Null merge from 3.2 (issue bpo-13538).
http://hg.python.org/cpython/rev/6630a1c42204

New changeset 325f80d792b9 by Chris Jerdonek in branch '3.3':
Improve str() and object.__str__() documentation (issue bpo-13538).
http://hg.python.org/cpython/rev/325f80d792b9

New changeset 59acd5cac8b5 by Chris Jerdonek in branch 'default':
Merge from 3.3: Improve str() and object.__str__() docs (issue bpo-13538).
http://hg.python.org/cpython/rev/59acd5cac8b5

python-dev · 2012-11-21T13:38:32Z

New changeset 5c39e3906ce9 by Chris Jerdonek in branch '3.2':
Fix label in docs (from issue bpo-13538).
http://hg.python.org/cpython/rev/5c39e3906ce9

GuillaumeBouchard mannequin added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label Dec 6, 2011

bitdancer added docs Documentation in the Doc dir and removed interpreter-core (Objects, Python, Grammar, and Parser dirs) labels Dec 6, 2011

bitdancer assigned docspython Dec 6, 2011

merwok changed the title ~~Docstring of str() and/or behavior~~ Improve doc for str(bytesobject) Dec 10, 2011

ezio-melotti added easy type-feature A feature request or enhancement labels Jul 25, 2012

cjerdonek closed this as completed Nov 21, 2012

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve doc for str(bytesobject) #57747

Improve doc for str(bytesobject) #57747

GuillaumeBouchard mannequin commented Dec 6, 2011

GuillaumeBouchard mannequin commented Dec 6, 2011

bitdancer commented Dec 6, 2011

pitrou commented Dec 6, 2011

GuillaumeBouchard mannequin commented Dec 6, 2011

bitdancer commented Dec 6, 2011

pitrou commented Dec 6, 2011

merwok commented Dec 10, 2011

terryjreedy commented Dec 12, 2011

cjerdonek commented Oct 12, 2012

cjerdonek commented Oct 13, 2012

ezio-melotti commented Oct 15, 2012

cjerdonek commented Oct 15, 2012

pitrou commented Oct 15, 2012

pitrou commented Oct 15, 2012

cjerdonek commented Oct 16, 2012

cjerdonek commented Oct 16, 2012

ezio-melotti commented Oct 16, 2012

cjerdonek commented Nov 10, 2012

cjerdonek commented Nov 19, 2012

cjerdonek commented Nov 20, 2012

merwok commented Nov 20, 2012

cjerdonek commented Nov 20, 2012

python-dev mannequin commented Nov 21, 2012

python-dev mannequin commented Nov 21, 2012

Improve doc for str(bytesobject) #57747

Improve doc for str(bytesobject) #57747

Comments

GuillaumeBouchard mannequin commented Dec 6, 2011

GuillaumeBouchard mannequin commented Dec 6, 2011

bitdancer commented Dec 6, 2011

pitrou commented Dec 6, 2011

GuillaumeBouchard mannequin commented Dec 6, 2011

bitdancer commented Dec 6, 2011

pitrou commented Dec 6, 2011

merwok commented Dec 10, 2011

terryjreedy commented Dec 12, 2011

cjerdonek commented Oct 12, 2012

cjerdonek commented Oct 13, 2012

ezio-melotti commented Oct 15, 2012

cjerdonek commented Oct 15, 2012

pitrou commented Oct 15, 2012

pitrou commented Oct 15, 2012

cjerdonek commented Oct 16, 2012

cjerdonek commented Oct 16, 2012

ezio-melotti commented Oct 16, 2012

cjerdonek commented Nov 10, 2012

cjerdonek commented Nov 19, 2012

cjerdonek commented Nov 20, 2012

merwok commented Nov 20, 2012

cjerdonek commented Nov 20, 2012

python-dev mannequin commented Nov 21, 2012

python-dev mannequin commented Nov 21, 2012