classification
Title: string.format(bytes) raise warning
Type: Stage:
Components: Interpreter Core Versions: Python 3.5
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: marco.sulla, vstinner
Priority: normal Keywords:

Created on 2016-03-14 10:29 by marco.sulla, last changed 2016-03-14 14:33 by vstinner. This issue is now closed.

Messages (8)
msg261739 - (view) Author: Marco Sulla (marco.sulla) Date: 2016-03-14 10:29
Steps to reproduce

1. create a format_bytes.py with:

"Hello {}".format(b"World")

2. launch it with
python3 -bb format_bytes.py

Result:

Traceback (most recent call last):
  File "format_bytes.py", line 1, in <module>
    "Hello {}".format(b"World")
BytesWarning: str() on a bytes instance



Expected:

No warning
msg261740 - (view) Author: Marco Sulla (marco.sulla) Date: 2016-03-14 10:31
I want to clarify more: I do not want to suppress the warning, I would that the format minilanguage will convert bytes to string properly.
msg261742 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-03-14 10:35
> I would that the format minilanguage will convert bytes to string properly.

Sorry, nope, Python 3 doesn't guess the encoding of byte strings anymore. You have to decode manually. Example:

"Hello {}".format(b"World".decode('ascii'))

Or format to bytes:

b"Hello {}".format(b"World")

It's not a bug. It's a feature.
msg261743 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-03-14 10:38
More about Unicode:

* https://docs.python.org/dev/howto/unicode.html
* http://unicodebook.readthedocs.org/
* etc.
msg261751 - (view) Author: Marco Sulla (marco.sulla) Date: 2016-03-14 13:19
> Python 3 doesn't guess the encoding of byte strings anymore

And I agree, but I think format minilanguage could convert it by default to utf8, and if something goes wrong raise an error (or try str()). More simple to use and robust at the same time.

My 2 cents.
msg261752 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-03-14 13:20
>> Python 3 doesn't guess the encoding of byte strings anymore

> And I agree, but I think format minilanguage could convert it by default to utf8, ..

Using utf8 means guessing the encoding of a byte string. Python 3 doesn't do that anymore, there is no more exception.
msg261753 - (view) Author: Marco Sulla (marco.sulla) Date: 2016-03-14 13:31
> Using utf8 means guessing the encoding

Well, it's not what format() is doing now, using str()? :)
msg261755 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-03-14 14:33
> Well, it's not what format() is doing now, using str()? :)

Hum, are you sure that you tried Python 3, and not Python 2?

str(bytes) on Python 3 is well defined:

>>> print(str(b'hello'))
b'hello'
>>> print(str('h\xe9llo'.encode('utf8')))
b'h\xc3\xa9llo'

I'm not sure that you expect the b'...' format. Non-ASCII characters are escaped as \xHH format.
History
Date User Action Args
2016-03-14 14:33:28vstinnersetmessages: + msg261755
2016-03-14 13:31:51marco.sullasetmessages: + msg261753
2016-03-14 13:20:39vstinnersetmessages: + msg261752
2016-03-14 13:19:30marco.sullasetmessages: + msg261751
2016-03-14 10:38:23vstinnersetmessages: + msg261743
2016-03-14 10:35:51vstinnersetstatus: open -> closed

nosy: + vstinner
messages: + msg261742

resolution: not a bug
2016-03-14 10:31:13marco.sullasetmessages: + msg261740
2016-03-14 10:29:53marco.sullacreate