Issue 32078: string result of str(bytes()) in Python3

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/76259

classification

Title:	string result of str(bytes()) in Python3
Type:	behavior	Stage:	resolved
Components:	Unicode	Versions:	Python 3.8, Python 3.7, Python 3.6

process

Status:	closed	Resolution:	not a bug
Dependencies:		Superseder:
Assigned To:		Nosy List:	acue, ezio.melotti, vstinner
Priority:	normal	Keywords:

Created on 2017-11-20 01:26 by acue, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
source_and_output.tar.gz	acue, 2017-11-20 01:26	example source with output for 2.7 and 3.6
howto_bytes_005.py	acue, 2017-11-20 05:37	reduced example
howto_bytes_005typo.py	acue, 2017-11-20 05:46	typo'fix

Messages (6)
msg306521 - (view)	Author: Arno-Can Uestuensoez (acue)	Date: 2017-11-20 01:26
Hello, I am currently writing some dual-version libraries and have to deal with str/unicode. The attached code example contains the str/unicode handling. The Python3.6.2 release behaves as I did not expected for all of the following the conversions: unicode = str # @ReservedAssignment # it is intentional mystring = "abc" u0 = unicode(bytes(mystring.encode())) # == str(mystring) mystring = "abc" u0 = unicode(bytes(mystring.encode('utf-8'))) # == str(mystring) mystring = "abc" u0 = unicode(bytes(mystring.encode('ascii'))) # == str(mystring) mystring = b"abc" u0 = unicode(mystring) # == str(mystring) results for Python3 in: type: <class 'str'> len: 6 b'abc' while in Python2: type: <type 'unicode'> len: 3 abc I am not sure whether this is the intended behavior because the manual could eventually be misinterpreted: 4.8.1. Bytes Objects Bytes objects are immutable sequences of single bytes. Since many major binary protocols are based on the ASCII text encoding, bytes objects offer several methods that are only valid when working with ASCII compatible data and are closely related to string objects in a variety of other ways. class bytes([source[, encoding[, errors]]]) Firstly, the syntax for bytes literals is largely the same as that for string literals, except that a b prefix is added: I expected the 'b'-prefix to be added to the input only, but I expect the output without a type-prefix, because it is just an attribute/property. The result for Python3 should be similar to Python2: type: <type 'str'> len: 3 abc Regards Arno
msg306527 - (view)	Author: Arno-Can Uestuensoez (acue)	Date: 2017-11-20 05:37
Hello, the following reduced example probably shows the issue a little better. I have currently not yet the 3.7+ environment, but guess the same behavior. Regards Arno
msg306528 - (view)	Author: Arno-Can Uestuensoez (acue)	Date: 2017-11-20 05:46
Sorry for the typo.
msg306567 - (view)	Author: STINNER Victor (vstinner) *	Date: 2017-11-20 17:09
Calling str(bytes) is wrong in Python 3: $ python3 -bb Python 3.6.2 (default, Oct 2 2017, 16:51:32) >>> str(b'abc') BytesWarning: str() on a bytes instance Just don't do that :-) Use repr(bytes) if you want the b'...' format: >>> repr(b'abc') "b'abc'"
msg306617 - (view)	Author: Arno-Can Uestuensoez (acue)	Date: 2017-11-21 02:54
I got your point, missed it before, sorry. So just for completeness. My issue was basically about the ambiguity of the str()-constructor and the str()-built-in-function. Therefore the len/type prints. It works with parameters: (3.6.2) [acue@lap001 Desktop]$ python -bb Python 3.6.2 (default, Jul 29 2017, 14:24:56) [GCC 4.8.3 20140911 (Red Hat 4.8.3-7)] on linux Type "help", "copyright", "credits" or "license" for more information. >>> >>> str(b"abc") Traceback (most recent call last): File "<stdin>", line 1, in <module> BytesWarning: str() on a bytes instance >>> >>> >>> str(b"abc", "utf-8") 'abc' >>> >>> type(str(b"abc",'utf-8')) <class 'str'> >>> Is there a common approach to force the use of the str()-constructor instead of the str()-built-in function and/or the __str__()-method? This would make the shared code of Python2/Python3 much easier, at least for unicode()->str().
msg306622 - (view)	Author: STINNER Victor (vstinner) *	Date: 2017-11-21 03:05
I'm sorry but the bug tracker is not the right place to ask such question. There are plenty of resources on the Internet explaining how to write code compatible with Python 2 and Python 3.

History
Date	User	Action	Args
2022-04-11 14:58:54	admin	set	github: 76259
2017-11-21 03:05:57	vstinner	set	messages: + msg306622
2017-11-21 02:54:03	acue	set	messages: + msg306617
2017-11-20 17:09:57	vstinner	set	status: open -> closed resolution: not a bug messages: + msg306567 stage: resolved
2017-11-20 05:46:44	acue	set	files: + howto_bytes_005typo.py messages: + msg306528
2017-11-20 05:37:48	acue	set	files: + howto_bytes_005.py messages: + msg306527 versions: + Python 3.7, Python 3.8
2017-11-20 01:26:16	acue	create