This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: string result of str(bytes()) in Python3
Type: behavior Stage: resolved
Components: Unicode Versions: Python 3.8, Python 3.7, Python 3.6
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: acue, ezio.melotti, vstinner
Priority: normal Keywords:

Created on 2017-11-20 01:26 by acue, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
source_and_output.tar.gz acue, 2017-11-20 01:26 example source with output for 2.7 and 3.6
howto_bytes_005.py acue, 2017-11-20 05:37 reduced example
howto_bytes_005typo.py acue, 2017-11-20 05:46 typo'fix
Messages (6)
msg306521 - (view) Author: Arno-Can Uestuensoez (acue) Date: 2017-11-20 01:26
Hello,
I am currently writing some dual-version libraries and have
to deal with str/unicode.
The attached code example contains the str/unicode handling.

The Python3.6.2 release behaves as I did not expected for
all of the following the conversions:

  unicode = str  # @ReservedAssignment # it is intentional


  mystring = "abc"
  u0 = unicode(bytes(mystring.encode()))  # == str(mystring)

  mystring = "abc"
  u0 = unicode(bytes(mystring.encode('utf-8')))  # == str(mystring)

  mystring = "abc"
  u0 = unicode(bytes(mystring.encode('ascii')))  # == str(mystring)

  mystring = b"abc"
  u0 = unicode(mystring)  # == str(mystring)

results for Python3 in:

  type: <class 'str'>
  len:  6
  b'abc'

while in Python2:

  type: <type 'unicode'>
  len:  3
  abc

I am  not sure whether this is the intended behavior because the manual
could eventually be misinterpreted:


  4.8.1. Bytes Objects

  Bytes objects are immutable sequences of single bytes. 
  Since many major binary protocols are based on the ASCII text 
  encoding, bytes objects offer several methods that are only 
  valid when working with ASCII compatible data and are closely
  related to string objects in a variety of other ways.

    class bytes([source[, encoding[, errors]]])

  Firstly, the syntax for bytes literals is largely the same as 
  that for string literals, except that a b prefix is added:

I expected the 'b'-prefix to be added to the input only, but I
expect the output without a type-prefix, because it is just an
attribute/property.

The result for Python3 should be similar to Python2:

  type: <type 'str'>
  len:  3
  abc

Regards
Arno
msg306527 - (view) Author: Arno-Can Uestuensoez (acue) Date: 2017-11-20 05:37
Hello,
the following reduced example probably shows the issue a
little better.

I have currently not yet the 3.7+ environment, but guess
the same behavior.


Regards
Arno
msg306528 - (view) Author: Arno-Can Uestuensoez (acue) Date: 2017-11-20 05:46
Sorry for the typo.
msg306567 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-11-20 17:09
Calling str(bytes) is wrong in Python 3:

$ python3 -bb
Python 3.6.2 (default, Oct  2 2017, 16:51:32) 
>>> str(b'abc')
BytesWarning: str() on a bytes instance

Just don't do that :-)

Use repr(bytes) if you want the b'...' format:

>>> repr(b'abc')
"b'abc'"
msg306617 - (view) Author: Arno-Can Uestuensoez (acue) Date: 2017-11-21 02:54
I got your point, missed it before, sorry.
So just for completeness.

My issue was basically about the ambiguity of the str()-constructor
and the str()-built-in-function. Therefore the len/type prints.

It works with parameters:

(3.6.2) [acue@lap001 Desktop]$ python -bb
Python 3.6.2 (default, Jul 29 2017, 14:24:56) 
[GCC 4.8.3 20140911 (Red Hat 4.8.3-7)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> 
>>> str(b"abc")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
BytesWarning: str() on a bytes instance
>>> 
>>> 
>>> str(b"abc", "utf-8")
'abc'
>>> 
>>> type(str(b"abc",'utf-8'))
<class 'str'>
>>> 

Is there a common approach to force the use of the str()-constructor instead of
the str()-built-in function and/or the __str__()-method?

This would make the shared code of Python2/Python3 much easier, 
at least for unicode()->str().
msg306622 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-11-21 03:05
I'm sorry but the bug tracker is not the right place to ask such question. There are plenty of resources on the Internet explaining how to write code compatible with Python 2 and Python 3.
History
Date User Action Args
2022-04-11 14:58:54adminsetgithub: 76259
2017-11-21 03:05:57vstinnersetmessages: + msg306622
2017-11-21 02:54:03acuesetmessages: + msg306617
2017-11-20 17:09:57vstinnersetstatus: open -> closed
resolution: not a bug
messages: + msg306567

stage: resolved
2017-11-20 05:46:44acuesetfiles: + howto_bytes_005typo.py

messages: + msg306528
2017-11-20 05:37:48acuesetfiles: + howto_bytes_005.py

messages: + msg306527
versions: + Python 3.7, Python 3.8
2017-11-20 01:26:16acuecreate