Message 306521 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	acue
Recipients	acue, ezio.melotti, vstinner
Date	2017-11-20.01:26:13
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1511141176.94.0.213398074469.issue32078@psf.upfronthosting.co.za>
In-reply-to

Content
Hello, I am currently writing some dual-version libraries and have to deal with str/unicode. The attached code example contains the str/unicode handling. The Python3.6.2 release behaves as I did not expected for all of the following the conversions: unicode = str # @ReservedAssignment # it is intentional mystring = "abc" u0 = unicode(bytes(mystring.encode())) # == str(mystring) mystring = "abc" u0 = unicode(bytes(mystring.encode('utf-8'))) # == str(mystring) mystring = "abc" u0 = unicode(bytes(mystring.encode('ascii'))) # == str(mystring) mystring = b"abc" u0 = unicode(mystring) # == str(mystring) results for Python3 in: type: <class 'str'> len: 6 b'abc' while in Python2: type: <type 'unicode'> len: 3 abc I am not sure whether this is the intended behavior because the manual could eventually be misinterpreted: 4.8.1. Bytes Objects Bytes objects are immutable sequences of single bytes. Since many major binary protocols are based on the ASCII text encoding, bytes objects offer several methods that are only valid when working with ASCII compatible data and are closely related to string objects in a variety of other ways. class bytes([source[, encoding[, errors]]]) Firstly, the syntax for bytes literals is largely the same as that for string literals, except that a b prefix is added: I expected the 'b'-prefix to be added to the input only, but I expect the output without a type-prefix, because it is just an attribute/property. The result for Python3 should be similar to Python2: type: <type 'str'> len: 3 abc Regards Arno

Hello,
I am currently writing some dual-version libraries and have
to deal with str/unicode.
The attached code example contains the str/unicode handling.

The Python3.6.2 release behaves as I did not expected for
all of the following the conversions:

  unicode = str  # @ReservedAssignment # it is intentional


  mystring = "abc"
  u0 = unicode(bytes(mystring.encode()))  # == str(mystring)

  mystring = "abc"
  u0 = unicode(bytes(mystring.encode('utf-8')))  # == str(mystring)

  mystring = "abc"
  u0 = unicode(bytes(mystring.encode('ascii')))  # == str(mystring)

  mystring = b"abc"
  u0 = unicode(mystring)  # == str(mystring)

results for Python3 in:

  type: <class 'str'>
  len:  6
  b'abc'

while in Python2:

  type: <type 'unicode'>
  len:  3
  abc

I am  not sure whether this is the intended behavior because the manual
could eventually be misinterpreted:


  4.8.1. Bytes Objects

  Bytes objects are immutable sequences of single bytes. 
  Since many major binary protocols are based on the ASCII text 
  encoding, bytes objects offer several methods that are only 
  valid when working with ASCII compatible data and are closely
  related to string objects in a variety of other ways.

    class bytes([source[, encoding[, errors]]])

  Firstly, the syntax for bytes literals is largely the same as 
  that for string literals, except that a b prefix is added:

I expected the 'b'-prefix to be added to the input only, but I
expect the output without a type-prefix, because it is just an
attribute/property.

The result for Python3 should be similar to Python2:

  type: <type 'str'>
  len:  3
  abc

Regards
Arno

History
Date	User	Action	Args
2017-11-20 01:26:17	acue	set	recipients: + acue, vstinner, ezio.melotti
2017-11-20 01:26:16	acue	set	messageid: <1511141176.94.0.213398074469.issue32078@psf.upfronthosting.co.za>
2017-11-20 01:26:16	acue	link	issue32078 messages
2017-11-20 01:26:13	acue	create