This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author lemburg
Recipients ezio.melotti, lemburg, vstinner
Date 2015-04-22.13:23:31
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1429709012.06.0.232941400287.issue24025@psf.upfronthosting.co.za>
In-reply-to
Content
In Python 2, the unicode() constructor does not accept bytes arguments, unless an encoding argument is given:

>>> unicode(u'abcäöü'.encode('utf-8'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 3: ordinal not in range(128)

In Python 3, the str() constructor masks this programming error by returning the repr() of the bytes object:

>>> str('abcäöü'.encode('utf-8'))
"b'abc\\xc3\\xa4\\xc3\\xb6\\xc3\\xbc'"

I think it would be more helpful to point the programmer to the most probably missing encoding argument by raising an error.

Also note that you get a different output with encoding argument set:

>>> str('abcäöü'.encode('utf-8'), 'utf-8')
'abcäöü'

I know this is documented, but it is still not very helpful and can easily hide errors.
History
Date User Action Args
2015-04-22 13:23:32lemburgsetrecipients: + lemburg, vstinner, ezio.melotti
2015-04-22 13:23:32lemburgsetmessageid: <1429709012.06.0.232941400287.issue24025@psf.upfronthosting.co.za>
2015-04-22 13:23:32lemburglinkissue24025 messages
2015-04-22 13:23:31lemburgcreate