Message241800
In Python 2, the unicode() constructor does not accept bytes arguments, unless an encoding argument is given:
>>> unicode(u'abcäöü'.encode('utf-8'))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 3: ordinal not in range(128)
In Python 3, the str() constructor masks this programming error by returning the repr() of the bytes object:
>>> str('abcäöü'.encode('utf-8'))
"b'abc\\xc3\\xa4\\xc3\\xb6\\xc3\\xbc'"
I think it would be more helpful to point the programmer to the most probably missing encoding argument by raising an error.
Also note that you get a different output with encoding argument set:
>>> str('abcäöü'.encode('utf-8'), 'utf-8')
'abcäöü'
I know this is documented, but it is still not very helpful and can easily hide errors. |
|
Date |
User |
Action |
Args |
2015-04-22 13:23:32 | lemburg | set | recipients:
+ lemburg, vstinner, ezio.melotti |
2015-04-22 13:23:32 | lemburg | set | messageid: <1429709012.06.0.232941400287.issue24025@psf.upfronthosting.co.za> |
2015-04-22 13:23:32 | lemburg | link | issue24025 messages |
2015-04-22 13:23:31 | lemburg | create | |
|