Message259330
> PAYLOAD.decode('utf8') passes in P2.7.* and fails in P3.4
Well, Python 2 decoder didn't respect the Unicode standard. Please see:
http://unicodebook.readthedocs.org/issues.html#non-strict-utf-8-decoder-overlong-byte-sequences-and-surrogates
Python 3 is now stricted. You can still decode surrogate characters if you need them *for a good reason* using:
>>> b'\xed\xa0\x80'.decode('utf-8', 'surrogatepass')
'\ud800'
By they way, there is also:
>>> b'\xed\xa0\x80'.decode('utf-8', 'surrogateescape')
'\udced\udca0\udc80'
which is very different but may also help.
I suggest to close the issue as NOT A BUG. |
|
Date |
User |
Action |
Args |
2016-02-01 16:54:25 | vstinner | set | recipients:
+ vstinner, ezio.melotti, jinz |
2016-02-01 16:54:25 | vstinner | set | messageid: <1454345665.7.0.128274358036.issue26260@psf.upfronthosting.co.za> |
2016-02-01 16:54:25 | vstinner | link | issue26260 messages |
2016-02-01 16:54:25 | vstinner | create | |
|