Message161054
codecs.charmap_decode behaves differently with native and user string as decode table.
>>> import codecs
>>> print(ascii(codecs.charmap_decode(b'\x00', 'replace', '\uFFFE')))
('\ufffd', 1)
>>> class S(str): pass
...
>>> print(ascii(codecs.charmap_decode(b'\x00', 'replace', S('\uFFFE'))))
('\ufffe', 1)
It's because charmap decoder (function PyUnicode_DecodeCharmap in Objects/unicodeobject.c) uses different algorithms for exact strings and for other.
We need to fix it? If yes, what should return `codecs.charmap_decode(b'\x00', 'replace', {0:'\uFFFE'})`? What should return `codecs.charmap_decode(b'\x00', 'replace', {0:0xFFFE})`? |
|
Date |
User |
Action |
Args |
2012-05-18 14:46:36 | serhiy.storchaka | set | recipients:
+ serhiy.storchaka |
2012-05-18 14:46:36 | serhiy.storchaka | set | messageid: <1337352396.54.0.468921009595.issue14850@psf.upfronthosting.co.za> |
2012-05-18 14:46:35 | serhiy.storchaka | link | issue14850 messages |
2012-05-18 14:46:35 | serhiy.storchaka | create | |
|