This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author serhiy.storchaka
Recipients serhiy.storchaka
Date 2012-05-18.14:46:35
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1337352396.54.0.468921009595.issue14850@psf.upfronthosting.co.za>
In-reply-to
Content
codecs.charmap_decode behaves differently with native and user string as decode table.

>>> import codecs
>>> print(ascii(codecs.charmap_decode(b'\x00', 'replace', '\uFFFE')))
('\ufffd', 1)
>>> class S(str): pass
... 
>>> print(ascii(codecs.charmap_decode(b'\x00', 'replace', S('\uFFFE'))))
('\ufffe', 1)

It's because charmap decoder (function PyUnicode_DecodeCharmap in Objects/unicodeobject.c) uses different algorithms for exact strings and for other.

We need to fix it? If yes, what should return `codecs.charmap_decode(b'\x00', 'replace', {0:'\uFFFE'})`? What should return `codecs.charmap_decode(b'\x00', 'replace', {0:0xFFFE})`?
History
Date User Action Args
2012-05-18 14:46:36serhiy.storchakasetrecipients: + serhiy.storchaka
2012-05-18 14:46:36serhiy.storchakasetmessageid: <1337352396.54.0.468921009595.issue14850@psf.upfronthosting.co.za>
2012-05-18 14:46:35serhiy.storchakalinkissue14850 messages
2012-05-18 14:46:35serhiy.storchakacreate