Message 161054 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	serhiy.storchaka
Recipients	serhiy.storchaka
Date	2012-05-18.14:46:35
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1337352396.54.0.468921009595.issue14850@psf.upfronthosting.co.za>
In-reply-to

Content
codecs.charmap_decode behaves differently with native and user string as decode table. >>> import codecs >>> print(ascii(codecs.charmap_decode(b'\x00', 'replace', '\uFFFE'))) ('\ufffd', 1) >>> class S(str): pass ... >>> print(ascii(codecs.charmap_decode(b'\x00', 'replace', S('\uFFFE')))) ('\ufffe', 1) It's because charmap decoder (function PyUnicode_DecodeCharmap in Objects/unicodeobject.c) uses different algorithms for exact strings and for other. We need to fix it? If yes, what should return `codecs.charmap_decode(b'\x00', 'replace', {0:'\uFFFE'})`? What should return `codecs.charmap_decode(b'\x00', 'replace', {0:0xFFFE})`?

codecs.charmap_decode behaves differently with native and user string as decode table.

>>> import codecs
>>> print(ascii(codecs.charmap_decode(b'\x00', 'replace', '\uFFFE')))
('\ufffd', 1)
>>> class S(str): pass
... 
>>> print(ascii(codecs.charmap_decode(b'\x00', 'replace', S('\uFFFE'))))
('\ufffe', 1)

It's because charmap decoder (function PyUnicode_DecodeCharmap in Objects/unicodeobject.c) uses different algorithms for exact strings and for other.

We need to fix it? If yes, what should return `codecs.charmap_decode(b'\x00', 'replace', {0:'\uFFFE'})`? What should return `codecs.charmap_decode(b'\x00', 'replace', {0:0xFFFE})`?

History
Date	User	Action	Args
2012-05-18 14:46:36	serhiy.storchaka	set	recipients: + serhiy.storchaka
2012-05-18 14:46:36	serhiy.storchaka	set	messageid: <1337352396.54.0.468921009595.issue14850@psf.upfronthosting.co.za>
2012-05-18 14:46:35	serhiy.storchaka	link	issue14850 messages
2012-05-18 14:46:35	serhiy.storchaka	create