Message 138244 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	vstinner
Recipients	amaury.forgeotdarc, loewis, ocean-city, vstinner
Date	2011-06-13.13:52:52
SpamBayes Score	1.42018e-06
Marked as misclassified	No
Message-id	<1307973176.03.0.353214642985.issue12281@psf.upfronthosting.co.za>
In-reply-to

Content
Patch version 3: - add unit tests for code pages 932, 1252, CP_UTF7 and CP_UTF8 - fix encode/decode flags for CP_UTF7/CP_UTF8 - fix encode name on UnicodeDecodeError, support also "CP_UTF7" and "CP_UTF8" code page names TODO: - The decoder (with errors) doesn't support multibyte characters, e.g. b"\xC3\xA9\xFF" is not correctly decoded using "replace" (insize is fixed to 1) - The encoder doesn't support surrogate pairs, but the result with UTF-8 looks correct - UTF-7 decoder is not strict, e.g. b'[+/]' is decoded to '[]' in strict mode - UTF-8 encoder is not strict, e.g. replace surrogates by U+FFFD - Use final in decode_mbcs_errors(): a multibyte character may be splitted between two chunks of INT_MAX bytes - Implement suggested Martin's optimizations?

Patch version 3:
 - add unit tests for code pages 932, 1252, CP_UTF7 and CP_UTF8
 - fix encode/decode flags for CP_UTF7/CP_UTF8
 - fix encode name on UnicodeDecodeError, support also "CP_UTF7" and "CP_UTF8" code page names

TODO:

 - The decoder (with errors) doesn't support multibyte characters, e.g. b"\xC3\xA9\xFF" is not correctly decoded using "replace" (insize is fixed to 1)
 - The encoder doesn't support surrogate pairs, but the result with UTF-8 looks correct
 - UTF-7 decoder is not strict, e.g. b'[+/]' is decoded to '[]' in strict mode
 - UTF-8 encoder is not strict, e.g. replace surrogates by U+FFFD
 - Use final in decode_mbcs_errors(): a multibyte character may be splitted between two chunks of INT_MAX bytes
 - Implement suggested Martin's optimizations?

History
Date	User	Action	Args
2011-06-13 13:52:56	vstinner	set	recipients: + vstinner, loewis, amaury.forgeotdarc, ocean-city
2011-06-13 13:52:56	vstinner	set	messageid: <1307973176.03.0.353214642985.issue12281@psf.upfronthosting.co.za>
2011-06-13 13:52:55	vstinner	link	issue12281 messages
2011-06-13 13:52:55	vstinner	create