Message 364180 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	shihai1991
Recipients	lemburg, serhiy.storchaka, shihai1991, vstinner
Date	2020-03-14.15:02:55
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1584198175.54.0.0448183732491.issue39337@roundup.psfhosted.org>
In-reply-to

Content
> How about calling `encodings.normalize_encoding() in codecs.normalizestring()` to keep same behavior?(I create PR18845) I have try this idea, but it make the testcase of test_io.py failed because some object will call `codecs.Lookup()` in `__del__()`.-->extension module will be cleaned before calling `__del__().` > I would prefer that codecs.lookup() and encodings.normalize_encoding() behave the same. Either always ignore or always copy. I try to add a `_Py_normalize_unicode_encoding()` in unicodeobject.c to support non-ASCII encoding names' normalization(PR18987), but this PR caused many testcases failed. For example: In master: python3.9 -c "print('a\xac\u1234\u20ac\u8000\U0010ffff'.encode('iso-8859-15', 'namereplace'))" result: b'a\xac\\N{ETHIOPIC SYLLABLE SEE}\xa4\\N{CJK UNIFIED IDEOGRAPH-8000}\\U0010ffff' after PR18987: ./python -c "print('a\xac\u1234\u20ac\u8000\U0010ffff'.encode('iso-8859-15', 'namereplace'))" result: b'a\xac\\N{ETHIOPIC SYLLABLE SEE}\\N{EURO SIGN}\\N{CJK UNIFIED IDEOGRAPH-8000}\\U0010ffff'

> How about calling `encodings.normalize_encoding() in codecs.normalizestring()` to keep same behavior?(I create PR18845)

I have try this idea, but it make the testcase of test_io.py failed because some object will call `codecs.Lookup()` in `__del__()`.-->extension module will be cleaned before calling `__del__().`

> I would prefer that codecs.lookup() and encodings.normalize_encoding() behave the same. Either always ignore or always copy.

I try to add a `_Py_normalize_unicode_encoding()` in unicodeobject.c to support non-ASCII encoding names' normalization(PR18987), but this PR caused many testcases failed.

For example:

In master:
python3.9 -c "print('a\xac\u1234\u20ac\u8000\U0010ffff'.encode('iso-8859-15', 'namereplace'))"
result:
b'a\xac\\N{ETHIOPIC SYLLABLE SEE}\xa4\\N{CJK UNIFIED IDEOGRAPH-8000}\\U0010ffff'

after PR18987:
./python -c "print('a\xac\u1234\u20ac\u8000\U0010ffff'.encode('iso-8859-15', 'namereplace'))"
result:
b'a\xac\\N{ETHIOPIC SYLLABLE SEE}\\N{EURO SIGN}\\N{CJK UNIFIED IDEOGRAPH-8000}\\U0010ffff'

History
Date	User	Action	Args
2020-03-14 15:02:55	shihai1991	set	recipients: + shihai1991, lemburg, vstinner, serhiy.storchaka
2020-03-14 15:02:55	shihai1991	set	messageid: <1584198175.54.0.0448183732491.issue39337@roundup.psfhosted.org>
2020-03-14 15:02:55	shihai1991	link	issue39337 messages
2020-03-14 15:02:55	shihai1991	create