This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author vstinner
Recipients belopolsky, ezio.melotti, jcea, lemburg, sdaoden, serhiy.storchaka, vstinner
Date 2016-12-15.09:53:01
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1481795581.59.0.160043064791.issue11322@psf.upfronthosting.co.za>
In-reply-to
Content
It seems like encodings.normalize_encoding() currently has no unit test! Before modifying it, I would prefer to see a few unit tests:

* " utf 8 "
* "UtF 8"
* "utf8\xE9"
* etc.

Since we are talking about an optimmization, I would like to see a benchmark result before/after. I also would like to test Marc-Andre's idea of exposing the C function _Py_normalize_encoding().

_Py_normalize_encoding() works on a byte string encoded to Latin1. To implement encodings.normalize_encoding(), we might rewrite the function to work on Py_UCS4 character, or have a fast version on char*, and a more generic version for UCS2 and UCS4?
History
Date User Action Args
2016-12-15 09:53:01vstinnersetrecipients: + vstinner, lemburg, jcea, belopolsky, ezio.melotti, sdaoden, serhiy.storchaka
2016-12-15 09:53:01vstinnersetmessageid: <1481795581.59.0.160043064791.issue11322@psf.upfronthosting.co.za>
2016-12-15 09:53:01vstinnerlinkissue11322 messages
2016-12-15 09:53:01vstinnercreate