Message281869
The codec code has a few (performance) issues:
* nonspacing_diacritical_marks should be a set for fast lookup
* ord(c) in range(0x00, 0xA0) should be rewritten using < and >=
* result += bytes([ord(c)]) has exponential timing (it copies
the whole bytes string for every single operation); better
use a bytearray and convert this to bytes in one final step
* the error messages should include more useful information
about the cause and location of the error, instead of just
UnicodeError("Unacceptable unicode character") and
raise KeyError
Please also check whether it's not possible to reuse the charmap codec
functions we have. Thanks. |
|
Date |
User |
Action |
Args |
2016-11-28 12:40:36 | lemburg | set | recipients:
+ lemburg, loewis, vstinner, serhiy.storchaka, xiang.zhang, John Helour, mdk |
2016-11-28 12:40:36 | lemburg | set | messageid: <1480336836.1.0.893057658555.issue24339@psf.upfronthosting.co.za> |
2016-11-28 12:40:36 | lemburg | link | issue24339 messages |
2016-11-28 12:40:36 | lemburg | create | |
|