Message129463
STINNER Victor wrote:
>
> STINNER Victor <victor.stinner@haypocalc.com> added the comment:
>
> We should first implement the same algorithm of the 3 normalization functions and add tests for them (at least for the function in normalization):
>
> - normalize_encoding() in encodings: it doesn't convert to lowercase and keep non-ASCII letters
> - normalize_encoding() in unicodeobject.c
> - normalizestring() in codecs.c
>
> normalize_encoding() in encodings is more laxist than the two other functions: it normalizes " utf 8 " to 'utf_8'. But it doesn't convert to lowercase and keeps non-ASCII letters: "UTF-8é" is normalized "UTF_8é".
>
> I don't know if the normalization functions have to be more or less strict, but I think that they should all give the same result.
Please see this message for an explanation of why we have those
three functions, why they are different and what their application
space is:
http://bugs.python.org/issue5902#msg129257
This ticket is just about the encoding package's codec search
function, not the other two, and I don't want to change
semantics, just its performance. |
|
Date |
User |
Action |
Args |
2011-02-25 23:06:54 | lemburg | set | recipients:
+ lemburg, belopolsky, vstinner, ezio.melotti |
2011-02-25 23:06:49 | lemburg | link | issue11322 messages |
2011-02-25 23:06:49 | lemburg | create | |
|