Message129460
We should first implement the same algorithm of the 3 normalization functions and add tests for them (at least for the function in normalization):
- normalize_encoding() in encodings: it doesn't convert to lowercase and keep non-ASCII letters
- normalize_encoding() in unicodeobject.c
- normalizestring() in codecs.c
normalize_encoding() in encodings is more laxist than the two other functions: it normalizes " utf 8 " to 'utf_8'. But it doesn't convert to lowercase and keeps non-ASCII letters: "UTF-8é" is normalized "UTF_8é".
I don't know if the normalization functions have to be more or less strict, but I think that they should all give the same result. |
|
Date |
User |
Action |
Args |
2011-02-25 23:03:06 | vstinner | set | recipients:
+ vstinner, lemburg, belopolsky, ezio.melotti |
2011-02-25 23:03:06 | vstinner | set | messageid: <1298674986.91.0.66075330086.issue11322@psf.upfronthosting.co.za> |
2011-02-25 23:03:06 | vstinner | link | issue11322 messages |
2011-02-25 23:03:06 | vstinner | create | |
|