This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author lemburg
Recipients belopolsky, ezio.melotti, lemburg, vstinner
Date 2011-02-25.23:06:49
SpamBayes Score 4.0463715e-06
Marked as misclassified No
Message-id <4D683608.1070702@egenix.com>
In-reply-to <1298674986.91.0.66075330086.issue11322@psf.upfronthosting.co.za>
Content
STINNER Victor wrote:
> 
> STINNER Victor <victor.stinner@haypocalc.com> added the comment:
> 
> We should first implement the same algorithm of the 3 normalization functions and add tests for them (at least for the function in normalization):
> 
>  - normalize_encoding() in encodings: it doesn't convert to lowercase and keep non-ASCII letters
>  - normalize_encoding() in unicodeobject.c
>  - normalizestring() in codecs.c
> 
> normalize_encoding() in encodings is more laxist than the two other functions: it normalizes "  utf   8  " to 'utf_8'. But it doesn't convert to lowercase and keeps non-ASCII letters: "UTF-8é" is normalized "UTF_8é".
> 
> I don't know if the normalization functions have to be more or less strict, but I think that they should all give the same result.

Please see this message for an explanation of why we have those
three functions, why they are different and what their application
space is:

http://bugs.python.org/issue5902#msg129257

This ticket is just about the encoding package's codec search
function, not the other two, and I don't want to change
semantics, just its performance.
History
Date User Action Args
2011-02-25 23:06:54lemburgsetrecipients: + lemburg, belopolsky, vstinner, ezio.melotti
2011-02-25 23:06:49lemburglinkissue11322 messages
2011-02-25 23:06:49lemburgcreate