Message129386
I don't know who changed the encoding's package normalize_encoding() function (wasn't me), but it's a really slow implementation.
The original version used the .translate() method which is a lot faster and can be adapted to work with the Unicode variant of the .translate() method just as well.
_norm_encoding_map = (' . '
'0123456789 ABCDEFGHIJKLMNOPQRSTUVWXYZ '
' abcdefghijklmnopqrstuvwxyz '
' '
' '
' ')
def normalize_encoding(encoding):
""" Normalize an encoding name.
Normalization works as follows: all non-alphanumeric
characters except the dot used for Python package names are
collapsed and replaced with a single underscore, e.g. ' -;#'
becomes '_'. Leading and trailing underscores are removed.
Note that encoding names should be ASCII only; if they do use
non-ASCII characters, these must be Latin-1 compatible.
"""
# Make sure we have an 8-bit string, because .translate() works
# differently for Unicode strings.
if hasattr(__builtin__, "unicode") and isinstance(encoding, unicode):
# Note that .encode('latin-1') does *not* use the codec
# registry, so this call doesn't recurse. (See unicodeobject.c
# PyUnicode_AsEncodedString() for details)
encoding = encoding.encode('latin-1')
return '_'.join(encoding.translate(_norm_encoding_map).split()) |
|
Date |
User |
Action |
Args |
2011-02-25 15:55:32 | lemburg | set | recipients:
+ lemburg |
2011-02-25 15:55:32 | lemburg | set | messageid: <1298649332.11.0.637901441206.issue11322@psf.upfronthosting.co.za> |
2011-02-25 15:55:31 | lemburg | link | issue11322 messages |
2011-02-25 15:55:31 | lemburg | create | |
|