This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author lemburg
Recipients lemburg
Date 2011-02-25.15:55:31
SpamBayes Score 2.214895e-14
Marked as misclassified No
Message-id <1298649332.11.0.637901441206.issue11322@psf.upfronthosting.co.za>
In-reply-to
Content
I don't know who changed the encoding's package normalize_encoding() function (wasn't me), but it's a really slow implementation.

The original version used the .translate() method which is a lot faster and can be adapted to work with the Unicode variant of the .translate() method just as well.

_norm_encoding_map = ('                                              . '
                      '0123456789       ABCDEFGHIJKLMNOPQRSTUVWXYZ     '
                      ' abcdefghijklmnopqrstuvwxyz                     '
                      '                                                '
                      '                                                '
                      '                ')

def normalize_encoding(encoding):

    """ Normalize an encoding name.

        Normalization works as follows: all non-alphanumeric
        characters except the dot used for Python package names are
        collapsed and replaced with a single underscore, e.g. '  -;#'
        becomes '_'. Leading and trailing underscores are removed.

        Note that encoding names should be ASCII only; if they do use
        non-ASCII characters, these must be Latin-1 compatible.

    """
    # Make sure we have an 8-bit string, because .translate() works
    # differently for Unicode strings.
    if hasattr(__builtin__, "unicode") and isinstance(encoding, unicode):
        # Note that .encode('latin-1') does *not* use the codec
        # registry, so this call doesn't recurse. (See unicodeobject.c
        # PyUnicode_AsEncodedString() for details)
        encoding = encoding.encode('latin-1')
    return '_'.join(encoding.translate(_norm_encoding_map).split())
History
Date User Action Args
2011-02-25 15:55:32lemburgsetrecipients: + lemburg
2011-02-25 15:55:32lemburgsetmessageid: <1298649332.11.0.637901441206.issue11322@psf.upfronthosting.co.za>
2011-02-25 15:55:31lemburglinkissue11322 messages
2011-02-25 15:55:31lemburgcreate