This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author belopolsky
Recipients belopolsky, ezio.melotti, georg.brandl, lemburg, mrabarnett, pitrou
Date 2011-02-24.04:00:52
SpamBayes Score 0.00011236552
Marked as misclassified No
Message-id <1298520054.8.0.0591201159241.issue5902@psf.upfronthosting.co.za>
In-reply-to
Content
Ezio and I discussed on IRC the implementation of alias lookup and neither of us was able to point out to the function that strips non-alphanumeric characters from encoding names.

It turns out that there are three "normalize" functions that are successively applied to the encoding name during evaluation of str.encode/str.decode.

1. normalize_encoding() in unicodeobject.c
2. normalizestring() in codecs.c
3. normalize_encoding() in encodings/__init__.py

Each performs a slightly different transformation and only the last one strips non-alphanumeric characters.

The complexity of codec lookup is comparable with that of the import mechanism!
History
Date User Action Args
2011-02-24 04:00:54belopolskysetrecipients: + belopolsky, lemburg, georg.brandl, pitrou, ezio.melotti, mrabarnett
2011-02-24 04:00:54belopolskysetmessageid: <1298520054.8.0.0591201159241.issue5902@psf.upfronthosting.co.za>
2011-02-24 04:00:52belopolskylinkissue5902 messages
2011-02-24 04:00:52belopolskycreate