Message129248
Ezio and I discussed on IRC the implementation of alias lookup and neither of us was able to point out to the function that strips non-alphanumeric characters from encoding names.
It turns out that there are three "normalize" functions that are successively applied to the encoding name during evaluation of str.encode/str.decode.
1. normalize_encoding() in unicodeobject.c
2. normalizestring() in codecs.c
3. normalize_encoding() in encodings/__init__.py
Each performs a slightly different transformation and only the last one strips non-alphanumeric characters.
The complexity of codec lookup is comparable with that of the import mechanism! |
|
Date |
User |
Action |
Args |
2011-02-24 04:00:54 | belopolsky | set | recipients:
+ belopolsky, lemburg, georg.brandl, pitrou, ezio.melotti, mrabarnett |
2011-02-24 04:00:54 | belopolsky | set | messageid: <1298520054.8.0.0591201159241.issue5902@psf.upfronthosting.co.za> |
2011-02-24 04:00:52 | belopolsky | link | issue5902 messages |
2011-02-24 04:00:52 | belopolsky | create | |
|