Message161041
Reproducing the issue is not too hard, see the example below. Does your program play with sys.modules?
import sys
b'x'.decode('utf-8')
import locale; del locale.encodings # Not necessary with python2
del sys.modules['encodings.utf_8'], sys.modules['encodings']
b'x'.decode('utf-8')
If we want to make make codecs more robust against sys.modules manipulation I can see several paths:
#1 Somehow clear interp->codec_search_cache (used in Python/codecs.c) when the encodings module is cleared (by using weak references?)
#2 Make sure that functions returned in CodecInfo objects don't rely on global module state. For example in utf_8.py:
def decode(input, errors='strict', _codecs=codecs):
return _codecs.utf_8_decode(input, errors, True)
#3 Capture utf_8.globals() in the CodecInfo, and run decode() with these captured globals.
#4 Get rid of module.__del__ clearing the module globals, and rely on the cyclic garbage collector to clear modules at interpreter shutdown.
Item #2 is the easiest one, but must be implemented in each codec. We could fix the most important ones though.
Item #4 is the most far-reaching one, and would probably be an improvement to other parts of Python... |
|
Date |
User |
Action |
Args |
2012-05-18 08:44:27 | amaury.forgeotdarc | set | recipients:
+ amaury.forgeotdarc, jaraco |
2012-05-18 08:44:27 | amaury.forgeotdarc | set | messageid: <1337330667.3.0.1539999775.issue14847@psf.upfronthosting.co.za> |
2012-05-18 08:44:26 | amaury.forgeotdarc | link | issue14847 messages |
2012-05-18 08:44:26 | amaury.forgeotdarc | create | |
|