This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author amaury.forgeotdarc
Recipients amaury.forgeotdarc, jaraco
Date 2012-05-18.08:44:26
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1337330667.3.0.1539999775.issue14847@psf.upfronthosting.co.za>
In-reply-to
Content
Reproducing the issue is not too hard, see the example below. Does your program play with sys.modules?

import sys
b'x'.decode('utf-8')
import locale; del locale.encodings   # Not necessary with python2
del sys.modules['encodings.utf_8'], sys.modules['encodings']
b'x'.decode('utf-8')

If we want to make make codecs more robust against sys.modules manipulation I can see several paths:

#1 Somehow clear interp->codec_search_cache (used in Python/codecs.c) when the encodings module is cleared (by using weak references?)

#2 Make sure that functions returned in CodecInfo objects don't rely on global module state. For example in utf_8.py:
    def decode(input, errors='strict', _codecs=codecs):
        return _codecs.utf_8_decode(input, errors, True)

#3 Capture utf_8.globals() in the CodecInfo, and run decode() with these captured globals.

#4 Get rid of module.__del__ clearing the module globals, and rely on the cyclic garbage collector to clear modules at interpreter shutdown.

Item #2 is the easiest one, but must be implemented in each codec. We could fix the most important ones though.
Item #4 is the most far-reaching one, and would probably be an improvement to other parts of Python...
History
Date User Action Args
2012-05-18 08:44:27amaury.forgeotdarcsetrecipients: + amaury.forgeotdarc, jaraco
2012-05-18 08:44:27amaury.forgeotdarcsetmessageid: <1337330667.3.0.1539999775.issue14847@psf.upfronthosting.co.za>
2012-05-18 08:44:26amaury.forgeotdarclinkissue14847 messages
2012-05-18 08:44:26amaury.forgeotdarccreate