Message 161041 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	amaury.forgeotdarc
Recipients	amaury.forgeotdarc, jaraco
Date	2012-05-18.08:44:26
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1337330667.3.0.1539999775.issue14847@psf.upfronthosting.co.za>
In-reply-to

Content
Reproducing the issue is not too hard, see the example below. Does your program play with sys.modules? import sys b'x'.decode('utf-8') import locale; del locale.encodings # Not necessary with python2 del sys.modules['encodings.utf_8'], sys.modules['encodings'] b'x'.decode('utf-8') If we want to make make codecs more robust against sys.modules manipulation I can see several paths: #1 Somehow clear interp->codec_search_cache (used in Python/codecs.c) when the encodings module is cleared (by using weak references?) #2 Make sure that functions returned in CodecInfo objects don't rely on global module state. For example in utf_8.py: def decode(input, errors='strict', _codecs=codecs): return _codecs.utf_8_decode(input, errors, True) #3 Capture utf_8.globals() in the CodecInfo, and run decode() with these captured globals. #4 Get rid of module.__del__ clearing the module globals, and rely on the cyclic garbage collector to clear modules at interpreter shutdown. Item #2 is the easiest one, but must be implemented in each codec. We could fix the most important ones though. Item #4 is the most far-reaching one, and would probably be an improvement to other parts of Python...

Reproducing the issue is not too hard, see the example below. Does your program play with sys.modules?

import sys
b'x'.decode('utf-8')
import locale; del locale.encodings   # Not necessary with python2
del sys.modules['encodings.utf_8'], sys.modules['encodings']
b'x'.decode('utf-8')

If we want to make make codecs more robust against sys.modules manipulation I can see several paths:

#1 Somehow clear interp->codec_search_cache (used in Python/codecs.c) when the encodings module is cleared (by using weak references?)

#2 Make sure that functions returned in CodecInfo objects don't rely on global module state. For example in utf_8.py:
    def decode(input, errors='strict', _codecs=codecs):
        return _codecs.utf_8_decode(input, errors, True)

#3 Capture utf_8.globals() in the CodecInfo, and run decode() with these captured globals.

#4 Get rid of module.__del__ clearing the module globals, and rely on the cyclic garbage collector to clear modules at interpreter shutdown.

Item #2 is the easiest one, but must be implemented in each codec. We could fix the most important ones though.
Item #4 is the most far-reaching one, and would probably be an improvement to other parts of Python...

History
Date	User	Action	Args
2012-05-18 08:44:27	amaury.forgeotdarc	set	recipients: + amaury.forgeotdarc, jaraco
2012-05-18 08:44:27	amaury.forgeotdarc	set	messageid: <1337330667.3.0.1539999775.issue14847@psf.upfronthosting.co.za>
2012-05-18 08:44:26	amaury.forgeotdarc	link	issue14847 messages
2012-05-18 08:44:26	amaury.forgeotdarc	create