Created on 2012-05-18 03:22 by jason.coombs, last changed 2014-10-16 20:20 by zzzeek.
|wrapper.py||jason.coombs, 2012-06-06 18:03|
|wrapper.py||jason.coombs, 2012-08-30 21:57||Wrapper as used Jun-Aug|
|msg161032 - (view)||Author: Jason R. Coombs (jason.coombs) *||Date: 2012-05-18 03:22|
I have run into an issue where an attempt to call .decode('utf-8') on a Python string results in the error with the following traceback: File ... ''.decode('utf-8') File "env/lib/python2.6/encodings/utf_8.py", line 16, in decode return codecs.utf_8_decode(input, errors, True) AttributeError: 'NoneType' object has no attribute 'utf_8_decode' I've noticed this when running our applications. I've also encountered it when installing packages using distribute. With a sufficiently-complicated tree of packages to install, distribute 0.6.26 will fail with the error above at https://bitbucket.org/tarek/distribute/src/0a45ae3390cd/setuptools/command/easy_install.py#cl-745 . Unfortunately, the only case where I've been able to reliably reproduce this behavior is with private packages in a very complex arrangement. I tried but was not able to create a small script to reproduce the issue. I see this bug was observed in issue6551, but that only a workaround was applied to avoid the symptom, but the underlying cause was never discovered. Furthermore, I found that I could sometimes reproduce the failure in one line of code, but not reproduce it with the same invocation one line prior, with no substantial logic in between. In other words, it's not even possible to pinpoint the cause because whatever is causing the utf_8 module to become finalize is not coincident with where the failures occur. I'm hesitant to file a bug with Python because the core Python is not necessarily implicated, but because this problem emerged in the Python core project test suite, I'm inclined to think the issue does lie with Python itself. Furthermore, it should be very difficult for a Python program to get into a situation where ''.decode('utf-8') fails with an AttributeError. At this point, I could use some help. Is it possible to detect when a module (utf_8 in this case) is finalized? I'm realizing now that running python with -v might provide some insight, so I'll try that.
|msg161041 - (view)||Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) *||Date: 2012-05-18 08:44|
Reproducing the issue is not too hard, see the example below. Does your program play with sys.modules? import sys b'x'.decode('utf-8') import locale; del locale.encodings # Not necessary with python2 del sys.modules['encodings.utf_8'], sys.modules['encodings'] b'x'.decode('utf-8') If we want to make make codecs more robust against sys.modules manipulation I can see several paths: #1 Somehow clear interp->codec_search_cache (used in Python/codecs.c) when the encodings module is cleared (by using weak references?) #2 Make sure that functions returned in CodecInfo objects don't rely on global module state. For example in utf_8.py: def decode(input, errors='strict', _codecs=codecs): return _codecs.utf_8_decode(input, errors, True) #3 Capture utf_8.globals() in the CodecInfo, and run decode() with these captured globals. #4 Get rid of module.__del__ clearing the module globals, and rely on the cyclic garbage collector to clear modules at interpreter shutdown. Item #2 is the easiest one, but must be implemented in each codec. We could fix the most important ones though. Item #4 is the most far-reaching one, and would probably be an improvement to other parts of Python...
|msg161069 - (view)||Author: Jason R. Coombs (jason.coombs) *||Date: 2012-05-18 18:00|
Thanks for the tip Amaury. Following that lead, I see that distribute does indeed have a sandbox module which attempts to sandbox sys.modules, which can break the encodings modules. I'm going to address the issue in distribute first, but I like the proposals you've put forth. I like the fourth one in particular, because it's always bugged me when I see other modules implementing #2, which is a workaround for the broader problem. Is #4 the kind of issue that would require a PEP?
|msg161115 - (view)||Author: Daniel Swanson (weirdink13)||Date: 2012-05-19 13:56|
I attempted to reproduce the error. I didn't, all I got was 'str' object has no attribute 'decode' here is the whole test. Python 3.2.2 (default, Sep 4 2011, 09:51:08) [MSC v.1500 32 bit (Intel)] on win32 Type "copyright", "credits" or "license()" for more information. >>> b''.decode('utf-8') '' >>> ''.decode('utf-8') Traceback (most recent call last): File "<pyshell#1>", line 1, in <module> ''.decode('utf-8') AttributeError: 'str' object has no attribute 'decode' >>> b'x'.decode('utf-8') 'x' >>> Appearently, this error does not apply to Python 3.2.2.
|msg161210 - (view)||Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) *||Date: 2012-05-20 14:40|
Daniel, please try my snippets instead. Of course in Python3 str is the unicode string and has no decode() method...
|msg162415 - (view)||Author: Jason R. Coombs (jason.coombs) *||Date: 2012-06-06 16:53|
I find that on Python 2.7.3 64-bit Windows, the deletion of locale.encodings is also necessary: PS C:\Users\jaraco> python Python 2.7.3 (default, Apr 10 2012, 23:24:47) [MSC v.1500 64 bit (AMD64)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> import sys >>> b'x'.decode('utf-8') u'x' >>> del sys.modules['encodings.utf_8'], sys.modules['encodings'] >>> b'x'.decode('utf-8') u'x' >>> ^Z PS C:\Users\jaraco> python Python 2.7.3 (default, Apr 10 2012, 23:24:47) [MSC v.1500 64 bit (AMD64)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> import sys >>> b'x'.decode('utf-8') u'x' >>> import locale; del locale.encodings # Not necessary with python2 >>> del sys.modules['encodings.utf_8'], sys.modules['encodings'] >>> b'x'.decode('utf-8') Traceback (most recent call last): File "<stdin>", line 1, in <module> File "c:\python\lib\encodings\utf_8.py", line 16, in decode return codecs.utf_8_decode(input, errors, True) AttributeError: 'NoneType' object has no attribute 'utf_8_decode'
|msg162418 - (view)||Author: Jason R. Coombs (jason.coombs) *||Date: 2012-06-06 18:03|
We're encountering this issue in another application of ours. A scan of the code doesn't reveal any places where the encodings.* modules are removed, so I've created a wrapper I intend to apply to our application that might help us detect where the module is being deleted. I'm attaching it as 'wrapper.py'. It appears to work for the nominal case. I'll report back if it works (or doesn't) for a real-world case.
|msg169494 - (view)||Author: Jason R. Coombs (jason.coombs) *||Date: 2012-08-30 21:57|
Since my last comment, we've been running with a version of the wrapper (attached) around the main entry point to our application and it has completely eliminated the error. However, while the wrapper does report when a module deletion is requested, it is never triggered (we don't see the word "deletion" anywhere in our logs). This result leads me to two findings: 1. Merely making a reference to sys.modules was sufficient to prevent the error. 2. Preventing deletion of 'encodings.*' modules is not sufficient (as apparently this doesn't happen in our code). My next step is going to be to remove the wrapper, so the error exhibits itself again and we can prove that the issue is still present, and then to re-apply the wrapper, but this time only creating a reference to sys.modules (not actually replacing it).
|msg169867 - (view)||Author: Jason R. Coombs (jason.coombs) *||Date: 2012-09-05 12:03|
I've removed the invocation of the wrapper code in our project, but the issue has not exhibited itself. This leads me to suspect a few possibilities of things that have changed since we had the issue in June: 1) Distribute was updated with its fix for this issue. I don't expect that code was being called in our application, but I didn't strictly rule it out. 2) The application launcher was changed. The old system was fork-based and used pkg_resources and multi-installed packages. The modern deployment is a simple, pip-installed process. 3) The code underwent other unrelated but substantial changes. So although I thought I was on to something when I added the wrapper, suppressing the error, I was never able to detect it, and it seems to have gone away now. I was hoping this would shed more light on the problem and describe another use case, but at this point, that may not happen. Unless the problem recurs in our application, or we have another application where the problem arises, I'll focus on the Python side as suggested by Amaury.
|2012-09-05 12:03:44||jason.coombs||set||messages: + msg169867|
messages: + msg169494
messages: + msg162418
|2012-06-06 16:53:18||jason.coombs||set||messages: + msg162415|
|2012-05-20 14:40:44||amaury.forgeotdarc||set||messages: + msg161210|
messages: + msg161115
|2012-05-18 18:00:05||jason.coombs||set||messages: + msg161069|
messages: + msg161041