New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Decoding UTF-7 with "ignore warnings" crashes Python on Windows Vista #46495
Comments
When decoding some data as UTF-7 with the optional "ignore" argument, import os
while True:
a = os.urandom(16).decode("utf7", "ignore") In WinDbg, you will see that Python died in isalnum with a bad pointer (f64.13b0): Access violation - code c0000005 (!!! second chance !!!) It seems that a sanity check present in other Windows versions is #include "stdafx.h"
#include <ctype.h>
int _tmain(int argc, _TCHAR* argv[])
{
isalnum(0xff8b);
return 0;
} causes Visual Studio 2005 to raise a debug assertion failure warning. I |
I reproduced this bug with VC6 + Win2000SP4 + following code. '+\xc1'.decode("utf7", "ignore") and this simple patch prevented crash. Index: Objects/unicodeobject.c --- Objects/unicodeobject.c (revision 61262)
+++ Objects/unicodeobject.c (working copy)
@@ -1506,7 +1506,7 @@
e = s + size;
while (s < e) {
- Py_UNICODE ch;
+ char ch;
restart:
ch = *s; Probably this is due to integer conversion, but I didn't look at logic |
One more thing. "ignore" is not needed. '+\xc1'.decode("utf7") crashed my interpreter. |
You could also fix the problem by using iswalnum function instead of http://msdn2.microsoft.com/en-us/library/k84c0490(VS.71).aspx |
Hirokazu, does replacing the following line (rather than changing the fix the crash as well? |
With this patch? Yes, it fixed crash. Index: Objects/unicodeobject.c --- Objects/unicodeobject.c (revision 65223)
+++ Objects/unicodeobject.c (working copy)
@@ -1523,7 +1523,7 @@
while (s < e) {
Py_UNICODE ch;
restart:
- ch = *s;
+ ch = (unsigned char)*s;
if (inShift) {
if ((ch == '-') || !B64CHAR(ch)) { >>> '+\xc1'.decode("utf7")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "e:\python-dev\trunk\lib\encodings\utf_7.py", line 12, in decode
return codecs.utf_7_decode(input, errors, True)
UnicodeDecodeError: 'utf7' codec can't decode bytes in position 0-1:
unexpected # But I don't know whether this behavior is right or not.... I confirmed test_unicode, test_codecs, test_codeccallbacks passed. |
VS8 and VS9 are immune to the crash, even if the exception message VC6 crashes, and the proposed patch fixes the problem there as well. |
Selon Hirokazu Yamamoto <report@bugs.python.org>:
Thanks!
As the name implies, utf7 is a 7-bit coding of Unicode... bytes >= 0x80 must |
This patch also has a test in it. |
Should be fixed in r65227. Please reopen if there's still a problem. |
On second thought, perhaps it should also be backported to 2.5, so I'm |
I've committed the fix for 2.5 in r65234, can somebody try it out with |
I confirm that r65234 for 2.5 corrects the crash. |
Thanks Amaury! |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: