classification
Title: Decoding UTF-7 with "ignore warnings" crashes Python on Windows Vista
Type: crash Stage:
Components: Interpreter Core Versions: Python 2.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: amaury.forgeotdarc, cpalmer, ocean-city, pitrou
Priority: critical Keywords: patch

Created on 2008-03-06 02:31 by cpalmer, last changed 2008-07-25 21:21 by pitrou. This issue is now closed.

Files
File name Uploaded Description Edit
2242.patch pitrou, 2008-07-25 12:42
Messages (14)
msg63303 - (view) Author: Chris Palmer (cpalmer) Date: 2008-03-06 02:30
When decoding some data as UTF-7 with the optional "ignore" argument,
Python (I am using 2.5.2) crashes. This happens only on Windows Vista (I
also tried Py 2.5.1 on Windows XP, Ubuntu 7, and FreeBSD 6). To
reproduce, set WinDbg as your post-mortem debugger and run this code:

    import os
    while True:
        a = os.urandom(16).decode("utf7", "ignore")

In WinDbg, you will see that Python died in isalnum with a bad pointer
dereference:

(f64.13b0): Access violation - code c0000005 (!!! second chance !!!)
eax=7c39a550 ebx=018e6837 ecx=0000ffe3 edx=00000003 esi=018edd66
edi=0000ffe3
eip=7c373977 esp=0021fc40 ebp=0000ffe3 iopl=0         nv up ei pl zr na
pe nc
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000            
efl=00010246
*** ERROR: Symbol file could not be found.  Defaulted to export symbols
for C:\Windows\system32\MSVCR71.dll -
MSVCR71!isalnum+0x35:
7c373977 0fb70448        movzx   eax,word ptr [eax+ecx*2]
ds:0023:7c3ba516=????
0:000> kb
ChildEBP RetAddr  Args to Child              
WARNING: Stack unwind information not available. Following frames may be
wrong.
0021fc3c 1e0dd81e 0000ffe3 00ff1030 0000012e MSVCR71!isalnum+0x35
00000000 00000000 00000000 00000000 00000000
python25!PyUnicode_DecodeUTF7+0x10e

It seems that a sanity check present in other Windows versions is
missing in Vista. The simplest possible test program:

#include "stdafx.h"
#include <ctype.h>

int _tmain(int argc, _TCHAR* argv[])
{
    isalnum(0xff8b);
    return 0;
}

causes Visual Studio 2005 to raise a debug assertion failure warning. I
guess that the assert is missing in the release build, and Python can be
tricked into providing the unsafe input to isalnum.
msg63308 - (view) Author: Hirokazu Yamamoto (ocean-city) * (Python committer) Date: 2008-03-06 07:34
I reproduced this bug with VC6 + Win2000SP4 + following code.

'+\xc1'.decode("utf7", "ignore")

and this simple patch prevented crash.

Index: Objects/unicodeobject.c
===================================================================
--- Objects/unicodeobject.c	(revision 61262)
+++ Objects/unicodeobject.c	(working copy)
@@ -1506,7 +1506,7 @@
     e = s + size;
 
     while (s < e) {
-        Py_UNICODE ch;
+        char ch;
         restart:
         ch = *s;

Probably this is due to integer conversion, but I didn't look at logic
so much.
msg63309 - (view) Author: Hirokazu Yamamoto (ocean-city) * (Python committer) Date: 2008-03-06 07:36
One more thing. "ignore" is not needed.

'+\xc1'.decode("utf7")

crashed my interpreter.
msg63328 - (view) Author: Chris Palmer (cpalmer) Date: 2008-03-06 18:29
You could also fix the problem by using iswalnum function instead of
isalnum. Sorry I didn't mention this in the original report.

http://msdn2.microsoft.com/en-us/library/k84c0490(VS.71).aspx
msg70246 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008-07-25 10:42
Hirokazu, does replacing the following line (rather than changing the
type of the `ch` variable):
         ch = *s;
with
         ch = (unsigned char) *s;

fix the crash as well?
msg70247 - (view) Author: Hirokazu Yamamoto (ocean-city) * (Python committer) Date: 2008-07-25 11:10
With this patch? Yes, it fixed crash.

Index: Objects/unicodeobject.c
===================================================================
--- Objects/unicodeobject.c	(revision 65223)
+++ Objects/unicodeobject.c	(working copy)
@@ -1523,7 +1523,7 @@
     while (s < e) {
         Py_UNICODE ch;
         restart:
-        ch = *s;
+        ch = (unsigned char)*s;
 
         if (inShift) {
             if ((ch == '-') || !B64CHAR(ch)) {


>>> '+\xc1'.decode("utf7")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "e:\python-dev\trunk\lib\encodings\utf_7.py", line 12, in decode
    return codecs.utf_7_decode(input, errors, True)
UnicodeDecodeError: 'utf7' codec can't decode bytes in position 0-1:
unexpected

# But I don't know whether this behavior is right or not....

I confirmed test_unicode, test_codecs, test_codeccallbacks passed.
msg70249 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2008-07-25 11:42
VS8 and VS9 are immune to the crash, even if the exception message
differ between release and debug builds.

VC6 crashes, and the proposed patch fixes the problem there as well.
msg70250 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008-07-25 12:11
Selon Hirokazu Yamamoto <report@bugs.python.org>:
>
> With this patch? Yes, it fixed crash.

Thanks!

> # But I don't know whether this behavior is right or not....

As the name implies, utf7 is a 7-bit coding of Unicode... bytes >= 0x80 must
raise an exception. The error message could be better though.
msg70252 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008-07-25 12:42
This patch also has a test in it.
msg70263 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008-07-25 17:52
Should be fixed in r65227. Please reopen if there's still a problem.
msg70264 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008-07-25 17:58
On second thought, perhaps it should also be backported to 2.5, so I'm
leaving the bug open.
msg70269 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008-07-25 19:03
I've committed the fix for 2.5 in r65234, can somebody try it out with
the failing MSVC version?
msg70279 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2008-07-25 21:19
I confirm that r65234 for 2.5 corrects the crash.
(Windows XP, Visual Studio 6)
msg70281 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008-07-25 21:21
Thanks Amaury!
History
Date User Action Args
2008-07-25 21:21:36pitrousetstatus: open -> closed
resolution: accepted -> fixed
messages: + msg70281
2008-07-25 21:19:23amaury.forgeotdarcsetmessages: + msg70279
2008-07-25 19:03:56pitrousetmessages: + msg70269
2008-07-25 18:06:06pitrousetresolution: accepted
versions: - Python 2.6, Python 3.0
2008-07-25 17:58:47pitrousetmessages: + msg70264
2008-07-25 17:52:15pitrousetmessages: + msg70263
2008-07-25 12:42:52pitrousetfiles: + 2242.patch
messages: + msg70252
2008-07-25 12:11:37pitrousetmessages: + msg70250
2008-07-25 11:42:59amaury.forgeotdarcsetnosy: + amaury.forgeotdarc
messages: + msg70249
2008-07-25 11:10:20ocean-citysetmessages: + msg70247
2008-07-25 10:42:57pitrousetkeywords: + patch
nosy: + pitrou
messages: + msg70246
2008-03-06 18:29:27cpalmersetmessages: + msg63328
2008-03-06 07:37:29georg.brandlsetpriority: critical
severity: normal -> urgent
versions: + Python 2.6, Python 3.0
2008-03-06 07:36:10ocean-citysetmessages: + msg63309
2008-03-06 07:34:14ocean-citysetnosy: + ocean-city
messages: + msg63308
2008-03-06 02:31:02cpalmercreate