classification
Title: unicodeobject.c: use of uninitialized values
Type: behavior Stage: resolved
Components: Versions: Python 3.2, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: Arfrever, lemburg, serhiy.storchaka, skrah, vstinner
Priority: normal Keywords: patch

Created on 2010-07-13 09:24 by skrah, last changed 2013-01-08 22:22 by serhiy.storchaka. This issue is now closed.

Files
File name Uploaded Description Edit
uninitialized.py skrah, 2010-07-16 11:10
issue9242.patch serhiy.storchaka, 2013-01-06 20:27 review
Messages (8)
msg110165 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2010-07-13 09:24
Not sure if this is valid or if there is some internal trickery that Valgrind isn't aware of. If it's the latter, perhaps an entry to
Misc/valgrind.supp could be added.


test_bug1175396 (__main__.UTF32Test) ... ==26861== Conditional jump or move depends on uninitialised value(s)
==26861==    at 0x48A2DD: PyUnicodeUCS2_DecodeUTF32Stateful (unicodeobject.c:2282)
==26861==    by 0x50E25C: utf_32_le_decode (_codecsmodule.c:420)
==26861==    by 0x52E727: PyCFunction_Call (methodobject.c:81)
==26861==    by 0x4B4EB7: call_function (ceval.c:4012)
==26861==    by 0x4B1402: PyEval_EvalFrameEx (ceval.c:2665)
==26861==    by 0x4B31DA: PyEval_EvalCodeEx (ceval.c:3252)
==26861==    by 0x4B52C0: fast_function (ceval.c:4108)
==26861==    by 0x4B4FE1: call_function (ceval.c:4033)
==26861==    by 0x4B1402: PyEval_EvalFrameEx (ceval.c:2665)
==26861==    by 0x4B31DA: PyEval_EvalCodeEx (ceval.c:3252)
==26861==    by 0x4B52C0: fast_function (ceval.c:4108)
==26861==    by 0x4B4FE1: call_function (ceval.c:4033)
==26861== 
==26861== 
==26861== ---- Attach to debugger ? --- [Return/N/n/Y/y/C/c] ---- y
==26861== starting debugger with cmd: /usr/bin/gdb -nw /proc/26862/fd/1014 26862
GNU gdb 6.8-debian
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu"...
Attaching to program: /proc/26862/fd/1014, process 26862
0x000000000048a2dd in PyUnicodeUCS2_DecodeUTF32Stateful (s=0x5bd6a54 "\n", size=1, errors=0x5add144 "strict", byteorder=0x7feffb18c, 
    consumed=0x7feffb170) at Objects/unicodeobject.c:2282
2282            if (qq[iorder[2]] != 0 || qq[iorder[3]] != 0)
(gdb) p s
$1 = 0x5bd6a54 "\n"
(gdb) p qq
$2 = (const unsigned char *) 0x5bd6a54 "\n"
(gdb) p iorder[2]
$3 = 2
(gdb) p iorder[3]
$4 = 3
(gdb) p bo
$5 = -1
(gdb) p *byteorder
$6 = -1
msg110168 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2010-07-13 10:45
Stefan Krah wrote:
> 
> New submission from Stefan Krah <stefan-usenet@bytereef.org>:
> 
> Not sure if this is valid or if there is some internal trickery that Valgrind isn't aware of. If it's the latter, perhaps an entry to
> Misc/valgrind.supp could be added.
> 
> 
> test_bug1175396 (__main__.UTF32Test) ... ==26861== Conditional jump or move depends on uninitialised value(s)
> ==26861==    at 0x48A2DD: PyUnicodeUCS2_DecodeUTF32Stateful (unicodeobject.c:2282)
> ==26861==    by 0x50E25C: utf_32_le_decode (_codecsmodule.c:420)
> ==26861==    by 0x52E727: PyCFunction_Call (methodobject.c:81)
> ==26861==    by 0x4B4EB7: call_function (ceval.c:4012)
> ==26861==    by 0x4B1402: PyEval_EvalFrameEx (ceval.c:2665)
> ==26861==    by 0x4B31DA: PyEval_EvalCodeEx (ceval.c:3252)
> ==26861==    by 0x4B52C0: fast_function (ceval.c:4108)
> ==26861==    by 0x4B4FE1: call_function (ceval.c:4033)
> ==26861==    by 0x4B1402: PyEval_EvalFrameEx (ceval.c:2665)
> ==26861==    by 0x4B31DA: PyEval_EvalCodeEx (ceval.c:3252)
> ==26861==    by 0x4B52C0: fast_function (ceval.c:4108)
> ==26861==    by 0x4B4FE1: call_function (ceval.c:4033)
> ==26861== 
> ==26861== 
> ==26861== ---- Attach to debugger ? --- [Return/N/n/Y/y/C/c] ---- y
> ==26861== starting debugger with cmd: /usr/bin/gdb -nw /proc/26862/fd/1014 26862
> GNU gdb 6.8-debian
> Copyright (C) 2008 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "x86_64-linux-gnu"...
> Attaching to program: /proc/26862/fd/1014, process 26862
> 0x000000000048a2dd in PyUnicodeUCS2_DecodeUTF32Stateful (s=0x5bd6a54 "\n", size=1, errors=0x5add144 "strict", byteorder=0x7feffb18c, 
>     consumed=0x7feffb170) at Objects/unicodeobject.c:2282
> 2282            if (qq[iorder[2]] != 0 || qq[iorder[3]] != 0)
> (gdb) p s
> $1 = 0x5bd6a54 "\n"
> (gdb) p qq
> $2 = (const unsigned char *) 0x5bd6a54 "\n"
> (gdb) p iorder[2]
> $3 = 2
> (gdb) p iorder[3]
> $4 = 3
> (gdb) p bo
> $5 = -1
> (gdb) p *byteorder
> $6 = -1

Could you check whether the report goes away when using the following
definition of iorder in that function:

#ifdef BYTEORDER_IS_LITTLE_ENDIAN
    const int iorder[4] = {0, 1, 2, 3};
#else
    const int iorder[4] = {3, 2, 1, 0};
#endif

I guess that the missing dimension is causing the valgrind report.

From a code perspective, everything is in order, in fact I'd expect
the compiler to optimize the array away. It may even be a good idea
to replace the array with symbols:

#ifdef BYTEORDER_IS_LITTLE_ENDIAN
# define BYTEORDER_0 0
# define BYTEORDER_1 1
# define BYTEORDER_2 2
# define BYTEORDER_3 3
#else
# define BYTEORDER_0 3
# define BYTEORDER_1 2
# define BYTEORDER_2 1
# define BYTEORDER_3 0
#endif
msg110171 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2010-07-13 11:40
> const int iorder[4] = {0, 1, 2, 3};

const isn't possible, iorder is modified later on. Adding the array
dimension did not change anything.

Making everything const (see below) did not change anything either.
I presume that Valgrind regards qq[2] or qq[3] as uninitialized.


Index: Objects/unicodeobject.c
===================================================================
--- Objects/unicodeobject.c     (revision 82816)
+++ Objects/unicodeobject.c     (working copy)
@@ -2216,10 +2216,12 @@
     int bo = 0;       /* assume native ordering by default */
     const char *errmsg = "";
     /* Offsets from q for retrieving bytes in the right order. */
+    const int iorder_le[] = {0, 1, 2, 3};
+    const int iorder_be[] = {3, 2, 1, 0};
 #ifdef BYTEORDER_IS_LITTLE_ENDIAN
-    int iorder[] = {0, 1, 2, 3};
+    const int *iorder = iorder_le;
 #else
-    int iorder[] = {3, 2, 1, 0};
+    const int *iorder = iorder_be;
 #endif
     PyObject *errorHandler = NULL;
     PyObject *exc = NULL;
@@ -2262,17 +2264,11 @@
 
     if (bo == -1) {
         /* force LE */
-        iorder[0] = 0;
-        iorder[1] = 1;
-        iorder[2] = 2;
-        iorder[3] = 3;
+        iorder = iorder_le;
     }
     else if (bo == 1) {
         /* force BE */
-        iorder[0] = 3;
-        iorder[1] = 2;
-        iorder[2] = 1;
-        iorder[3] = 0;
+        iorder = iorder_be;
     }
msg110427 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2010-07-16 11:10
Here is a minimal example how to reproduce this issue, extracted from
UTF32LETest.


valgrind --db-attach=yes --suppressions=Misc/valgrind-python.supp ./python uninitialized.py


It seems that in Lib/codecs.py the equivalent of "\x00".decode('utf-32-le')
is called (line 477).

This leads to:

PyUnicodeUCS2_DecodeUTF32Stateful (s=0x5b0fc0c "", size=1, errors=0x5add144 "strict", byteorder=0x7fefff39c, 
    consumed=0x0)


So we have a string of size 1, but s[2] and s[3] will be accessed
in the function.
msg179219 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-01-06 20:27
3.2 is affected too. The PEP-393 changes fixes this bug in 3.3+.

Here is a simple patch.
msg179371 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-01-08 20:51
IĀ forgot mention the issue number in commit messages. See changeset3570e04f4ea9 and changesetbf347198fbaf.

Is the issue fixed now?
msg179372 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-01-08 20:51
changeset 3570e04f4ea9 and changeset bf347198fbaf
msg179376 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2013-01-08 22:01
The utf_32_le_decode invalid access is gone; for Valgrind the issue
is fixed (I didn't look at the patch, no time ATM, sorry).
History
Date User Action Args
2013-01-08 22:22:10serhiy.storchakasetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2013-01-08 22:01:25skrahsetmessages: + msg179376
2013-01-08 20:51:57serhiy.storchakasetmessages: + msg179372
2013-01-08 20:51:21serhiy.storchakasetmessages: + msg179371
2013-01-06 20:27:04serhiy.storchakasetfiles: + issue9242.patch

assignee: serhiy.storchaka
versions: + Python 3.2
keywords: + patch
nosy: + serhiy.storchaka

messages: + msg179219
stage: patch review
2013-01-04 23:40:25Arfreversetnosy: + Arfrever
2011-07-07 10:44:13vstinnersetnosy: + vstinner
2010-07-16 11:10:59skrahsetfiles: + uninitialized.py

messages: + msg110427
2010-07-13 11:40:08skrahsetmessages: + msg110171
2010-07-13 10:45:45lemburgsetmessages: + msg110168
2010-07-13 09:24:47skrahcreate