classification
Title: Valgrind warning in PyUnicode_Decode: false alarm with GCC builtin strcmp()
Type: Stage: resolved
Components: Interpreter Core Versions: Python 3.9, Python 3.8, Python 3.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: miss-islington, vstinner
Priority: normal Keywords: patch

Created on 2019-09-11 15:28 by vstinner, last changed 2019-10-08 13:58 by vstinner. This issue is now closed.

Files
File name Uploaded Description Edit
valgrind_strcmp_warn.c vstinner, 2019-10-08 10:46
Pull Requests
URL Status Linked Edit
PR 16651 merged vstinner, 2019-10-08 13:08
PR 16652 merged miss-islington, 2019-10-08 13:42
PR 16653 closed miss-islington, 2019-10-08 13:42
PR 16654 merged miss-islington, 2019-10-08 13:50
Messages (10)
msg351939 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-09-11 15:28
Running tokenize.py in Valgrind emits 2 warnings:

$ ./configure --with-valgrind
$ make
$ echo|PYTHONMALLOC=malloc valgrind ./python Lib/tokenize.py

(A)

==30746== Conditional jump or move depends on uninitialised value(s)
==30746==    at 0x4A01F7: PyUnicode_Decode (unicodeobject.c:3395)
==30746==    by 0x4A01F7: PyUnicode_Decode (unicodeobject.c:3350)
==30746==    by 0x4A062E: PyUnicode_FromEncodedObject (unicodeobject.c:3268)
==30746==    by 0x4A090D: unicode_new (unicodeobject.c:15125)
==30746==    by 0x46F515: type_call (typeobject.c:969)
==30746==    by 0x435F82: _PyObject_MakeTpCall (call.c:167)
==30746==    by 0x422B3A: _PyObject_Vectorcall (abstract.h:104)
==30746==    by 0x422B3A: _PyObject_Vectorcall (abstract.h:96)
==30746==    by 0x422B3A: call_function (ceval.c:4984)
==30746==    by 0x423C26: _PyEval_EvalFrameDefault (ceval.c:3496)
==30746==    by 0x4217D2: function_code_fastcall (call.c:292)
==30746==    by 0x4360FC: _PyObject_FastCallDict (call.c:109)
==30746==    by 0x436309: _PyObject_Call_Prepend (call.c:447)
==30746==    by 0x4731F0: slot_tp_init (typeobject.c:6767)
==30746==    by 0x46F562: type_call (typeobject.c:989)

(B)

==30746== Conditional jump or move depends on uninitialised value(s)
==30746==    at 0x4DC8FF: PyState_AddModule (pystate.c:707)
==30746==    by 0x485A5E1: PyInit_zlib (zlibmodule.c:1472)
==30746==    by 0x4CA49C: _PyImport_LoadDynamicModuleWithSpec (importdl.c:164)
==30746==    by 0x4C8462: _imp_create_dynamic_impl (import.c:2292)
==30746==    by 0x4C8462: _imp_create_dynamic (import.c.h:330)
==30746==    by 0x570783: cfunction_vectorcall_FASTCALL (methodobject.c:382)
==30746==    by 0x435ACD: PyVectorcall_Call (call.c:203)
==30746==    by 0x42424F: do_call_core (ceval.c:5031)
==30746==    by 0x42424F: _PyEval_EvalFrameDefault (ceval.c:3557)
==30746==    by 0x4AF4CF: _PyEval_EvalCodeWithName (ceval.c:4296)
==30746==    by 0x435D3F: _PyFunction_Vectorcall (call.c:355)
==30746==    by 0x422AA8: _PyObject_Vectorcall (abstract.h:106)
==30746==    by 0x422AA8: call_function (ceval.c:4984)
==30746==    by 0x428E51: _PyEval_EvalFrameDefault (ceval.c:3465)
==30746==    by 0x4217D2: function_code_fastcall (call.c:292)

Warning (A) goes away when Python is compiled with gcc -O1.

Warning (B) is still present when Python is compiled with gcc -O0.


It's unclear to me if it's a false alarm or a real issue.


On Fedora, the issue can be seen with:

   echo|valgrind python3 /usr/lib64/python3.7/tokenize.py

On Fedora 30, python3 is Python 3.7.4.


Downstream Fedora issue: https://bugzilla.redhat.com/show_bug.cgi?id=1751208
msg352170 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-09-12 14:10
==30746== Conditional jump or move depends on uninitialised value(s)
==30746==    at 0x4DC8FF: PyState_AddModule (pystate.c:707)

Oh, this issue looks like a duplicate of bpo-38124 which has just been fixed by:

New changeset 39de95b746c990e6a2fe9af5fad01747f58b2e5f by Benjamin Peterson in branch 'master':
closes bpo-38124: Fix bounds check in PyState_AddModule. (GH-16007)
https://github.com/python/cpython/commit/39de95b746c990e6a2fe9af5fad01747f58b2e5f
msg352245 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-09-13 08:09
I reopen the issue, the PyUnicode_Decode warning is not fixed yet.

vstinner@apu$ echo|PYTHONMALLOC=malloc valgrind ./python Lib/tokenize.py
==6832== Memcheck, a memory error detector
==6832== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==6832== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
==6832== Command: ./python Lib/tokenize.py
==6832== 
==6832== Conditional jump or move depends on uninitialised value(s)
==6832==    at 0x4D87E9: PyUnicode_Decode (unicodeobject.c:3395)
==6832==    by 0x4D8E6E: PyUnicode_FromEncodedObject (unicodeobject.c:3268)
==6832==    by 0x4D9283: unicode_new (unicodeobject.c:15125)
==6832==    by 0x49271B: type_call (typeobject.c:969)
==6832==    by 0x440E97: _PyObject_MakeTpCall (call.c:167)
==6832==    by 0x42D6D3: _PyObject_Vectorcall (abstract.h:104)
==6832==    by 0x42D6D3: call_function (ceval.c:4984)
==6832==    by 0x42D6D3: _PyEval_EvalFrameDefault (ceval.c:3496)
==6832==    by 0x4216CA: function_code_fastcall (call.c:292)
==6832==    by 0x4411B5: _PyObject_FastCallDict (call.c:109)
==6832==    by 0x441413: _PyObject_Call_Prepend (call.c:447)
==6832==    by 0x497320: slot_tp_init (typeobject.c:6772)
==6832==    by 0x492767: type_call (typeobject.c:989)
==6832==    by 0x440E97: _PyObject_MakeTpCall (call.c:167)
==6832== 
1,0-1,1:            NL             '\n'           
2,0-2,0:            ENDMARKER      ''             
==6832== 
==6832== HEAP SUMMARY:
==6832==     in use at exit: 1,179,135 bytes in 8,467 blocks
==6832==   total heap usage: 93,677 allocs, 85,210 frees, 12,109,969 bytes allocated
==6832== 
==6832== LEAK SUMMARY:
==6832==    definitely lost: 0 bytes in 0 blocks
==6832==    indirectly lost: 0 bytes in 0 blocks
==6832==      possibly lost: 454,934 bytes in 1,972 blocks
==6832==    still reachable: 724,201 bytes in 6,495 blocks
==6832==         suppressed: 0 bytes in 0 blocks
==6832== Rerun with --leak-check=full to see details of leaked memory
==6832== 
==6832== Use --track-origins=yes to see where uninitialised values come from
==6832== For lists of detected and suppressed errors, rerun with: -s
==6832== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
msg354183 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-10-08 10:46
> ==6832==    at 0x4D87E9: PyUnicode_Decode (unicodeobject.c:3395)

This warning is a false alarm: the GCC builtin strcmp() function seems to read past the NUL byte. When recompiling Python with -fno-builtin GCC option, the warning is gone.

The valgrind warning can be reproduced using attached valgrind_strcmp_warn.c:

On Fedora 30 with:

* gcc (GCC) 9.2.1 20190827 (Red Hat 9.2.1-1)
* valgrind-3.15.0

GCC builtin strcmp() emits a false alarm in valgrind: it reads past the NUL
byte.

$ gcc -O3 valgrind_strcmp_warn.c -o valgrind_strcmp_warn -D NDEBUG=1 -g && valgrind ./valgrind_strcmp_warn
(...)
==29173== Conditional jump or move depends on uninitialised value(s)
==29173==    at 0x4011CB: PyUnicode_Decode.part.0 (valgrind_strcmp_warn.c:276)
==29173==    by 0x4898F42: (below main) (in /usr/lib64/libc-2.29.so)
==29173==
==29173== Conditional jump or move depends on uninitialised value(s)
==29173==    at 0x4011EE: PyUnicode_Decode.part.0 (valgrind_strcmp_warn.c:280)
==29173==    by 0x4898F42: (below main) (in /usr/lib64/libc-2.29.so)
==29173==
==29173== Conditional jump or move depends on uninitialised value(s)
==29173==    at 0x401221: PyUnicode_Decode.part.0 (valgrind_strcmp_warn.c:282)
==29173==    by 0x4898F42: (below main) (in /usr/lib64/libc-2.29.so)
(...)
==29173== ERROR SUMMARY: 3 errors from 3 contexts (suppressed: 0 from 0)

versus

$ gcc -O3 valgrind_strcmp_warn.c -o valgrind_strcmp_warn -D NDEBUG=1 -fno-builtin -g && valgrind ./valgrind_strcmp_warn
==29217== Memcheck, a memory error detector
(...)
==29217== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
msg354197 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-10-08 12:45
The valgrind false alarm on GCC builtin strcmp() has been reported to the Valgrind bug tracker in 2011, with an update in 2012 and... nothing :-(
https://bugs.kde.org/show_bug.cgi?id=264936

curl worked around the issue by disabling valgrind on the impacted test:
https://lists.fedoraproject.org/pipermail/scm-commits/2011-February/567665.html

Passing "--partial-loads-ok=yes" to Valgrind has no effect... It became the default:
http://valgrind.org/docs/manual/mc-manual.html#opt.partial-loads-ok
msg354202 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-10-08 13:19
It seems like Python 2.7 is not affected.
msg354205 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-10-08 13:42
New changeset 03ab6b4fc6f59a4452756e7a3a46310ce30ec4b2 by Victor Stinner in branch 'master':
bpo-38118: Ignore Valgrind false alarm in PyUnicode_Decode() (GH-16651)
https://github.com/python/cpython/commit/03ab6b4fc6f59a4452756e7a3a46310ce30ec4b2
msg354206 - (view) Author: miss-islington (miss-islington) Date: 2019-10-08 13:49
New changeset 364532fb4962991d278da6470c3aec4e443e533d by Miss Islington (bot) in branch '3.8':
bpo-38118: Ignore Valgrind false alarm in PyUnicode_Decode() (GH-16651)
https://github.com/python/cpython/commit/364532fb4962991d278da6470c3aec4e443e533d
msg354207 - (view) Author: miss-islington (miss-islington) Date: 2019-10-08 13:56
New changeset 98043b4fcd80bdfa8cc2be1730fa2a4cf4950958 by Miss Islington (bot) in branch '3.7':
bpo-38118: Ignore Valgrind false alarm in PyUnicode_Decode() (GH-16651)
https://github.com/python/cpython/commit/98043b4fcd80bdfa8cc2be1730fa2a4cf4950958
msg354208 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-10-08 13:58
Ok, the issue has been fixed in 3.7, 3.8 and master branches.

Ensure that you use Valgrind with Misc/valgrind.supp. Example:

$ echo|PYTHONMALLOC=malloc valgrind --suppressions=Misc/valgrind-python.supp ./python Lib/tokenize.py
(...)
==12923== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 1 from 1)

I close the issue.
History
Date User Action Args
2019-10-08 13:58:13vstinnersetstatus: open -> closed
resolution: fixed
messages: + msg354208

stage: patch review -> resolved
2019-10-08 13:56:11miss-islingtonsetmessages: + msg354207
2019-10-08 13:50:13miss-islingtonsetpull_requests: + pull_request16237
2019-10-08 13:49:15miss-islingtonsetnosy: + miss-islington
messages: + msg354206
2019-10-08 13:42:38miss-islingtonsetpull_requests: + pull_request16236
2019-10-08 13:42:30miss-islingtonsetpull_requests: + pull_request16235
2019-10-08 13:42:24vstinnersetmessages: + msg354205
2019-10-08 13:19:24vstinnersetmessages: + msg354202
versions: + Python 3.7, Python 3.8
2019-10-08 13:08:22vstinnersetkeywords: + patch
stage: resolved -> patch review
pull_requests: + pull_request16234
2019-10-08 12:45:06vstinnersetmessages: + msg354197
2019-10-08 10:46:47vstinnersetfiles: + valgrind_strcmp_warn.c

messages: + msg354183
title: Valgrind warnings when running tokenize.py -> Valgrind warning in PyUnicode_Decode: false alarm with GCC builtin strcmp()
2019-09-13 08:09:02vstinnersetstatus: closed -> open
superseder: off-by-one error in PyState_AddModule ->
resolution: duplicate -> (no value)
messages: + msg352245
2019-09-12 14:43:33benjamin.petersonsetstatus: open -> closed
superseder: off-by-one error in PyState_AddModule
resolution: duplicate
stage: resolved
2019-09-12 14:10:51vstinnersetmessages: + msg352170
2019-09-11 15:28:57vstinnercreate