Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Valgrind warning in PyUnicode_Decode: false alarm with GCC builtin strcmp() #82299

Closed
vstinner opened this issue Sep 11, 2019 · 10 comments
Closed
Labels
3.7 (EOL) end of life 3.8 only security fixes 3.9 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs)

Comments

@vstinner
Copy link
Member

BPO 38118
Nosy @vstinner, @miss-islington
PRs
  • bpo-38118: Ignore Valgrind false alarm in PyUnicode_Decode() #16651
  • [3.8] bpo-38118: Ignore Valgrind false alarm in PyUnicode_Decode() (GH-16651) #16652
  • [3.7] bpo-38118: Ignore Valgrind false alarm in PyUnicode_Decode() (GH-16651) #16653
  • [3.7] bpo-38118: Ignore Valgrind false alarm in PyUnicode_Decode() (GH-16651) #16654
  • Files
  • valgrind_strcmp_warn.c
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2019-10-08.13:58:13.065>
    created_at = <Date 2019-09-11.15:28:57.117>
    labels = ['interpreter-core', '3.7', '3.8', '3.9']
    title = 'Valgrind warning in PyUnicode_Decode: false alarm with GCC builtin strcmp()'
    updated_at = <Date 2019-10-08.13:58:13.064>
    user = 'https://github.com/vstinner'

    bugs.python.org fields:

    activity = <Date 2019-10-08.13:58:13.064>
    actor = 'vstinner'
    assignee = 'none'
    closed = True
    closed_date = <Date 2019-10-08.13:58:13.065>
    closer = 'vstinner'
    components = ['Interpreter Core']
    creation = <Date 2019-09-11.15:28:57.117>
    creator = 'vstinner'
    dependencies = []
    files = ['48647']
    hgrepos = []
    issue_num = 38118
    keywords = ['patch']
    message_count = 10.0
    messages = ['351939', '352170', '352245', '354183', '354197', '354202', '354205', '354206', '354207', '354208']
    nosy_count = 2.0
    nosy_names = ['vstinner', 'miss-islington']
    pr_nums = ['16651', '16652', '16653', '16654']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue38118'
    versions = ['Python 3.7', 'Python 3.8', 'Python 3.9']

    @vstinner
    Copy link
    Member Author

    Running tokenize.py in Valgrind emits 2 warnings:

    $ ./configure --with-valgrind
    $ make
    $ echo|PYTHONMALLOC=malloc valgrind ./python Lib/tokenize.py

    (A)

    ==30746== Conditional jump or move depends on uninitialised value(s)
    ==30746== at 0x4A01F7: PyUnicode_Decode (unicodeobject.c:3395)
    ==30746== by 0x4A01F7: PyUnicode_Decode (unicodeobject.c:3350)
    ==30746== by 0x4A062E: PyUnicode_FromEncodedObject (unicodeobject.c:3268)
    ==30746== by 0x4A090D: unicode_new (unicodeobject.c:15125)
    ==30746== by 0x46F515: type_call (typeobject.c:969)
    ==30746== by 0x435F82: _PyObject_MakeTpCall (call.c:167)
    ==30746== by 0x422B3A: _PyObject_Vectorcall (abstract.h:104)
    ==30746== by 0x422B3A: _PyObject_Vectorcall (abstract.h:96)
    ==30746== by 0x422B3A: call_function (ceval.c:4984)
    ==30746== by 0x423C26: _PyEval_EvalFrameDefault (ceval.c:3496)
    ==30746== by 0x4217D2: function_code_fastcall (call.c:292)
    ==30746== by 0x4360FC: _PyObject_FastCallDict (call.c:109)
    ==30746== by 0x436309: _PyObject_Call_Prepend (call.c:447)
    ==30746== by 0x4731F0: slot_tp_init (typeobject.c:6767)
    ==30746== by 0x46F562: type_call (typeobject.c:989)

    (B)

    ==30746== Conditional jump or move depends on uninitialised value(s)
    ==30746== at 0x4DC8FF: PyState_AddModule (pystate.c:707)
    ==30746== by 0x485A5E1: PyInit_zlib (zlibmodule.c:1472)
    ==30746== by 0x4CA49C: _PyImport_LoadDynamicModuleWithSpec (importdl.c:164)
    ==30746== by 0x4C8462: _imp_create_dynamic_impl (import.c:2292)
    ==30746== by 0x4C8462: _imp_create_dynamic (import.c.h:330)
    ==30746== by 0x570783: cfunction_vectorcall_FASTCALL (methodobject.c:382)
    ==30746== by 0x435ACD: PyVectorcall_Call (call.c:203)
    ==30746== by 0x42424F: do_call_core (ceval.c:5031)
    ==30746== by 0x42424F: _PyEval_EvalFrameDefault (ceval.c:3557)
    ==30746== by 0x4AF4CF: _PyEval_EvalCodeWithName (ceval.c:4296)
    ==30746== by 0x435D3F: _PyFunction_Vectorcall (call.c:355)
    ==30746== by 0x422AA8: _PyObject_Vectorcall (abstract.h:106)
    ==30746== by 0x422AA8: call_function (ceval.c:4984)
    ==30746== by 0x428E51: _PyEval_EvalFrameDefault (ceval.c:3465)
    ==30746== by 0x4217D2: function_code_fastcall (call.c:292)

    Warning (A) goes away when Python is compiled with gcc -O1.

    Warning (B) is still present when Python is compiled with gcc -O0.

    It's unclear to me if it's a false alarm or a real issue.

    On Fedora, the issue can be seen with:

    echo|valgrind python3 /usr/lib64/python3.7/tokenize.py

    On Fedora 30, python3 is Python 3.7.4.

    Downstream Fedora issue: https://bugzilla.redhat.com/show_bug.cgi?id=1751208

    @vstinner vstinner added 3.9 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) labels Sep 11, 2019
    @vstinner
    Copy link
    Member Author

    ==30746== Conditional jump or move depends on uninitialised value(s)
    ==30746== at 0x4DC8FF: PyState_AddModule (pystate.c:707)

    Oh, this issue looks like a duplicate of bpo-38124 which has just been fixed by:

    New changeset 39de95b by Benjamin Peterson in branch 'master':
    closes bpo-38124: Fix bounds check in PyState_AddModule. (GH-16007)
    39de95b

    @vstinner
    Copy link
    Member Author

    I reopen the issue, the PyUnicode_Decode warning is not fixed yet.

    vstinner@apu$ echo|PYTHONMALLOC=malloc valgrind ./python Lib/tokenize.py
    ==6832== Memcheck, a memory error detector
    ==6832== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
    ==6832== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
    ==6832== Command: ./python Lib/tokenize.py
    ==6832==
    ==6832== Conditional jump or move depends on uninitialised value(s)
    ==6832== at 0x4D87E9: PyUnicode_Decode (unicodeobject.c:3395)
    ==6832== by 0x4D8E6E: PyUnicode_FromEncodedObject (unicodeobject.c:3268)
    ==6832== by 0x4D9283: unicode_new (unicodeobject.c:15125)
    ==6832== by 0x49271B: type_call (typeobject.c:969)
    ==6832== by 0x440E97: _PyObject_MakeTpCall (call.c:167)
    ==6832== by 0x42D6D3: _PyObject_Vectorcall (abstract.h:104)
    ==6832== by 0x42D6D3: call_function (ceval.c:4984)
    ==6832== by 0x42D6D3: _PyEval_EvalFrameDefault (ceval.c:3496)
    ==6832== by 0x4216CA: function_code_fastcall (call.c:292)
    ==6832== by 0x4411B5: _PyObject_FastCallDict (call.c:109)
    ==6832== by 0x441413: _PyObject_Call_Prepend (call.c:447)
    ==6832== by 0x497320: slot_tp_init (typeobject.c:6772)
    ==6832== by 0x492767: type_call (typeobject.c:989)
    ==6832== by 0x440E97: _PyObject_MakeTpCall (call.c:167)
    ==6832==
    1,0-1,1: NL '\n'
    2,0-2,0: ENDMARKER ''
    ==6832==
    ==6832== HEAP SUMMARY:
    ==6832== in use at exit: 1,179,135 bytes in 8,467 blocks
    ==6832== total heap usage: 93,677 allocs, 85,210 frees, 12,109,969 bytes allocated
    ==6832==
    ==6832== LEAK SUMMARY:
    ==6832== definitely lost: 0 bytes in 0 blocks
    ==6832== indirectly lost: 0 bytes in 0 blocks
    ==6832== possibly lost: 454,934 bytes in 1,972 blocks
    ==6832== still reachable: 724,201 bytes in 6,495 blocks
    ==6832== suppressed: 0 bytes in 0 blocks
    ==6832== Rerun with --leak-check=full to see details of leaked memory
    ==6832==
    ==6832== Use --track-origins=yes to see where uninitialised values come from
    ==6832== For lists of detected and suppressed errors, rerun with: -s
    ==6832== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)

    @vstinner vstinner reopened this Sep 13, 2019
    @vstinner
    Copy link
    Member Author

    vstinner commented Oct 8, 2019

    ==6832== at 0x4D87E9: PyUnicode_Decode (unicodeobject.c:3395)

    This warning is a false alarm: the GCC builtin strcmp() function seems to read past the NUL byte. When recompiling Python with -fno-builtin GCC option, the warning is gone.

    The valgrind warning can be reproduced using attached valgrind_strcmp_warn.c:

    On Fedora 30 with:

    • gcc (GCC) 9.2.1 20190827 (Red Hat 9.2.1-1)
    • valgrind-3.15.0

    GCC builtin strcmp() emits a false alarm in valgrind: it reads past the NUL
    byte.

    $ gcc -O3 valgrind_strcmp_warn.c -o valgrind_strcmp_warn -D NDEBUG=1 -g && valgrind ./valgrind_strcmp_warn
    (...)
    ==29173== Conditional jump or move depends on uninitialised value(s)
    ==29173==    at 0x4011CB: PyUnicode_Decode.part.0 (valgrind_strcmp_warn.c:276)
    ==29173==    by 0x4898F42: (below main) (in /usr/lib64/libc-2.29.so)
    ==29173==
    ==29173== Conditional jump or move depends on uninitialised value(s)
    ==29173==    at 0x4011EE: PyUnicode_Decode.part.0 (valgrind_strcmp_warn.c:280)
    ==29173==    by 0x4898F42: (below main) (in /usr/lib64/libc-2.29.so)
    ==29173==
    ==29173== Conditional jump or move depends on uninitialised value(s)
    ==29173==    at 0x401221: PyUnicode_Decode.part.0 (valgrind_strcmp_warn.c:282)
    ==29173==    by 0x4898F42: (below main) (in /usr/lib64/libc-2.29.so)
    (...)
    ==29173== ERROR SUMMARY: 3 errors from 3 contexts (suppressed: 0 from 0)

    versus

    $ gcc -O3 valgrind_strcmp_warn.c -o valgrind_strcmp_warn -D NDEBUG=1 -fno-builtin -g && valgrind ./valgrind_strcmp_warn
    ==29217== Memcheck, a memory error detector
    (...)
    ==29217== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

    @vstinner vstinner changed the title Valgrind warnings when running tokenize.py Valgrind warning in PyUnicode_Decode: false alarm with GCC builtin strcmp() Oct 8, 2019
    @vstinner
    Copy link
    Member Author

    vstinner commented Oct 8, 2019

    The valgrind false alarm on GCC builtin strcmp() has been reported to the Valgrind bug tracker in 2011, with an update in 2012 and... nothing :-(
    https://bugs.kde.org/show_bug.cgi?id=264936

    curl worked around the issue by disabling valgrind on the impacted test:
    https://lists.fedoraproject.org/pipermail/scm-commits/2011-February/567665.html

    Passing "--partial-loads-ok=yes" to Valgrind has no effect... It became the default:
    http://valgrind.org/docs/manual/mc-manual.html#opt.partial-loads-ok

    @vstinner
    Copy link
    Member Author

    vstinner commented Oct 8, 2019

    It seems like Python 2.7 is not affected.

    @vstinner vstinner added 3.7 (EOL) end of life 3.8 only security fixes labels Oct 8, 2019
    @vstinner
    Copy link
    Member Author

    vstinner commented Oct 8, 2019

    New changeset 03ab6b4 by Victor Stinner in branch 'master':
    bpo-38118: Ignore Valgrind false alarm in PyUnicode_Decode() (GH-16651)
    03ab6b4

    @miss-islington
    Copy link
    Contributor

    New changeset 364532f by Miss Islington (bot) in branch '3.8':
    bpo-38118: Ignore Valgrind false alarm in PyUnicode_Decode() (GH-16651)
    364532f

    @miss-islington
    Copy link
    Contributor

    New changeset 98043b4 by Miss Islington (bot) in branch '3.7':
    bpo-38118: Ignore Valgrind false alarm in PyUnicode_Decode() (GH-16651)
    98043b4

    @vstinner
    Copy link
    Member Author

    vstinner commented Oct 8, 2019

    Ok, the issue has been fixed in 3.7, 3.8 and master branches.

    Ensure that you use Valgrind with Misc/valgrind.supp. Example:

    $ echo|PYTHONMALLOC=malloc valgrind --suppressions=Misc/valgrind-python.supp ./python Lib/tokenize.py
    (...)
    ==12923== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 1 from 1)

    I close the issue.

    @vstinner vstinner closed this as completed Oct 8, 2019
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life 3.8 only security fixes 3.9 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs)
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants