classification
Title: Valgrind reports Syscall param epoll_ctl(event) points to uninitialised byte(s)
Type: compile error Stage: patch review
Components: Interpreter Core Versions: Python 3.9, Python 3.8
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: ZackerySpytz, benjamin.peterson, nikratio, skrah, vstinner
Priority: normal Keywords: patch

Created on 2018-12-22 15:22 by nikratio, last changed 2020-01-15 21:47 by vstinner.

Pull Requests
URL Status Linked Edit
PR 17782 open ZackerySpytz, 2020-01-01 14:45
Messages (10)
msg332348 - (view) Author: Nikolaus Rath (nikratio) * Date: 2018-12-22 15:22
With current git master, configured --with-valgrind --with-pydebug, I get:

==31074== Command: /home/nikratio/clones/cpython/python /home/nikratio/in-progress/pyfuse3/test/../examples/hello.py /tmp/pytest-of-nikratio/pytest-11/test_hello_hello_py_0
==31074== 
==31074== Syscall param epoll_ctl(event) points to uninitialised byte(s)
==31074==    at 0x584906A: epoll_ctl (syscall-template.S:84)
==31074==    by 0xBDAA493: pyepoll_internal_ctl (selectmodule.c:1392)
==31074==    by 0xBDAA59F: select_epoll_register_impl (selectmodule.c:1438)
==31074==    by 0xBDAA5F8: select_epoll_register (selectmodule.c.h:599)
==31074==    by 0x174E15: _PyMethodDef_RawFastCallKeywords (call.c:658)
==31074==    by 0x300BCA: _PyMethodDescr_FastCallKeywords (descrobject.c:290)
==31074==    by 0x21FC05: call_function (ceval.c:4611)
==31074==    by 0x22B5E7: _PyEval_EvalFrameDefault (ceval.c:3183)
==31074==    by 0x2206FF: PyEval_EvalFrameEx (ceval.c:533)
==31074==    by 0x173B61: function_code_fastcall (call.c:285)
==31074==    by 0x174737: _PyFunction_FastCallKeywords (call.c:410)
==31074==    by 0x21FDF4: call_function (ceval.c:4634)
==31074==  Address 0xffeffeb4c is on thread 1's stack
==31074==  in frame #1, created by pyepoll_internal_ctl (selectmodule.c:1379)

To reproduce:

$ python-dev -m pip install --user pyfuse3  # for dependencies
$ git clone https://github.com/libfuse/pyfuse3.git
$ valgrind --trace-children=yes "--trace-children-skip=*mount*" python-dev -m pytest test/


pyfuse3 provides a C extension module, but I believe the problem is in the interpreter core as the stacktrace does not include anything from the extension.
msg332349 - (view) Author: Nikolaus Rath (nikratio) * Date: 2018-12-22 15:32
Same error with 3.7.1 and 3.6.7 (though line numbers differ slightly).
msg332351 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2018-12-22 15:44
"--with-valgrind --with-pydebug" looks suspicious, it essentially
mixes two different memory checkers.

1) CFLAGS="-O0 -g" --without-pymalloc

2) CFLAGS="-O0 -g" --with-valgrind

should both work.


Can you try if this fixes the error?
msg332352 - (view) Author: Nikolaus Rath (nikratio) * Date: 2018-12-22 15:51
$ ./configure CFLAGS="-O0 -g" --with-valgrind && make -j8

still gives

==13281== Memcheck, a memory error detector
==13281== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==13281== Using Valgrind-3.12.0.SVN and LibVEX; rerun with -h for copyright info
==13281== Command: /home/nikratio/clones/cpython/python /home/nikratio/in-progress/pyfuse3/test/../examples/hello.py /tmp/pytest-of-nikratio/pytest-15/test_hello_hello_py_0
==13281== 
==13281== Syscall param epoll_ctl(event) points to uninitialised byte(s)
==13281==    at 0x584906A: epoll_ctl (syscall-template.S:84)
==13281==    by 0xB5C07FA: pyepoll_internal_ctl (selectmodule.c:1392)
==13281==    by 0xB5C08CB: select_epoll_register_impl (selectmodule.c:1438)
==13281==    by 0xB5C112A: select_epoll_register (selectmodule.c.h:599)
==13281==    by 0x173031: _PyMethodDef_RawFastCallKeywords (call.c:658)
==13281==    by 0x2FEFCD: _PyMethodDescr_FastCallKeywords (descrobject.c:290)
==13281==    by 0x21ED25: call_function (ceval.c:4611)
==13281==    by 0x21AB4E: _PyEval_EvalFrameDefault (ceval.c:3183)
==13281==    by 0x21007C: PyEval_EvalFrameEx (ceval.c:533)
==13281==    by 0x17245F: function_code_fastcall (call.c:285)
==13281==    by 0x1728B5: _PyFunction_FastCallKeywords (call.c:410)
==13281==    by 0x21EDF4: call_function (ceval.c:4634)
==13281==  Address 0xffeff2d94 is on thread 1's stack
==13281==  in frame #1, created by pyepoll_internal_ctl (selectmodule.c:1379)
msg332503 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2018-12-25 04:36
I suspect Valgrind is being too conservative. union epoll_data is 64 bits wide but the file descriptor only occupies the first 32 bits. The last 32 bits don't need to be initialized by the application.
msg348066 - (view) Author: Nikolaus Rath (nikratio) * Date: 2019-07-17 15:08
Is there a good reason to not explicitly initialize them nevertheless? Valgrind is a common tool, and this false positive will affect everyone attempting to run it on a Python extension module.
msg359433 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-01-06 16:20
On my Fedora 31, epoll_event structure is defined in sys/epoll.h as:

typedef union epoll_data
{
  void *ptr;
  int fd;
  uint32_t u32;
  uint64_t u64;
} epoll_data_t;

struct epoll_event
{
  uint32_t events;	/* Epoll events */
  epoll_data_t data;	/* User data variable */
} __EPOLL_PACKED;


I can reproduce the issue using this Python script:

import select
p = select.epoll()
p.register(1, select.EPOLLIN)
p.poll(0)

The Linux syscall is defined as:

asmlinkage long sys_epoll_ctl(int epfd, int op, int fd,
				struct epoll_event __user *event);

It's implemented in fs/eventpoll.c:

https://github.com/torvalds/linux/blob/c79f46a282390e0f5b306007bf7b11a46d529538/fs/eventpoll.c#L2077-L2083

Valgrind complains that ev.data is only partially initialized: it's a 64-bit structure, but we only set ev.data.fd. I guess that the glibc implements epoll_ctl() as a raw system call: Valgrind is unable to know if it's ok to only partially initialize the epoll_data union.


Benjamin: "I suspect Valgrind is being too conservative. union epoll_data is 64 bits wide but the file descriptor only occupies the first 32 bits. The last 32 bits don't need to be initialized by the application."

Right. But Valgrind is not smart enough.

I see 2 options:

* Initialize ev.data to 0
* Use Misc/valgrind-python.supp to ignore this *false alarm*
msg359434 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-01-06 16:23
> Initialize ev.data to 0

I dislike this option, since the code is legit: Valgrind produces a false alarm.

> Use Misc/valgrind-python.supp to ignore this *false alarm*

I prefer this option.
msg359833 - (view) Author: Zackery Spytz (ZackerySpytz) * (Python triager) Date: 2020-01-12 06:54
Thank you, Victor, for the comments.  I will update PR 17782.
msg360079 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-01-15 21:47
> Thank you, Victor, for the comments.  I will update PR 17782.

Since it's a very different approach (modify C vs suppress the false alarm), I suggest instead to open a new different PR.
History
Date User Action Args
2020-01-15 21:47:31vstinnersetmessages: + msg360079
2020-01-12 06:54:58ZackerySpytzsetmessages: + msg359833
2020-01-06 16:23:58vstinnersetmessages: + msg359434
2020-01-06 16:20:30vstinnersetnosy: + vstinner
messages: + msg359433
2020-01-01 14:46:22ZackerySpytzsetnosy: + ZackerySpytz

versions: + Python 3.9
2020-01-01 14:45:16ZackerySpytzsetkeywords: + patch
stage: patch review
pull_requests: + pull_request17215
2019-07-17 15:08:17nikratiosetmessages: + msg348066
2018-12-25 04:36:42benjamin.petersonsetnosy: + benjamin.peterson
messages: + msg332503
2018-12-22 15:51:17nikratiosetmessages: + msg332352
2018-12-22 15:44:44skrahsetnosy: + skrah
messages: + msg332351
2018-12-22 15:32:09nikratiosetmessages: + msg332349
2018-12-22 15:22:42nikratiocreate