msg317509 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2018-05-24 02:18 |
http://buildbot.python.org/all/#/builders/53/builds/1062
(...)
test_close_file (test.test_posix.TestPosixSpawn) ... ok
test_dup2 (test.test_posix.TestPosixSpawn) ... ok
test_empty_file_actions (test.test_posix.TestPosixSpawn) ... ok
test_multiple_file_actions (test.test_posix.TestPosixSpawn) ... FAIL
test_no_such_executable (test.test_posix.TestPosixSpawn) ... ok
test_open_file (test.test_posix.TestPosixSpawn) ... ERROR
test_returns_pid (test.test_posix.TestPosixSpawn) ... ok
Warning -- files was modified by test_posix
Before: []
After: ['\udcdb\udcdb\udcdb\udcdb\udcdb\udcdb\udcdb\udcdb\udcdb\udcdb\udcdb\udcdb\udcdb\udcdb\udcdb\udcdb\udcdb\udcdb\udcdb\udcdb\udcdb\udcdb\udcdb\udcdb\udcdb\udcdb\udcdb\udcdb\udcdb\udcdb\udcdb\udcdb']
test test_posix failed
test_specify_environment (test.test_posix.TestPosixSpawn) ... ok
======================================================================
ERROR: test_open_file (test.test_posix.TestPosixSpawn)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/shager/cpython-buildarea/3.x.edelsohn-fedora-ppc64/build/Lib/test/test_posix.py", line 1542, in test_open_file
with open(outfile) as f:
FileNotFoundError: [Errno 2] No such file or directory: '@test_42482_tmp'
======================================================================
FAIL: test_multiple_file_actions (test.test_posix.TestPosixSpawn)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/shager/cpython-buildarea/3.x.edelsohn-fedora-ppc64/build/Lib/test/test_posix.py", line 1488, in test_multiple_file_actions
self.assertEqual(os.waitpid(pid, 0), (pid, 0))
AssertionError: Tuples differ: (31319, 32512) != (31319, 0)
First differing element 1:
32512
0
- (31319, 32512)
? ^^^^^
+ (31319, 0)
? ^
----------------------------------------------------------------------
Ran 101 tests in 2.175s
FAILED (failures=1, errors=1, skipped=9)
1 test failed again:
test_posix
|
msg317524 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2018-05-24 05:12 |
What is \udcdb\udcdb\udcdb...? Looks like a corrupted memory.
|
msg317527 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2018-05-24 05:35 |
Seems there are 2 or 3 different bugs here. Race condition, using freed memory, and maybe something other. I take this.
|
msg317797 - (view) |
Author: Pablo Galindo Salgado (pablogsal) *  |
Date: 2018-05-27 17:09 |
@serhiy.storchaka Is the race condition/freed memory reusage in the test suite or in the posix module?
|
msg317827 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2018-05-28 07:27 |
The race condition is in tests (we can open a file before it be written in a child). The reed memory reusage is in the posix module. Didn't look at it close, but seems the content of a temporary Python object (the result of encoding/decoding) is passed to C library function, and this object is freed before the content be used.
|
msg319385 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2018-06-12 14:03 |
I'm not sure that the attached PR fixes all race conditions.
|
msg319386 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2018-06-12 14:05 |
The part that is related to \udcdb\udcdb\udcdb... is a problem in posix_spawn() implementation, not in tests. It can't be fixed by changing tests.
|
msg319387 - (view) |
Author: Pablo Galindo Salgado (pablogsal) *  |
Date: 2018-06-12 14:10 |
In PR 7663 I am only fixing the race condition.
|
msg319388 - (view) |
Author: Pablo Galindo Salgado (pablogsal) *  |
Date: 2018-06-12 14:12 |
A separate PR is needed for the apparent reusing of freed memory in the implementation of posix_spawn. I am currently looking into it.
|
msg319389 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2018-06-12 14:12 |
For reference: posix_spawn() was implemented in issue20104.
|
msg319390 - (view) |
Author: Pablo Galindo Salgado (pablogsal) *  |
Date: 2018-06-12 14:14 |
Should we merge PR6693 and PR6725 before fixing the memory problem?
|
msg319394 - (view) |
Author: Pablo Galindo Salgado (pablogsal) *  |
Date: 2018-06-12 15:53 |
Serhiy, I am checking and the only temporary Python object (the result of encoding/decoding) that is passed to C library function that I can find is in:
if (!PyArg_ParseTuple(file_action, "OiO&ik"
";A open file_action tuple must have 5 elements",
&tag_obj, &fd, PyUnicode_FSConverter, &path,
&oflag, &mode))
{
goto fail;
}
errno = posix_spawn_file_actions_addopen(file_actionsp,
fd, PyBytes_AS_STRING(path), oflag, (mode_t)mode);
Py_DECREF(path); /* addopen copied it. */
And according to the manpage of posix_spawn_file_actions_addopen: "The string described by path shall be copied by the posix_spawn_file_actions_addopen() function.".
The object created by `PyArg_ParseTuple` (path) is only freed after calling `posix_spawn_file_actions_addopen` and in that function the contents are copied inside
`file_actionsp`. So it should not be a problem.
In case there is still a problem, could you elaborate more about what object is being freed and how is being passed to a C library function?
|
msg319427 - (view) |
Author: Xiang Zhang (xiang.zhang) *  |
Date: 2018-06-13 05:56 |
I encounter the same failure when running test_posix suite with a debug build Python.
[root@SZX1000328166 cpython]# ./python -m test test_posix
Run tests sequentially
0:00:00 load avg: 0.91 [1/1] test_posix
Warning -- files was modified by test_posix
Before: []
After: ['\udcdb\udcdb\udcdb\udcdb\udcdb\udcdb\udcdb\udcdb\udcdb\udcdb\udcdb\udcdb\udcdb\udcdb\udcdb\udcdb\udcdb\udcdb\udcdb\udcdb\udcdb\udcdb\udcdb\udcdb\udcdb\udcdb\udcdb\udcdb\udcdb\udcdb\udcdb\udcdb\udcf0\udcd3B\udcb4\udc8b\x7f']
test test_posix failed -- multiple errors occurred; run in verbose mode for details
test_posix failed
== Tests result: FAILURE ==
1 test failed:
test_posix
Total duration: 459 ms
Tests result: FAILURE
Believe it or not, when I comment out the `Py_DECREF(path)`, the failure goes away.
|
msg319429 - (view) |
Author: Xiang Zhang (xiang.zhang) *  |
Date: 2018-06-13 06:14 |
I checked glibc changelog and the copy behaviour is not implemented until 2014, glic 2.20. There is a CVE-2014-4043. My environment has glic 2.17 which seems still suffer the problem.
|
msg319431 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2018-06-13 06:19 |
I'm puzzled. It is not just documented that posix_spawn_file_actions_addopen() should make a copy, but the glibc implementation actually makes it: see for example https://github.com/bminor/glibc/blob/09533208febe923479261a27b7691abef297d604/posix/spawn_faction_addopen.c . I don't see a way of corrupting this memory.
Can non-glibc implementation be used here?
|
msg319432 - (view) |
Author: Xiang Zhang (xiang.zhang) *  |
Date: 2018-06-13 06:29 |
Serhiy, you need to check an early version, for example 2.17: https://github.com/bminor/glibc/blob/glibc-2.17/posix/spawn_faction_addopen.c
|
msg319434 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2018-06-13 06:51 |
Thank you Xiang! Thus this is a bug in old glibc. It was fixed in glibc 2.20: https://sourceware.org/bugzilla/show_bug.cgi?id=17048 .
What should we do with this? Add a workaround for the bug (this will complicate the code)? Or skip the test on glibc <2.20.
|
msg319454 - (view) |
Author: Charalampos Stratakis (cstratak) * |
Date: 2018-06-13 12:13 |
The latest stable Fedora's have glibc >= 2.26
Maybe the buildbot needs to be updated?
|
msg319457 - (view) |
Author: Pablo Galindo Salgado (pablogsal) *  |
Date: 2018-06-13 12:51 |
If we want to support older versions of glibc a temporary (and somewhat inelegant) workaround is storing the temporaries in a list that the caller passes and destroy the list after calling `posix_spawn`.
|
msg319458 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2018-06-13 12:54 |
> If we want to support older versions of glibc a temporary (and somewhat inelegant) workaround is storing the temporaries in a list that the caller passes and destroy the list after calling `posix_spawn`.
Creating a copy seems to be a reasonable tradeoff.
Another option is to detect the bug at runtime and raise an exception if the bug is detected. But I dislike this option: I hate when a function is available in the os module but calling it raises a NotImplementedError :-( Moreover, posix_spawn() seems to be a lot of benefit, so I would like to be able to use it on "old" glibc versions.
|
msg319463 - (view) |
Author: Pablo Galindo Salgado (pablogsal) *  |
Date: 2018-06-13 13:28 |
I have updated the PR with a workaround.
|
msg319481 - (view) |
Author: Pablo Galindo Salgado (pablogsal) *  |
Date: 2018-06-13 18:25 |
Serhiy commented in the PR:
>__GLIBC__ and __GLIBC_MINOR__ give you the version of glibc used at >compile time. But can not the different version be dynamically linked at >run time?
Should we use the list approach always to avoid problems with this?
|
msg319485 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2018-06-13 20:02 |
Yes, as I wrote on GitHub.
|
msg319594 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2018-06-15 08:23 |
As I wrote on GitHub, I think that it is better to use the list approach always for two reasons:
1) Additional #if's and duplications clutter the code.
2) It is hard to test the alternate branch if the other branch is used on most developer computers.
It is better to have a single branch and add a comment that explains a workaround. It can be removed after platforms with buggy glibc versions will grow out of use.
|
msg319598 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2018-06-15 09:02 |
"It is better to have a single branch and add a comment that explains
a workaround. It can be removed after platforms with buggy glibc
versions will grow out of use."
Recently, I proposed to drop support for old Linux kernels, but some
people still use them and the code to support these kernels don't need
a lot of maintenance. So we keep the code :-)
https://mail.python.org/pipermail/python-dev/2018-January/151821.html
|
msg319605 - (view) |
Author: Pablo Galindo Salgado (pablogsal) *  |
Date: 2018-06-15 11:07 |
I have updated PR7685 to use the temporary list as per Serhiy comments.
|
msg319606 - (view) |
Author: Pablo Galindo Salgado (pablogsal) *  |
Date: 2018-06-15 11:18 |
As I commented on GitHub, it seems that the race in the test (problem covered in PR7663) disappears once the memory problem is fixed (PR33630). As is a race condition I am not sure that this affirmation is true, but running 3 test suites in parallel that executes test_posix 200 times does not raise any problem on the buildbot after applying the patch in PR33630
|
msg319736 - (view) |
Author: Martin Panter (martin.panter) *  |
Date: 2018-06-16 13:04 |
I doubt it is a race condition. “Waitpid” only returns after the child has terminated, and the file should be created as the child starts, before it terminates. Surely it is just the buggy glibc opening the DBDBDB file, when the test case expects it to create outfile.
|
msg319936 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2018-06-19 08:20 |
New changeset cb970730e3ca2522e9b1700dcaf0a06b7e898db6 by Serhiy Storchaka (Pablo Galindo) in branch 'master':
bpo-33630: Fix using of freed memory in old versions of glicb for posix_spawn(). (GH-7685)
https://github.com/python/cpython/commit/cb970730e3ca2522e9b1700dcaf0a06b7e898db6
|
msg319937 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2018-06-19 08:24 |
Thank you Pablo! Seems failures in other tests that look like a race condition actually are caused by the glibc bug. I think that after passing tests on builtbots this issue can be closed.
|
msg319946 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2018-06-19 09:10 |
PPC64 Fedora 3.x, PPC64LE Fedora 3.x, s390x Debian 3.x and s390x RHEL 3.x buildbots are back to green thanks to this fix.
Oh wow, so it was a glibc bug: that was unexpected :-( Hopefully, the workaround is tiny and cheap!
> I think that after passing tests on builtbots this issue can be closed.
Right. I closed the PR 7663 and I close this issue.
Thank you all for investigating this issue, thanks Serhiy and Xiang for the review, and thanks Pablo for the fix!
|
|
Date |
User |
Action |
Args |
2022-04-11 14:59:00 | admin | set | github: 77811 |
2018-06-19 09:10:16 | vstinner | set | status: open -> closed resolution: fixed messages:
+ msg319946
stage: patch review -> resolved |
2018-06-19 08:24:10 | serhiy.storchaka | set | messages:
+ msg319937 |
2018-06-19 08:20:00 | serhiy.storchaka | set | messages:
+ msg319936 |
2018-06-16 13:04:37 | martin.panter | set | nosy:
+ martin.panter messages:
+ msg319736
|
2018-06-15 11:18:07 | pablogsal | set | messages:
+ msg319606 |
2018-06-15 11:07:27 | pablogsal | set | messages:
+ msg319605 |
2018-06-15 09:02:07 | vstinner | set | messages:
+ msg319598 |
2018-06-15 08:23:25 | serhiy.storchaka | set | messages:
+ msg319594 |
2018-06-13 20:02:10 | serhiy.storchaka | set | messages:
+ msg319485 |
2018-06-13 18:25:18 | pablogsal | set | messages:
+ msg319481 |
2018-06-13 18:18:35 | pablogsal | set | pull_requests:
+ pull_request7298 |
2018-06-13 13:28:05 | pablogsal | set | messages:
+ msg319463 |
2018-06-13 12:54:41 | vstinner | set | messages:
+ msg319458 |
2018-06-13 12:51:18 | pablogsal | set | messages:
+ msg319457 |
2018-06-13 12:13:13 | cstratak | set | nosy:
+ cstratak messages:
+ msg319454
|
2018-06-13 06:51:52 | serhiy.storchaka | set | nosy:
+ alex messages:
+ msg319434
|
2018-06-13 06:29:52 | xiang.zhang | set | messages:
+ msg319432 |
2018-06-13 06:19:37 | serhiy.storchaka | set | messages:
+ msg319431 |
2018-06-13 06:14:50 | xiang.zhang | set | messages:
+ msg319429 |
2018-06-13 05:56:57 | xiang.zhang | set | nosy:
+ xiang.zhang messages:
+ msg319427
|
2018-06-12 15:53:28 | pablogsal | set | messages:
+ msg319394 |
2018-06-12 14:14:27 | pablogsal | set | messages:
+ msg319390 |
2018-06-12 14:12:56 | serhiy.storchaka | set | messages:
+ msg319389 |
2018-06-12 14:12:49 | pablogsal | set | messages:
+ msg319388 |
2018-06-12 14:10:25 | pablogsal | set | messages:
+ msg319387 |
2018-06-12 14:05:49 | serhiy.storchaka | set | messages:
+ msg319386 |
2018-06-12 14:03:53 | vstinner | set | messages:
+ msg319385 |
2018-06-12 12:44:46 | pablogsal | set | pull_requests:
+ pull_request7278 |
2018-05-28 07:27:50 | serhiy.storchaka | set | messages:
+ msg317827 |
2018-05-27 17:09:54 | pablogsal | set | nosy:
+ pablogsal messages:
+ msg317797
|
2018-05-24 14:39:10 | vstinner | set | pull_requests:
- pull_request6734 |
2018-05-24 12:48:53 | twisteroid ambassador | set | keywords:
+ patch stage: patch review pull_requests:
+ pull_request6734 |
2018-05-24 05:35:19 | serhiy.storchaka | set | assignee: serhiy.storchaka type: behavior messages:
+ msg317527 |
2018-05-24 05:12:11 | serhiy.storchaka | set | nosy:
+ serhiy.storchaka messages:
+ msg317524
|
2018-05-24 02:18:53 | vstinner | create | |