classification
Title: memory corrupted in test_capi refleaks test
Type: crash Stage:
Components: Library (Lib) Versions: Python 3.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: haypo, python-dev, skrah, xdegaye
Priority: normal Keywords: patch

Created on 2014-10-09 18:10 by xdegaye, last changed 2014-10-10 20:07 by haypo. This issue is now closed.

Files
File name Uploaded Description Edit
subtest_in_range.diff xdegaye, 2014-10-10 19:08 review
Messages (8)
msg228886 - (view) Author: Xavier de Gaye (xdegaye) * (Python committer) Date: 2014-10-09 18:10
This does not happen on tests run with '-R 22:22' or a lower run count, but occur systematically with '-R 23:23' or a greater run count.

$ ./python
Python 3.5.0a0 (default:1e1c6e306eb4, Oct  9 2014, 19:52:59)
[GCC 4.9.1 20140903 (prerelease)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import platform; platform.uname()
uname_result(system='Linux', node='bilboquet', release='3.16.3-1-ARCH', version='#1 SMP PREEMPT Wed Sep 17 21:54:13 CEST 2014', machine='x86_64', processor='')
>>> 

$ ./python -m test -R 23:23 test_capi
[1/1] test_capi
beginning 46 repetitions
1234567890123456789012345678901234567890123456
..............................................
1 test OK.
Debug memory block at address p=0x982570: API ''
    18446744073709551615 bytes originally requested
    The 7 pad bytes at p-7 are not all FORBIDDENBYTE (0xfb):
        at p-7: 0x00 *** OUCH
        at p-6: 0x00 *** OUCH
        at p-5: 0x00 *** OUCH
        at p-4: 0x00 *** OUCH
        at p-3: 0x00 *** OUCH
        at p-2: 0x00 *** OUCH
        at p-1: 0x00 *** OUCH
    Because memory is corrupted at the start, the count of bytes requested
       may be bogus, and checking the trailing pad bytes may segfault.
    The 8 pad bytes at tail=0x98256f are not all FORBIDDENBYTE (0xfb):
        at tail+0: 0x00 *** OUCH
        at tail+1: 0x00 *** OUCH
        at tail+2: 0x00 *** OUCH
        at tail+3: 0x00 *** OUCH
        at tail+4: 0x00 *** OUCH
        at tail+5: 0x00 *** OUCH
        at tail+6: 0x00 *** OUCH
        at tail+7: 0x00 *** OUCH
    The block was made by call #0 to debug malloc/realloc.
    Data at p:
Fatal Python error: bad ID: Allocated using API '', verified using API 'o'

Current thread 0x00007f525bcf2700 (most recent call first):
Aborted (core dumped)
msg228902 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2014-10-09 20:14
I cannot reproduce this here. Did you run "make distclean" before compiling?
msg228904 - (view) Author: Roundup Robot (python-dev) Date: 2014-10-09 20:16
New changeset 5d87a6b38422 by Victor Stinner in branch '3.4':
Issue #22588: Fix typo in _testcapi.test_incref_decref_API()
https://hg.python.org/cpython/rev/5d87a6b38422
msg228905 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2014-10-09 20:21
> but occur systematically with '-R 23:23' or a greater run count.

I was able to reproduce the issue. I found that the error came from test_incref_decref_API(). After my change, it looks like test_capi doesn't crash anymore.

$ ./python -m test -R 23:23 test_capi
[1/1] test_capi
beginning 46 repetitions
1234567890123456789012345678901234567890123456
..............................................
1 test OK.

Can you confirm Xavier?
msg228967 - (view) Author: Xavier de Gaye (xdegaye) * (Python committer) Date: 2014-10-10 08:37
That was really fast Victor!

I confirm that the '-R 23:23' refleak test does not crash any more here after changeset 5d87a6b38422.
msg228968 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2014-10-10 08:39
> That was really fast Victor!

I modified test_capi to only run one testcase, and I modified the testcase which caused the issue to run a subset of tests. By dichotomy, I found that only one function caused the fatal error.

Does anyone know how to automatize the dichotomy process? Maybe I should open a new issue for this?

It's very boring to disable tests by modifying the source code.

It would help a lot to identify memory leaks, crash, reference leak, resource leak, etc.
msg229022 - (view) Author: Xavier de Gaye (xdegaye) * (Python committer) Date: 2014-10-10 19:07
With the attached patch (the patch does reintroduce the bug in 'test_incref_decref_API' for testing purposes), it is possible to find the failing subtest rapidly:

Get the number of subtests (35 subsets):
$ export SUBTEST_RANGE="[]"
$ ./python -m test -m test__testcapi test_capi

Then run:
$ ./python -m test -m test__testcapi -R 23:23 test_capi

after modifying, each time, the range of subtests to execute, with:
$ export SUBTEST_RANGE="range(1,18)"    # tests 1-17  result: fail
$ export SUBTEST_RANGE="range(1,9)"     # test 1-8    result: pass
$ export SUBTEST_RANGE="range(1,13)"    # test 1-12   result: fail
$ export SUBTEST_RANGE="range(1,11)"    # test 1-10   result: fail
$ export SUBTEST_RANGE="[9]"            # so it must be test #9, check it now

The strong limitation with this solution is that the subTest context manager must now be enclosed in a 'try except unittest.SkipTest' clause and that the context manager is used more than 100 times in the test suite.
msg229030 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2014-10-10 20:07
Please open a new issue, this one is closed.
History
Date User Action Args
2014-10-10 20:07:47hayposetmessages: + msg229030
2014-10-10 19:08:19xdegayesetfiles: + subtest_in_range.diff
keywords: + patch
2014-10-10 19:07:53xdegayesetmessages: + msg229022
2014-10-10 08:39:44hayposetstatus: open -> closed
resolution: fixed
messages: + msg228968
2014-10-10 08:37:25xdegayesetmessages: + msg228967
2014-10-09 20:21:29hayposetnosy: + haypo
messages: + msg228905
2014-10-09 20:16:53python-devsetnosy: + python-dev
messages: + msg228904
2014-10-09 20:14:27skrahsetnosy: + skrah
messages: + msg228902
2014-10-09 18:10:59xdegayecreate