classification
Title: 3.8.7rc1 regression: 'free(): invalid pointer' after running backports-zoneinfo test suite
Type: crash Stage: resolved
Components: Interpreter Core Versions: Python 3.10, Python 3.9, Python 3.8
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: pablogsal Nosy List: brandtbucher, lukasz.langa, mgorny, nascheme, p-ganssle, pablogsal, tim.peters
Priority: Keywords:

Created on 2020-12-20 21:56 by mgorny, last changed 2020-12-21 16:41 by p-ganssle. This issue is now closed.

Messages (13)
msg383464 - (view) Author: Michał Górny (mgorny) * Date: 2020-12-20 21:56
I'm still investigating the problem and I will include more information shortly.  However, I'm filing the bug early, as I'd like to prevent this regression from hitting 3.8.7 release.

When running backports-zoneinfo-0.2.1 test suite using cpython 3.8.7rc1, all tests pass, then python segfaults:

```
----------------------------------------------------------------------
Ran 233 tests in 2.200s

OK (skipped=27)
free(): invalid pointer
/var/tmp/portage/dev-python/backports-zoneinfo-0.2.1/temp/environment: line 3054:   167 Aborted                 (core dumped) "${EPYTHON}" -m unittest discover -v
```


The backtrace I got doesn't seem very useful:

#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:49
#1  0x00007fd4b6c79536 in __GI_abort () at abort.c:79
#2  0x00007fd4b6cd2bf7 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7fd4b6de53b5 "%s\n") at ../sysdeps/posix/libc_fatal.c:155
#3  0x00007fd4b6cdaa7a in malloc_printerr (str=str@entry=0x7fd4b6de3593 "free(): invalid pointer") at malloc.c:5389
#4  0x00007fd4b6cdbe5c in _int_free (av=<optimized out>, p=<optimized out>, have_lock=0) at malloc.c:4201
#5  0x00007fd4b6f00aaa in ?? () from /usr/lib64/libpython3.8.so.1.0
#6  0x00007fd4b6eb8745 in ?? () from /usr/lib64/libpython3.8.so.1.0
#7  0x00007fd4b6ece115 in ?? () from /usr/lib64/libpython3.8.so.1.0
#8  0x00007fd4b6ece2f2 in ?? () from /usr/lib64/libpython3.8.so.1.0
#9  0x0000562239cd1a60 in ?? ()
#10 0x00007fd4b7086967 in ?? () from /usr/lib64/libpython3.8.so.1.0
#11 0x00007fd4b7167e20 in ?? () from /usr/lib64/libpython3.8.so.1.0
#12 0x0000562239cd1a60 in ?? ()
#13 0x00007fd4b6f05d26 in ?? () from /usr/lib64/libpython3.8.so.1.0
#14 0x00007fd4b6fccf0d in ?? () from /usr/lib64/libpython3.8.so.1.0
#15 0x00007fd4b6fcdc1d in PyGC_Collect () from /usr/lib64/libpython3.8.so.1.0
#16 0x000056223996c670 in ?? ()
#17 0x00007fd4b6f93e8a in PyImport_Cleanup () from /usr/lib64/libpython3.8.so.1.0
#18 0x00007fd4b6faa55c in Py_NewInterpreter () from /usr/lib64/libpython3.8.so.1.0
#19 0x0000000000000000 in ?? ()


I'm going to start by trying to bisect this, and let you know the results when I manage them.
msg383465 - (view) Author: Michał Górny (mgorny) * Date: 2020-12-20 22:17
aeb66c1abbf4ec214e2e80eb972546996d1a1571 is the first bad commit
commit aeb66c1abbf4ec214e2e80eb972546996d1a1571
Author: Miss Skeleton (bot) <31488909+miss-islington@users.noreply.github.com>
Date:   Thu Oct 15 08:51:48 2020 -0700

    bpo-41984: GC track all user classes (GH-22701/GH-22707)
    
    (cherry picked from commit c13b847a6f913b72eeb71651ff626390b738d973)

 Lib/test/test_finalization.py                      | 23 ++++++++++++++++++++--
 Lib/test/test_gc.py                                |  6 +++---
 .../2020-10-14-16-19-43.bpo-41984.SEtKMr.rst       |  2 ++
 Modules/_testcapimodule.c                          | 20 +++++++++++++++++++
 Objects/typeobject.c                               | 22 ++++++---------------
 5 files changed, 52 insertions(+), 21 deletions(-)
 create mode 100644 Misc/NEWS.d/next/Core and Builtins/2020-10-14-16-19-43.bpo-41984.SEtKMr.rst
msg383466 - (view) Author: Michał Górny (mgorny) * Date: 2020-12-20 22:23
A more complete backtrace:

#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:49
#1  0x00007fa633b20536 in __GI_abort () at abort.c:79
#2  0x00007fa633b79bf7 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7fa633c8c3b5 "%s\n") at ../sysdeps/posix/libc_fatal.c:155
#3  0x00007fa633b81a7a in malloc_printerr (str=str@entry=0x7fa633c8a593 "free(): invalid pointer") at malloc.c:5389
#4  0x00007fa633b82e5c in _int_free (av=<optimized out>, p=<optimized out>, have_lock=0) at malloc.c:4201
#5  0x000055ea845c6170 in _PyMem_RawFree (ctx=0x0, ptr=0x55ea848c1a60 <small_ints+192>) at Objects/obmalloc.c:127
#6  0x000055ea845c6f46 in PyMem_RawFree (ptr=0x55ea848c1a60 <small_ints+192>) at Objects/obmalloc.c:595
#7  0x000055ea845c7fa0 in _PyObject_Free (ctx=0x0, p=0x55ea848c1a60 <small_ints+192>) at Objects/obmalloc.c:1898
#8  0x000055ea845c726e in PyObject_Free (ptr=0x55ea848c1a60 <small_ints+192>) at Objects/obmalloc.c:709
#9  0x000055ea845dda33 in object_dealloc (self=0x55ea848c1a60 <small_ints+192>) at Objects/typeobject.c:3784
#10 0x000055ea845c60a5 in _Py_Dealloc (op=0x55ea848c1a60 <small_ints+192>) at Objects/object.c:2215
#11 0x000055ea845d3b55 in _Py_DECREF (filename=0x55ea847a8de0 "./Include/object.h", lineno=541, op=0x55ea848c1a60 <small_ints+192>)
    at ./Include/object.h:478
#12 0x000055ea845d3b83 in _Py_XDECREF (op=0x55ea848c1a60 <small_ints+192>) at ./Include/object.h:541
#13 0x000055ea845d4627 in tupledealloc (op=0x7fa63195d6c0) at Objects/tupleobject.c:247
#14 0x000055ea845c60a5 in _Py_Dealloc (op=0x7fa63195d6c0) at Objects/object.c:2215
#15 0x000055ea84743afb in _Py_DECREF (filename=0x55ea8482b9a0 "./Include/object.h", lineno=541, op=0x7fa63195d6c0) at ./Include/object.h:478
#16 0x000055ea84743b29 in _Py_XDECREF (op=0x7fa63195d6c0) at ./Include/object.h:541
#17 0x000055ea84745b5a in code_dealloc (co=0x7fa631b68450) at Objects/codeobject.c:552
#18 0x000055ea845c60a5 in _Py_Dealloc (op=0x7fa631b68450) at Objects/object.c:2215
#19 0x000055ea847525d2 in _Py_DECREF (filename=0x55ea8482e47c "Objects/funcobject.c", lineno=576, op=0x7fa631b68450) at ./Include/object.h:478
#20 0x000055ea84753b6a in func_clear (op=0x7fa631b768b0) at Objects/funcobject.c:576
#21 0x000055ea84753de0 in func_dealloc (op=0x7fa631b768b0) at Objects/funcobject.c:597
#22 0x000055ea845c60a5 in _Py_Dealloc (op=0x7fa631b768b0) at Objects/object.c:2215
#23 0x000055ea845addff in _Py_DECREF (filename=0x55ea847a3320 "./Include/object.h", lineno=541, op=0x7fa631b768b0) at ./Include/object.h:478
#24 0x000055ea845ade4f in _Py_XDECREF (op=0x7fa631b768b0) at ./Include/object.h:541
#25 0x000055ea845aeed8 in free_keys_object (keys=0x55ea84e72580) at Objects/dictobject.c:584
#26 0x000055ea845ae524 in dictkeys_decref (dk=0x55ea84e72580) at Objects/dictobject.c:324
#27 0x000055ea845b1c9c in PyDict_Clear (op=0x7fa63198b740) at Objects/dictobject.c:1729
#28 0x000055ea845dd683 in type_clear (type=0x55ea84e12ba0) at Objects/typeobject.c:3594
#29 0x000055ea845d8720 in subtype_clear (self=0x55ea84e12ba0) at Objects/typeobject.c:1147
#30 0x000055ea846ac4aa in delete_garbage (state=0x55ea848dd6f8 <_PyRuntime+344>, collectable=0x7fff13eb2e10, old=0x55ea848dd740 <_PyRuntime+416>)
    at Modules/gcmodule.c:948
#31 0x000055ea846aca23 in collect (state=0x55ea848dd6f8 <_PyRuntime+344>, generation=2, n_collected=0x0, n_uncollectable=0x0, nofail=1)
    at Modules/gcmodule.c:1123
#32 0x000055ea846ae0da in _PyGC_CollectNoFail () at Modules/gcmodule.c:1866
#33 0x000055ea846616da in PyImport_Cleanup () at Python/import.c:599
#34 0x000055ea8467df93 in Py_FinalizeEx () at Python/pylifecycle.c:1233
#35 0x000055ea84577e8a in Py_RunMain () at Modules/main.c:691
#36 0x000055ea84577f22 in pymain_main (args=0x7fff13eb2ff0) at Modules/main.c:719
#37 0x000055ea84577fe2 in Py_BytesMain (argc=6, argv=0x7fff13eb3138) at Modules/main.c:743
#38 0x000055ea84576785 in main (argc=6, argv=0x7fff13eb3138) at ./Programs/python.c:16
msg383469 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2020-12-21 00:30
Raising this to a release blocker
msg383470 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2020-12-21 00:41
The symptom is that we are now trying to free something in the small integer cache that we shouldn't. Running this under the address sanitizer shows a bit more of the problem:

==190303==ERROR: AddressSanitizer: attempting free on address which was not malloc()-ed: 0x564456357fe0 in thread T0
    #0 0x7f18ad2200e9 in __interceptor_free /build/gcc/src/gcc/libsanitizer/asan/asan_malloc_linux.cpp:123
    #1 0x564455b03283 in _PyMem_RawFree Objects/obmalloc.c:127
    #2 0x564455b0538d in PyObject_Free Objects/obmalloc.c:709
    #3 0x564455b3b820 in object_dealloc Objects/typeobject.c:3797
    #4 0x564455b03180 in _Py_Dealloc Objects/object.c:2215
    #5 0x564455b21308 in _Py_DECREF Include/object.h:478
    #6 0x564455b21336 in _Py_XDECREF Include/object.h:541
    #7 0x564455b2305f in tupledealloc Objects/tupleobject.c:247
    #8 0x564455b03180 in _Py_Dealloc Objects/object.c:2215
    #9 0x564455f1ef6b in _Py_DECREF Include/object.h:478
    #10 0x564455f1ef99 in _Py_XDECREF Include/object.h:541
    #11 0x564455f24ed4 in code_dealloc Objects/codeobject.c:552
    #12 0x564455b03180 in _Py_Dealloc Objects/object.c:2215
    #13 0x564455a85e3c in _Py_DECREF Include/object.h:478
    #14 0x564455a892f7 in func_clear Objects/funcobject.c:579
    #15 0x564455a89704 in func_dealloc Objects/funcobject.c:600
    #16 0x564455b03180 in _Py_Dealloc Objects/object.c:2215
    #17 0x564455ac025d in _Py_DECREF Include/object.h:478
    #18 0x564455ac02ad in _Py_XDECREF Include/object.h:541
    #19 0x564455ac3178 in free_keys_object Objects/dictobject.c:584
    #20 0x564455ac1f41 in dictkeys_decref Objects/dictobject.c:324
    #21 0x564455acb293 in PyDict_Clear Objects/dictobject.c:1729
    #22 0x564455b3b00b in type_clear Objects/typeobject.c:3607
    #23 0x564455d9afa6 in delete_garbage Modules/gcmodule.c:948
    #24 0x564455d9b9c3 in collect Modules/gcmodule.c:1123
    #25 0x564455d9ee3a in _PyGC_CollectNoFail Modules/gcmodule.c:1866
    #26 0x564455ce0fe3 in PyImport_Cleanup Python/import.c:599
    #27 0x564455d29245 in Py_FinalizeEx Python/pylifecycle.c:1229
    #28 0x564455a29e99 in Py_RunMain Modules/main.c:691
    #29 0x564455a2a236 in pymain_main Modules/main.c:719
    #30 0x564455a2a5e5 in Py_BytesMain Modules/main.c:743
    #31 0x564455a269b8 in main Programs/python.c:16
    #32 0x7f18ace5a151 in __libc_start_main (/usr/lib/libc.so.6+0x28151)
    #33 0x564455a268cd in _start (/home/pablogsal/github/python/3.8/python+0x1d48cd)

0x564456357fe0 is located 192 bytes inside of global variable 'small_ints' defined in 'Objects/longobject.c:43:21' (0x564456357f20) of size 8384
SUMMARY: AddressSanitizer

Still unclear how we have reached this situation
msg383473 - (view) Author: Brandt Bucher (brandtbucher) * (Python committer) Date: 2020-12-21 01:34
On my phone right now, but this looks a tad suspicious:

https://github.com/pganssle/zoneinfo/blob/07ec80ad5dc7e7e4b4f861ddbb61a9b71e9f27c7/lib/zoneinfo_module.c#L596-L600
msg383477 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2020-12-21 01:49
Indeed. The module in the std lib is actually different:

https://github.com/python/cpython/blob/master/Modules/_zoneinfo.c#L570-L589

and uses _PyLong_GetOne() instead of _PyLong_One
msg383478 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2020-12-21 01:52
The version in 3.9 is also different but also uses _PyLong_One:

https://github.com/python/cpython/blob/3.9/Modules/_zoneinfo.c#L571-L590

Although that is passed directly to PyDict_SetItemString so the refcount should be correct.
msg383479 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2020-12-21 02:00
Oh, wait. This looks incorrect:

https://github.com/pganssle/zoneinfo/blob/07ec80ad5dc7e7e4b4f861ddbb61a9b71e9f27c7/lib/zoneinfo_module.c#L619-L621

That

#ifndef ATLEAST_37

should be

#ifdef ATLEAST_37
msg383480 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2020-12-21 02:04
I am quite sure that is the problem. I opened https://github.com/pganssle/zoneinfo/pull/97 to correct this in the backport.

The standard library versions are ok.

Brandt, if you agree that this is indeed the problem, let's close this as "not a bug".
msg383484 - (view) Author: Brandt Bucher (brandtbucher) * (Python committer) Date: 2020-12-21 03:27
Almost certain. The number one is offset 192 bytes in small_ints on 3.8, which matches both of your backtraces:

>>> id(1) - id(-5)                                                                                                                                                                                                                            
192
msg383496 - (view) Author: Michał Górny (mgorny) * Date: 2020-12-21 09:18
Thank you for debugging this.  I can confirm that this patch resolves my issue.
msg383538 - (view) Author: Paul Ganssle (p-ganssle) * (Python committer) Date: 2020-12-21 16:41
For future reference, this bug is triggered only when `.fromutc` is called on a subclass of `datetime` and the resulting date is the second ambiguous time (e.g. if there's a DST transition from 02:00 → 01:00, and the result of the `.fromutc` call is the *second* 01:30, you will trigger this issue).

Here's an MWE:

```
from backports.zoneinfo import ZoneInfo
from datetime import datetime, timezone

class SubDT(datetime):
    pass

LON = ZoneInfo("Europe/London")
d = SubDT(2020, 10, 25, 1, 30, tzinfo=timezone.utc)

# Each pass through the loop inappropriately reduces the reference count on the
# `1` object by 1. Since there are usually a large number of live references to
# `1`, this won't have any immediate noticeable effect unless you do it a lot.
for i in range(10000):
    d.astimezone(LON)
```
History
Date User Action Args
2020-12-21 16:41:30p-gansslesetmessages: + msg383538
2020-12-21 09:18:29mgornysetmessages: + msg383496
2020-12-21 04:15:32pablogsalsetpriority: release blocker ->
assignee: pablogsal
status: open -> closed
resolution: not a bug
stage: resolved
2020-12-21 03:27:56brandtbuchersetmessages: + msg383484
2020-12-21 02:04:20pablogsalsetmessages: + msg383480
2020-12-21 02:00:22pablogsalsetnosy: + p-ganssle
2020-12-21 02:00:09pablogsalsetmessages: + msg383479
2020-12-21 01:52:47pablogsalsetmessages: + msg383478
2020-12-21 01:49:21pablogsalsetmessages: + msg383477
2020-12-21 01:34:08brandtbuchersetmessages: + msg383473
2020-12-21 00:41:21pablogsalsetmessages: + msg383470
2020-12-21 00:30:54pablogsalsetpriority: normal -> release blocker

messages: + msg383469
versions: + Python 3.9, Python 3.10
2020-12-21 00:02:38Arfreversetnosy: + tim.peters, nascheme, pablogsal, brandtbucher
2020-12-20 22:23:40mgornysetmessages: + msg383466
2020-12-20 22:17:07mgornysetmessages: + msg383465
2020-12-20 21:59:00Arfreversetnosy: + lukasz.langa
2020-12-20 21:56:08mgornycreate