File name |
Uploaded |
Description |
Edit |
unicode_init_globals.patch
|
skrah,
2010-10-22 14:39
|
|
review
|
unicode_init_globals2.patch
|
skrah,
2010-10-24 10:21
|
|
review
|
unicode-leak.patch
|
skrah,
2011-04-11 07:31
|
patch by Daniel Stutzbach |
|
unicode_globals-2.7.patch
|
serhiy.storchaka,
2013-01-07 11:28
|
|
review
|
unicode_globals-3.2.patch
|
serhiy.storchaka,
2013-01-07 11:28
|
|
review
|
unicode_globals-3.3.patch
|
serhiy.storchaka,
2013-01-07 11:28
|
|
review
|
unicode_globals-3.4.patch
|
serhiy.storchaka,
2013-01-07 11:28
|
|
review
|
unicode_globals-2.7_2.patch
|
serhiy.storchaka,
2013-01-24 20:21
|
|
review
|
unicode_globals-3.2_2.patch
|
serhiy.storchaka,
2013-01-24 20:21
|
|
review
|
unicode_globals-3.3_2.patch
|
serhiy.storchaka,
2013-01-24 20:21
|
|
|
unicode_globals-3.4_2.patch
|
serhiy.storchaka,
2013-01-24 20:21
|
|
|
unicode_globals-2.7_3.patch
|
serhiy.storchaka,
2013-01-25 19:51
|
|
review
|
unicode_globals-3.2_3.patch
|
serhiy.storchaka,
2013-01-25 19:51
|
|
review
|
unicode_globals-3.3_3.patch
|
serhiy.storchaka,
2013-01-25 19:51
|
|
review
|
unicode_globals-3.4_3.patch
|
serhiy.storchaka,
2013-01-25 19:51
|
|
review
|
msg119226 - (view) |
Author: Stefan Krah (skrah) * |
Date: 2010-10-20 17:27 |
This is one of two remaining "definitely lost" leaks in py3k. It started
to appear in r70459. How to reproduce:
make distclean && ./configure OPT="-O0 -g" --without-pymalloc && make
valgrind --leak-check=full --suppressions=Misc/valgrind-python.supp ./python > VGOUT 2>&1
Then search for 'definitely'. This leak is not present in release-2.7.
==2058== 56 bytes in 1 blocks are definitely lost in loss record 918 of 2,136
==2058== at 0x4C2412C: malloc (vg_replace_malloc.c:195)
==2058== by 0x4167DE: _PyObject_New (object.c:243)
==2058== by 0x42C278: _PyUnicode_New (unicodeobject.c:341)
==2058== by 0x4306BD: PyUnicodeUCS2_DecodeUTF8Stateful (unicodeobject.c:2100)
==2058== by 0x430671: PyUnicodeUCS2_DecodeUTF8 (unicodeobject.c:2065)
==2058== by 0x42C8F7: PyUnicodeUCS2_FromStringAndSize (unicodeobject.c:541)
==2058== by 0x42C973: PyUnicodeUCS2_FromString (unicodeobject.c:559)
==2058== by 0x50B432: PyDict_SetItemString (dictobject.c:2088)
==2058== by 0x4258DF: PyType_Ready (typeobject.c:3844)
==2058== by 0x517B64: PyStructSequence_InitType (structseq.c:522)
==2058== by 0x4F3B4F: _PyFloat_Init (floatobject.c:1905)
==2058== by 0x4813CE: Py_InitializeEx (pythonrun.c:217)
|
msg119237 - (view) |
Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * |
Date: 2010-10-20 22:10 |
The stack corresponds to the allocation of type(sys.float_info).__doc__.
Why would only this object appear as a memory leak? It is certainly not deallocated, but all other types are in the same situation.
For example, sys.int_info is very similar, and happens to be defined in r70459. Why doesn't it appear in the report?
|
msg119238 - (view) |
Author: Stefan Krah (skrah) * |
Date: 2010-10-20 22:24 |
To add to the mystery, the leak disappears if the key value is not
interned in PyDict_SetItemString:
Index: Objects/dictobject.c
===================================================================
--- Objects/dictobject.c (revision 70459)
+++ Objects/dictobject.c (working copy)
@@ -2088,7 +2088,6 @@
kv = PyUnicode_FromString(key);
if (kv == NULL)
return -1;
- PyUnicode_InternInPlace(&kv); /* XXX Should we really? */
err = PyDict_SetItem(v, kv, item);
Py_DECREF(kv);
return err;
|
msg119239 - (view) |
Author: Marc-Andre Lemburg (lemburg) * |
Date: 2010-10-20 22:29 |
Stefan Krah wrote:
>
> Stefan Krah <stefan-usenet@bytereef.org> added the comment:
>
> To add to the mystery, the leak disappears if the key value is not
> interned in PyDict_SetItemString:
I'm not sure how you determine what is a leak and what not.
Interned Unicode objects stay alive until the interpreter
is finalized.
Are you suggesting that the finalization does not free the
interned Unicode strings or not all of them ?
> Index: Objects/dictobject.c
> ===================================================================
> --- Objects/dictobject.c (revision 70459)
> +++ Objects/dictobject.c (working copy)
> @@ -2088,7 +2088,6 @@
> kv = PyUnicode_FromString(key);
> if (kv == NULL)
> return -1;
> - PyUnicode_InternInPlace(&kv); /* XXX Should we really? */
> err = PyDict_SetItem(v, kv, item);
> Py_DECREF(kv);
> return err;
|
msg119241 - (view) |
Author: Stefan Krah (skrah) * |
Date: 2010-10-20 22:47 |
Marc-Andre Lemburg <report@bugs.python.org> wrote:
> I'm not sure how you determine what is a leak and what not.
> Interned Unicode objects stay alive until the interpreter
> is finalized.
>
> Are you suggesting that the finalization does not free the
> interned Unicode strings or not all of them ?
No, Valgrind's "definitely lost" category means that all pointers
to an allocated region have been lost, so it would not be possible
to free the area. [1]
There are hundreds of "possibly lost" warnings as well, but I did
not report those.
My experience is that Valgrind is usually correct with "definitely
lost", see e.g. #10153. That said, of course it _could_ be a false
alarm.
[1] Last category from:
http://mail.python.org/pipermail/python-dev/2002-October/029758.html
|
msg119356 - (view) |
Author: Stefan Krah (skrah) * |
Date: 2010-10-22 00:03 |
Re disabling interning in PyDict_SetItemString:
A comment in unicodeobject.c says that globals should not be used
before calling _PyUnicode_Init. But in Py_InitializeEx (pythonrun.c)
_PyUnicode_Init is called after _Py_ReadyTypes, _PyFrame_Init,
_PyLong_Init, PyByteArray_Init and _PyFloat_Init.
In fact, when I move _PyUnicode_Init up, the error concerning
_PyFloat_Init disappears.
Problem is, PyType_Ready also uses PyDict_SetItemString, but I
presume that _Py_ReadyTypes has to be called before anything else.
In that case it would be unavoidable that PyDict_SetItemString is
used before _PyUnicode_Init, and it might be a good idea to disable
interning.
|
msg119387 - (view) |
Author: Stefan Krah (skrah) * |
Date: 2010-10-22 14:39 |
I've verified the leak manually. The cause is that global variables in
unicodeobject.c, e.g. free_list, are used before _PyUnicode_Init() is
called. Later on _PyUnicode_Init() sets these variables to NULL, losing
the allocated memory.
Here is an example of the earliest use of free_list during
_Py_ReadyTypes (),
well before _PyUnicode_Init():
Breakpoint 1, unicode_dealloc (unicode=0x1b044c0) at Objects/unicodeobject.c:392
392 switch (PyUnicode_CHECK_INTERNED(unicode)) {
(gdb) bt
#0 unicode_dealloc (unicode=0x1b044c0) at Objects/unicodeobject.c:392
#1 0x000000000044fc69 in PyUnicode_InternInPlace (p=0x7fff303852b8) at Objects/unicodeobject.c:9991
#2 0x000000000044fed3 in PyUnicode_InternFromString (cp=0x568861 "__len__") at Objects/unicodeobject.c:10025
#3 0x00000000004344d0 in init_slotdefs () at Objects/typeobject.c:5751
#4 0x0000000000434840 in add_operators (type=0x7be260) at Objects/typeobject.c:5905
#5 0x000000000042eec8 in PyType_Ready (type=0x7be260) at Objects/typeobject.c:3810
#6 0x000000000042edfc in PyType_Ready (type=0x7bde60) at Objects/typeobject.c:3774
#7 0x000000000041aa5f in _Py_ReadyTypes () at Objects/object.c:1514
#8 0x00000000004992ff in Py_InitializeEx (install_sigs=1) at Python/pythonrun.c:232
#9 0x000000000049957f in Py_Initialize () at Python/pythonrun.c:321
#10 0x00000000004b289f in Py_Main (argc=1, argv=0x1afa010) at Modules/main.c:590
#11 0x0000000000417dcc in main (argc=1, argv=0x7fff30385758) at ./Modules/python.c:59
(gdb) n
411 if (PyUnicode_CheckExact(unicode) &&
(gdb)
414 if (unicode->length >= KEEPALIVE_SIZE_LIMIT) {
(gdb)
419 if (unicode->defenc) {
(gdb)
423 *(PyUnicodeObject **)unicode = free_list;
(gdb) n
424 free_list = unicode;
(gdb) n
425 numfree++;
(gdb) n
411 if (PyUnicode_CheckExact(unicode) &&
A possible fix could be to initialize the globals right at the start
in main.c. Note that there are still several Unicode API functions in
main.c before PyType_Ready has been called on the Unicode type.
With the patch, Valgrind does not show the leak any longer.
|
msg119396 - (view) |
Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * |
Date: 2010-10-22 18:04 |
About the patch: why should _PyUnicode_Init() try to call _PyUnicode_InitGlobals() again?
|
msg119503 - (view) |
Author: Stefan Krah (skrah) * |
Date: 2010-10-24 10:21 |
> why should _PyUnicode_Init() try to call _PyUnicode_InitGlobals() again?
For the embedding scenario (when only Py_Initialize() is called) I wanted
to preserve the old behavior of _PyUnicode_Init().
But this is not really enough. I wrote a new patch that also calls
_PyUnicode_InitGlobals() at the beginning of Py_Initialize().
I don't like the fact that even more clutter is added to Py_Main(). Perhaps
Py_Initialize() could be moved up or the Unicode functions could be moved
down.
|
msg133502 - (view) |
Author: Stefan Krah (skrah) * |
Date: 2011-04-11 07:31 |
[Merging with issue 11402]
Daniel's patch is much simpler, but I think that unicode_empty and
unicode_latin1 would need to be protected before _PyUnicode_Init
is called.
Is the module initialization procedure documented somewhere? I get
the impression that unicodeobject.c depends on dict.c and dict.c
depends on unicodeobject.c.
|
msg133504 - (view) |
Author: Stefan Krah (skrah) * |
Date: 2011-04-11 07:36 |
Stefan Krah <report@bugs.python.org> wrote:
> Is the module initialization procedure documented somewhere? I get
> the impression that unicodeobject.c depends on dict.c and dict.c
> depends on unicodeobject.c.
s/dict.c/dictobject.c/
|
msg144749 - (view) |
Author: Stefan Krah (skrah) * |
Date: 2011-10-01 20:59 |
The PEP-393 changes apparently fix this leak; at least I can't reproduce
it in default any longer (but still in 3.2).
|
msg172103 - (view) |
Author: Stefan Krah (skrah) * |
Date: 2012-10-05 17:24 |
See also #16143.
|
msg179221 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2013-01-06 20:33 |
Daniel's patch looks good for me. 2.7 looks affected too.
|
msg179223 - (view) |
Author: Stefan Krah (skrah) * |
Date: 2013-01-06 20:43 |
unicode-leak.patch doesn't fix #16143 though. unicode_empty and
unicode_latin1 need to be initialized, too.
Actually we could close this in favor of #16143.
|
msg179234 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2013-01-06 22:29 |
> unicode-leak.patch doesn't fix #16143 though. unicode_empty and
> unicode_latin1 need to be initialized, too.
Indeed. I'll upload patches tomorrow.
|
msg179256 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2013-01-07 11:28 |
Here are patches for all four Python versions. They fixes possible usage of the
followed non-initialized global variables: free_list, numfree, interned,
unicode_empty, static_strings, unicode_latin1, bloom_linebreak,
unicode_default_encoding.
|
msg179494 - (view) |
Author: Stefan Krah (skrah) * |
Date: 2013-01-09 22:54 |
Nick, I'm adding you to the nosy list since this issue seems related to PEP 432.
Quick summary: Globals are used in unicodeobject.c before they are initialized.
Also, Unicode objects are created before PyType_Ready(&PyUnicode_Type) has been
called.
This happens during startup:
_Py_InitializeEx_Private():
_Py_ReadyTypes():
PyType_Ready(&PyType_Type);
[...]
Many Unicode objects like "" or "__add__" are created. Uninitialized
globals have led to a crash (#16143). This is fixed by Serhiy's patch,
which always dynamically checks all globals for NULL before using them.
However, Unicode objects are still created at this point.
[...]
PyType_Ready(&PyUnicode_Type); /* Called for the first time */
[...]
_PyUnicode_Init:
for (i = 0; i < 256; i++) /* Could leak if latin1 strings
unicode_latin1[i] = NULL; have already been created. */
PyType_Ready(&PyUnicode_Type); /* Called a second time! */
So, considering PEP 432: Are these "pre-type-ready" Unicode objects
safe to use, or should something be done about it?
|
msg179498 - (view) |
Author: Nick Coghlan (ncoghlan) * |
Date: 2013-01-09 23:10 |
There should still be a check in tp_new (IIRC) that calls PyType_Ready on
unready types.
While doing something systematic about this kind of problem is part of the
rationale of PEP 432, that won't help earlier versions.
|
msg179504 - (view) |
Author: Stefan Krah (skrah) * |
Date: 2013-01-10 00:21 |
Nick Coghlan <report@bugs.python.org> wrote:
> There should still be a check in tp_new (IIRC) that calls PyType_Ready on
> unready types.
Indeed there is one in type_new(), but that isn't used here AFAICS. If
you apply this patch and start up python, there are many "str: not ready"
instances:
diff --git a/Objects/unicodeobject.c b/Objects/unicodeobject.c
--- a/Objects/unicodeobject.c
+++ b/Objects/unicodeobject.c
@@ -14282,6 +14282,10 @@
PyUnicode_InternFromString(const char *cp)
{
PyObject *s = PyUnicode_FromString(cp);
+
+ fprintf(stderr, "%s: %s\n", PyUnicode_Type.tp_name,
+ (PyUnicode_Type.tp_flags & Py_TPFLAGS_READY) ? "ready" : "not ready");
+
if (s == NULL)
return NULL;
PyUnicode_InternInPlace(&s);
|
msg180546 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2013-01-24 20:21 |
There is a set of updated patches.
|
msg180547 - (view) |
Author: Nick Coghlan (ncoghlan) * |
Date: 2013-01-24 20:42 |
Serhiy's general approach here looks good to me (although there seem to be some unrelated changes to the re module in the current 3.2 patch).
For PEP 432, I want to try to rearrange things so that _PyUnicode_Init is one of the *first* calls made in Py_BeginInitialization (even before the general call to Py_ReadyTypes), but that still won't invalidate the work done here.
|
msg180573 - (view) |
Author: Stefan Krah (skrah) * |
Date: 2013-01-25 12:29 |
Since Rietveld didn't mail me this time: I left some comments on the 2.7 patch.
|
msg180579 - (view) |
Author: Stefan Krah (skrah) * |
Date: 2013-01-25 13:40 |
The 2.7 comments also apply to the 3.2 patch. Otherwise the 3.2 patch
(without the _sre changes :) looks good to me.
|
msg180617 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2013-01-25 19:51 |
> The 2.7 comments also apply to the 3.2 patch. Otherwise the 3.2 patch
> (without the _sre changes :) looks good to me.
Patches updated addressing new Stefan's comments. Unicode globals no longer
reinitialized in _PyUnicode_Init(). Note that I have added a consistency check
into the macro in 3.3+.
I hope Rietveld will accept this set.
|
msg180623 - (view) |
Author: Stefan Krah (skrah) * |
Date: 2013-01-25 21:12 |
Nice. I think the latest patches are commit-ready.
|
msg180652 - (view) |
Author: Roundup Robot (python-dev) |
Date: 2013-01-26 10:21 |
New changeset 7c8ad0d02664 by Serhiy Storchaka in branch '2.7':
Issue #10156: In the interpreter's initialization phase, unicode globals
http://hg.python.org/cpython/rev/7c8ad0d02664
New changeset f7eda8165e6f by Serhiy Storchaka in branch '3.2':
Issue #10156: In the interpreter's initialization phase, unicode globals
http://hg.python.org/cpython/rev/f7eda8165e6f
New changeset 01d4dd412581 by Serhiy Storchaka in branch '3.3':
Issue #10156: In the interpreter's initialization phase, unicode globals
http://hg.python.org/cpython/rev/01d4dd412581
New changeset cb12d642eed2 by Serhiy Storchaka in branch 'default':
Issue #10156: In the interpreter's initialization phase, unicode globals
http://hg.python.org/cpython/rev/cb12d642eed2
|
msg180654 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2013-01-26 10:33 |
Committed. Thank you for review, Stefan. Close this issue if the work is finished.
|
msg180721 - (view) |
Author: Stefan Krah (skrah) * |
Date: 2013-01-26 23:21 |
Buildbots etc. look all good. Thanks for fixing this.
|
|
Date |
User |
Action |
Args |
2022-04-11 14:57:07 | admin | set | github: 54365 |
2013-01-26 23:21:43 | skrah | set | resolution: fixed stage: patch review -> resolved |
2013-01-26 23:21:26 | skrah | set | status: open -> closed |
2013-01-26 23:21:19 | skrah | set | messages:
+ msg180721 |
2013-01-26 10:33:56 | serhiy.storchaka | set | messages:
+ msg180654 |
2013-01-26 10:21:50 | python-dev | set | nosy:
+ python-dev messages:
+ msg180652
|
2013-01-25 21:12:47 | skrah | set | messages:
+ msg180623 |
2013-01-25 19:51:33 | serhiy.storchaka | set | files:
+ unicode_globals-2.7_3.patch, unicode_globals-3.2_3.patch, unicode_globals-3.3_3.patch, unicode_globals-3.4_3.patch
messages:
+ msg180617 |
2013-01-25 13:40:39 | skrah | set | messages:
+ msg180579 |
2013-01-25 12:29:15 | skrah | set | messages:
+ msg180573 versions:
+ Python 3.3, Python 3.4 |
2013-01-24 20:42:08 | ncoghlan | set | messages:
+ msg180547 |
2013-01-24 20:21:09 | serhiy.storchaka | set | files:
+ unicode_globals-2.7_2.patch, unicode_globals-3.2_2.patch, unicode_globals-3.3_2.patch, unicode_globals-3.4_2.patch
messages:
+ msg180546 |
2013-01-10 00:21:25 | skrah | set | messages:
+ msg179504 |
2013-01-09 23:10:52 | ncoghlan | set | messages:
+ msg179498 |
2013-01-09 22:54:13 | skrah | set | nosy:
+ ncoghlan messages:
+ msg179494
|
2013-01-07 23:30:53 | skrah | set | priority: high -> critical |
2013-01-07 23:30:38 | skrah | set | nosy:
+ georg.brandl, pitrou, franck, Gregory.Andersen
|
2013-01-07 23:29:13 | skrah | link | issue16143 superseder |
2013-01-07 11:34:18 | serhiy.storchaka | set | stage: commit review -> patch review |
2013-01-07 11:28:43 | serhiy.storchaka | set | files:
+ unicode_globals-2.7.patch, unicode_globals-3.2.patch, unicode_globals-3.3.patch, unicode_globals-3.4.patch
messages:
+ msg179256 |
2013-01-06 22:29:46 | serhiy.storchaka | set | messages:
+ msg179234 |
2013-01-06 20:43:48 | skrah | set | messages:
+ msg179223 |
2013-01-06 20:33:36 | serhiy.storchaka | set | versions:
+ Python 2.7, - Python 3.3, Python 3.4 nosy:
+ serhiy.storchaka
messages:
+ msg179221
stage: patch review -> commit review |
2013-01-04 23:41:18 | Arfrever | set | nosy:
+ Arfrever
|
2012-10-05 17:24:52 | skrah | set | messages:
+ msg172103 versions:
+ Python 3.3, Python 3.4 |
2012-04-20 17:58:50 | mark.dickinson | set | nosy:
- mark.dickinson
|
2011-10-01 20:59:19 | skrah | set | messages:
+ msg144749 |
2011-04-11 07:36:29 | skrah | set | messages:
+ msg133504 |
2011-04-11 07:32:27 | skrah | link | issue11402 superseder |
2011-04-11 07:31:07 | skrah | set | files:
+ unicode-leak.patch nosy:
+ stutzbach messages:
+ msg133502
|
2010-10-24 10:21:44 | skrah | set | files:
+ unicode_init_globals2.patch
messages:
+ msg119503 |
2010-10-22 18:04:51 | amaury.forgeotdarc | set | messages:
+ msg119396 |
2010-10-22 14:39:03 | skrah | set | files:
+ unicode_init_globals.patch priority: normal -> high title: Memory leak (r70459) -> Initialization of globals in unicodeobject.c messages:
+ msg119387
keywords:
+ patch stage: patch review |
2010-10-22 00:03:30 | skrah | set | messages:
+ msg119356 |
2010-10-20 22:47:11 | skrah | set | messages:
+ msg119241 |
2010-10-20 22:29:25 | lemburg | set | nosy:
+ lemburg messages:
+ msg119239
|
2010-10-20 22:24:43 | skrah | set | messages:
+ msg119238 |
2010-10-20 22:10:02 | amaury.forgeotdarc | set | nosy:
+ amaury.forgeotdarc messages:
+ msg119237
|
2010-10-20 17:58:55 | belopolsky | set | nosy:
+ vstinner
|
2010-10-20 17:27:05 | skrah | create | |