classification
Title: Same constants in tuples are not merged while compile()
Type: resource usage Stage: resolved
Components: Interpreter Core Versions: Python 3.8
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: inada.naoki Nosy List: Dan Rose, brett.cannon, inada.naoki, mark.dickinson, miss-islington, serhiy.storchaka, terry.reedy, tim.peters, vstinner
Priority: normal Keywords: needs review

Created on 2018-07-11 19:08 by Dan Rose, last changed 2018-11-28 16:05 by vstinner. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 8341 merged inada.naoki, 2018-07-19 09:24
PR 10743 merged vstinner, 2018-11-27 13:33
PR 10760 merged inada.naoki, 2018-11-28 10:18
PR 10762 closed vstinner, 2018-11-28 11:26
Messages (24)
msg321495 - (view) Author: Dan Rose (Dan Rose) * Date: 2018-07-11 19:08
In the Python 3.7.0 interpreter, the following evaluates to False. In 3.6.4, it was True:

    c,d = 500,500
    c is d

This seems to be because, in some cases, Python 3.7 fails to intern integers inside tuples:

    a = (500,500)
    print(a[0] is a[1]) # False
    
    a = (500,500,42)
    print(a[0] is a[1]) # False
    
    a = (500,500,'42')
    print(a[0] is a[1]) # False
    
    answer = 42
    a = (500,500,answer)
    print(a[0] is a[1]) # True
    
    a = (500,500,[42])
    print(a[0] is a[1]) # True

    a = [500,500]
    print(a[0] is a[1]) # True

I believe the above should all return True.
msg321497 - (view) Author: Dan Rose (Dan Rose) * Date: 2018-07-11 19:32
Another curious case:

    a = (500,500); b = (500,500)
    print(a[0] is b[0]) # True
    print(a[0] is b[1]) # False
    print(a[1] is b[0]) # False
    print(a[1] is b[1]) # True
msg321498 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2018-07-11 20:38
This was a side-effect of #29469.
msg321499 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-07-11 20:54
Should the AST optimizer "interns" constants?
msg321513 - (view) Author: Inada Naoki (inada.naoki) * (Python committer) Date: 2018-07-12 03:11
Thank you for finding it.

I had worked on constant merging. [1]
It didn't fix this case for now.  I'll continue the suspended work.

[1]: https://github.com/methane/cpython/pull/14/files

But it may be a too big to fix only this regression.
How is this regression critical?
If it's important, I'll try to create minimal version of the patch to backport.
msg321515 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-07-12 04:52
This is not only with integers.

>>> a = ((1, 2), (1, 2))
>>> a[0] is a[1]
False
>>> a = ('@#$', '@#$')
>>> a[0] is a[1]
False
>>> a = (1.0, 1.0)
>>> a[0] is a[1]
False

The only exception is short ASCII identifier-like strings (as a side effect of interning them):

>>> a = ('foo', 'foo')
>>> a[0] is a[1]
True

I'm not sure this is a problem which should be resolved.
msg321516 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2018-07-12 04:58
The language doesn't define anything about this - any program relying on accidental identity is in error itself.

Still, it's nice if a code object's co_consts vector is as short as reasonably possible.  That's a matter of pragmatics, not of correctness.
msg321517 - (view) Author: Inada Naoki (inada.naoki) * (Python committer) Date: 2018-07-12 05:02
OK, unless example code this regression affects seriously is shown, I target only 3.8.
msg321518 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-07-12 05:08
The co_consts vector is already as short as possible, except cases when tuples are created at code generation time, but this is not related to this issue (see issue33318 and issue33325).

>>> def f():
...     a = (1.0, 1.0)
...     b = (1.0, 1.0)
... 
>>> f.__code__.co_consts
(None, (1.0, 1.0))
>>> f.__code__.co_consts[1][0] is f.__code__.co_consts[1][1]
False
msg321519 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2018-07-12 05:21
Fine, Serhiy, so reword it a tiny bit:  it's nice if a code object's co_consts vector references as few distinct objects as possible.  Still a matter of pragmatics, not of correctness.
msg321520 - (view) Author: Inada Naoki (inada.naoki) * (Python committer) Date: 2018-07-12 05:22
FYI, same constants are shared even when they are in different tuples on 3.6.

For modules having large constant table containing large integer or floats, non name-like string, and bytes are affected significantly.
msg321526 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-07-12 08:27
Honestly, I don't think that it's a bug. We might enhance Python 3.8 to reduce constants duplication, but there is no need to "fix" Python 3.7.
msg321576 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2018-07-12 18:56
I also agree that there's no bug here, so I'm marking this an enhancement and removing the regression label.
msg321633 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2018-07-13 20:06
I think (space) 'performance' would be a better label, as this is strictly an implementation improvement, not a language change.  But we often (usually? sometimes?) limit performance improvements to the 'next version' so we have the alpha/beta/candidate releases to discover possible regressions.

I think this is worth considering just because the pattern is so odd.  But if this is ironed out as part of a broader and better patch, great.
msg321943 - (view) Author: Inada Naoki (inada.naoki) * (Python committer) Date: 2018-07-19 09:32
Counting object types in logging/__pycache__/__init__.cpython-38.pyc:

master:
[('r', 1815), (')', 467), ('Z', 339), ('s', 314), ('z', 273), ('c', 157), ('N', 154), ('a', 24), ('F', 14), ('i', 11), ('T', 8)]

GH-8341:
[('r', 1737), (')', 375), ('Z', 339), ('s', 314), ('z', 264), ('c', 157), ('N', 138), ('a', 24), ('F', 14), ('i', 11), ('T', 8)]

It reduced about 5% objects.

I chose logging/__init__ because it's large except tests, and it's written in OO-based. (Large application has many OO-based code).
msg321989 - (view) Author: Inada Naoki (inada.naoki) * (Python committer) Date: 2018-07-20 08:34
This is another memory overhead comparison.
It seems merging constants reduces 2~3% memory usage.

import sys, django, flask
sys._debugmallocstats()

### master branch

class   size   num pools   blocks in use  avail blocks
-----   ----   ---------   -------------  ------------
    0      8           1             312           194
    1     16           1             135           118
    2     24           2             198           138
    3     32          22            2690            82
    4     40          62            6214            48
    5     48         149           12465            51
    6     56         254           18260            28
    7     64         354           22300             2
    8     72         208           11589            59
    9     80         124            6173            27
   10     88          59            2708             6
...
# bytes in allocated blocks        =            9,261,952


### merge-const branch

class   size   num pools   blocks in use  avail blocks
-----   ----   ---------   -------------  ------------
    0      8           1             312           194
    1     16           1             135           118
    2     24           1             161             7
    3     32          22            2657           115
    4     40          62            6202            60
    5     48         102            8501            67
    6     56         235           16879            41
    7     64         346           21751            47
    8     72         204           11387            37
    9     80         122            6083            17
   10     88          58            2656            12
...
# bytes in allocated blocks        =            8,919,824
msg330428 - (view) Author: miss-islington (miss-islington) Date: 2018-11-26 12:23
New changeset c2e1607a51d7a17f143b5a34e8cff7c6fc58a091 by Miss Islington (bot) (INADA Naoki) in branch 'master':
bpo-34100: Merge constants recursively (GH-8341)
https://github.com/python/cpython/commit/c2e1607a51d7a17f143b5a34e8cff7c6fc58a091
msg330519 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-11-27 13:39
> New changeset c2e1607a51d7a17f143b5a34e8cff7c6fc58a091 by Miss Islington (bot) (INADA Naoki) in branch 'master'

This change introduced a lot of memory leaks (reference leaks):
https://buildbot.python.org/all/#/builders/1/builds/422

The following change fix "make && ./python -m test -R 3:3 test_code":

diff --git a/Python/compile.c b/Python/compile.c
index acb5cfe29b..cb3e73740d 100644
--- a/Python/compile.c
+++ b/Python/compile.c
@@ -1277,6 +1277,7 @@ merge_consts_recursive(struct compiler *c, PyObject *o)
             else {
                 u = k;
             }
+            Py_DECREF(k);
             Py_INCREF(u);
             PyTuple_SET_ITEM(tuple, i, u);
             i++;
@@ -1288,6 +1289,7 @@ merge_consts_recursive(struct compiler *c, PyObject *o)
             Py_DECREF(key);
             return NULL;
         }
+        Py_DECREF(PyTuple_GET_ITEM(key, 1));
         PyTuple_SET_ITEM(key, 1, new);
     }
 

But I dislike the frozenset branch of merge_consts_recursive(): modifying a tuple is a bad idea. For example, if you replace PyTuple_SET_ITEM() with PyTuple_SetItem(), you get a PyErr_BadInternalCall() because the reference count is greater than 1.

I don't see how to rewrote properly the code, so I reverted (removed) it to let someone else fix the code: PR 10743.
msg330521 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-11-27 14:12
New changeset 1005c84535191a72ebb7587d8c5636a065b7ed79 by Victor Stinner in branch 'master':
bpo-34100: Partially revert merge_consts_recursive() (GH-10743)
https://github.com/python/cpython/commit/1005c84535191a72ebb7587d8c5636a065b7ed79
msg330522 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-11-27 14:13
merge_consts_recursive(): rather than modifying the 'key' tuple, would it make sense to *add* the new key to the cache?
msg330572 - (view) Author: Inada Naoki (inada.naoki) * (Python committer) Date: 2018-11-28 03:13
I agree that modifying tuple is bad idea in general case.

But in case of constants, in-place modify is the easiest approach.

Our purpose is "merging" constant.  On the other hand, "build new tuple and
replace old tuple" approach makes two same constant tuples.
If old tuple is referenced from somewhere, we fail to "merging".

So I don't think we need to follow the general principle unless it reduces code.

FYI, we have similar in-place editing for interning already.
https://github.com/python/cpython/blob/master/Objects/codeobject.c#L47-L95
msg330582 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-11-28 09:16
Well, it's up to you. I will review your PR ;-)
msg330605 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-11-28 15:58
New changeset f7e4d3642fbb88f4e6243c952a0e223fb5df1c65 by Victor Stinner (INADA Naoki) in branch 'master':
bpo-34100: compile: Re-enable frozenset merging (GH-10760)
https://github.com/python/cpython/commit/f7e4d3642fbb88f4e6243c952a0e223fb5df1c65
msg330607 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-11-28 16:05
Thanks INADA Naoki! Thanks a simple and cool optimization!

Maybe you might want to document it at https://docs.python.org/dev/whatsnew/3.8.html#optimizations ?
History
Date User Action Args
2018-11-28 16:05:45vstinnersetmessages: + msg330607
2018-11-28 15:58:49vstinnersetmessages: + msg330605
2018-11-28 11:26:23vstinnersetpull_requests: + pull_request10011
2018-11-28 10:18:03inada.naokisetpull_requests: + pull_request10008
2018-11-28 09:16:33vstinnersetmessages: + msg330582
2018-11-28 03:14:00inada.naokisetmessages: + msg330572
2018-11-27 14:13:40vstinnersetmessages: + msg330522
2018-11-27 14:12:51vstinnersetmessages: + msg330521
2018-11-27 13:39:21vstinnersetmessages: + msg330519
2018-11-27 13:33:44vstinnersetpull_requests: + pull_request9990
2018-11-26 12:24:08inada.naokisetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2018-11-26 12:23:32miss-islingtonsetnosy: + miss-islington
messages: + msg330428
2018-07-23 05:22:57inada.naokilinkissue29336 superseder
2018-07-20 08:34:13inada.naokisetmessages: + msg321989
2018-07-19 09:32:56inada.naokisetkeywords: + needs review, - 3.7regression
type: enhancement -> resource usage
stage: needs patch -> patch review
2018-07-19 09:32:05inada.naokisetkeywords: - patch

messages: + msg321943
stage: patch review -> needs patch
2018-07-19 09:24:55inada.naokisetkeywords: + patch
stage: needs patch -> patch review
pull_requests: + pull_request7875
2018-07-13 20:06:24terry.reedysetnosy: + terry.reedy

messages: + msg321633
stage: needs patch
2018-07-12 18:56:15brett.cannonsettype: resource usage -> enhancement

messages: + msg321576
nosy: + brett.cannon
2018-07-12 08:35:29inada.naokisettitle: Same integers in a tuple of constant literals are not merged -> Same constants in tuples are not merged while compile()
2018-07-12 08:27:46vstinnersetmessages: + msg321526
2018-07-12 05:22:53inada.naokisetmessages: + msg321520
2018-07-12 05:21:03tim.peterssetmessages: + msg321519
2018-07-12 05:08:56serhiy.storchakasetmessages: + msg321518
2018-07-12 05:02:26inada.naokisetmessages: + msg321517
versions: - Python 3.7
2018-07-12 04:58:09tim.peterssetnosy: + tim.peters
messages: + msg321516
2018-07-12 04:52:16serhiy.storchakasetmessages: + msg321515
2018-07-12 04:00:52inada.naokisetkeywords: + 3.7regression
title: Python doesn't intern integers in a tuple of constant literals -> Same integers in a tuple of constant literals are not merged
type: behavior -> resource usage
components: + Interpreter Core
versions: + Python 3.8
2018-07-12 03:11:37inada.naokisetassignee: inada.naoki
messages: + msg321513
2018-07-11 20:54:32vstinnersetmessages: + msg321499
2018-07-11 20:38:14mark.dickinsonsetnosy: + mark.dickinson, vstinner, serhiy.storchaka, inada.naoki
messages: + msg321498
2018-07-11 19:32:57Dan Rosesetmessages: + msg321497
2018-07-11 19:08:31Dan Rosecreate