classification
Title: Same constants in tuples are not merged while compile()
Type: resource usage Stage: patch review
Components: Interpreter Core Versions: Python 3.8
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: inada.naoki Nosy List: Dan Rose, brett.cannon, inada.naoki, mark.dickinson, serhiy.storchaka, terry.reedy, tim.peters, vstinner
Priority: normal Keywords: needs review

Created on 2018-07-11 19:08 by Dan Rose, last changed 2018-07-19 09:32 by inada.naoki.

Pull Requests
URL Status Linked Edit
PR 8341 open inada.naoki, 2018-07-19 09:24
Messages (15)
msg321495 - (view) Author: Dan Rose (Dan Rose) Date: 2018-07-11 19:08
In the Python 3.7.0 interpreter, the following evaluates to False. In 3.6.4, it was True:

    c,d = 500,500
    c is d

This seems to be because, in some cases, Python 3.7 fails to intern integers inside tuples:

    a = (500,500)
    print(a[0] is a[1]) # False
    
    a = (500,500,42)
    print(a[0] is a[1]) # False
    
    a = (500,500,'42')
    print(a[0] is a[1]) # False
    
    answer = 42
    a = (500,500,answer)
    print(a[0] is a[1]) # True
    
    a = (500,500,[42])
    print(a[0] is a[1]) # True

    a = [500,500]
    print(a[0] is a[1]) # True

I believe the above should all return True.
msg321497 - (view) Author: Dan Rose (Dan Rose) Date: 2018-07-11 19:32
Another curious case:

    a = (500,500); b = (500,500)
    print(a[0] is b[0]) # True
    print(a[0] is b[1]) # False
    print(a[1] is b[0]) # False
    print(a[1] is b[1]) # True
msg321498 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2018-07-11 20:38
This was a side-effect of #29469.
msg321499 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-07-11 20:54
Should the AST optimizer "interns" constants?
msg321513 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2018-07-12 03:11
Thank you for finding it.

I had worked on constant merging. [1]
It didn't fix this case for now.  I'll continue the suspended work.

[1]: https://github.com/methane/cpython/pull/14/files

But it may be a too big to fix only this regression.
How is this regression critical?
If it's important, I'll try to create minimal version of the patch to backport.
msg321515 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-07-12 04:52
This is not only with integers.

>>> a = ((1, 2), (1, 2))
>>> a[0] is a[1]
False
>>> a = ('@#$', '@#$')
>>> a[0] is a[1]
False
>>> a = (1.0, 1.0)
>>> a[0] is a[1]
False

The only exception is short ASCII identifier-like strings (as a side effect of interning them):

>>> a = ('foo', 'foo')
>>> a[0] is a[1]
True

I'm not sure this is a problem which should be resolved.
msg321516 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2018-07-12 04:58
The language doesn't define anything about this - any program relying on accidental identity is in error itself.

Still, it's nice if a code object's co_consts vector is as short as reasonably possible.  That's a matter of pragmatics, not of correctness.
msg321517 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2018-07-12 05:02
OK, unless example code this regression affects seriously is shown, I target only 3.8.
msg321518 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-07-12 05:08
The co_consts vector is already as short as possible, except cases when tuples are created at code generation time, but this is not related to this issue (see issue33318 and issue33325).

>>> def f():
...     a = (1.0, 1.0)
...     b = (1.0, 1.0)
... 
>>> f.__code__.co_consts
(None, (1.0, 1.0))
>>> f.__code__.co_consts[1][0] is f.__code__.co_consts[1][1]
False
msg321519 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2018-07-12 05:21
Fine, Serhiy, so reword it a tiny bit:  it's nice if a code object's co_consts vector references as few distinct objects as possible.  Still a matter of pragmatics, not of correctness.
msg321520 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2018-07-12 05:22
FYI, same constants are shared even when they are in different tuples on 3.6.

For modules having large constant table containing large integer or floats, non name-like string, and bytes are affected significantly.
msg321526 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-07-12 08:27
Honestly, I don't think that it's a bug. We might enhance Python 3.8 to reduce constants duplication, but there is no need to "fix" Python 3.7.
msg321576 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2018-07-12 18:56
I also agree that there's no bug here, so I'm marking this an enhancement and removing the regression label.
msg321633 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2018-07-13 20:06
I think (space) 'performance' would be a better label, as this is strictly an implementation improvement, not a language change.  But we often (usually? sometimes?) limit performance improvements to the 'next version' so we have the alpha/beta/candidate releases to discover possible regressions.

I think this is worth considering just because the pattern is so odd.  But if this is ironed out as part of a broader and better patch, great.
msg321943 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2018-07-19 09:32
Counting object types in logging/__pycache__/__init__.cpython-38.pyc:

master:
[('r', 1815), (')', 467), ('Z', 339), ('s', 314), ('z', 273), ('c', 157), ('N', 154), ('a', 24), ('F', 14), ('i', 11), ('T', 8)]

GH-8341:
[('r', 1737), (')', 375), ('Z', 339), ('s', 314), ('z', 264), ('c', 157), ('N', 138), ('a', 24), ('F', 14), ('i', 11), ('T', 8)]

It reduced about 5% objects.

I chose logging/__init__ because it's large except tests, and it's written in OO-based. (Large application has many OO-based code).
History
Date User Action Args
2018-07-19 09:32:56inada.naokisetkeywords: + needs review, - 3.7regression
type: enhancement -> resource usage
stage: needs patch -> patch review
2018-07-19 09:32:05inada.naokisetkeywords: - patch

messages: + msg321943
stage: patch review -> needs patch
2018-07-19 09:24:55inada.naokisetkeywords: + patch
stage: needs patch -> patch review
pull_requests: + pull_request7875
2018-07-13 20:06:24terry.reedysetnosy: + terry.reedy

messages: + msg321633
stage: needs patch
2018-07-12 18:56:15brett.cannonsettype: resource usage -> enhancement

messages: + msg321576
nosy: + brett.cannon
2018-07-12 08:35:29inada.naokisettitle: Same integers in a tuple of constant literals are not merged -> Same constants in tuples are not merged while compile()
2018-07-12 08:27:46vstinnersetmessages: + msg321526
2018-07-12 05:22:53inada.naokisetmessages: + msg321520
2018-07-12 05:21:03tim.peterssetmessages: + msg321519
2018-07-12 05:08:56serhiy.storchakasetmessages: + msg321518
2018-07-12 05:02:26inada.naokisetmessages: + msg321517
versions: - Python 3.7
2018-07-12 04:58:09tim.peterssetnosy: + tim.peters
messages: + msg321516
2018-07-12 04:52:16serhiy.storchakasetmessages: + msg321515
2018-07-12 04:00:52inada.naokisetkeywords: + 3.7regression
title: Python doesn't intern integers in a tuple of constant literals -> Same integers in a tuple of constant literals are not merged
type: behavior -> resource usage
components: + Interpreter Core
versions: + Python 3.8
2018-07-12 03:11:37inada.naokisetassignee: inada.naoki
messages: + msg321513
2018-07-11 20:54:32vstinnersetmessages: + msg321499
2018-07-11 20:38:14mark.dickinsonsetnosy: + mark.dickinson, vstinner, serhiy.storchaka, inada.naoki
messages: + msg321498
2018-07-11 19:32:57Dan Rosesetmessages: + msg321497
2018-07-11 19:08:31Dan Rosecreate