classification
Title: merge tuples in module
Type: enhancement Stage: resolved
Components: Interpreter Core Versions: Python 3.7
process
Status: closed Resolution: duplicate
Dependencies: Superseder: Same constants in tuples are not merged while compile()
View: 34100
Assigned To: Nosy List: methane, rhettinger, serhiy.storchaka, vstinner
Priority: low Keywords: patch

Created on 2017-01-21 03:30 by methane, last changed 2018-07-23 05:22 by methane. This issue is now closed.

Files
File name Uploaded Description Edit
merge-tuples.patch methane, 2017-01-21 03:30 review
tuplemem.py methane, 2017-01-21 03:37
mergetuples2.patch methane, 2017-01-21 05:45 Ignore bytes objects review
merge-constants.patch serhiy.storchaka, 2017-01-22 17:23 review
Messages (15)
msg285933 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2017-01-21 03:30
Tuples consists of None, True, False, str and bytes can be merged safely.
To avoid memory leak, this patch shares such tuples in compile unit (module, in most case.)
msg285934 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2017-01-21 03:37
I tried this patch with attached script.

```
$ venv/bin/pip install django flask sqlalchemy
$ PYTHONTRACEMALLOC=5 venv/bin/python3 tuplemem.py > tuples.txt
$ sort tuples.txt | uniq -c | sort -nr > tuplecount
```

## default
memory: (32254693, 32292635)
tuples: 64968

head -n10 tuplecount-default
   5479 (None,)
   3069 ('self',)
    727 (<class 'object'>,)
    688 ('__class__',)
    321 ('NotImplementedError',)
    287 ('self', 'other')
    264 (None, None)
    207 (False,)
    193 (None, 0)
    176 (None, False)

## patched
memory: (31224697, 31261892)
tuples: 51298

head -n10 tuplecount-patched
   1437 (None,)
    727 (<class 'object'>,)
    328 ('self',)
    264 (None, None)
    207 (False,)
    193 (None, 0)
    138 ('__class__',)
    114 (True,)
    112 (None, False)
    110 ('target', 'fn')


I'll try again with my company's private codebase in next week.
msg285938 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-01-21 05:11
$ ./python -Wa -b -c "lambda: 'a'; lambda: b'a'"
sys:1: BytesWarning: Comparison between bytes and string
sys:1: BytesWarning: Comparison between bytes and string
sys:1: BytesWarning: Comparison between bytes and string
sys:1: BytesWarning: Comparison between bytes and string
sys:1: BytesWarning: Comparison between bytes and string
sys:1: BytesWarning: Comparison between bytes and string
sys:1: BytesWarning: Comparison between bytes and string

Look at _PyCode_ConstantKey(), it may help.
msg285939 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2017-01-21 05:18
Oh, thanks.  I hadn't checked the warning.
Since bytes are not important in this time, I'll remove bytes support.
msg285944 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-01-21 08:48
Following patch uses _PyCode_ConstantKey() as a key. This allows supporting integers, floats, bytes, etc.

But I think we can go further and merge constants recursively.

See also issue28813.
msg286082 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2017-01-23 13:00
Thanks.  Your patch reduced memory consumption by 2%,
and number of tuples by 15%.

$ cat invtuple.py
import app
import sys
import traceback
import tracemalloc

print(tracemalloc.get_traced_memory())
allobj = sys.getobjects(0, tuple)
print(len(allobj))

$ PYTHONTRACEMALLOC=1 venv/bin/python invtuple.py
(51491189, 51506311)
87143

$ PYTHONTRACEMALLOC=1 venv-mt/bin/python  invtuple.py
(50551519, 50566641)
74308
msg286154 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2017-01-24 08:45
merge-constants.patch LGTM
msg286160 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-01-24 09:17
merge-constants.patch is rather a proof of concept. I think it may be more efficient after removing unneeded folded constants (issue28813). But this can be done better with the AST optimizer (issue1346238, issue11549). It would be worth also to merge nested constants. This is similar to interning string constants (see in Objects/codeobject.c).

The benefit is small and we should check that merging constants doesn't have too large cost (CPU time or temporally consumed memory) at compile time.

I think we should first implement the AST optimizer, and then try to combine merging constants with interning string constants.
msg286165 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-01-24 09:48
> But this can be done better with the AST optimizer (issue1346238, issue11549).

There are also the PEP 511 (Code transformers) which was created to implement *external* AST optimizers, and the FAT Python which implements an AST optimizer.

Hum, after "-o no_annotation" discussed on the mailing list, maybe it's time to look again at this PEP? ;-)
msg286167 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-01-24 09:56
> Thanks.  Your patch reduced memory consumption by 2%,

merge-constants.patch looks simple enought, but I'm not really impressed by such result. Is 2% worth it?

Since code objects loaded by import are likely for stay for the whole lifetime of a process, I would be interested to experiment interning all constant objects (the tuple of objects, but also each object of these tuples, so support "nested interning") of all code objets (consts, names, varnames, freevars, cellvars) in the marshal module.

To intern constants, we need to generate an unique key to not merge 1 int and 1.0 float. The risk of such global intern dict is to increase the memory usage of the unique keys uses more memory than the decreased caused by the removal of duplicate objects.

More generally, would it be possible to share co_consts (None,) tuples between code objects of two different modules? Or is it already the case with merge-constants.patch?
msg286170 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-01-24 09:57
I don't know what is the most efficient technic to reduce the memory usage of module, but at least I can say imports are the top #1 memory consumer. Look at tracemalloc examples:

https://docs.python.org/dev/library/tracemalloc.html#display-the-top-10

"We can see that Python loaded 4855 KiB data (bytecode and constants) from modules"
msg286229 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2017-01-25 07:00
> More generally, would it be possible to share co_consts (None,) tuples between code objects of two different modules? Or is it already the case with merge-constants.patch?

No. I didn't do it because I don't know it's worth enough.

> merge-constants.patch looks simple enought, but I'm not really impressed by such result. Is 2% worth it?

Maybe, our application is somewhat special.
I'll check memory usage again with only loading major OSS framework/libraries for this issue in next time.
(BTW, it's 3% if -OO is used)

But issue28813 or other optimizations may reduce tuple size and make it possible to share more tuples.
So I think we should evaluate this patch after them.

But I don't familiar with frontend (parser, AST).
I usually look bytecode and runtime.
msg286230 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-01-25 07:45
> merge-constants.patch looks simple enought, but I'm not really impressed by such result. Is 2% worth it?

I'm not impressed too. And merging constants can take time.

> Since code objects loaded by import are likely for stay for the whole lifetime of a process, I would be interested to experiment interning all constant objects (the tuple of objects, but also each object of these tuples, so support "nested interning") of all code objets (consts, names, varnames, freevars, cellvars) in the marshal module.

This could slow down marhalling. Some patches in issue20416 implements this, but current code is faster and much simpler and the benefit of deduplicating constants is too small.

> More generally, would it be possible to share co_consts (None,) tuples between code objects of two different modules? Or is it already the case with merge-constants.patch?

All patches in this issue shares constants only in one compilation unit (module). Otherwise the global intern dict would increase the memory usage too much.

See also the statistics in issue16475. The most memory is spent by strings, and the majority of strings already are interned.
msg286231 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-01-25 07:51
I vote +0 on merge-constants.patch: LGTM, may be useful, but not impressed :-)

Hopefully, .py => .pyc compilation "should" occur only once. Except of the main script, but more and more applications use a tiny entry point of 3 lines. I don't expect any significant slowdown. But even if there is a slowdown, IMHO it's worth it.
msg286246 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2017-01-25 12:40
As I said on ML, now I think embedding some co_* tuples into code object
is better way to reduce number of tuples and memory usage.
So I close this issue for now.

Thanks for review and comments.
History
Date User Action Args
2018-07-23 05:22:57methanesetstatus: pending -> closed
superseder: Same constants in tuples are not merged while compile()
resolution: postponed -> duplicate
stage: resolved
2017-01-25 12:40:25methanesetstatus: open -> pending
resolution: postponed
messages: + msg286246
2017-01-25 07:51:54vstinnersetmessages: + msg286231
2017-01-25 07:45:19serhiy.storchakasetmessages: + msg286230
2017-01-25 07:00:47methanesetmessages: + msg286229
2017-01-24 09:57:54vstinnersetmessages: + msg286170
2017-01-24 09:56:12vstinnersetmessages: + msg286167
2017-01-24 09:48:17vstinnersetmessages: + msg286165
2017-01-24 09:17:12serhiy.storchakasetpriority: normal -> low

messages: + msg286160
2017-01-24 08:45:01methanesetmessages: + msg286154
2017-01-23 13:00:59methanesetmessages: + msg286082
2017-01-22 17:23:55serhiy.storchakasetfiles: + merge-constants.patch
2017-01-22 17:23:27serhiy.storchakasetfiles: - merge-constants.patch
2017-01-21 08:48:06serhiy.storchakasetfiles: + merge-constants.patch
nosy: + rhettinger
messages: + msg285944

2017-01-21 05:45:55methanesetfiles: + mergetuples2.patch
2017-01-21 05:18:28methanesetmessages: + msg285939
2017-01-21 05:11:26serhiy.storchakasetversions: + Python 3.7
nosy: + serhiy.storchaka

messages: + msg285938

components: + Interpreter Core
type: enhancement
2017-01-21 03:37:58methanesetfiles: + tuplemem.py

messages: + msg285934
2017-01-21 03:30:45methanecreate