Issue 29336: merge tuples in module

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/73522

classification

Title:	merge tuples in module
Type:	enhancement	Stage:	resolved
Components:	Interpreter Core	Versions:	Python 3.7

process

Status:	closed	Resolution:	duplicate
Dependencies:		Superseder:	Same constants in tuples are not merged while compile() View: 34100
Assigned To:		Nosy List:	methane, rhettinger, serhiy.storchaka, vstinner
Priority:	low	Keywords:	patch

Created on 2017-01-21 03:30 by methane, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
merge-tuples.patch	methane, 2017-01-21 03:30		review
tuplemem.py	methane, 2017-01-21 03:37
mergetuples2.patch	methane, 2017-01-21 05:45	Ignore bytes objects	review
merge-constants.patch	serhiy.storchaka, 2017-01-22 17:23		review

Messages (15)
msg285933 - (view)	Author: Inada Naoki (methane) *	Date: 2017-01-21 03:30
Tuples consists of None, True, False, str and bytes can be merged safely. To avoid memory leak, this patch shares such tuples in compile unit (module, in most case.)
msg285934 - (view)	Author: Inada Naoki (methane) *	Date: 2017-01-21 03:37
I tried this patch with attached script. ``` $ venv/bin/pip install django flask sqlalchemy $ PYTHONTRACEMALLOC=5 venv/bin/python3 tuplemem.py > tuples.txt $ sort tuples.txt \| uniq -c \| sort -nr > tuplecount ``` ## default memory: (32254693, 32292635) tuples: 64968 head -n10 tuplecount-default 5479 (None,) 3069 ('self',) 727 (<class 'object'>,) 688 ('__class__',) 321 ('NotImplementedError',) 287 ('self', 'other') 264 (None, None) 207 (False,) 193 (None, 0) 176 (None, False) ## patched memory: (31224697, 31261892) tuples: 51298 head -n10 tuplecount-patched 1437 (None,) 727 (<class 'object'>,) 328 ('self',) 264 (None, None) 207 (False,) 193 (None, 0) 138 ('__class__',) 114 (True,) 112 (None, False) 110 ('target', 'fn') I'll try again with my company's private codebase in next week.
msg285938 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2017-01-21 05:11
$ ./python -Wa -b -c "lambda: 'a'; lambda: b'a'" sys:1: BytesWarning: Comparison between bytes and string sys:1: BytesWarning: Comparison between bytes and string sys:1: BytesWarning: Comparison between bytes and string sys:1: BytesWarning: Comparison between bytes and string sys:1: BytesWarning: Comparison between bytes and string sys:1: BytesWarning: Comparison between bytes and string sys:1: BytesWarning: Comparison between bytes and string Look at _PyCode_ConstantKey(), it may help.
msg285939 - (view)	Author: Inada Naoki (methane) *	Date: 2017-01-21 05:18
Oh, thanks. I hadn't checked the warning. Since bytes are not important in this time, I'll remove bytes support.
msg285944 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2017-01-21 08:48
Following patch uses _PyCode_ConstantKey() as a key. This allows supporting integers, floats, bytes, etc. But I think we can go further and merge constants recursively. See also issue28813.
msg286082 - (view)	Author: Inada Naoki (methane) *	Date: 2017-01-23 13:00
Thanks. Your patch reduced memory consumption by 2%, and number of tuples by 15%. $ cat invtuple.py import app import sys import traceback import tracemalloc print(tracemalloc.get_traced_memory()) allobj = sys.getobjects(0, tuple) print(len(allobj)) $ PYTHONTRACEMALLOC=1 venv/bin/python invtuple.py (51491189, 51506311) 87143 $ PYTHONTRACEMALLOC=1 venv-mt/bin/python invtuple.py (50551519, 50566641) 74308
msg286154 - (view)	Author: Inada Naoki (methane) *	Date: 2017-01-24 08:45
merge-constants.patch LGTM
msg286160 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2017-01-24 09:17
merge-constants.patch is rather a proof of concept. I think it may be more efficient after removing unneeded folded constants (issue28813). But this can be done better with the AST optimizer (issue1346238, issue11549). It would be worth also to merge nested constants. This is similar to interning string constants (see in Objects/codeobject.c). The benefit is small and we should check that merging constants doesn't have too large cost (CPU time or temporally consumed memory) at compile time. I think we should first implement the AST optimizer, and then try to combine merging constants with interning string constants.
msg286165 - (view)	Author: STINNER Victor (vstinner) *	Date: 2017-01-24 09:48
> But this can be done better with the AST optimizer (issue1346238, issue11549). There are also the PEP 511 (Code transformers) which was created to implement external AST optimizers, and the FAT Python which implements an AST optimizer. Hum, after "-o no_annotation" discussed on the mailing list, maybe it's time to look again at this PEP? ;-)
msg286167 - (view)	Author: STINNER Victor (vstinner) *	Date: 2017-01-24 09:56
> Thanks. Your patch reduced memory consumption by 2%, merge-constants.patch looks simple enought, but I'm not really impressed by such result. Is 2% worth it? Since code objects loaded by import are likely for stay for the whole lifetime of a process, I would be interested to experiment interning all constant objects (the tuple of objects, but also each object of these tuples, so support "nested interning") of all code objets (consts, names, varnames, freevars, cellvars) in the marshal module. To intern constants, we need to generate an unique key to not merge 1 int and 1.0 float. The risk of such global intern dict is to increase the memory usage of the unique keys uses more memory than the decreased caused by the removal of duplicate objects. More generally, would it be possible to share co_consts (None,) tuples between code objects of two different modules? Or is it already the case with merge-constants.patch?
msg286170 - (view)	Author: STINNER Victor (vstinner) *	Date: 2017-01-24 09:57
I don't know what is the most efficient technic to reduce the memory usage of module, but at least I can say imports are the top #1 memory consumer. Look at tracemalloc examples: https://docs.python.org/dev/library/tracemalloc.html#display-the-top-10 "We can see that Python loaded 4855 KiB data (bytecode and constants) from modules"
msg286229 - (view)	Author: Inada Naoki (methane) *	Date: 2017-01-25 07:00
> More generally, would it be possible to share co_consts (None,) tuples between code objects of two different modules? Or is it already the case with merge-constants.patch? No. I didn't do it because I don't know it's worth enough. > merge-constants.patch looks simple enought, but I'm not really impressed by such result. Is 2% worth it? Maybe, our application is somewhat special. I'll check memory usage again with only loading major OSS framework/libraries for this issue in next time. (BTW, it's 3% if -OO is used) But issue28813 or other optimizations may reduce tuple size and make it possible to share more tuples. So I think we should evaluate this patch after them. But I don't familiar with frontend (parser, AST). I usually look bytecode and runtime.
msg286230 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2017-01-25 07:45
> merge-constants.patch looks simple enought, but I'm not really impressed by such result. Is 2% worth it? I'm not impressed too. And merging constants can take time. > Since code objects loaded by import are likely for stay for the whole lifetime of a process, I would be interested to experiment interning all constant objects (the tuple of objects, but also each object of these tuples, so support "nested interning") of all code objets (consts, names, varnames, freevars, cellvars) in the marshal module. This could slow down marhalling. Some patches in issue20416 implements this, but current code is faster and much simpler and the benefit of deduplicating constants is too small. > More generally, would it be possible to share co_consts (None,) tuples between code objects of two different modules? Or is it already the case with merge-constants.patch? All patches in this issue shares constants only in one compilation unit (module). Otherwise the global intern dict would increase the memory usage too much. See also the statistics in issue16475. The most memory is spent by strings, and the majority of strings already are interned.
msg286231 - (view)	Author: STINNER Victor (vstinner) *	Date: 2017-01-25 07:51
I vote +0 on merge-constants.patch: LGTM, may be useful, but not impressed :-) Hopefully, .py => .pyc compilation "should" occur only once. Except of the main script, but more and more applications use a tiny entry point of 3 lines. I don't expect any significant slowdown. But even if there is a slowdown, IMHO it's worth it.
msg286246 - (view)	Author: Inada Naoki (methane) *	Date: 2017-01-25 12:40
As I said on ML, now I think embedding some co_* tuples into code object is better way to reduce number of tuples and memory usage. So I close this issue for now. Thanks for review and comments.

History
Date	User	Action	Args
2022-04-11 14:58:42	admin	set	github: 73522
2018-07-23 05:22:57	methane	set	status: pending -> closed superseder: Same constants in tuples are not merged while compile() resolution: postponed -> duplicate stage: resolved
2017-01-25 12:40:25	methane	set	status: open -> pending resolution: postponed messages: + msg286246
2017-01-25 07:51:54	vstinner	set	messages: + msg286231
2017-01-25 07:45:19	serhiy.storchaka	set	messages: + msg286230
2017-01-25 07:00:47	methane	set	messages: + msg286229
2017-01-24 09:57:54	vstinner	set	messages: + msg286170
2017-01-24 09:56:12	vstinner	set	messages: + msg286167
2017-01-24 09:48:17	vstinner	set	messages: + msg286165
2017-01-24 09:17:12	serhiy.storchaka	set	priority: normal -> low messages: + msg286160
2017-01-24 08:45:01	methane	set	messages: + msg286154
2017-01-23 13:00:59	methane	set	messages: + msg286082
2017-01-22 17:23:55	serhiy.storchaka	set	files: + merge-constants.patch
2017-01-22 17:23:27	serhiy.storchaka	set	files: - merge-constants.patch
2017-01-21 08:48:06	serhiy.storchaka	set	files: + merge-constants.patch nosy: + rhettinger messages: + msg285944
2017-01-21 05:45:55	methane	set	files: + mergetuples2.patch
2017-01-21 05:18:28	methane	set	messages: + msg285939
2017-01-21 05:11:26	serhiy.storchaka	set	versions: + Python 3.7 nosy: + serhiy.storchaka messages: + msg285938 components: + Interpreter Core type: enhancement
2017-01-21 03:37:58	methane	set	files: + tuplemem.py messages: + msg285934
2017-01-21 03:30:45	methane	create