msg285933 - (view) |
Author: Inada Naoki (methane) * |
Date: 2017-01-21 03:30 |
Tuples consists of None, True, False, str and bytes can be merged safely.
To avoid memory leak, this patch shares such tuples in compile unit (module, in most case.)
|
msg285934 - (view) |
Author: Inada Naoki (methane) * |
Date: 2017-01-21 03:37 |
I tried this patch with attached script.
```
$ venv/bin/pip install django flask sqlalchemy
$ PYTHONTRACEMALLOC=5 venv/bin/python3 tuplemem.py > tuples.txt
$ sort tuples.txt | uniq -c | sort -nr > tuplecount
```
## default
memory: (32254693, 32292635)
tuples: 64968
head -n10 tuplecount-default
5479 (None,)
3069 ('self',)
727 (<class 'object'>,)
688 ('__class__',)
321 ('NotImplementedError',)
287 ('self', 'other')
264 (None, None)
207 (False,)
193 (None, 0)
176 (None, False)
## patched
memory: (31224697, 31261892)
tuples: 51298
head -n10 tuplecount-patched
1437 (None,)
727 (<class 'object'>,)
328 ('self',)
264 (None, None)
207 (False,)
193 (None, 0)
138 ('__class__',)
114 (True,)
112 (None, False)
110 ('target', 'fn')
I'll try again with my company's private codebase in next week.
|
msg285938 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2017-01-21 05:11 |
$ ./python -Wa -b -c "lambda: 'a'; lambda: b'a'"
sys:1: BytesWarning: Comparison between bytes and string
sys:1: BytesWarning: Comparison between bytes and string
sys:1: BytesWarning: Comparison between bytes and string
sys:1: BytesWarning: Comparison between bytes and string
sys:1: BytesWarning: Comparison between bytes and string
sys:1: BytesWarning: Comparison between bytes and string
sys:1: BytesWarning: Comparison between bytes and string
Look at _PyCode_ConstantKey(), it may help.
|
msg285939 - (view) |
Author: Inada Naoki (methane) * |
Date: 2017-01-21 05:18 |
Oh, thanks. I hadn't checked the warning.
Since bytes are not important in this time, I'll remove bytes support.
|
msg285944 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2017-01-21 08:48 |
Following patch uses _PyCode_ConstantKey() as a key. This allows supporting integers, floats, bytes, etc.
But I think we can go further and merge constants recursively.
See also issue28813.
|
msg286082 - (view) |
Author: Inada Naoki (methane) * |
Date: 2017-01-23 13:00 |
Thanks. Your patch reduced memory consumption by 2%,
and number of tuples by 15%.
$ cat invtuple.py
import app
import sys
import traceback
import tracemalloc
print(tracemalloc.get_traced_memory())
allobj = sys.getobjects(0, tuple)
print(len(allobj))
$ PYTHONTRACEMALLOC=1 venv/bin/python invtuple.py
(51491189, 51506311)
87143
$ PYTHONTRACEMALLOC=1 venv-mt/bin/python invtuple.py
(50551519, 50566641)
74308
|
msg286154 - (view) |
Author: Inada Naoki (methane) * |
Date: 2017-01-24 08:45 |
merge-constants.patch LGTM
|
msg286160 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2017-01-24 09:17 |
merge-constants.patch is rather a proof of concept. I think it may be more efficient after removing unneeded folded constants (issue28813). But this can be done better with the AST optimizer (issue1346238, issue11549). It would be worth also to merge nested constants. This is similar to interning string constants (see in Objects/codeobject.c).
The benefit is small and we should check that merging constants doesn't have too large cost (CPU time or temporally consumed memory) at compile time.
I think we should first implement the AST optimizer, and then try to combine merging constants with interning string constants.
|
msg286165 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2017-01-24 09:48 |
> But this can be done better with the AST optimizer (issue1346238, issue11549).
There are also the PEP 511 (Code transformers) which was created to implement *external* AST optimizers, and the FAT Python which implements an AST optimizer.
Hum, after "-o no_annotation" discussed on the mailing list, maybe it's time to look again at this PEP? ;-)
|
msg286167 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2017-01-24 09:56 |
> Thanks. Your patch reduced memory consumption by 2%,
merge-constants.patch looks simple enought, but I'm not really impressed by such result. Is 2% worth it?
Since code objects loaded by import are likely for stay for the whole lifetime of a process, I would be interested to experiment interning all constant objects (the tuple of objects, but also each object of these tuples, so support "nested interning") of all code objets (consts, names, varnames, freevars, cellvars) in the marshal module.
To intern constants, we need to generate an unique key to not merge 1 int and 1.0 float. The risk of such global intern dict is to increase the memory usage of the unique keys uses more memory than the decreased caused by the removal of duplicate objects.
More generally, would it be possible to share co_consts (None,) tuples between code objects of two different modules? Or is it already the case with merge-constants.patch?
|
msg286170 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2017-01-24 09:57 |
I don't know what is the most efficient technic to reduce the memory usage of module, but at least I can say imports are the top #1 memory consumer. Look at tracemalloc examples:
https://docs.python.org/dev/library/tracemalloc.html#display-the-top-10
"We can see that Python loaded 4855 KiB data (bytecode and constants) from modules"
|
msg286229 - (view) |
Author: Inada Naoki (methane) * |
Date: 2017-01-25 07:00 |
> More generally, would it be possible to share co_consts (None,) tuples between code objects of two different modules? Or is it already the case with merge-constants.patch?
No. I didn't do it because I don't know it's worth enough.
> merge-constants.patch looks simple enought, but I'm not really impressed by such result. Is 2% worth it?
Maybe, our application is somewhat special.
I'll check memory usage again with only loading major OSS framework/libraries for this issue in next time.
(BTW, it's 3% if -OO is used)
But issue28813 or other optimizations may reduce tuple size and make it possible to share more tuples.
So I think we should evaluate this patch after them.
But I don't familiar with frontend (parser, AST).
I usually look bytecode and runtime.
|
msg286230 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2017-01-25 07:45 |
> merge-constants.patch looks simple enought, but I'm not really impressed by such result. Is 2% worth it?
I'm not impressed too. And merging constants can take time.
> Since code objects loaded by import are likely for stay for the whole lifetime of a process, I would be interested to experiment interning all constant objects (the tuple of objects, but also each object of these tuples, so support "nested interning") of all code objets (consts, names, varnames, freevars, cellvars) in the marshal module.
This could slow down marhalling. Some patches in issue20416 implements this, but current code is faster and much simpler and the benefit of deduplicating constants is too small.
> More generally, would it be possible to share co_consts (None,) tuples between code objects of two different modules? Or is it already the case with merge-constants.patch?
All patches in this issue shares constants only in one compilation unit (module). Otherwise the global intern dict would increase the memory usage too much.
See also the statistics in issue16475. The most memory is spent by strings, and the majority of strings already are interned.
|
msg286231 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2017-01-25 07:51 |
I vote +0 on merge-constants.patch: LGTM, may be useful, but not impressed :-)
Hopefully, .py => .pyc compilation "should" occur only once. Except of the main script, but more and more applications use a tiny entry point of 3 lines. I don't expect any significant slowdown. But even if there is a slowdown, IMHO it's worth it.
|
msg286246 - (view) |
Author: Inada Naoki (methane) * |
Date: 2017-01-25 12:40 |
As I said on ML, now I think embedding some co_* tuples into code object
is better way to reduce number of tuples and memory usage.
So I close this issue for now.
Thanks for review and comments.
|
|
Date |
User |
Action |
Args |
2022-04-11 14:58:42 | admin | set | github: 73522 |
2018-07-23 05:22:57 | methane | set | status: pending -> closed superseder: Same constants in tuples are not merged while compile() resolution: postponed -> duplicate stage: resolved |
2017-01-25 12:40:25 | methane | set | status: open -> pending resolution: postponed messages:
+ msg286246
|
2017-01-25 07:51:54 | vstinner | set | messages:
+ msg286231 |
2017-01-25 07:45:19 | serhiy.storchaka | set | messages:
+ msg286230 |
2017-01-25 07:00:47 | methane | set | messages:
+ msg286229 |
2017-01-24 09:57:54 | vstinner | set | messages:
+ msg286170 |
2017-01-24 09:56:12 | vstinner | set | messages:
+ msg286167 |
2017-01-24 09:48:17 | vstinner | set | messages:
+ msg286165 |
2017-01-24 09:17:12 | serhiy.storchaka | set | priority: normal -> low
messages:
+ msg286160 |
2017-01-24 08:45:01 | methane | set | messages:
+ msg286154 |
2017-01-23 13:00:59 | methane | set | messages:
+ msg286082 |
2017-01-22 17:23:55 | serhiy.storchaka | set | files:
+ merge-constants.patch |
2017-01-22 17:23:27 | serhiy.storchaka | set | files:
- merge-constants.patch |
2017-01-21 08:48:06 | serhiy.storchaka | set | files:
+ merge-constants.patch nosy:
+ rhettinger messages:
+ msg285944
|
2017-01-21 05:45:55 | methane | set | files:
+ mergetuples2.patch |
2017-01-21 05:18:28 | methane | set | messages:
+ msg285939 |
2017-01-21 05:11:26 | serhiy.storchaka | set | versions:
+ Python 3.7 nosy:
+ serhiy.storchaka
messages:
+ msg285938
components:
+ Interpreter Core type: enhancement |
2017-01-21 03:37:58 | methane | set | files:
+ tuplemem.py
messages:
+ msg285934 |
2017-01-21 03:30:45 | methane | create | |