This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Excessive memory use or memory fragmentation when unpickling many small objects
Type: resource usage Stage: patch review
Components: Library (Lib) Versions: Python 3.8
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Ellenbogen, alexandre.vassalotti, methane, pitrou, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2019-04-21 17:11 by Ellenbogen, last changed 2022-04-11 14:59 by admin.

Files
File name Uploaded Description Edit
load.py Ellenbogen, 2019-04-21 17:11
common.py Ellenbogen, 2019-04-21 17:11
dump.py Ellenbogen, 2019-04-21 18:58
Pull Requests
URL Status Linked Edit
PR 13036 open serhiy.storchaka, 2019-05-01 12:28
Messages (8)
msg340615 - (view) Author: Paul Ellenbogen (Ellenbogen) Date: 2019-04-21 17:11
Python encounters significant memory fragmentation when unpickling many small objects.

I have attached two scripts that I believe demonstrate the issue. When you run "dumpy.py" it will generate a large list of namedtuples, then write that list to a file using pickle. Before it does so, it pauses for user input. Before exiting the script you can view the memory usage in htop or whatever your preferred method is.

The "load.py" script loads the file written by dump.py. After loading the data is complete, it waits for user input. The memory usage at the point where the script is waiting for user input is (more than) twice as much in the "load" case as the "dump" case.

The small objects in the list I am storing have 3 values, and I have tested three alternative representations: tuple, namedtuple, and a custom class. The namedtuple and custom class both have the memory use/fragmentation issue. The built in tuple type does not have this issue. Using optimize in pickletools doesn't seem to make a difference.

Matthew Cowles from the python help list had some good suggestions, and found that the object size themselves, as observed by sys.getsizeof was different before and after pickling. Perhaps this is something other than memory fragmentation, or something in addition to memory fragmentation.

Although high water mark is similar for both scripts, the pickling script settles down on a reasonably smaller memory footprint. I would still consider the long run memory waste of unpickling a bug. For example in my use case I will run one instance of the equivalent of pickling script, then run many many instances of the script that unpickles.


These scripts were run with Python 3.6.7 (GCC 8.2.0) on Ubuntu 18.10.
msg340616 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019-04-21 17:32
The difference is because in the first case all floats are the same float object 0.0, but in the second case they are different objects. For more reaĺistic comparison use different floats (for example random()).
msg340617 - (view) Author: Paul Ellenbogen (Ellenbogen) Date: 2019-04-21 18:58
Good point. I have created a new version of dump that uses random() instead. float reuse explains the getsizeof difference, but there is still a significant memory usage difference. This makes sense to me because the original code I saw this issue in is more analogous to random()
msg341179 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2019-05-01 04:41
Memory allocation pattern is:

alloc 24  # float
alloc 24
alloc 24
alloc 64  # temporary tuple
alloc 72
<repeat>
free 64  # free temporary tuples
free 64
free 64
<repeat>

This cause some sort of fragmentation.  Some pools in arenas are unused.
This prevents pymalloc to return arenas to OS.
(Note that pymalloc manages memory as arena (256KiB) > pool (4KiB) > blocks (requested sizes <= 512).  pymalloc can return the memory to OS only when arena is clean)

But this is not too bad because many pools is free.  Any allocation which
size < 512 can reuse the free pools.
If you run some code after unpickle, the pools will be reused efficiently.

(In case of very bad fragmentation, many pools are dirty: some blocks in pools are used while
many blocks in the pool is free.  So only same size alloc request can use the pool.)


There are two approach to fix this problem.

1. Investigate why temporary tuple is not freed until last stage of unpickle.
2. When there are too many free pools, return some by MADV_FREE or MADV_DONTNEED.

I think (1) should be considered first.  But (2) is more general solution.
msg341181 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2019-05-01 07:35
I confirmed this fragmentation is caused by memo in Unpickler.

Pickler memos "reduce"-ed tuples while it is just a temporary object.
I am not sure that this behavior is good.
msg341193 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019-05-01 12:30
PR 13036 makes the C implementation no longer memoizing temporary objects. This decreases memory fragmentation and peak memory consumption on pickling and unpickling.
msg341251 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2019-05-02 06:26
I'm using 1/10 version of dump.py
I removed total_size() because it creates some garbages.

sys._debugmallocstats() after load.py:

master:

# arenas allocated total           =                1,223
# arenas reclaimed                 =                    5
# arenas highwater mark            =                1,218
# arenas allocated current         =                1,218
1218 arenas * 262144 bytes/arena   =          319,291,392

# bytes in allocated blocks        =          218,026,128
# bytes in available blocks        =              149,024
23835 unused pools * 4096 bytes    =           97,628,160


PR 13036:

# arenas allocated total           =                  849
# arenas reclaimed                 =                    3
# arenas highwater mark            =                  846
# arenas allocated current         =                  846
846 arenas * 262144 bytes/arena    =          221,773,824

# bytes in allocated blocks        =          217,897,968
# bytes in available blocks        =              140,096
61 unused pools * 4096 bytes       =              249,856

Now "arena allocated current" is same to after dump.py:

# arenas allocated total           =                  847
# arenas reclaimed                 =                    1
# arenas highwater mark            =                  846
# arenas allocated current         =                  846
846 arenas * 262144 bytes/arena    =          221,773,824

# bytes in allocated blocks        =          217,998,792
# bytes in available blocks        =              131,112
38 unused pools * 4096 bytes       =              155,648


It looks nice.  Additionally, both of "time python dump.py"
and "time python load.py" become slightly faster.

master dump:
dump (Note that this time includes not only dump, but also constructing data)
real    0m3.539s
user    0m3.266s
sys     0m0.196s

master load:
real    0m1.408s
user    0m1.292s
sys     0m0.116s

PR-13036 dump:
real    0m2.758s
user    0m2.598s
sys     0m0.088s

PR-13036 load:
real    0m1.239s
user    0m1.183s
sys     0m0.056s


Would pickle experts review the PR?
msg341284 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019-05-02 16:43
Great work Inada-san! Thank you for your investigations!

PR 13036 increases the chance of using borrowed references during pickling. Since this bug exists in the current code too (it just can be exposed in smaller number of cases), it should be fixed in any case. So I going to fix this bug before merging PR 13036, and fix it in a way that does not prevent the optimization.
History
Date User Action Args
2022-04-11 14:59:14adminsetgithub: 80875
2019-05-02 16:43:18serhiy.storchakasetmessages: + msg341284
2019-05-02 06:27:04methanesetnosy: + pitrou
2019-05-02 06:26:50methanesetmessages: + msg341251
2019-05-01 12:30:38serhiy.storchakasetmessages: + msg341193
2019-05-01 12:28:15serhiy.storchakasetkeywords: + patch
stage: patch review
pull_requests: + pull_request12956
2019-05-01 07:35:28methanesetmessages: + msg341181
2019-05-01 04:41:39methanesetversions: + Python 3.8, - Python 3.6
2019-05-01 04:41:31methanesetmessages: + msg341179
2019-04-27 08:41:19methanesetnosy: + methane
2019-04-21 18:58:54Ellenbogensetfiles: - dump.py
2019-04-21 18:58:47Ellenbogensetfiles: + dump.py
2019-04-21 18:58:29Ellenbogensetfiles: - dump.py
2019-04-21 18:58:07Ellenbogensetfiles: + dump.py

messages: + msg340617
2019-04-21 17:32:32serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg340616
2019-04-21 17:11:43Ellenbogensetfiles: + common.py
2019-04-21 17:11:38Ellenbogensetfiles: + load.py
2019-04-21 17:11:20Ellenbogencreate