msg260704 - (view) |
Author: A. Skrobov (A. Skrobov) * |
Date: 2016-02-22 23:06 |
I have a one-line module that assigns a tuple->int dictionary:
holo_table = {(0, 0, 0, 0, 0, 0, 1, 41, 61, 66, 89): 9, (0, 0, 0, 70, 88, 98, 103, 131, 147, 119, 93): 4, [35MB skipped], (932, 643, 499, 286, 326, 338, 279, 200, 280, 262, 115): 5}
When I try to import this module, Python grinds 100% of my CPU for like half an hour, then ultimately crashes with a MemoryError.
How much memory does it need to parse 35MB of data, of a rather simple structure?
Attaching the module, zipped to 10MB.
|
msg260840 - (view) |
Author: A. Skrobov (A. Skrobov) * |
Date: 2016-02-25 06:08 |
A practical note: if, instead of importing crash.py, I do a json.loads, with a few extra transformations:
with open("crash.py") as f: holo_table={tuple(int(z) for z in k.split(', ')):v for k,v in json.loads(f.readlines()[0][13:].replace('(','"').replace(')','"')).iteritems()}
--the whole data structure loads in a jiffy.
Makes me wonder why this roundabout approach is so much more efficient than the native parsing.
|
msg260859 - (view) |
Author: Christian Heimes (christian.heimes) * |
Date: 2016-02-25 14:13 |
It takes about 12 seconds to byte compile crash.py on my computer and less than half a second to import about 28 MB of byte code:
$ rm -rf crash.pyc __pycache__/
$ time python2.7 -c 'import crash'
real 0m11.930s
user 0m9.859s
sys 0m2.085s
$ time python2.7 -c 'import crash'
real 0m0.484s
user 0m0.401s
sys 0m0.083s
$ time python3.4 -c 'import crash'
real 0m12.327s
user 0m10.106s
sys 0m2.236s
$ time python3.4 -c 'import crash'
real 0m0.435s
user 0m0.367s
sys 0m0.069s
|
msg260860 - (view) |
Author: Christian Heimes (christian.heimes) * |
Date: 2016-02-25 14:16 |
$ python2.7 --version
Python 2.7.10
$ python3.4 --version
Python 3.4.3
$ uname -r -o
4.3.5-300.fc23.x86_64 GNU/Linux
Intel(R) Core(TM) i7-4900MQ CPU @ 2.80GHz
|
msg260865 - (view) |
Author: A. Skrobov (A. Skrobov) * |
Date: 2016-02-25 16:23 |
Mine is on Windows. I've now installed both 2.7.10 and 3.4.3 to reconfirm, and it's still the same, on both of them, except that on 3.4.3 it crashes with a MemoryError much faster (within a couple minutes).
|
msg260868 - (view) |
Author: Eryk Sun (eryksun) * |
Date: 2016-02-25 17:08 |
I don't think this is Windows related. Are you using 32-bit Python? On Linux, if I limit the process address space to 2 gigs, it crashes almost immediately:
$ ulimit -v 2000000
$ python-dbg -c 'import crash'
Segmentation fault
It runs out of memory while parsing the file:
Program received signal SIGSEGV, Segmentation fault.
0x000000000048ecb4 in PyObject_Malloc (nbytes=72) at ../Objects/obmalloc.c:932
932 ../Objects/obmalloc.c: No such file or directory.
(gdb) bt
#0 0x000000000048ecb4 in PyObject_Malloc (nbytes=72) at ../Objects/obmalloc.c:932
#1 0x000000000048f8be in _PyObject_DebugMallocApi (id=111 'o', nbytes=40) at ../Objects/obmalloc.c:1469
#2 0x000000000048fa2b in _PyObject_DebugReallocApi (api=111 'o', p=0x0, nbytes=40) at ../Objects/obmalloc.c:1520
#3 0x000000000048f83c in _PyObject_DebugRealloc (p=0x0, nbytes=40) at ../Objects/obmalloc.c:1441
#4 0x0000000000418a02 in PyNode_AddChild (n1=0x7fff85cbffb8, type=318, str=0x0, lineno=1, col_offset=6446977)
at ../Parser/node.c:98
#5 0x0000000000418f53 in push (s=0xa6b680, type=318, d=0x8bfc70 <dfas+2480>, newstate=1, lineno=1, col_offset=6446977)
at ../Parser/parser.c:126
#6 0x000000000041946c in PyParser_AddToken (ps=0xa6b680, type=262144, str=0x7fff85cba720 "11", lineno=1,
col_offset=6446977, expected_ret=0x7fffffffd324) at ../Parser/parser.c:252
#7 0x0000000000419f19 in parsetok (tok=0xa5f650, g=0x8c0ac0 <_PyParser_Grammar>, start=257, err_ret=0x7fffffffd300,
flags=0x7fffffffd2ec) at ../Parser/parsetok.c:198
#8 0x0000000000419cb6 in PyParser_ParseFileFlagsEx (fp=0xa19b70, filename=0xa53b00 "crash.py",
g=0x8c0ac0 <_PyParser_Grammar>, start=257, ps1=0x0, ps2=0x0, err_ret=0x7fffffffd300, flags=0x7fffffffd2ec)
at ../Parser/parsetok.c:106
|
msg260870 - (view) |
Author: A. Skrobov (A. Skrobov) * |
Date: 2016-02-25 17:23 |
My Python is 64-bit, but my computer only has 2GB physical RAM.
|
msg260874 - (view) |
Author: Eryk Sun (eryksun) * |
Date: 2016-02-25 17:47 |
> My Python is 64-bit, but my computer only has 2GB physical RAM.
That explains why it takes half an hour to crash. It's thrashing on page faults. Adding another paging file or increasing the size of your current paging file should allow this to finish parsing... eventually in maybe an hour or two.
The design of the parser isn't something I've delved into very much, but possibly the dynamic nature of Python prevents optimizing the memory footprint here. Or maybe no one has seen the need to optimize parsing containers (dicts, sets, lists, tuples) that have constant literals. This is an inefficient way to store 35 MiB of data, as opposed to XML, JSON, or a binary format.
|
msg260889 - (view) |
Author: A. Skrobov (A. Skrobov) * |
Date: 2016-02-26 06:47 |
Yes, I understand that this is a matter of memory consumption, which is why I submitted this ticket as "resource usage".
What I don't understand is, what could possibly require gigabytes of memory for this task?
|
msg260930 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2016-02-27 08:00 |
I though this might be tokenizer issue, but this is just large memory consumption for AST tree. Simpler example:
./python -c 'import ast; ast.parse("0,"*1000000, mode="eval")'
It takes over 450MB of memory on 32-bit system, over 450 bytes per tuple item. Increase the multiplier to 10000000 leads to swapping and failing with MemoryError.
Of course it would be nice to decrease memory consumption, but this looks rather as new feature.
|
msg261334 - (view) |
Author: A. Skrobov (A. Skrobov) * |
Date: 2016-03-08 08:54 |
OK, I've now looked into it with a fresh build of 3.6 trunk on Linux x64.
Peak memory usage is about 3KB per node:
$ /usr/bin/time -v ./python -c 'import ast; ast.parse("0,"*1000000, mode="eval")'
Command being timed: "./python -c import ast; ast.parse("0,"*1000000, mode="eval")"
...
Maximum resident set size (kbytes): 3015552
...
Out of the 2945 MB total peak memory usage, only 330 MB are attributable to the heap use:
$ valgrind ./python -c 'import ast; ast.parse("0,"*1000000, mode="eval")'
==21232== ...
==21232== HEAP SUMMARY:
==21232== in use at exit: 3,480,447 bytes in 266 blocks
==21232== total heap usage: 1,010,171 allocs, 1,009,905 frees, 348,600,304 bytes allocated
==21232== ...
So, apparently, it's not the nodes themselves taking up a disproportionate amount of memory -- it's the heap getting so badly fragmented that 89% of its memory allocation is wasted.
gprof confirms that there are lots of mallocs/reallocs going on, up to 21 per node:
$ gprof python
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls s/call s/call name
17.82 0.31 0.31 2000020 0.00 0.00 PyParser_AddToken
13.79 0.55 0.24 2 0.12 0.16 freechildren
12.64 0.77 0.22 21039125 0.00 0.00 _PyMem_RawMalloc
6.32 0.88 0.11 17000101 0.00 0.00 PyNode_AddChild
5.75 0.98 0.10 28379846 0.00 0.00 visit_decref
5.75 1.08 0.10 1000004 0.00 0.00 ast_for_expr
4.60 1.16 0.08 2867 0.00 0.00 collect
4.02 1.23 0.07 20023405 0.00 0.00 _PyObject_Free
2.30 1.27 0.04 3031305 0.00 0.00 _PyType_Lookup
2.30 1.31 0.04 3002234 0.00 0.00 _PyObject_GenericSetAttrWithDict
2.30 1.35 0.04 1 0.04 0.05 ast2obj_expr
1.72 1.38 0.03 28366858 0.00 0.00 visit_reachable
1.72 1.41 0.03 12000510 0.00 0.00 subtype_traverse
1.72 1.44 0.03 3644 0.00 0.00 list_traverse
1.44 1.47 0.03 3002161 0.00 0.00 _PyObjectDict_SetItem
1.15 1.49 0.02 20022785 0.00 0.00 PyObject_Free
1.15 1.51 0.02 15000763 0.00 0.00 _PyObject_Realloc
So, I suppose what needs to be done is to try reducing the number of reallocs involved in handling an AST node; the representation of the nodes themselves doesn't need to change.
|
msg261345 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2016-03-08 11:09 |
> So, apparently, it's not the nodes themselves taking up a disproportionate amount of memory -- it's the heap getting so badly fragmented that 89% of its memory allocation is wasted.
Yeah, the Python parser+compiler badly uses the memory allocator. See my "benchmark" for the memory allocator: python_memleak.py.
The classical pattern of memory fragmentation is:
* allocate a lot of small objects
* allocate a few objects
* allocate more small objects
* free *all* small objects
All objects must allocated on the heap, not mmap(). So the maximum size of a single object must be 128 KB (usual threshold used in malloc() to switch beetween the heap memory and mmap).
We can try various hacks to reduce the fragmentation, but IMHO the only real fix is to use a different memory allocator for the compiler and then free everything allocated by the parser+compiler at once.
We already have an "arena" memory allocator: Include/pyarena.h, Python/pyarena.c. It is already used by the parser+compiler, but it's only used for AST objects in practice. The parser uses the PyMem allocator API (ex: PyMem_Malloc).
|
msg261346 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2016-03-08 11:10 |
My misc notes about memory fragmentation: https://haypo-notes.readthedocs.org/heap_fragmentation.html
|
msg261347 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2016-03-08 11:15 |
libapr library of Apache is designed to group all memory allocations required to handle an HTTP request. It is possible to release *all* memory allocations of the request at once.
It looks like the the "pool" object:
* https://apr.apache.org/docs/apr/2.0/group__apr__pools.html
* https://apr.apache.org/docs/apr/2.0/group__apr__allocator.html
|
msg261349 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2016-03-08 11:35 |
On my computer peak memory usage in non-debug build is about 450 MB. Peak memory usage in debug build is about 730 MB. I'm not sure this is due to memory fragmentation.
|
msg261362 - (view) |
Author: A. Skrobov (A. Skrobov) * |
Date: 2016-03-08 14:31 |
@Serhiy: if your build is 32-bit, then every node is half the size, as it mostly consists of pointers.
The amount of heap fragmentation can also depend on gcc/glibc version.
|
msg261427 - (view) |
Author: A. Skrobov (A. Skrobov) * |
Date: 2016-03-09 14:07 |
The attached patch for the parser reduces "Maximum resident set size (kbytes)" threefold, for the degenerate example of 'import ast; ast.parse("0,"*1000000, mode="eval")', by eliminating many CST nodes that have a single child.
According to the comment in node.c -- "89% of PyObject_REALLOC calls in PyNode_AddChild passed 1 for the size" -- the memory saving should be generally applicable, and not limited just to this degenerate case.
Modules/parsermodule.c is not yet updated to match. Please tell if you want me to do that, in case that my proposed change to the parser is acceptable.
|
msg261503 - (view) |
Author: A. Skrobov (A. Skrobov) * |
Date: 2016-03-10 14:39 |
I've now tried it with "perf.py -r -m", and the memory savings are as follows:
### 2to3 ###
Mem max: 45976.000 -> 47440.000: 1.0318x larger
### chameleon_v2 ###
Mem max: 436968.000 -> 401088.000: 1.0895x smaller
### django_v3 ###
Mem max: 23808.000 -> 22584.000: 1.0542x smaller
### fastpickle ###
Mem max: 10768.000 -> 9248.000: 1.1644x smaller
### fastunpickle ###
Mem max: 10988.000 -> 9328.000: 1.1780x smaller
### json_dump_v2 ###
Mem max: 10892.000 -> 10612.000: 1.0264x smaller
### json_load ###
Mem max: 11012.000 -> 9908.000: 1.1114x smaller
### nbody ###
Mem max: 8696.000 -> 7944.000: 1.0947x smaller
### regex_v8 ###
Mem max: 12504.000 -> 9432.000: 1.3257x smaller
### tornado_http ###
Mem max: 27636.000 -> 27608.000: 1.0010x smaller
So, on these benchmarks, the saving is not threefold, of course; but still quite substantial (up to 30%).
The run time difference, on these benchmarks, is between "1.04x slower" and "1.06x faster", for reasons beyond my understanding (variability of background load, possibly?)
|
msg265412 - (view) |
Author: A. Skrobov (A. Skrobov) * |
Date: 2016-05-12 15:22 |
Ping? This patch is two months old now.
|
msg267534 - (view) |
Author: A. Skrobov (A. Skrobov) * |
Date: 2016-06-06 15:09 |
Now that #26526 landed (thanks to everybody involved!), I'm requesting a review on an updated version of my patch, which addresses the excessive memory consumption by the parser.
The description of my original patch still applies:
> The attached patch for the parser reduces "Maximum resident set size (kbytes)" threefold, for the degenerate example of 'import ast; ast.parse("0,"*1000000, mode="eval")', by eliminating many CST nodes that have a single child.
>
> According to the comment in Parser/node.c -- "89% of PyObject_REALLOC calls in PyNode_AddChild passed 1 for the size" -- the memory saving should be generally applicable, and not limited just to this degenerate case.
> I've now tried it with "perf.py -r -m", and the memory savings are as follows:
> ...
> on these benchmarks, the saving is not threefold, of course; but still quite substantial (up to 30%).
My new patch updates Modules/parsermodule.c to accept such "compressed" nodes, so that everything still builds cleanly and passes the tests.
|
msg267585 - (view) |
Author: Benjamin Peterson (benjamin.peterson) * |
Date: 2016-06-07 04:21 |
It seems to me a simpler solution would be allocate all nodes for a parse tree in an arena.
|
msg267602 - (view) |
Author: A. Skrobov (A. Skrobov) * |
Date: 2016-06-07 07:01 |
An arena might help reclaim the memory once the parsing is complete, but it wouldn't reduce the peak memory consumption by the parser, and so it wouldn't prevent a MemoryError when parsing a 35MB source on a PC with 2GB of RAM.
|
msg267618 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2016-06-07 10:19 |
Benjamin Peterson: "It seems to me a simpler solution would be allocate all nodes for a parse tree in an arena."
Exactly, that's the real fix. Just make sure that deallocating this arena does punch holes in the "heap memory". For example, on Linux, it means using mmap() rather than sbrk() to allocate memory.
A. Skrobov: "An arena might help reclaim the memory once the parsing is complete, but it wouldn't reduce the peak memory consumption by the parser, and so it wouldn't prevent a MemoryError when parsing a 35MB source on a PC with 2GB of RAM."
Parsing a 35 MB source doesn't seem like a good idea :-) I think that it's ok to have a memory peak, but it's not ok to not release the memory later.
Do you have a solution avoid the memory peak *and* don't create memory fragmentation?
|
msg267620 - (view) |
Author: A. Skrobov (A. Skrobov) * |
Date: 2016-06-07 10:37 |
My current patch avoids the memory peak *and* doesn't add any memory fragmentation on top of whatever is already there.
In other words, it makes the parser better in this one aspect, and it doesn't make it worse in any aspect.
|
msg267622 - (view) |
Author: A. Skrobov (A. Skrobov) * |
Date: 2016-06-07 11:01 |
(Updating the issue title, to avoid confusion between two orthogonal concerns)
|
msg270002 - (view) |
Author: A. Skrobov (A. Skrobov) * |
Date: 2016-07-08 18:10 |
Fixing whitespace in the patch, and including an update for the docs
|
msg272772 - (view) |
Author: Xavier Combelle (xcombelle) * |
Date: 2016-08-15 15:47 |
Looks that there is two bug as partial solution of the main bug which is reduce memory consumption of the parser:
- The compression thing
- the reducing of heap fragmentation
Could each sub bug have it's own bug tracker and mark them blocking for the main bug ?
A. Skrobov Can you describe the big picture of your patch, without I don't understand the logic of your modifications.
|
msg272786 - (view) |
Author: A. Skrobov (A. Skrobov) * |
Date: 2016-08-15 17:40 |
Xavier, the big picture description of my patch is in http://bugs.python.org/file43665/devguide_patch
The heap fragmentation was observed by Victor, not by myself.
Victor, could you please create a new ticket for your python_memleak.py reproducer?
|
msg274393 - (view) |
Author: A. Skrobov (A. Skrobov) * |
Date: 2016-09-05 07:32 |
Updated the comment for Init_ValidationGrammar()
|
msg299022 - (view) |
Author: A. Skrobov (A. Skrobov) * |
Date: 2017-07-24 21:48 |
To bump this year-old issue, I have delivered a talk at EuroPython 2017, explaining what my patch does, how it does what it does, and why it's a good thing to do.
You can watch my talk at https://www.youtube.com/watch?v=dyRDmcsTwhE&t=1h52m38s
|
msg330197 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2018-11-21 12:48 |
Do you mind to create a pull request for reviewing? I can not promise that it will be merged, but it will have a chance.
For reference, the article about this patch (on Russian): https://habr.com/post/314062/ .
|
msg331240 - (view) |
Author: A. Skrobov (A. Skrobov) * |
Date: 2018-12-06 17:34 |
@Serhiy: incredibly, this patch from 2.5 years ago required very minor changes to apply to the latest master. Shows how ossified the parser is :-)
Now posted as https://github.com/python/cpython/pull/10995
|
msg331848 - (view) |
Author: A. Skrobov (A. Skrobov) * |
Date: 2018-12-14 18:29 |
I've run pyperformance (0.7.0) with my updated patch, and posted results at the PR page. They look encouraging enough.
|
msg369673 - (view) |
Author: Pablo Galindo Salgado (pablogsal) * |
Date: 2020-05-23 00:28 |
Closing as we have a new parser in 3.9 and the current one will be removed in Python 3.10.
|
|
Date |
User |
Action |
Args |
2022-04-11 14:58:27 | admin | set | github: 70603 |
2020-05-23 00:28:30 | pablogsal | set | status: open -> closed resolution: rejected messages:
+ msg369673
stage: patch review -> resolved |
2019-04-04 12:25:01 | methane | set | nosy:
+ methane
|
2019-02-12 23:27:01 | pablogsal | set | nosy:
+ pablogsal
|
2018-12-19 10:03:34 | serhiy.storchaka | set | assignee: serhiy.storchaka |
2018-12-14 18:29:03 | A. Skrobov | set | messages:
+ msg331848 |
2018-12-06 17:34:00 | A. Skrobov | set | messages:
+ msg331240 |
2018-12-06 17:00:16 | python-dev | set | keywords:
+ patch pull_requests:
+ pull_request10236 |
2018-11-21 21:03:03 | brett.cannon | set | nosy:
- brett.cannon
|
2018-11-21 13:14:27 | giampaolo.rodola | set | nosy:
- giampaolo.rodola
|
2018-11-21 12:48:59 | serhiy.storchaka | set | messages:
+ msg330197 versions:
+ Python 3.8, - Python 3.6 |
2017-07-24 21:48:57 | A. Skrobov | set | messages:
+ msg299022 |
2017-07-13 12:29:26 | cstratak | set | nosy:
+ cstratak
|
2017-07-12 09:43:35 | petr.viktorin | set | nosy:
+ petr.viktorin
|
2016-09-05 07:32:14 | A. Skrobov | set | files:
+ patch
messages:
+ msg274393 |
2016-08-15 17:40:09 | A. Skrobov | set | messages:
+ msg272786 |
2016-08-15 15:47:13 | xcombelle | set | nosy:
+ xcombelle messages:
+ msg272772
|
2016-07-08 18:11:10 | A. Skrobov | set | files:
+ devguide_patch |
2016-07-08 18:10:52 | A. Skrobov | set | files:
+ patch
messages:
+ msg270002 |
2016-06-07 11:01:45 | A. Skrobov | set | messages:
+ msg267622 title: Fragmentation of the heap memory in the Python parser -> Excessive peak memory consumption by the Python parser |
2016-06-07 10:37:35 | A. Skrobov | set | messages:
+ msg267620 |
2016-06-07 10:19:14 | vstinner | set | messages:
+ msg267618 |
2016-06-07 07:01:24 | A. Skrobov | set | messages:
+ msg267602 |
2016-06-07 04:21:00 | benjamin.peterson | set | messages:
+ msg267585 |
2016-06-06 15:09:50 | A. Skrobov | set | files:
+ patch nosy:
+ fdrake, brett.cannon, giampaolo.rodola, benjamin.peterson, berker.peksag messages:
+ msg267534
|
2016-05-12 15:22:58 | A. Skrobov | set | messages:
+ msg265412 |
2016-03-10 14:39:57 | A. Skrobov | set | messages:
+ msg261503 |
2016-03-09 14:08:27 | serhiy.storchaka | set | stage: patch review |
2016-03-09 14:07:03 | A. Skrobov | set | files:
+ patch
messages:
+ msg261427 |
2016-03-08 14:31:34 | A. Skrobov | set | messages:
+ msg261362 |
2016-03-08 11:35:37 | serhiy.storchaka | set | messages:
+ msg261349 |
2016-03-08 11:15:12 | vstinner | set | messages:
+ msg261347 |
2016-03-08 11:10:25 | vstinner | set | messages:
+ msg261346 |
2016-03-08 11:09:51 | vstinner | set | title: Out of memory, trying to parse a 35MB dict -> Fragmentation of the heap memory in the Python parser |
2016-03-08 11:09:22 | vstinner | set | files:
+ python_memleak.py nosy:
+ vstinner messages:
+ msg261345
|
2016-03-08 08:54:52 | A. Skrobov | set | messages:
+ msg261334 |
2016-02-27 08:00:59 | serhiy.storchaka | set | assignee: serhiy.storchaka -> (no value) messages:
+ msg260930 versions:
+ Python 3.6, - Python 2.7, Python 3.4 |
2016-02-26 06:47:02 | A. Skrobov | set | messages:
+ msg260889 |
2016-02-25 17:47:34 | eryksun | set | messages:
+ msg260874 |
2016-02-25 17:23:28 | A. Skrobov | set | messages:
+ msg260870 |
2016-02-25 17:08:44 | eryksun | set | nosy:
+ eryksun messages:
+ msg260868 components:
- Windows
|
2016-02-25 16:23:29 | A. Skrobov | set | versions:
+ Python 3.4 nosy:
+ tim.golden, paul.moore, zach.ware, steve.dower
messages:
+ msg260865
components:
+ Windows |
2016-02-25 14:16:07 | christian.heimes | set | messages:
+ msg260860 |
2016-02-25 14:13:16 | christian.heimes | set | nosy:
+ christian.heimes messages:
+ msg260859
|
2016-02-25 06:13:00 | rhettinger | set | nosy:
+ rhettinger
|
2016-02-25 06:08:53 | A. Skrobov | set | messages:
+ msg260840 |
2016-02-24 12:09:40 | serhiy.storchaka | set | assignee: serhiy.storchaka
nosy:
+ serhiy.storchaka |
2016-02-22 23:06:20 | A. Skrobov | create | |