Message 294050 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	dalke
Recipients	dalke
Date	2017-05-20.21:01:36
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1495314097.2.0.680001526088.issue30416@psf.upfronthosting.co.za>
In-reply-to

Content
Others have reported issues like #21074 where the peephole compiler generates and discards large strings, and #30293 where it generates multi-MB integers and stores them in the .pyc. This is a different issue. The code: def tuple20(): return ((((((((1,)20,)20,)20,)20,)20,)20,)20,)20 takes over four minutes (257 seconds) to compile on my machine. The seemingly larger: def tuple30(): return ((((((((1,)30,)30,)30,)30,)30,)30,)30,)30 takes a small fraction of a second to compile, and is equally fast to run. Neither code generates a large data structure. In fact, they only needs about 1K. A sampling profiler of the first case, taken around 30 seconds into the run, shows that nearly all of the CPU time is spent in computing the hash of the highly-nested tuple20, which must visit a very large number of elements even though there are only a small number of unique elements. The call chain is: Py_Main -> PyRun_SimpleFileExFlags -> PyAST_CompileObject -> compiler_body -> compiler_function -> compiler_make_closure -> compiler_add_o -> PyDict_GetItem and then into the tuple hash code. It appears that the peephole optimizer converts the highly-nested tuple20 into a constant value. The compiler then creates the code object with its co_consts. Specifically, compiler_make_closure uses a dictionary to ensure that the elements in co_consts are unique, and mapped to the integer used by LOAD_CONST. It takes about 115 seconds to compute hash(tuple20). I believe the hash is computed twice, once to check if the object is present, and the second to insert it. I suspect most of the other 26 seconds went to computing the intermediate constants in the tuple. Based on the previous issues I highlighted in my first paragraph, I believe this report will be filed under "Doctor, doctor, it hurts when I do this"/"Then don't do it." I see no easy fix, and cannot think of how it would come about in real-world use. I point it out because in reading the various issues related to the peephole optimizer there's a subset of people who propose a look-before-you-leap technical solution of avoiding an optimization where the estimated result is too large. While it does help, it does not avoid all of the negatives of the peephole optimizer, or any AST-based optimizer with similar capabilities. I suspect even most core developers aren't aware of this specific negative.

Others have reported issues like #21074 where the peephole compiler generates and discards large strings, and #30293 where it generates multi-MB integers and stores them in the .pyc.

This is a different issue. The code:

  def tuple20():
    return ((((((((1,)*20,)*20,)*20,)*20,)*20,)*20,)*20,)*20

takes over four minutes (257 seconds) to compile on my machine. The seemingly larger:

  def tuple30():
    return ((((((((1,)*30,)*30,)*30,)*30,)*30,)*30,)*30,)*30

takes a small fraction of a second to compile, and is equally fast to run.

Neither code generates a large data structure. In fact, they only needs about 1K.

A sampling profiler of the first case, taken around 30 seconds into the run, shows that nearly all of the CPU time is spent in computing the hash of the highly-nested tuple20, which must visit a very large number of elements even though there are only a small number of unique elements. The call chain is:

Py_Main -> PyRun_SimpleFileExFlags -> PyAST_CompileObject -> compiler_body -> compiler_function -> compiler_make_closure -> compiler_add_o -> PyDict_GetItem and then into the tuple hash code.

It appears that the peephole optimizer converts the highly-nested tuple20 into a constant value. The compiler then creates the code object with its co_consts. Specifically, compiler_make_closure uses a dictionary to ensure that the elements in co_consts are unique, and mapped to the integer used by LOAD_CONST.

It takes about 115 seconds to compute hash(tuple20). I believe the hash is computed twice, once to check if the object is present, and the second to insert it. I suspect most of the other 26 seconds went to computing the intermediate constants in the tuple.

Based on the previous issues I highlighted in my first paragraph, I believe this report will be filed under "Doctor, doctor, it hurts when I do this"/"Then don't do it." I see no easy fix, and cannot think of how it would come about in real-world use.

I point it out because in reading the various issues related to the peephole optimizer there's a subset of people who propose a look-before-you-leap technical solution of avoiding an optimization where the estimated result is too large. While it does help, it does not avoid all of the negatives of the peephole optimizer, or any AST-based optimizer with similar capabilities. I suspect even most core developers aren't aware of this specific negative.

History
Date	User	Action	Args
2017-05-20 21:01:37	dalke	set	recipients: + dalke
2017-05-20 21:01:37	dalke	set	messageid: <1495314097.2.0.680001526088.issue30416@psf.upfronthosting.co.za>
2017-05-20 21:01:37	dalke	link	issue30416 messages
2017-05-20 21:01:36	dalke	create