Message 294119 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	dalke
Recipients	alex, benjamin.peterson, dalke, eryksun, holdenweb, methane, ncoghlan, pitrou, rhettinger, vstinner
Date	2017-05-22.02:24:29
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1495419873.19.0.00805594036756.issue21074@psf.upfronthosting.co.za>
In-reply-to

Content
I know this issue was closed many years ago, and I don't propose re-opening it. I write this comment because some of the earlier comments here make it sound like only a foolish or perverse programmer might be affected by this 'too aggressive constant folding'. I'll provide a real-world example of how it affected me. It took me several hours to track it down, and even longer to decide that the fault shouldn't be solely attributed to poor coding practices on my side. I recently updated a code base from Python 2.7 to Python 3.5+. It contains a C extension which handles 64-bit indexing. One of the test files, "test_large.py", exercises the API with multi-gigabyte strings. It typically takes about 10 minutes to run so it isn't part of the normal test suite. Instead, it's decorated with a @unittest.skipUnless(), and only enabled if the file is executed directly or if an environment variable is set. The file creates about 25 multi-GB string constants, like 's = b"\xfe" * (2*32+1)'. Those alone require a minute to create, but that's acceptable overhead because these tests are usually skipped, and when not skipped are only 10% of the total run-time. Here is an example extracted from my code; this tests the population count on a byte string: RUN_ALL = "LARGE_TESTS" in os.environ if __name__ == "__main__": RUN_ALL = True @unittest.skipUnless(RUN_ALL, "large tests not enabled; set LARGE_TESTS") class LargeTests(unittest.TestSuite): def test_popcount(self): s = b"\xfe\xff" (2*31 + 1) self.assertEqual(bitops.byte_popcount(s), 15(2*31 + 1)) if __name__ == "__main__": unittest.main() As part of the update I did a 'move function' refactoring across the code base and re-ran the tests. Unit test discovery seemed to hang and ^C didn't work. Investigation showed it was in __import__("test_large"), which was needed because I had changed code in test_large.py. I finally figured out it was due to constant folding, which created the string, found it was too large, and discarded it. Test discovery took a minute, even though all of the tests were marked as 'skip' and would not be called. Once done, the compiler generated a .pyc file. I hadn't noticed the problem earlier because the .py file rarely changed, so rarely needed to be recompiled. It would have a bigger issue if I ran test_large.py directly, as that will always trigger the one minute compilation, even if I specified a test which didn't use them. (There were no bugs in 64-bit handling during the update so I didn't need to do that.) I was going to report the issue, then found that INADA Naoki had reported almost exactly the same issue here, back in 2014. I was bewildered by some of the comments here, because they seemed to suggest I was at fault for writing such poor quality code in the first place. Others may be in the same boat as me, so I'll make a few points to add some counter-balance. "Are we really supposed to protect programmers from their own folly by second-guessing when constant folding might be required and when it might not?" If there is a 'folly', it is shared with the developers of Python's constant-folding implementation who thought there wouldn't be a problem, and provide no mechanisms (like #2506 proposed in 2008 to disable optimization; also available in #26145) which might help people like me diagnose a problem. But I don't think there was any folly. There was an engineering decision that the benefits of constant folding outweighed the negatives. Just like in my case there was an engineering decision that constant expressions which worked in Python 2.5-2.7 didn't need to be made future-proof against improved constant-folding. "How is the interpreter supposed to know the function isn't called?" Perhaps a standard-library decorator which says that a function will be skipped when run as a unit test? But really, the question should be "how is the byte-code compiler* supposed to know". This highlights a shift between the Python I started with, which left everything up to the run-time virtual machine interpreter, and the smarter compile-time optimizer we have now. As it gets smarter, we developers seem to need to know more about how the optimizer works in order to avoid unwanted side-effects. Currently this knowledge is 'arcane'. "simply declare a manifest constant and use that instead" The fundamental problem is there's no way for a programmer to create large constant value which is safe from a sufficiently smart compiler, and nothing which outlines how smart the compiler will get. Instead, people figure out what works operationally, but that's specific to a given CPython version. My code ran into problems because Python's constant folding improved from under me. Even if I follow that advice, how do I avoid having a future version of Python restore the problem? For example, the following function is currently not subject to constant folding. def big_x(): N = 4294967296 return b'x' * N A future version of Python (perhaps 3.9 with an AST optimizer) might implement constant propagation, making this identical to writing "b'x' * 4294967296" now, including the long compile time needed to create and discard that 4GB string. Of course it's harder to do constant folding on a value in the module namespace, so the following is less likely to run into future problems. (A hypothetical Python could do optimistic constant propagation during module compilation, then at run-time check if N has changed, and if not, use the pre-computed value.) N = 4294967296 def big_x(): return b'x' * N This distinction is subtle and, I argue, demonstrates that it isn't all that simple. "It would be ashamed to hack-up constant-folding to handle an arcane case of an uncalled function that does space intensive constant operation." I don't think my example test case was arcane. To the contrary, I think many people would write the test the same way. I agree that few people construct test cases that large. An alternate series of questions might start with the Bayesian: of those who need such constants, how many are negatively affected (eg, #27695)? How long does it take them to figure out a workaround? And how many of them think to file a bug, read comments here which accuse them of folly and arcane practices, and stop? As I almost did. Yes, it has several 'trivially easy work-around's - once the developer figures out the fundamental problem. Which workarounds are guaranteed to continue to work, and how do non-core developers learn about the possible problem and workarounds? People do get confused about this behavior. One comment from http://stackoverflow.com/questions/34113609/why-does-python-preemptively-hang-when-trying-to-calculate-a-very-large-number/ , which tests a RLIMIT_CPU by computing 10(1010), is "This is spooky". Nor is it easy to figure out. Others from the same SO page write: "I'm guessing that Python is trying to optimize by calculating constants beforehand...which backfires. But I have no proof of this.", "Do you know of a place in the documentation or something where this precalculation of constants is discussed? I tried Googling, but I'm not turning up anything useful", and "It took quite a bit of searching, but I have discovered evidence that Python 3 does precompute constant literals". To be clear, I know nothing will change in the current peephole implementation. I have read most of the issues on bugs.python.org and stackverflow.com related to it. The most frequent assessment is that the current peephole optimizer is good enough, and too fragile, and the hope is that an AST optimizer -- something first proposed at least 9 years ago (#2506) -- is the right solution. This progresses slowly at #11549 (started in 2011), with the same INADA Naoki recently picking it up again because "it was suspended (because of lack of reviewers, maybe)". What I don't want is to see the same useless behavior of creating and discarding large constant strings replicated in the AST branch and justified by an argument that current experience with the peephole optimizer shows that it's not a problem in real-world code. At the very least, I think there should be some documentation about this arcane corner of the CPython implementation. "Special cases aren't special enough to break the rules." The peephole optimizer is nothing but support for the special case of evaluating dynamic code at compile-time. In an aphorism challenge, the peephole optimizer is almost exactly the 'right now' in "Although never is often better than right now." I am not saying the peephole optimizer should not have been added, rather, the Zen of Python gives different answers depending on where one draws the line.

I know this issue was closed many years ago, and I don't propose re-opening it. I write this comment because some of the earlier comments here make it sound like only a foolish or perverse programmer might be affected by this 'too aggressive constant folding'. I'll provide a real-world example of how it affected me. It took me several hours to track it down, and even longer to decide that the fault shouldn't be solely attributed to poor coding practices on my side.

I recently updated a code base from Python 2.7 to Python 3.5+. It contains a C extension which handles 64-bit indexing. One of the test files, "test_large.py", exercises the API with multi-gigabyte strings. It typically takes about 10 minutes to run so it isn't part of the normal test suite. Instead, it's decorated with a @unittest.skipUnless(), and only enabled if the file is executed directly or if an environment variable is set.

The file creates about 25 multi-GB string constants, like 's = b"\xfe" * (2**32+1)'. Those alone require a minute to create, but that's acceptable overhead because these tests are usually skipped, and when not skipped are only 10% of the total run-time. Here is an example extracted from my code; this tests the population count on a byte string:

RUN_ALL = "LARGE_TESTS" in os.environ
if __name__ ==  "__main__":
    RUN_ALL = True

@unittest.skipUnless(RUN_ALL, "large tests not enabled; set LARGE_TESTS")
class LargeTests(unittest.TestSuite):
    def test_popcount(self):
        s = b"\xfe\xff" * (2**31 + 1)
        self.assertEqual(bitops.byte_popcount(s), 15*(2**31 + 1))

if __name__ ==  "__main__":
    unittest.main()

As part of the update I did a 'move function' refactoring across the code base and re-ran the tests. Unit test discovery seemed to hang and ^C didn't work. Investigation showed it was in __import__("test_large"), which was needed because I had changed code in test_large.py. I finally figured out it was due to constant folding, which created the string, found it was too large, and discarded it. Test discovery took a minute, even though all of the tests were marked as 'skip' and would not be called.

Once done, the compiler generated a .pyc file. I hadn't noticed the problem earlier because the .py file rarely changed, so rarely needed to be recompiled. It would have a bigger issue if I ran test_large.py directly, as that will always trigger the one minute compilation, even if I specified a test which didn't use them. (There were no bugs in 64-bit handling during the update so I didn't need to do that.)

I was going to report the issue, then found that INADA Naoki had reported almost exactly the same issue here, back in 2014.

I was bewildered by some of the comments here, because they seemed to suggest I was at fault for writing such poor quality code in the first place. Others may be in the same boat as me, so I'll make a few points to add some counter-balance.

"Are we really supposed to protect programmers from their own folly by second-guessing when constant folding might be required and when it might not?"

If there is a 'folly', it is shared with the developers of Python's constant-folding implementation who thought there wouldn't be a problem, and provide no mechanisms (like #2506 proposed in 2008 to disable optimization; also available in #26145) which might help people like me diagnose a problem. But I don't think there was any folly. There was an engineering decision that the benefits of constant folding outweighed the negatives. Just like in my case there was an engineering decision that constant expressions which worked in Python 2.5-2.7 didn't need to be made future-proof against improved constant-folding.

"How is the interpreter supposed to know the function isn't called?"

Perhaps a standard-library decorator which says that a function will be skipped when run as a unit test? But really, the question should be "how is the *byte-code compiler* supposed to know". This highlights a shift between the Python I started with, which left everything up to the run-time virtual machine interpreter, and the smarter compile-time optimizer we have now. As it gets smarter, we developers seem to need to know more about how the optimizer works in order to avoid unwanted side-effects. Currently this knowledge is 'arcane'.

"simply declare a manifest constant and use that instead"

The fundamental problem is there's no way for a programmer to create large constant value which is safe from a sufficiently smart compiler, and nothing which outlines how smart the compiler will get. Instead, people figure out what works operationally, but that's specific to a given CPython version.

My code ran into problems because Python's constant folding improved from under me. Even if I follow that advice, how do I avoid having a future version of Python restore the problem?

For example, the following function is currently not subject to constant folding.

def big_x():
    N = 4294967296
    return b'x' * N

A future version of Python (perhaps 3.9 with an AST optimizer) might implement constant propagation, making this identical to writing "b'x' * 4294967296" now, including the long compile time needed to create and discard that 4GB string.

Of course it's harder to do constant folding on a value in the module namespace, so the following is less likely to run into future problems. (A hypothetical Python could do optimistic constant propagation during module compilation, then at run-time check if N has changed, and if not, use the pre-computed value.)

N = 4294967296
def big_x():
    return b'x' * N

This distinction is subtle and, I argue, demonstrates that it isn't all that simple.


"It would be ashamed to hack-up constant-folding to handle an arcane case of an uncalled function that does space intensive constant operation."

I don't think my example test case was arcane. To the contrary, I think many people would write the test the same way. I agree that few people construct test cases that large. An alternate series of questions might start with the Bayesian: of those who need such constants, how many are negatively affected (eg, #27695)? How long does it take them to figure out a workaround? And how many of them think to file a bug, read comments here which accuse them of folly and arcane practices, and stop? As I almost did.

Yes, it has several 'trivially easy work-around's - once the developer figures out the fundamental problem. Which workarounds are guaranteed to continue to work, and how do non-core developers learn about the possible problem and workarounds?

People do get confused about this behavior. One comment from http://stackoverflow.com/questions/34113609/why-does-python-preemptively-hang-when-trying-to-calculate-a-very-large-number/ , which tests a RLIMIT_CPU by computing 10**(10**10), is "This is spooky". Nor is it easy to figure out. Others from the same SO page write: "I'm guessing that Python is trying to optimize by calculating constants beforehand...which backfires. But I have no proof of this.", "Do you know of a place in the documentation or something where this precalculation of constants is discussed? I tried Googling, but I'm not turning up anything useful", and "It took quite a bit of searching, but I have discovered evidence that Python 3 does precompute constant literals".

To be clear, I know nothing will change in the current peephole implementation. I have read most of the issues on bugs.python.org and stackverflow.com related to it. The most frequent assessment is that the current peephole optimizer is good enough, and too fragile, and the hope is that an AST optimizer -- something first proposed at least 9 years ago (#2506) -- is the right solution. This progresses slowly at #11549 (started in 2011), with the same INADA Naoki recently picking it up again because "it was suspended (because of lack of reviewers, maybe)". 

What I don't want is to see the same useless behavior of creating and discarding large constant strings replicated in the AST branch and justified by an argument that current experience with the peephole optimizer shows that it's not a problem in real-world code.

At the very least, I think there should be some documentation about this arcane corner of the CPython implementation.

"Special cases aren't special enough to break the rules."

The peephole optimizer is nothing but support for the special case of evaluating dynamic code at compile-time. In an aphorism challenge, the peephole optimizer is almost exactly the '*right* now' in "Although never is often better than *right* now." I am not saying the peephole optimizer should not have been added, rather, the Zen of Python gives different answers depending on where one draws the line.

History
Date	User	Action	Args
2017-05-22 02:24:33	dalke	set	recipients: + dalke, rhettinger, holdenweb, ncoghlan, pitrou, vstinner, benjamin.peterson, alex, methane, eryksun
2017-05-22 02:24:33	dalke	set	messageid: <1495419873.19.0.00805594036756.issue21074@psf.upfronthosting.co.za>
2017-05-22 02:24:33	dalke	link	issue21074 messages
2017-05-22 02:24:29	dalke	create