Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Too aggressive constant folding #65273

Closed
methane opened this issue Mar 27, 2014 · 25 comments
Closed

Too aggressive constant folding #65273

methane opened this issue Mar 27, 2014 · 25 comments
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage

Comments

@methane
Copy link
Member

methane commented Mar 27, 2014

BPO 21074
Nosy @ncoghlan, @vstinner, @benjaminp, @alex, @methane, @serhiy-storchaka, @eryksun
PRs
  • bpo-30416: Protect the optimizer during constant folding. #4860
  • [3.6] bpo-30416: Protect the optimizer during constant folding. #4865
  • [2.7] bpo-30416: Protect the optimizer during constant folding. (GH-4865) #4896
  • Superseder
  • bpo-30416: constant folding opens compiler to quadratic time hashing
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2019-03-03.14:33:28.122>
    created_at = <Date 2014-03-27.02:26:36.519>
    labels = ['interpreter-core', 'performance']
    title = 'Too aggressive constant folding'
    updated_at = <Date 2019-03-03.14:33:28.121>
    user = 'https://github.com/methane'

    bugs.python.org fields:

    activity = <Date 2019-03-03.14:33:28.121>
    actor = 'serhiy.storchaka'
    assignee = 'none'
    closed = True
    closed_date = <Date 2019-03-03.14:33:28.122>
    closer = 'serhiy.storchaka'
    components = ['Interpreter Core']
    creation = <Date 2014-03-27.02:26:36.519>
    creator = 'methane'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 21074
    keywords = ['patch']
    message_count = 25.0
    messages = ['214928', '214932', '214934', '214941', '214953', '215021', '215023', '215030', '215035', '215080', '215082', '215170', '215174', '215214', '294119', '294123', '294157', '294184', '294211', '294231', '294258', '294259', '308385', '308387', '337031']
    nosy_count = 9.0
    nosy_names = ['holdenweb', 'dalke', 'ncoghlan', 'vstinner', 'benjamin.peterson', 'alex', 'methane', 'serhiy.storchaka', 'eryksun']
    pr_nums = ['4860', '4865', '4896']
    priority = 'low'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = '30416'
    type = 'resource usage'
    url = 'https://bugs.python.org/issue21074'
    versions = ['Python 3.4', 'Python 3.5']

    @methane
    Copy link
    Member Author

    methane commented Mar 27, 2014

    When I run following script:

    def uncalled():
        x = b'x' * (2**32)
    print('Hello')

    Python 3.4 consumes huge memory in spite of uncalled() function isn't called.

    $ /usr/bin/time  -l /usr/local/bin/python2 constant_folding.py
    Hello
            0.02 real         0.01 user         0.00 sys
       4337664  maximum resident set size
    
    $ /usr/bin/time  -l /usr/local/bin/python3 constant_folding.py
    Hello
            2.76 real         1.36 user         1.39 sys
    4300234752  maximum resident set size

    Both of Python 2.7.6 and Python 3.4.0 is built with Homebrew.

    @methane methane added interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage labels Mar 27, 2014
    @benjaminp
    Copy link
    Contributor

    Note the memory is allocated but is immediately thrown away.

    @eryksun
    Copy link
    Contributor

    eryksun commented Mar 27, 2014

    The redesigned peephole optimizer in PY3 improves constant folding. Limiting this would be a step back. Plus you can get the same behavior in PY2 if you first expand the power. For example:

    # using 2**30, for a 32-bit process
    
        def uncalled():
            x = b'x' * 1073741824

    At issue is the design of fold_binops_on_constants in peephole.c:

    http://hg.python.org/cpython/file/04f714765c13/Python/peephole.c#l138

    Some LBYL may be helpful here. It could get the size and integer value of the two objects before evaluating the switch statement. Then use these values to skip certain operations, such as skipping PyNumber_Multiply in the case of BINARY_MULTIPLY.

    @vstinner
    Copy link
    Member

    Hi, I have a project called "astoptimizer" which does the same job than the CPython peephole optimizer, but it is implemented in Python.

    To avoid this issue (create an huge object and then discard it because it is too large), I have a "check_binop_cst" function which checks if the result will fit into the configuration constraints:

    https://bitbucket.org/haypo/astoptimizer/src/024c05ae410458813c20fafe8fe5b8d3cf980f52/astoptimizer/optimizer.py?at=default#cl-598

    Check for "a * b" :
    -------------

            if isinstance(op, ast.Mult):
                if isinstance(right, INT_TYPES):
                    # str * int
                    if isinstance(left, STR_TYPES):
                        return (len(left) * right <= self.config.max_string_length)
                    # tuple * int
                    if isinstance(left, tuple):
                        return (len(left) * right <= self.config.max_tuple_length)
    
                if isinstance(left, INT_TYPES):
                    # int * str
                    if isinstance(right, STR_TYPES):
                        return (left * len(right) <= self.config.max_string_length)
                    # int * tuple
                    if isinstance(right, tuple):
                        return (left * len(right) <= self.config.max_tuple_length)

    Constraints in the config:

    https://bitbucket.org/haypo/astoptimizer/src/024c05ae410458813c20fafe8fe5b8d3cf980f52/astoptimizer/config.py?at=default#cl-187

    Astoptimizer supports the following "constant" types: int, float, complex, bytes, str, tuple, "name constant" (True, False, None). The frozenset type is also supported if the builtins are declared as "cosntant" too.

    @pitrou
    Copy link
    Member

    pitrou commented Mar 27, 2014

    Not only is a lot of memory allocated but it also eats quite a bit of CPU time.

    @holdenweb
    Copy link
    Member

    Is a fix really required? Are we really supposed to protect programmers from their own folly by second-guessing when constant folding might be required and when it might not? How is hte interpreter supposed to know the function isn't called?

    The simple solution to this problem is available in Python: simply declare a manifest constant and use that instead:

    UNCALLED_SIZE = 2**32
    def uncalled():
        x = b'x' * UNCALLED_SIZE

    I'd recommend closing this issue (though I'm glad that people are concerned with optimization, I don't think that an optimizer should be too concerned with those rare cases when their optimization doesn't optimize.

    But I'm not going to close it: wiser heads than mine are required.

    @vstinner
    Copy link
    Member

    Is a fix really required?

    Yes.

    @methane
    Copy link
    Member Author

    methane commented Mar 28, 2014

    For example. I want to write test like this:

    @unittest.skip("This test requires huge memory")
    def test_largebuf():
    client = self.create_client()
    data = b'x' * (2**32 - 1)
    client.send(data)

    @vstinner
    Copy link
    Member

    For example. I want to write test like this:

    @unittest.skip("This test requires huge memory")

    There is @test.support.bigmemtest(size=_2G, memuse=1) decorator which
    is more convinient than the skip decorators. See
    Lib/test/test_bigmem.py.

    You can use tracemalloc to get the memory peak:
    tracemalloc.get_traced_memory()[1] is the peak.

    @rhettinger
    Copy link
    Contributor

    FWIW, I concur with Steve Holden and recommend closing this.

    It would be ashamed to hack-up constant-folding to handle an arcane case of an uncalled function that does space intensive constant operation.

    @rhettinger rhettinger self-assigned this Mar 28, 2014
    @rhettinger
    Copy link
    Contributor

    "Special cases aren't special enough to break the rules."

    @rhettinger rhettinger removed their assignment Mar 29, 2014
    @ncoghlan
    Copy link
    Contributor

    I'm marking this as a duplicate of bpo-11549 (which is about building out an AST optimiser), mostly because I don't think this kind of parameter tweaking makes sense with the current peephole optimiser.

    With a more sophisticated Python implemented optimiser earlier in the pipeline, it becomes more reasonable to make smarter decisions about space/speed trade-offs when dealing with sequence repetition, and even just really large integer values.

    @vstinner
    Copy link
    Member

    The change that I proposed (check parameters before compute the result is
    simple), whereas implementing a new optimizer working on the AST requires
    much more work.

    @rhettinger
    Copy link
    Contributor

    check parameters before compute the result is simple

    Handling the general case (all sequences and all operators) isn't really simple. It also involves doing more computations and making more assumptions than we want to do in this code (it is not a great place to be running code that could raise exceptions, trigger GC, etc).

    I admire your inclination to "fix" this but doing so makes it slower and more fragile. This code has worked well for a long time and we should be reluctant to muck with it just to handle an arcane case that has a trivially easy work-around.

    @dalke
    Copy link
    Mannequin

    dalke mannequin commented May 22, 2017

    I know this issue was closed many years ago, and I don't propose re-opening it. I write this comment because some of the earlier comments here make it sound like only a foolish or perverse programmer might be affected by this 'too aggressive constant folding'. I'll provide a real-world example of how it affected me. It took me several hours to track it down, and even longer to decide that the fault shouldn't be solely attributed to poor coding practices on my side.

    I recently updated a code base from Python 2.7 to Python 3.5+. It contains a C extension which handles 64-bit indexing. One of the test files, "test_large.py", exercises the API with multi-gigabyte strings. It typically takes about 10 minutes to run so it isn't part of the normal test suite. Instead, it's decorated with a @unittest.skipUnless(), and only enabled if the file is executed directly or if an environment variable is set.

    The file creates about 25 multi-GB string constants, like 's = b"\xfe" * (2**32+1)'. Those alone require a minute to create, but that's acceptable overhead because these tests are usually skipped, and when not skipped are only 10% of the total run-time. Here is an example extracted from my code; this tests the population count on a byte string:

    RUN_ALL = "LARGE_TESTS" in os.environ
    if __name__ ==  "__main__":
        RUN_ALL = True

    @unittest.skipUnless(RUN_ALL, "large tests not enabled; set LARGE_TESTS")
    class LargeTests(unittest.TestSuite):
    def test_popcount(self):
    s = b"\xfe\xff" * (2**31 + 1)
    self.assertEqual(bitops.byte_popcount(s), 15*(2**31 + 1))

    if __name__ ==  "__main__":
        unittest.main()

    As part of the update I did a 'move function' refactoring across the code base and re-ran the tests. Unit test discovery seemed to hang and ^C didn't work. Investigation showed it was in __import__("test_large"), which was needed because I had changed code in test_large.py. I finally figured out it was due to constant folding, which created the string, found it was too large, and discarded it. Test discovery took a minute, even though all of the tests were marked as 'skip' and would not be called.

    Once done, the compiler generated a .pyc file. I hadn't noticed the problem earlier because the .py file rarely changed, so rarely needed to be recompiled. It would have a bigger issue if I ran test_large.py directly, as that will always trigger the one minute compilation, even if I specified a test which didn't use them. (There were no bugs in 64-bit handling during the update so I didn't need to do that.)

    I was going to report the issue, then found that INADA Naoki had reported almost exactly the same issue here, back in 2014.

    I was bewildered by some of the comments here, because they seemed to suggest I was at fault for writing such poor quality code in the first place. Others may be in the same boat as me, so I'll make a few points to add some counter-balance.

    "Are we really supposed to protect programmers from their own folly by second-guessing when constant folding might be required and when it might not?"

    If there is a 'folly', it is shared with the developers of Python's constant-folding implementation who thought there wouldn't be a problem, and provide no mechanisms (like bpo-2506 proposed in 2008 to disable optimization; also available in bpo-26145) which might help people like me diagnose a problem. But I don't think there was any folly. There was an engineering decision that the benefits of constant folding outweighed the negatives. Just like in my case there was an engineering decision that constant expressions which worked in Python 2.5-2.7 didn't need to be made future-proof against improved constant-folding.

    "How is the interpreter supposed to know the function isn't called?"

    Perhaps a standard-library decorator which says that a function will be skipped when run as a unit test? But really, the question should be "how is the *byte-code compiler* supposed to know". This highlights a shift between the Python I started with, which left everything up to the run-time virtual machine interpreter, and the smarter compile-time optimizer we have now. As it gets smarter, we developers seem to need to know more about how the optimizer works in order to avoid unwanted side-effects. Currently this knowledge is 'arcane'.

    "simply declare a manifest constant and use that instead"

    The fundamental problem is there's no way for a programmer to create large constant value which is safe from a sufficiently smart compiler, and nothing which outlines how smart the compiler will get. Instead, people figure out what works operationally, but that's specific to a given CPython version.

    My code ran into problems because Python's constant folding improved from under me. Even if I follow that advice, how do I avoid having a future version of Python restore the problem?

    For example, the following function is currently not subject to constant folding.

    def big_x():
        N = 4294967296
        return b'x' * N

    A future version of Python (perhaps 3.9 with an AST optimizer) might implement constant propagation, making this identical to writing "b'x' * 4294967296" now, including the long compile time needed to create and discard that 4GB string.

    Of course it's harder to do constant folding on a value in the module namespace, so the following is less likely to run into future problems. (A hypothetical Python could do optimistic constant propagation during module compilation, then at run-time check if N has changed, and if not, use the pre-computed value.)

    N = 4294967296
    def big_x():
        return b'x' * N

    This distinction is subtle and, I argue, demonstrates that it isn't all that simple.

    "It would be ashamed to hack-up constant-folding to handle an arcane case of an uncalled function that does space intensive constant operation."

    I don't think my example test case was arcane. To the contrary, I think many people would write the test the same way. I agree that few people construct test cases that large. An alternate series of questions might start with the Bayesian: of those who need such constants, how many are negatively affected (eg, bpo-27695)? How long does it take them to figure out a workaround? And how many of them think to file a bug, read comments here which accuse them of folly and arcane practices, and stop? As I almost did.

    Yes, it has several 'trivially easy work-around's - once the developer figures out the fundamental problem. Which workarounds are guaranteed to continue to work, and how do non-core developers learn about the possible problem and workarounds?

    People do get confused about this behavior. One comment from http://stackoverflow.com/questions/34113609/why-does-python-preemptively-hang-when-trying-to-calculate-a-very-large-number/ , which tests a RLIMIT_CPU by computing 10**(10**10), is "This is spooky". Nor is it easy to figure out. Others from the same SO page write: "I'm guessing that Python is trying to optimize by calculating constants beforehand...which backfires. But I have no proof of this.", "Do you know of a place in the documentation or something where this precalculation of constants is discussed? I tried Googling, but I'm not turning up anything useful", and "It took quite a bit of searching, but I have discovered evidence that Python 3 does precompute constant literals".

    To be clear, I know nothing will change in the current peephole implementation. I have read most of the issues on bugs.python.org and stackverflow.com related to it. The most frequent assessment is that the current peephole optimizer is good enough, and too fragile, and the hope is that an AST optimizer -- something first proposed at least 9 years ago (bpo-2506) -- is the right solution. This progresses slowly at bpo-11549 (started in 2011), with the same INADA Naoki recently picking it up again because "it was suspended (because of lack of reviewers, maybe)".

    What I don't want is to see the same useless behavior of creating and discarding large constant strings replicated in the AST branch and justified by an argument that current experience with the peephole optimizer shows that it's not a problem in real-world code.

    At the very least, I think there should be some documentation about this arcane corner of the CPython implementation.

    "Special cases aren't special enough to break the rules."

    The peephole optimizer is nothing but support for the special case of evaluating dynamic code at compile-time. In an aphorism challenge, the peephole optimizer is almost exactly the '*right* now' in "Although never is often better than *right* now." I am not saying the peephole optimizer should not have been added, rather, the Zen of Python gives different answers depending on where one draws the line.

    @vstinner
    Copy link
    Member

    (I reopen the issue since it got a long new comment.)

    @vstinner vstinner reopened this May 22, 2017
    @rhettinger
    Copy link
    Contributor

    (I reopen the issue since it got a long new comment.)

    Andrew specifically said, "I don't propose re-opening it".

    FWIW, the zen of python also says, "practicality beats purity". For most users, most of the time, the peephole optimizer is a net win and it encourages us to write clearer code like "x & (2 ** 32 - 1)" rather than "x & 4294967295".

    @dalke
    Copy link
    Mannequin

    dalke mannequin commented May 22, 2017

    I do not think quoting the Zen of Python helps anything. As I wrote, "it gives different answers depending on where one draws the line." This includes "practicality beats purity".

    From my viewpoint, the peephole optimizer isn't going to change because the core developers prioritize the purity of not adding special cases over the practicality of supporting reasonable real-world code. Or the purity of the long-awaited AST optimizer over the practicality of changing the existing, fragile peephole optimizer.

    I also appreciate the viewpoint that the practicality of a maintainable peephole optimizer beats the impossible purity of trying to support all use cases gracefully. My line, of course, only wants it to handle my use case, which is the issue reported here.

    My goal from this is not to re-open the topic. It is to provide a counter-balance to opinions expressed here that place all blame onto the programmer whose 'folly' lead to 'arcane' and 'insane' code. (The 'insane' is used at http://bugs.python.org/issue30293#msg293172 as "Burdening the optimizer with insanity checks just slows down the compilation of normal, sane code.")

    The use case pulled from my project, which is very near to the original report by INADA Naoki, seems entirely sane and not at all arcane. How else might one test 64-bit addressing than by constructing values which are over 4GB in length? Indeed, Python itself has similar test code. Quoting Lib/test/test_zlib.py:

    # Issue python/cpython#54485 - check that inputs >=4GB are handled correctly.
    class ChecksumBigBufferTestCase(unittest.TestCase):
    @bigmemtest(size=_4G + 4, memuse=1, dry_run=False)
    def test_big_buffer(self, size):
        data = b"nyan" * (_1G + 1)
        self.assertEqual(zlib.crc32(data), 1044521549)
        self.assertEqual(zlib.adler32(data), 2256789997)
    

    Is the difference between happiness and "folly" really the difference between writing "_1G" and "2**30"? If so, how are people supposed to learn the true path? Is that not exactly the definition of 'arcane'?

    The Code of Conduct which governs comments here requests that we be considerate and respective. Terms like 'folly' and 'arcane', at least for what I think is an entirely reasonable use case, seems to run counter to that spirit.

    @rhettinger
    Copy link
    Contributor

    I do not think quoting the Zen of Python helps anything.

    Instead of practicality-beats-purity, the better aphorism is perfect-is-the-enemy-of-good.

    The Code of Conduct which governs comments here requests
    that we be considerate and respective

    The term sanity check was used in normal technical sense. It is a technical description of a kind of guard that is sometimes placed in code. It isn't a judgment about a user or their code.

    In bpo-30293, the StackOverflow OP constructed a hypothetical example that explicitly asked the machine to compute a large object using 1 << 500000000. The OP was merely curious about why the optimized code was so much faster than the unoptimized code.

    On the tracker, this gave rise to question about whether sanity checks should be added to the peephole optimizer. Apparently, the use of the word sanity is offensive to you and hence the reference to the code-of-conduct. Though I believe this is a profound misreading of my words and my intent, I apologize if you were offended.

    Even though I'm a subject-matter-expert in the area, after the code-of-conduct reference, I no longer feel comfortable or safe in participating further this discussion, so I am bowing out.

    To move these issues to resolution, please be clear about what is being asked for. If you don't want the issue re-opened, please close it. If you want the peephole optimizer to be disabled because it is interfering with your work, please say so -- we can take it out and give up whatever benefits it offering to other users. If you want some specific guards to be added, please say so.

    My opinion is that this difficult to do in the general case and that it is difficult to decide what the desirable behavior should be for the uncommon cases. ISTM that constant folding is already too complex and would benefit from benefit from simplification rather than growing additional special cases. That said, your opinion may differ from mine, and it is just as valid.

    @serhiy-storchaka
    Copy link
    Member

    See the patch in bpo-30416. I'm not sure these checks should be added to the peephole optimizer, but this isn't very difficult.

    @dalke
    Copy link
    Mannequin

    dalke mannequin commented May 23, 2017

    Again, I do not propose any changes to the existing optimizer. I do not need anything changed for my code to work.

    My goal is to counter-balance comments which suggest that perfectly normal code is somehow folly and arcane. These caused me some bewilderment and self-doubt as I tried to establish that my test suite was not, in fact, poorly written. Others with the same issue should not face the same confusion.

    I especially do not want to see the years of experience with the current optimizer used to justify repeating the same decisions in some future AST-based optimizer. http://bugs.python.org/issue2506#msg64764 gives an example of how the lack of complaints over several years is used to argue against changing compiler behavior.

    Terms like "folly" and "arcane" also suggest an outright rejection of considering to support in the future what seems like totally reasonable code.

    I realize now that there is a more immediately actionable item. I have just added bpo-30440 as a request to document these effects. I have removed my name from its nosy list in hopes of reducing Raymond Hettinger's concerns about comfort and safety, and thus perhaps increase the likelihood that this will be documented.

    "I apologize if you were offended", which I will take as being sincere, happens to also be one of the most common examples of an insincere apology. Bowing out when there is a reference to the CoC gives undue power to others, and hinders the ability to apply its spirit to all but the most egregious situations.

    Even if I accept the idea that "sane" and "insane" have technical meanings, that does not exempt their use from questions about being considerate and respective. Django and others replaced their use of the technical terms "master" and "slave", following a trend which is at least 13 years old; see http://edition.cnn.com/2003/TECH/ptech/11/26/master.term.reut/ . Note that I am not proposing to avoid using the terms "sane" and "insane", only asserting that there is no clean exception for words which also have a technical sense or meaning, even when used for that technical sense.

    @pitrou
    Copy link
    Member

    pitrou commented May 23, 2017

    "I apologize if you were offended", which I will take as being sincere, happens to also be one of the most common examples of an insincere apology. Bowing out when there is a reference to the CoC gives undue power to others, and hinders the ability to apply its spirit to all but the most egregious situations.

    I'm not interested in reading this kind of rhetorics, sorry.

    @serhiy-storchaka
    Copy link
    Member

    New changeset 2e3f570 by Serhiy Storchaka in branch 'master':
    bpo-30416: Protect the optimizer during constant folding. (bpo-4860)
    2e3f570

    @serhiy-storchaka
    Copy link
    Member

    New changeset b580f4f by Serhiy Storchaka in branch '3.6':
    [3.6] bpo-30416: Protect the optimizer during constant folding. (bpo-4865)
    b580f4f

    @serhiy-storchaka
    Copy link
    Member

    Fixed in bpo-30416.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage
    Projects
    None yet
    Development

    No branches or pull requests

    9 participants