This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: document peephole optimizer effects
Type: Stage:
Components: Documentation Versions: Python 3.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: emilyemorehouse Nosy List: docs@python, emilyemorehouse, rhettinger
Priority: normal Keywords:

Created on 2017-05-23 12:52 by dalke, last changed 2022-04-11 14:58 by admin.

Messages (3)
msg294248 - (view) Author: Andrew Dalke (dalke) * (Python committer) Date: 2017-05-23 12:52
The peephole optimizer is an overall benefit to Python but it has some side-effects that occasionally cause problems. These are well-known in the issue tracker, but there is no other documentation which will help a Python programmer figure out which constructs may be a problem.

1) it will compute large integer constants and save them in the .pyc file. The following results in a 19M .pyc file. 

def compute_largest_known_prime():
  return 2**74207281 - 1

As an example of someone affected by the compile-time evaluation, see https://stackoverflow.com/questions/34113609/why-does-python-preemptively-hang-when-trying-to-calculate-a-very-large-number/ . Note the many people who struggled to find definitive documentation.

2) it will create and discard large strings. Consider this variant of the code in test_zlib.py, where I have replaced the imported module variable "_1G" with its value:

    @bigmemtest(size=_4G + 4, memuse=1, dry_run=False)
    def test_big_buffer(self, size):
        data = b"nyan" * (2**30 + 1)  # the original uses "_1G"
        self.assertEqual(zlib.crc32(data), 1044521549)
        self.assertEqual(zlib.adler32(data), 2256789997)

The byte compiler will create the ~4GB string then discard it, even though the function will not be called on machines with insufficient RAM.

As an example of how I was affected by this, see #21074 .

3) The optimizer affects control flow such that the coverage.py gives false positives about unreached code.

As examples of how people are affected, see #2506 and https://bitbucket.org/ned/coveragepy/issues/198/continue-marked-as-not-covered . The last item on the coverage.py tracker asks "Is this limitation documented anywhere?"

I do not believe that the current peephole optimizer should be changed to support these use cases, only that there should be documentation about how the optimizer may negatively affect real-world code.

The domain expert on this topic is Raymond Hettinger. He does not feel safe in issues where I am involved. As I believe my continued presence on this issue will inhibit the documentation changes which I think are needed, I will remove my name from this issue and not be further involved.
msg294266 - (view) Author: Emily Morehouse (emilyemorehouse) * (Python committer) Date: 2017-05-23 17:37
Raymond, I'm willing to take over this issue with your guidance if you're open to it.
msg294306 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2017-05-24 01:11
Thanks Emily.  

This should be probably be a FAQ entry (cpython/Doc/faq) because the peephole optimizer is a CPython implementation detail, because we don't normally document internal details such as which opcodes are generated or big-oh runtimes etc, because isn't really a good place for this elsewhere, and because the low level details are frequently changing.

Ideally, the FAQ entry should be brief and focus on the two user visible effects. 1) The jump-to-jump optimization can cause a coverage analysis tool to mark as uncovered a line that is logically executed but is bypassed because it has no effect.  2) Constant folding transfers run-time costs to compile-time costs -- this may matter when the expression is expensive to compute or if it results in the creation of a large object.

Follow the documentation advice given in the dev guide:
* https://docs.python.org/devguide/documenting.html#affirmative-tone
* https://docs.python.org/devguide/documenting.html#economy-of-expression
* https://docs.python.org/devguide/documenting.html#audience

The key part of the first link is to keep it upbeat, focusing on what the tool does, the related costs and benefits, and giving recommendations on how to control it.  We don't want the docs to read as a litany of warnings and suggestions of danger.

In particular, I recommend discussing the OP's compute_largest_known_prime() example and explaining:
1) the intended calculation is expensive in terms of both space and time 
2) the constant folding step of code generation does this computation at compile-time
3) since this computation is slow, it noticeably slows down the compilation (appears to hang at compile-time rather than at run-time as it normally would)
4) since this computation creates a large object, the pyc file will be large
5) due to pre-computation, the run-time call will be fast
6) if that effect is not desired, it is easily turned-off by storing part of the expression in a variable:
    >>> def compute_largest_known_prime():
    ...     exp = 74207281
    ...     return 2 ** exp - 1

The key part of the second link is to be brief and terse.  It would be easy to write an entire blog post or chapter in a book about constant folding, but that wouldn't be efficient with the reader's time or really help them out in a meaningful way.

The key part of the third link is address what you think is helpful to new readers of the FAQ who don't already understand the issue.  We don't make doc entries to appease or vindicate someone.  The docs aren't a vehicle for expressing opposition to design decision; instead, it is a place to explain what the tools do and how to use them.

Try not to over-specify the implementation.  Note that constant folding takes place but don't guarantee specific transformations.  These are all subject to change, have changed over time (including backports), and changing even now.
History
Date User Action Args
2022-04-11 14:58:46adminsetgithub: 74625
2017-05-24 01:11:54rhettingersetassignee: docs@python -> emilyemorehouse
messages: + msg294306
2017-05-23 17:37:48emilyemorehousesetnosy: + rhettinger, emilyemorehouse
messages: + msg294266
2017-05-23 12:52:37dalkesetnosy: - dalke
2017-05-23 12:52:21dalkecreate