classification
Title: Remove unneeded folded consts after peephole
Type: resource usage Stage: patch review
Components: Interpreter Core Versions: Python 3.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: Adrian Wielgosik, rhettinger, serhiy.storchaka
Priority: low Keywords: patch

Created on 2016-11-27 11:05 by Adrian Wielgosik, last changed 2016-11-27 18:09 by rhettinger.

Files
File name Uploaded Description Edit
indices_tweak.patch Adrian Wielgosik, 2016-11-27 11:06 review
clean_co_consts.patch Adrian Wielgosik, 2016-11-27 11:07
clean_co_consts_squashed.patch Adrian Wielgosik, 2016-11-27 13:13 review
Messages (4)
msg281820 - (view) Author: Adrian Wielgosik (Adrian Wielgosik) * Date: 2016-11-27 11:05
The attached patch adds new logic to peephole compiler to remove constants that are no longer needed after the main peephole pass.

For example:

    def f():
        var = 'te' + 'xt'
        num = -12
        num = -6 * 2
        return (1, (3, 4), 6)
    print(f.__code__.co_consts)

Without the patch:

    (None, 'te', 'xt', 12, 6, 2, 1, 3, 4, 'text', -12, -6, -12, (3, 4), (1, (3, 4), 6))

With patch:

    (None, 'text', -12, -12, (1, (3, 4), 6))

(unfortunately, I couldn't get rid of None because that would make 'text' a docstring)

For convenience, I've written the patch in two parts.
The first one just changes the CONST_STACK_* macros to store the co_const indices instead of the constants themselves, the second one is the actual implementation of the new logic.

Aside from simply having to store less objects around, this also makes co_consts contents closer together. This may help the cache a little bit.

---------

I did run benchmarks multiple times, but it looked like all the results were random noise. That makes sense, since I didn't directly affect the runtime.
The only consistently faster benchmark is:

- regex_dna: 288 ms +- 7 ms -> 275 ms +- 5 ms: 1.05x faster

I tried to measure the difference in compile time, but it too was lost in the noise.

---------

I also compared size of compiled .pyc files in the Lib/ directory.
The gains are mostly very small.

_compat_pickle.cpython-37.pyc     | 6554   -> 5851   | -10.7%
sre_compile.cpython-37.pyc        | 10275  -> 10025  | -2.43%
hashlib.cpython-37.pyc            | 6624   -> 6514   | -1.66%
pstats.cpython-37.pyc             | 21755  -> 21435  | -1.47%
_markupbase.cpython-37.pyc        | 7979   -> 7864   | -1.44%
pydoc.cpython-37.pyc              | 83899  -> 82712  | -1.41%
_strptime.cpython-37.pyc          | 15951  -> 15751  | -1.25%
__future__.cpython-37.pyc         | 4155   -> 4105   | -1.2%
opcode.cpython-37.pyc             | 5401   -> 5341   | -1.11%
colorsys.cpython-37.pyc           | 3299   -> 3263   | -1.09%
signal.cpython-37.pyc             | 2503   -> 2478   | -0.999%
_osx_support.cpython-37.pyc       | 9663   -> 9568   | -0.983%
gettext.cpython-37.pyc            | 13990  -> 13854  | -0.972%
getpass.cpython-37.pyc            | 4223   -> 4183   | -0.947%
compare.cpython-37.pyc            | 541    -> 536    | -0.924%
warnings.cpython-37.pyc           | 13328  -> 13208  | -0.9%
platform.cpython-37.pyc           | 27931  -> 27681  | -0.895%
imaplib.cpython-37.pyc            | 42019  -> 41653  | -0.871%
webbrowser.cpython-37.pyc         | 15836  -> 15702  | -0.846%
this.cpython-37.pyc               | 1253   -> 1243   | -0.798%
rlcompleter.cpython-37.pyc        | 5768   -> 5723   | -0.78%
zipfile.cpython-37.pyc            | 48024  -> 47672  | -0.733%
imghdr.cpython-37.pyc             | 4138   -> 4108   | -0.725%
turtle.cpython-37.pyc             | 131600 -> 130653 | -0.72%
timeit.cpython-37.pyc             | 11676  -> 11596  | -0.685%
lzma.cpython-37.pyc               | 11980  -> 11900  | -0.668%
bz2.cpython-37.pyc                | 11270  -> 11195  | -0.665%
aifc.cpython-37.pyc               | 25821  -> 25651  | -0.658%
gzip.cpython-37.pyc               | 16215  -> 16110  | -0.648%
uuid.cpython-37.pyc               | 20382  -> 20260  | -0.599%
plistlib.cpython-37.pyc           | 27354  -> 27191  | -0.596%
cProfile.cpython-37.pyc           | 4199   -> 4174   | -0.595%
tarfile.cpython-37.pyc            | 62437  -> 62076  | -0.578%
sysconfig.cpython-37.pyc          | 15819  -> 15728  | -0.575%
profile.cpython-37.pyc            | 13889  -> 13814  | -0.54%
random.cpython-37.pyc             | 19177  -> 19074  | -0.537%
_threading_local.cpython-37.pyc   | 6609   -> 6574   | -0.53%
_dummy_thread.cpython-37.pyc      | 4839   -> 4814   | -0.517%
datetime.cpython-37.pyc           | 53722  -> 53445  | -0.516%
tracemalloc.cpython-37.pyc        | 17218  -> 17131  | -0.505%
// remaining 129 files are < 0.5% smaller, 33 of them didn't change their size
msg281822 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-11-27 11:59
Thank you for your patch Adrian. I haven't close look, but at first glance your patch looks correct, and the idea looks great.

But moving constant folding from the peephole optimizer to the AST level (issue1346238, issue11549) would totally eliminate the need in your patch. I'll push your patch if AST optimizer will be not implemented in 3.7.

On other hand, your patch looks simple enough, and my be pushed first.

It would be easy to review if provide your changes as one patch.
msg281826 - (view) Author: Adrian Wielgosik (Adrian Wielgosik) * Date: 2016-11-27 13:13
Attached squashed patch.

> But moving constant folding from the peephole optimizer to the AST level (...) would totally eliminate the need in your patch.

I'm aware of that and I'm okay with it. I chose an unfortunate moment for implementing this :)
msg281830 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2016-11-27 18:09
FWIW, we intentionally decided not to do this when constant folding was added.  The idea was to keep the peephole optimizer simple and to have it do the minimum work necessary to get its job done (optimizing the constants table takes extra time to do but doesn't result in faster code).  

Another reason was that aside from contrived examples (such as the OP's example), very little real-world code gets any benefit and the benefit tends to be very small.  (In other words, no one will actually notice or benefit from this patch, but their compilation times will all slow down slightly).

Lastly, the intention is to stop building out constant folding.  The correct place for constant folding is upstream, using AST prior to code generation.
History
Date User Action Args
2016-11-27 18:09:45rhettingersetnosy: + rhettinger
messages: + msg281830
2016-11-27 13:13:30Adrian Wielgosiksetfiles: + clean_co_consts_squashed.patch

messages: + msg281826
2016-11-27 11:59:24serhiy.storchakasetpriority: normal -> low

nosy: + serhiy.storchaka
messages: + msg281822

assignee: serhiy.storchaka
stage: patch review
2016-11-27 11:07:26Adrian Wielgosiksetfiles: + clean_co_consts.patch
2016-11-27 11:06:08Adrian Wielgosiksetfiles: + indices_tweak.patch
keywords: + patch
2016-11-27 11:05:46Adrian Wielgosikcreate