classification
Title: Peephole creates duplicate and unused constants
Type: performance Stage:
Components: Interpreter Core Versions: Python 3.3
process
Status: closed Resolution: rejected
Dependencies: Superseder: Build-out an AST optimizer, moving some functionality out of the peephole optimizer
View: 11549
Assigned To: rhettinger Nosy List: belopolsky, eltoder, mark.dickinson, pitrou, rhettinger, terry.reedy, twouters
Priority: low Keywords: patch

Created on 2011-03-11 00:55 by eltoder, last changed 2011-03-19 05:23 by terry.reedy. This issue is now closed.

Files
File name Uploaded Description Edit
dedup_const_plana.patch eltoder, 2011-03-11 01:01
dedup_const_planb.patch eltoder, 2011-03-11 01:04
unused_consts.patch eltoder, 2011-03-11 01:07
consts_test.patch eltoder, 2011-03-11 01:10
Messages (24)
msg130537 - (view) Author: Eugene Toder (eltoder) Date: 2011-03-11 00:55
Peephole optimizer performs constant folding, however
1) When it replaces operation with LOAD_CONST it always adds a new constant to co_consts, even if constant with the same value is already there. It also can add the same constant multiple times.
2) It does not remove constants that are no longer used after the operation was folded.
The result is that code object after folding has more constants that it needs and so uses more memory.

Attached are patches to address this. Patch for 1) comes in 2 versions. PlanA is simple (it only needs changes in peephole.c), but does linear searches through co_consts and duplicates some logic from compiler.c. PlanB needs changes in both peephole.c and compiler.c, but is free from such problems. I favour PlanB.

Patch for 2) can be applied on top of either A or B.
msg130538 - (view) Author: Eugene Toder (eltoder) Date: 2011-03-11 01:04
(either plana or planb should be applied)
msg130539 - (view) Author: Eugene Toder (eltoder) Date: 2011-03-11 01:10
(test case)
msg130543 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2011-03-11 04:37
Sorry Eugene, but I'm not willing to add that much complexity to save a few bytes.  This was a known issue and long ago we choose to leave it as-is.  In the future when constant folding is done at the AST level, we'll get more extensive optimizations and they won't produce duplicate or unused constants as a by-product.
msg130544 - (view) Author: Eugene Toder (eltoder) Date: 2011-03-11 04:53
I think the changes are fairly trivial. dedup_const_planb.patch is about 10 lines of new code with all the rest being trivial plubming. unused_consts.patch may look big, but only because I factored out fix ups into a separate function; there are only about 25 lines of new code.
msg130562 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-03-11 12:30
Style nit: you have tabs in your code. Please only use spaces for indentation.
On the principle, looks worthwhile, and the changes don't look really complicated.
msg130574 - (view) Author: Eugene Toder (eltoder) Date: 2011-03-11 14:04
Antoine, sure, I'll fix it with any other suggested changes if patches will be accepted.
msg130584 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2011-03-11 16:30
I'm disinclined to accept this patch.  There is only minor space savings and no actual speed-up.

When constant folding was introduced, a design decision was made to ignore the issue of orphan constants.  The rationale behind that decision still stands.

The peepholer has aspired to be very conservative and is considered a high-risk area for changes.  Since its introduction, Guido has constantly teetered on the edge of entirely ripping out the peepholer.  Right now, the stability of that code is its greatest asset and we should be reluctant to modify it.

Also, the peephole optimizer was developed before the AST branch was added to Python.  Constant folding more properly belongs at the AST level, not in the peepholer itself.  Our strategy is to move as much as possible into an AST optimizer and to stop building out the peephole optimizer.

Right now, the pattern is tokenize -> parse -> AST -> genbytecode -> peephole optimization (which disassembles the bytecode, analyzed it and rewrites it) -> final bytecode.   The correct pattern is tokenize -> parse -> AST -> optimize -> genbytecode -> peephole optimization with minimal disassembly, analysis, and opcode rewrites -> final bytecode.
msg130585 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-03-11 16:35
About dedup_const_planb.patch: _PyCode_AddObj() should be added to Include/code.h, even if it's private (rather than declared extern in peephole.c).

About consts_test.patch: "def mapping" deserves a better name, and perhaps a comment or explanation. Same for "def fix".

About unused_consts.patch:

- code like
    constuse = (int *)PyMem_Malloc(numconst * sizeof(int));
is better spelt
    constuse = PyMem_NEW(int, numconst);

- why the "#ifndef NDEBUG" path?

You probably need to regenerate this patch against latest hg, by the way...
Thanks for posting this.
msg130586 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-03-11 16:38
> Also, the peephole optimizer was developed before the AST branch was
> added to Python.  Constant folding more properly belongs at the AST
> level, not in the peepholer itself.  Our strategy is to move as much
> as possible into an AST optimizer and to stop building out the
> peephole optimizer.

Who is working on this "strategy"?
Until someone takes it to task to implement the "correct pattern", I
think iterative improvement of the peepholer is a reasonable thing.
msg130587 - (view) Author: Eugene Toder (eltoder) Date: 2011-03-11 16:51
> - why the "#ifndef NDEBUG" path?
These entries in the table should not be used, but if something slips through and uses one of them, it's much easier to tell if we remap to invalid value. As this is an internal check, I didn't want it in release mode. If this is deemed unnecessary or confusing I can remove it.
msg130588 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-03-11 16:52
> > - why the "#ifndef NDEBUG" path?
> These entries in the table should not be used, but if something slips
> through and uses one of them, it's much easier to tell if we remap to
> invalid value. As this is an internal check, I didn't want it in
> release mode. If this is deemed unnecessary or confusing I can remove
> it.

I think it is confusing indeed. If you want to do a check in debug mode
only, just use assert() (and perhaps add a comment if the check doesn't
look obvious).
msg130590 - (view) Author: Eugene Toder (eltoder) Date: 2011-03-11 17:03
assert() doesn't quite work here. I need to check that this entry in the table is not used in the next loop. I'd need to put assert in that loop, but by that time I have no easy way to tell which entries are bad, unless I mark them in the first loop. so I mark them under !NDEBUG.
msg130591 - (view) Author: Eugene Toder (eltoder) Date: 2011-03-11 17:19
> _PyCode_AddObj() should be added to Include/code.h
Should it be declared as PyAPI_FUNC too? This will make it unnecessarily exported (when patch in Issue11410 is merged), so I wanted to avoid that.

btw, that you for reviewing the patch.
msg130592 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-03-11 17:23
> > _PyCode_AddObj() should be added to Include/code.h
> Should it be declared as PyAPI_FUNC too? This will make it
> unnecessarily exported (when patch in Issue11410 is merged), so I
> wanted to avoid that.

I guess we don't usually try to optimize that, but you can omit
PyAPI_FUNC indeed.
msg130595 - (view) Author: Eugene Toder (eltoder) Date: 2011-03-11 17:32
> Right now, the pattern is tokenize -> parse -> AST -> genbytecode ->
> peephole optimization (which disassembles the bytecode, analyzed it
> and rewrites it) -> final bytecode.   The correct pattern is tokenize
> -> parse -> AST -> optimize -> genbytecode -> peephole optimization
> with minimal disassembly, analysis, and opcode rewrites -> final bytecode.

Actually, optimizing on AST is not ideal too. Ideally you should convert it into a specialized IR, preferably in SSA form and with explicit control flow.

Re size saving: I've ran make test with and without my patch and measured total size of all generated pyc files:
without patch: 16_619_340
with patch: 16_467_867
So it's about 150KB or 1% of the size, not just a few bytes.
msg130621 - (view) Author: Thomas Wouters (twouters) * (Python committer) Date: 2011-03-11 21:49
What is the actual performance impact of this change? As far as I can see the difference is almost entirely in .pyc (on-disk) size, as when loading most of these constants are interned anyway. Even the on-disk change is minimal - and I say that as a person who actually cares a lot about the size of .pyc files. Considering the extra work that needs to be done, I expect planA to be a measurable performance drop, and planB to be indistinguishable. Perhaps I'm wrong, so feel free to prove it so by providing some (unladen-swallow) benchmarks :)

I'm not sure this change is worth it unless there's a clear performance benefit. The peephole optimizer is very limited, fragile, and an uncomfortable place to edit (and I say *that* as a person who has introduced way too many subtle, stupid bugs in the past.) As Raymond says, the right thing to do with the peepholer is to rip it out and replace it with an AST-based optimizer, which wouldn't bypass or duplicate all manner of sanity- and safety-checks. Perhaps this should be a GSoC project.

About the patches themselves: the plan-A patch is a bad idea, I wouldn't consider it; duplicating the compiler checks seems pointless. In the plan-B patch, PyCode_Optimize is an exported symbol so it should not change; you need to change the name and provide the old interface in a macro. The extern declaration of _PyCode_AddObj in peephole.c is now unnecessary. The name of that function is rather non-obvious, since it doesn't do anything with a code object at all -- but I don't have a better suggestion.
msg130631 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2011-03-11 22:26
Sorry Eugene.  As the person who designed, implemented, and maintains the peephole optimizer, I need to reject the patch.

If better constant folding is wanted, the work needs to be done with AST, not with bytecode.  I would welcome a patch.

Thomas, thanks for your input.
msg130634 - (view) Author: Eugene Toder (eltoder) Date: 2011-03-11 22:31
Thomas, can you clarify, does loading interns all constants in co_consts, or do you mean that they are mostly small numbers and the like?

Without interning I think that in-memory size difference is even bigger than on-disk, as you get one entry in tuple and the object itself. I'm sure I can cook up a test that will show some perf difference, because of cache misses or paging. You can say that this is not real world code, and you will likely be right. But in real world (before you add inlining and constant propagation) constant folding doesn't make a lot of difference too, yet people asked for it and peepholer does it. Btw, Antoine just improved it quite a bit (Issue11244), so size difference with my patch should increase.

My rationale for the patch is that 1) it's real very simple 2) it removes feeling of half-done job when you look at the bytecode.
msg130656 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2011-03-12 03:36
Eugene, how does your optimization differ from the one proposed in issue2493?
msg130659 - (view) Author: Eugene Toder (eltoder) Date: 2011-03-12 03:49
Alexander, my patch does 2 optimizations: doesn't insert a new constant if one already exist and removes unused constants after peephole is done. You patch seems to do only the latter. It's very similar, from a quick look at your patch:
- My patch doesn't introduce any additional passes over the code (you added 2 passes).
- It preserves doc string.
- It's less code, because I reuse more of existing code.

Feel free to look at the patch and tell me if you don't agree.
msg130661 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2011-03-12 04:20
Please cease development of this patch.  It has been rejected.

The peephole optimizer should be considered somewhat off-limits at this point and any further development efforts directed at an AST optimizer.
msg130662 - (view) Author: Eugene Toder (eltoder) Date: 2011-03-12 04:24
Raymond, you can be assured that I'm not developing this patch, unless I'm told it has chances to be accepted. I don't like to waste my time. On a related note, your review of my other patches is appreciated.
msg131391 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2011-03-19 05:23
Eugene has started work on AST optimizer in #11549
History
Date User Action Args
2011-03-19 05:23:26terry.reedysetsuperseder: Build-out an AST optimizer, moving some functionality out of the peephole optimizer

messages: + msg131391
nosy: + terry.reedy
2011-03-12 04:24:48eltodersetnosy: twouters, rhettinger, mark.dickinson, belopolsky, pitrou, eltoder
messages: + msg130662
2011-03-12 04:20:51rhettingersetresolution: rejected
messages: + msg130661
nosy: twouters, rhettinger, mark.dickinson, belopolsky, pitrou, eltoder
2011-03-12 03:49:12eltodersetnosy: twouters, rhettinger, mark.dickinson, belopolsky, pitrou, eltoder
messages: + msg130659
2011-03-12 03:36:46belopolskysetnosy: + belopolsky
messages: + msg130656
2011-03-11 22:31:07eltodersetnosy: twouters, rhettinger, mark.dickinson, pitrou, eltoder
messages: + msg130634
2011-03-11 22:26:43rhettingersetstatus: open -> closed
nosy: + rhettinger
messages: + msg130631

2011-03-11 21:49:04twouterssetnosy: twouters, mark.dickinson, pitrou, eltoder
messages: + msg130621
2011-03-11 20:21:11twouterssetfiles: - nonmethod-warn-nongit.diff
nosy: twouters, mark.dickinson, pitrou, eltoder
2011-03-11 20:20:57twouterssetfiles: + nonmethod-warn-nongit.diff
nosy: twouters, mark.dickinson, pitrou, eltoder
2011-03-11 19:05:45rhettingersetnosy: + twouters, - rhettinger
2011-03-11 17:32:15eltodersetnosy: rhettinger, mark.dickinson, pitrou, eltoder
messages: + msg130595
2011-03-11 17:23:51pitrousetnosy: rhettinger, mark.dickinson, pitrou, eltoder
messages: + msg130592
2011-03-11 17:19:59eltodersetnosy: rhettinger, mark.dickinson, pitrou, eltoder
messages: + msg130591
2011-03-11 17:03:11eltodersetnosy: rhettinger, mark.dickinson, pitrou, eltoder
messages: + msg130590
2011-03-11 16:52:58pitrousetnosy: rhettinger, mark.dickinson, pitrou, eltoder
messages: + msg130588
2011-03-11 16:51:29eltodersetnosy: rhettinger, mark.dickinson, pitrou, eltoder
messages: + msg130587
2011-03-11 16:38:12pitrousetnosy: rhettinger, mark.dickinson, pitrou, eltoder
messages: + msg130586
2011-03-11 16:35:12pitrousetnosy: rhettinger, mark.dickinson, pitrou, eltoder
messages: + msg130585
2011-03-11 16:30:53rhettingersetnosy: rhettinger, mark.dickinson, pitrou, eltoder
messages: + msg130584
2011-03-11 14:04:41eltodersetnosy: rhettinger, mark.dickinson, pitrou, eltoder
messages: + msg130574
2011-03-11 12:30:04pitrousetnosy: + mark.dickinson
messages: + msg130562
2011-03-11 04:53:40eltodersetnosy: rhettinger, pitrou, eltoder
messages: + msg130544
2011-03-11 04:37:58rhettingersetpriority: normal -> low

nosy: + rhettinger
messages: + msg130543

assignee: rhettinger
2011-03-11 01:10:39eltodersetfiles: + consts_test.patch
nosy: + pitrou
messages: + msg130539

2011-03-11 01:07:44eltodersetfiles: + unused_consts.patch
2011-03-11 01:04:04eltodersetfiles: + dedup_const_planb.patch

messages: + msg130538
2011-03-11 01:01:11eltodersetfiles: + dedup_const_plana.patch
keywords: + patch
2011-03-11 00:55:58eltodercreate