Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimize bytecode for conditional branches #48965

Closed
pitrou opened this issue Dec 21, 2008 · 21 comments
Closed

optimize bytecode for conditional branches #48965

pitrou opened this issue Dec 21, 2008 · 21 comments
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage

Comments

@pitrou
Copy link
Member

pitrou commented Dec 21, 2008

BPO 4715
Nosy @rhettinger, @jcea, @pitrou
Files
  • condbranches-plus.patch: Aggregate patch for POP_JUMP_IF_TRUE, POP_JUMP_IF_FALSE, POP_OR_JUMP, JUMP_OR_POP
  • pybench.txt
  • condbranches.patch: Original patch for POP_JUMP_IF_TRUE, POP_JUMP_IF_FALSE
  • condbranches-plus2.patch: Updated patch with POP_JUMP_IF_TRUE, POP_JUMP_IF_FALSE, JUMP_IF_TRUE_OR_POP, JUMP_IF_FALSE_OR_POP
  • trunk-opt-cond-jump.patch: Backport to trunk
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2009-03-01.20:50:45.940>
    created_at = <Date 2008-12-21.22:58:05.131>
    labels = ['interpreter-core', 'performance']
    title = 'optimize bytecode for conditional branches'
    updated_at = <Date 2009-10-17.04:33:00.842>
    user = 'https://github.com/pitrou'

    bugs.python.org fields:

    activity = <Date 2009-10-17.04:33:00.842>
    actor = 'esam'
    assignee = 'none'
    closed = True
    closed_date = <Date 2009-03-01.20:50:45.940>
    closer = 'jyasskin'
    components = ['Interpreter Core']
    creation = <Date 2008-12-21.22:58:05.131>
    creator = 'pitrou'
    dependencies = []
    files = ['12421', '12719', '12730', '13170', '13208']
    hgrepos = []
    issue_num = 4715
    keywords = ['patch']
    message_count = 21.0
    messages = ['78159', '78165', '79747', '79750', '79790', '79794', '79796', '79799', '79806', '79823', '79824', '79833', '79848', '79850', '82687', '82689', '82694', '82736', '82742', '82888', '82987']
    nosy_count = 7.0
    nosy_names = ['collinwinter', 'rhettinger', 'jcea', 'pitrou', 'jyasskin', 'blaisorblade', 'esam']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'patch review'
    status = 'closed'
    superseder = None
    type = 'performance'
    url = 'https://bugs.python.org/issue4715'
    versions = ['Python 3.1', 'Python 2.7']

    @pitrou
    Copy link
    Member Author

    pitrou commented Dec 21, 2008

    This patch optimizes the bytecode for conditional branches.
    For example, the list comprehension "[x for x in l if not x]" produced
    the following bytecode:

    1 0 BUILD_LIST 0
    3 LOAD_FAST 0 (.0)
    >> 6 FOR_ITER 23 (to 32)
    9 STORE_FAST 1 (x)
    12 LOAD_FAST 1 (x)
    15 JUMP_IF_TRUE 10 (to 28)
    18 POP_TOP
    19 LOAD_FAST 1 (x)
    22 LIST_APPEND 2
    25 JUMP_ABSOLUTE 6
    >> 28 POP_TOP
    29 JUMP_ABSOLUTE 6
    >> 32 RETURN_VALUE

    but after the patch it produces the following bytecode:

    1 0 BUILD_LIST 0
    3 LOAD_FAST 0 (.0)
    >> 6 FOR_ITER 18 (to 27)
    9 STORE_FAST 1 (x)
    12 LOAD_FAST 1 (x)
    15 POP_JUMP_IF_TRUE 6
    18 LOAD_FAST 1 (x)
    21 LIST_APPEND 2
    24 JUMP_ABSOLUTE 6
    >> 27 RETURN_VALUE

    Notice that not only the code is shorter, but the conditional jump
    (POP_JUMP_IF_TRUE) jumps right to the start of the loop instead of going
    through the JUMP_ABSOLUTE at the end. "continue" statements are helped
    similarly.

    Furthermore, the old jump opcodes (JUMP_IF_FALSE, JUMP_IF_TRUE) could
    profitably be replaced by two new opcodes:

    • one which jumps if true and pops otherwise
    • one which jumps if false and pops otherwise

    @pitrou pitrou added performance Performance or resource usage interpreter-core (Objects, Python, Grammar, and Parser dirs) labels Dec 21, 2008
    @pitrou
    Copy link
    Member Author

    pitrou commented Dec 22, 2008

    Here is an optional patch which provides the two opcodes I was talking
    about (I've called them POP_OR_JUMP and JUMP_OR_POP). Together with a
    bit of work on the peepholer they make the bytecode expression of
    boolean calculations very concise.

    A somewhat contrived example: "f = lambda x,y,z,v: (z if x else y) or v"

    Without the patch:

    1 0 LOAD_FAST 0 (x)
    3 JUMP_IF_FALSE 7 (to 13)
    6 POP_TOP
    7 LOAD_FAST 2 (z)
    10 JUMP_FORWARD 4 (to 17)
    >> 13 POP_TOP
    14 LOAD_FAST 1 (y)
    >> 17 JUMP_IF_TRUE 4 (to 24)
    20 POP_TOP
    21 LOAD_FAST 3 (v)
    >> 24 RETURN_VALUE

    With the patch:

    1 0 LOAD_FAST 0 (x)
    3 POP_JUMP_IF_FALSE 12
    6 LOAD_FAST 2 (z)
    9 JUMP_FORWARD 3 (to 15)
    >> 12 LOAD_FAST 1 (y)
    >> 15 JUMP_OR_POP 21
    18 LOAD_FAST 3 (v)
    >> 21 RETURN_VALUE

    @jyasskin
    Copy link
    Mannequin

    jyasskin mannequin commented Jan 13, 2009

    Those look nice, although I need to look at the patches in more detail.
    What speedup do they give you?

    @pitrou
    Copy link
    Member Author

    pitrou commented Jan 13, 2009

    pybench runtimes (attached) are almost the same. The big win is on list
    comprehensions with an "if" clause.

    @collinwinter
    Copy link
    Mannequin

    collinwinter mannequin commented Jan 13, 2009

    I've backported condbranches-plus.patch to trunk, and I'm getting these
    results:

    PyBench: 1.84-2.21% faster
    2to3: 3.83% faster
    Spitfire: 6.13-6.23% faster

    PyBench was run with -w=1; 2to3 is translating its entire source
    directory five times; Spitfire is measured by the "Spitfire -O4" line
    from tests/perf/bigtable.py, run 15 times. This is on a Core 2 system.
    My AMD system (otherwise identical) shows somewhat less improvement,
    but still an improvement.

    I've haven't tested condbranches.patch vs condbranches-plus.patch; what
    difference are you seeing, Antoine?

    I like these changes. Folding POP into JUMP_IF_{TRUE,FALSE} should have
    been done years ago (I think I proposed it and Raymond shot me down).
    I'm also in favor of making POP_JUMP_IF_* absolute jumps.

    This patch mostly looks good, though you still need to change Doc/
    library/dis.rst and the pure-Python compiler package. I'd get someone
    else to look over the patch, too, though.

    @pitrou
    Copy link
    Member Author

    pitrou commented Jan 13, 2009

    Hello,

    I've backported condbranches-plus.patch to trunk, and I'm getting these
    results:

    Thanks!

    PyBench: 1.84-2.21% faster
    2to3: 3.83% faster
    Spitfire: 6.13-6.23% faster

    What is Spitfire?

    I've haven't tested condbranches.patch vs condbranches-plus.patch; what
    difference are you seeing, Antoine?

    condbranches.patch is the earlier version (without POP_OR_JUMP and
    JUMP_OR_POP), it can be ignored.

    This patch mostly looks good, though you still need to change Doc/
    library/dis.rst and the pure-Python compiler package.

    Well, the pure-Python compiler package doesn't exist in py3k, for which
    I've initially made the patch.
    (and its state in 2.x is very sorry...)

    @collinwinter
    Copy link
    Mannequin

    collinwinter mannequin commented Jan 13, 2009

    On Tue, Jan 13, 2009 at 3:25 PM, Antoine Pitrou <report@bugs.python.org> wrote:

    Antoine Pitrou <pitrou@free.fr> added the comment:

    Hello,

    > I've backported condbranches-plus.patch to trunk, and I'm getting these
    > results:

    Thanks!

    > PyBench: 1.84-2.21% faster
    > 2to3: 3.83% faster
    > Spitfire: 6.13-6.23% faster

    What is Spitfire?

    http://code.google.com/p/spitfire/. It's a template system designed
    for performance that I have some experience with.

    > I've haven't tested condbranches.patch vs condbranches-plus.patch; what
    > difference are you seeing, Antoine?

    condbranches.patch is the earlier version (without POP_OR_JUMP and
    JUMP_OR_POP), it can be ignored.

    I was mostly curious whether the POP_OR_JUMP and JUMP_OR_POP opcodes
    had a noticeable performance impact, ie, do they make things fast
    enough to warrant their inclusion over the old JUMP_IF_FALSE
    implementations.

    > This patch mostly looks good, though you still need to change Doc/
    > library/dis.rst and the pure-Python compiler package.

    Well, the pure-Python compiler package doesn't exist in py3k, for which
    I've initially made the patch.
    (and its state in 2.x is very sorry...)

    I'll update the compiler package in 2.x and post my patch once I have
    it working. There are also some test failures in 2.x that I'll take
    care of.

    @pitrou
    Copy link
    Member Author

    pitrou commented Jan 13, 2009

    Sorry, hadn't seen your message before removing the file. Here it is again.

    @pitrou
    Copy link
    Member Author

    pitrou commented Jan 13, 2009

    http://code.google.com/p/spitfire/. It's a template system designed
    for performance that I have some experience with.

    6% faster on a template system simply by optimizing conditional jumps is
    quite neat.
    (the spitfire homepage itself is rather intriguing, they claim to be
    faster than plain string joining...)

    I was mostly curious whether the POP_OR_JUMP and JUMP_OR_POP opcodes
    had a noticeable performance impact, ie, do they make things fast
    enough to warrant their inclusion over the old JUMP_IF_FALSE
    implementations.

    Well, I don't remember really, although this is only a few weeks old.
    POP_OR_JUMP / JUMP_OR_POP will be used when the "and" and "or" keywords
    are involved. That's basically the only situation where you don't want
    to pop the top of stack after a conditional jump.

    @jyasskin
    Copy link
    Mannequin

    jyasskin mannequin commented Jan 14, 2009

    Looking through the patch...

    I don't really like the names for JUMP_OR_POP and POP_OR_JUMP. (They
    don't really indicate to me that the choice depends on the truth of the
    top of the stack.) How about JUMP_IF_FALSE_OR_POP and
    JUMP_IF_TRUE_OR_POP (or s/OR/ELSE/)?

    If the old opcodes weren't called JUMP_IF_XXX, I'd suggest the
    always-popping variants just use those names. Many other opcodes
    implicitly consume their arguments so these could too. But since this
    would be confusing with both the old meanings and the other new pair,
    your names are probably better.

    The comments in opcode.h and opcode.py are inconsistent.

    I now get a warning that PRED_POP_TOP is defined but not used, so you
    should remove the PREDICTED macro for it.

    I wonder if BINARY_AND and BINARY_OR should get predictions ... not for
    this patch.

    I would probably put the POP_JUMP_IF_XXX definitions next to the
    JUMP_OR_POP definitions to emphasize their similarity.

    You missed a comment referring to JUMP_IF_FALSE before
    compiler_try_except().

    POP_JUMP_IF_TRUE is only used in one place: assertions. I wonder if
    anyone would cry if we compiled assertions to UNARY_NOT;
    POP_JUMP_IF_FALSE instead... Anyway, also not for this patch.

    In compiler_comprehension_generator, "compiler_use_next_block(c, skip);"
    is now always followed by "compiler_use_next_block(c, if_cleanup);".
    Should you remove the use(skip) call?

    In peephole.c, s/JUMP_SIGN/JUMPS_ON_TRUE/ ?

    test_peepholer fails for me, which makes sense because it still refers
    to JUMP_IF_XXX.

    Whoo, the peephole.c changes are complex. I'll do those in a second round.

    @rhettinger
    Copy link
    Contributor

    FWIW, the goal is to replace the opcode peephole optimizer with a more
    powerful equivalent working on the AST just prior to code generation.

    @jyasskin
    Copy link
    Mannequin

    jyasskin mannequin commented Jan 14, 2009

    In peephole.c:

    _optimize isn't a great label name, but I don't have a great
    replacement. Maybe reoptimize_current_index?

    Your change to the "LOAD_CONST trueconst JUMP_IF_FALSE xx POP_TOP"
    optimization doesn't preserve the optimization to:
    def f():
    return 1 and a
    I suspect we don't care since "0 or a" wasn't optimized.

    I wonder what the "POP_TOP JUMP_FORWARD 1 POP_TOP" was ever for. Why did
    compiler_comprehension_generator() emit it in the first place?

    After "if (JUMP_SIGN(j) == JUMP_SIGN(opcode)) {", it might be nice to
    have a comment like, "/* The second jump will always be taken if the
    first was. */" and similarly for the else branch with an explanation why
    the POP should become unconditional.

    Otherwise looks good.

    @pitrou
    Copy link
    Member Author

    pitrou commented Jan 14, 2009

    Your change to the "LOAD_CONST trueconst JUMP_IF_FALSE xx POP_TOP"
    optimization doesn't preserve the optimization to:
    def f():
    return 1 and a
    I suspect we don't care since "0 or a" wasn't optimized.

    Yes, this optimization seems meant for "while 1" and "while True" mainly
    (which my patch preserves, but I might add a comment).

    I wonder what the "POP_TOP JUMP_FORWARD 1 POP_TOP" was ever for. Why did
    compiler_comprehension_generator() emit it in the first place?

    I'm as clueless as you...

    @pitrou
    Copy link
    Member Author

    pitrou commented Jan 14, 2009

    Le mercredi 14 janvier 2009 à 02:48 +0000, Jeffrey Yasskin a écrit :

    Looking through the patch...

    I don't really like the names for JUMP_OR_POP and POP_OR_JUMP.

    I don't like them either, they're the only relatively short ones I could
    come up with. I'll change them to your proposal
    (JUMP_IF_{TRUE,FALSE}_OR_POP).

    I wonder if BINARY_AND and BINARY_OR should get predictions ... not for
    this patch.

    With the patch, I don't think they need predictions anymore.

    POP_JUMP_IF_TRUE is only used in one place: assertions. I wonder if
    anyone would cry if we compiled assertions to UNARY_NOT;
    POP_JUMP_IF_FALSE instead...

    No, POP_JUMP_IF_TRUE is also used when optimizing the sequence
    "UNARY_NOT; POP_JUMP_IF_FALSE" (think "if not x: ...").

    In compiler_comprehension_generator, "compiler_use_next_block(c, skip);"
    is now always followed by "compiler_use_next_block(c, if_cleanup);".
    Should you remove the use(skip) call?

    I'll look at this.

    In peephole.c, s/JUMP_SIGN/JUMPS_ON_TRUE/ ?

    Great, another stupid name disappears :)

    @jyasskin
    Copy link
    Mannequin

    jyasskin mannequin commented Feb 25, 2009

    I've updated Antoine's patch to incorporate my comments. This interacts
    with bpo-2459, but I haven't yet looked at its patch to figure out
    how. As a first cut, I'll propose committing this patch, backporting it
    to trunk, syncing it into the bpo-2459 patch, and then committing that
    patch. Let me know if that's crazy.

    http://codereview.appspot.com/20064

    @pitrou
    Copy link
    Member Author

    pitrou commented Feb 25, 2009

    Le mercredi 25 février 2009 à 00:51 +0000, Jeffrey Yasskin a écrit :

    I've updated Antoine's patch to incorporate my comments. This interacts
    with bpo-2459, but I haven't yet looked at its patch to figure out
    how. As a first cut, I'll propose committing this patch, backporting it
    to trunk, syncing it into the bpo-2459 patch, and then committing that
    patch. Let me know if that's crazy.

    Doesn't sound crazier than doing it in another order :-))
    There'll be a bit of work to reconcile both patches anyway (and also to
    adapt the 2459 in order to apply cleanly against current trunk).

    2459 defines its own JUMP_ABS_IF_TRUE / JUMP_ABS_IF_FALSE (which are the
    same as this patch's POP_JUMP_IF_TRUE / POP_JUMP_IF_FALSE), but it also
    keeps the old relative non-popping conditional jumps, which as this
    issue shows is sub-optimal.

    Thank you for taking this, Jeffrey, my own priority right now being the
    io-c branch.

    @jyasskin
    Copy link
    Mannequin

    jyasskin mannequin commented Feb 25, 2009

    Committed as r69961. I'll post the backport to trunk here at least a day
    before I commit it.

    @jyasskin jyasskin mannequin added performance Performance or resource usage and removed performance Performance or resource usage labels Feb 25, 2009
    @jyasskin
    Copy link
    Mannequin

    jyasskin mannequin commented Feb 26, 2009

    Oh, and no problem with picking up the patches. Thanks for writing them
    in the first place.

    Here's the backport to trunk. I particularly enjoyed the bit in
    pyassem.py where I had to teach the pure-Python compiler that you can
    get to a block without going through a relative jump or a fallthrough.

    The patch is also #2 at http://codereview.appspot.com/20064.

    I'll post performance numbers as soon as I've measured them.

    @jyasskin
    Copy link
    Mannequin

    jyasskin mannequin commented Feb 26, 2009

    The numbers are:

    Intel Core 2, gcc-4.3, 32-bit
    2to3:
    25.24 -> 24.89: 1.38% faster

    Django:
    Min: 0.618 -> 0.607: 1.90% faster
    Avg: 0.621 -> 0.615: 1.04% faster

    PyBench:
    Min: 5324 -> 5280: 0.83% faster
    Avg: 5456 -> 5386: 1.30% faster

    Pickle:
    Min: 1.424 -> 1.376: 3.48% faster
    Avg: 1.427 -> 1.378: 3.55% faster

    Spitfire:
    Min: 0.701 -> 0.718: 2.32% slower
    Avg: 0.710 -> 0.721: 1.47% slower

    Unpickle:
    Min: 0.667 -> 0.651: 2.33% faster
    Avg: 0.668 -> 0.652: 2.38% faster

    Intel Core 2, gcc-4.3, 64-bit

    2to3:
    22.40 -> 22.59: 0.81% slower

    Django:
    Min: 0.575 -> 0.565: 1.74% faster
    Avg: 0.577 -> 0.567: 1.76% faster

    PyBench:
    Min: 4332 -> 4433: 2.28% slower
    Avg: 4393 -> 4519: 2.79% slower

    Pickle:
    Min: 1.177 -> 1.204: 2.25% slower
    Avg: 1.180 -> 1.205: 2.14% slower

    Spitfire:
    Min: 0.622 -> 0.629: 1.22% slower
    Avg: 0.623 -> 0.631: 1.26% slower

    Unpickle:
    Min: 0.576 -> 0.563: 2.25% faster
    Avg: 0.596 -> 0.564: 5.55% faster

    On my MacBook, gcc-4.0, 32-bit:
    2to3:
    29.82 -> 29.39: 1.46% faster

    Django:
    Min: 0.727 -> 0.720: 0.98% faster
    Avg: 0.746 -> 0.736: 1.45% faster

    PyBench:
    Min: 6303 -> 6432: 2.01% slower
    Avg: 6471 -> 6563: 1.40% slower

    Pickle:
    Min: 1.564 -> 1.564: 0.00% faster
    Avg: 1.609 -> 1.592: 1.07% faster

    Spitfire:
    Min: 0.902 -> 0.909: 0.78% slower
    Avg: 0.924 -> 0.920: 0.41% faster

    Unpickle:
    Min: 0.784 -> 0.763: 2.73% faster
    Avg: 0.794 -> 0.776: 2.26% faster

    The performance isn't as good as I'd like, especially on 64-bits. I
    suspect the difference from the py3k branch is that trunk doesn't have
    Antoine's dispatch patch, and POP_TOP is predicted after
    JUMP_IF_{TRUE,FALSE}, which means without computed-goto-dispatch, this
    patch usually only saves a predictable if(). The skipped JUMP_ABSOLUTEs
    may not happen enough in my benchmarks to matter much.

    On the other hand, "./python.exe -m timeit -s 'x=range(500)' '[y+3 for y
    in x if y%5 <2]'" shows the following differences on my MacBook

    For py3k:
    Min: 196.000 -> 172.000: 13.95% faster
    Avg: 200.000 -> 178.600: 11.98% faster
    Significant (t=5.339997, a=0.95)

    For trunk:
    Min: 108.000 -> 88.200: 22.45% faster
    Avg: 114.571 -> 97.571: 17.42% faster
    Significant (t=5.518236, a=0.95)

    That list comprehension definitely takes advantage of skipping the
    JUMP_ABSOLUTE.

    @jyasskin
    Copy link
    Mannequin

    jyasskin mannequin commented Feb 28, 2009

    Collin made some comments at http://codereview.appspot.com/20094. Here's
    a new patch that fixes them. I plan to commit it over the weekend and
    then start on bpo-2459.

    @jyasskin
    Copy link
    Mannequin

    jyasskin mannequin commented Mar 1, 2009

    Backported as r70071. I also fixed a couple things I missed in the py3k
    branch in r70076. Thanks all!

    @jyasskin jyasskin mannequin closed this as completed Mar 1, 2009
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants