Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

quadratic-time compilation in the number of 'and' or 'or' expressions #65722

Closed
dalke mannequin opened this issue May 18, 2014 · 9 comments
Closed

quadratic-time compilation in the number of 'and' or 'or' expressions #65722

dalke mannequin opened this issue May 18, 2014 · 9 comments
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage

Comments

@dalke
Copy link
Mannequin

dalke mannequin commented May 18, 2014

BPO 21523
Nosy @brettcannon, @birkenfeld, @rhettinger, @ncoghlan, @pitrou, @benjaminp
Files
  • quadratic_shortcircuit_compilation.py: Demo quadratic-time compilation with 'and' expressions
  • stackdepth.patch
  • stackdepth2.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2014-05-23.10:00:06.143>
    created_at = <Date 2014-05-18.12:31:45.846>
    labels = ['interpreter-core', 'performance']
    title = "quadratic-time compilation in the number of 'and' or 'or' expressions"
    updated_at = <Date 2014-05-23.10:00:06.142>
    user = 'https://bugs.python.org/dalke'

    bugs.python.org fields:

    activity = <Date 2014-05-23.10:00:06.142>
    actor = 'pitrou'
    assignee = 'none'
    closed = True
    closed_date = <Date 2014-05-23.10:00:06.143>
    closer = 'pitrou'
    components = ['Interpreter Core']
    creation = <Date 2014-05-18.12:31:45.846>
    creator = 'dalke'
    dependencies = []
    files = ['35279', '35315', '35316']
    hgrepos = []
    issue_num = 21523
    keywords = ['patch']
    message_count = 9.0
    messages = ['218742', '218886', '218889', '218907', '218909', '218911', '218922', '218955', '218956']
    nosy_count = 8.0
    nosy_names = ['brett.cannon', 'georg.brandl', 'rhettinger', 'dalke', 'ncoghlan', 'pitrou', 'benjamin.peterson', 'python-dev']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'performance'
    url = 'https://bugs.python.org/issue21523'
    versions = ['Python 2.7', 'Python 3.4', 'Python 3.5']

    @dalke
    Copy link
    Mannequin Author

    dalke mannequin commented May 18, 2014

    Python's compiler has quadratic-time time behavior based on the number of "and" or "or" expressions. A profile shows that stackdepth_walk is calling itself in a stack at least 512 levels deep. (My profiler doesn't go higher than that.)

    I've reduced it to a simple test case. Compiling functions of the form

        def f(x):
            x * x  # Repeat N times

    takes linear time in the number of lines N, while functions of the form

    def g(x):
        x and x  # Repeat N times
    

    takes quadratic time in N. Here's an example of running the attached demonstration code on a fresh build of Python from version control:

    Results from 3.5.0a0 (default:de01f7c37b53, May 18 2014, 13:18:43)  
    
      num    using  using
     tests    '*'   'and'
       100   0.002  0.002
       200   0.003  0.004
       400   0.005  0.010
       800   0.012  0.040
      1600   0.023  0.133
      3200   0.042  0.906
      6400   0.089  5.871
     12800   0.188  27.581
     25600   0.404  120.800
    

    The same behavior occurs when I replace 'and' with 'or'.

    The same behavior also occurs under Python 2.7.2, 3.3.5, 3.4.0. (I don't have builds of 3.1 or 3.2 for testing.)

    However, the demonstration code shows linear time under Python 2.6.6:

    Results from 2.6.6 (r266:84374, Aug 31 2010, 11:00:51)  
    
      num    using  using
     tests    '*'   'and'
       100   0.003  0.001
       200   0.002  0.002
       400   0.006  0.008
       800   0.010  0.010
      1600   0.019  0.022
      3200   0.039  0.045
      6400   0.085  0.098
     12800   0.176  0.203
     25600   0.359  0.423
     51200   0.726  0.839
    

    I came across this problem because my code imports a large machine-generated module. It was originally written for Python 2.6, where it worked just fine. When I tried to import it under new Pythons, the import appeared to hang, and I killed it after a minute or so.

    As a work-around, I have re-written the code generator to use chained if-statements instead of the short-circuit and operator.

    @dalke dalke mannequin added interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage labels May 18, 2014
    @pitrou
    Copy link
    Member

    pitrou commented May 21, 2014

    Andrew, have you tried to bisect the Mercurial repository to find where the regression was introduced?

    @dalke
    Copy link
    Mannequin Author

    dalke mannequin commented May 22, 2014

    Live and learn. I did my first bisect today.

    The first bad revision is:
    changeset: 51920:ef8fe9088696
    branch: legacy-trunk
    parent: 51916:4e1556012584
    user: Jeffrey Yasskin <jyasskin@gmail.com>
    date: Sat Feb 28 19:03:21 2009 +0000
    summary: Backport r69961 to trunk, replacing JUMP_IF_{TRUE,FALSE} with

    I confirmed that the parent did not have the problem.

    If you want me to diagnose this further, then I'll need some hints on what to do next.

    @pitrou
    Copy link
    Member

    pitrou commented May 22, 2014

    You're right, CPU time is burnt by stackdepth_walk().
    The underlying issue is that max stacksize is computed a bit pessimistically with the new opcodes (JUMP_IF_{TRUE,FALSE}_OR_POP). On normal functions there wouldn't be a sizable difference, but on pathological cases such as yours, a code object could end up claiming a large stack size (as large of the number of such opcodes) rather than a very small number.

    Still, of course, stackdepth_walk() should not blow up when computing a large stack size, but fixing the opcode's stack effect in compile.c seems to fix the issue anyway.

    @pitrou
    Copy link
    Member

    pitrou commented May 22, 2014

    Here is a patch.

    @pitrou
    Copy link
    Member

    pitrou commented May 22, 2014

    Updated patch adding some tests.

    @rhettinger
    Copy link
    Contributor

    The patch looks like the correct solution.

    That said, I'm more impressed that you were able to test it so cleanly :-)

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented May 23, 2014

    New changeset cd62cc331572 by Antoine Pitrou in branch '3.4':
    Issue bpo-21523: Fix over-pessimistic computation of the stack effect of some opcodes in the compiler.
    http://hg.python.org/cpython/rev/cd62cc331572

    New changeset e61462e18112 by Antoine Pitrou in branch 'default':
    Issue bpo-21523: Fix over-pessimistic computation of the stack effect of some opcodes in the compiler.
    http://hg.python.org/cpython/rev/e61462e18112

    New changeset 49588a510ed4 by Antoine Pitrou in branch '2.7':
    Issue bpo-21523: Fix over-pessimistic computation of the stack effect of some opcodes in the compiler.
    http://hg.python.org/cpython/rev/49588a510ed4

    @pitrou
    Copy link
    Member

    pitrou commented May 23, 2014

    This should be fixed now!

    @pitrou pitrou closed this as completed May 23, 2014
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants