Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Peephole binops folding can lead to memory and bytecache ballooning #74479

Closed
mjpieters mannequin opened this issue May 6, 2017 · 4 comments
Closed

Peephole binops folding can lead to memory and bytecache ballooning #74479

mjpieters mannequin opened this issue May 6, 2017 · 4 comments
Labels
3.7 (EOL) end of life performance Performance or resource usage

Comments

@mjpieters
Copy link
Mannequin

mjpieters mannequin commented May 6, 2017

BPO 30293
Nosy @mjpieters

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2017-05-09.09:01:32.062>
created_at = <Date 2017-05-06.19:04:17.805>
labels = ['3.7', 'invalid', 'performance']
title = 'Peephole binops folding can lead to memory and bytecache ballooning'
updated_at = <Date 2017-05-23.01:52:46.731>
user = 'https://github.com/mjpieters'

bugs.python.org fields:

activity = <Date 2017-05-23.01:52:46.731>
actor = 'rhettinger'
assignee = 'none'
closed = True
closed_date = <Date 2017-05-09.09:01:32.062>
closer = 'rhettinger'
components = []
creation = <Date 2017-05-06.19:04:17.805>
creator = 'mjpieters'
dependencies = []
files = []
hgrepos = []
issue_num = 30293
keywords = []
message_count = 4.0
messages = ['293167', '293172', '293292', '293295']
nosy_count = 1.0
nosy_names = ['mjpieters']
pr_nums = []
priority = 'low'
resolution = 'not a bug'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'resource usage'
url = 'https://bugs.python.org/issue30293'
versions = ['Python 3.7']

@mjpieters
Copy link
Mannequin Author

mjpieters mannequin commented May 6, 2017

The following expression produces 127MB in constants in co_consts due to two 63.5MB integer objects produced when folding:

((200*200 - 2) & ((1 << 500000000) - 1)) + ((200*200 - 2) >> 500000000)

The optimizer already does not store optimized *sequences* of more than 20 elements to avoid making bytecode files too large:

If the new constant is a sequence, only folds when the size
is below a threshold value. That keeps pyc files from
becoming large in the presence of code like: (None,)*1000.

Perhaps the same should be extended to number objects?

Context: https://stackoverflow.com/questions/43823807/why-does-using-arguments-make-this-function-so-much-slower

@rhettinger
Copy link
Contributor

A few thoughts:

  • This is very unlikely to come up in real code. Accordingly, I've marked this a low priority -- the peepholer is over a decade old and seems to work-out fine in actual practice.

  • The user explicitly asked for 1 << 500000000 to be computed. At some point, either during compilation or during execution, a large object is going to be produced.

  • The binops folding in peephole.c was originally much simpler and did not recursively combine results. The patch that added the recursive application of constant folding added a lot of complexity, but it demonstrated very little real world benefit, and it opened the door to examples like this where the computer is being explicitly told to do a lot of work. Perhaps constant folding should be reverted to its simpler state.

  • We could do a size check after each folding step and abandon folding when the size hit some limit. I'm not sure that is worth it though. Someone can just add more zeros and then complain that the compilation step took a lot of time and memory without producing any benefit.

  • We could try to detect an expensive computation before it is run, but in general that is hard to do right and it adds yet more complexity to something that started-out simple.

  • One of the design objectives for the peephole transformations was to keep it fast and slim rather than trying to be all things to all people. Burdening the optimizer with insanity checks just slows down the compilation of normal, sane code.

  • With sum(), Guido and Alex took the position that users should be blocked from using it with list-of-strings, but we didn't do block other unfavorable possibilities like a list-of-lists. The rationale was that the list-of-strings was too tempting and likely but that other cases weren't common or easy to detect. Likewise for Tim and I, the early decision for the peephole optimized (the decision being second guessed by this issue) was that large strings were easy to detect and likely enough to warrant extra code but that numeric folding issues were much less likely and didn't warrant extra code.

  • Constant folding was introduced before the AST branch. Once that branch landed, it has been a long standing objective to move constant folding out of the peepholer and into the AST stage. Accordingly, we've tried to refrain from further build-outs of constant folding. It is a pit of snakes. If someone really cares about this, they should work on an AST transformation.

  • Programming languages are different from applications. Users are allowed to explicitly create code that tells the computer to do something resource intensive and there are a myriad of ways to do so. In general, it will be difficult to prevent the execution of such code without unduly burdening execution of normal code.

@rhettinger rhettinger added 3.7 (EOL) end of life performance Performance or resource usage labels May 6, 2017
@rhettinger
Copy link
Contributor

Looking back at the OP's timings in the referenced SO question, I would expect that if someone "fixed" this issue, it wouldn't be long before someone else filed a performance regression bug claiming a 63,000x slowdown in exactly the same code.

I'm marking this as closed because if this ever did arise in real code, it is unclear whether the desirable behavior is to eat memory but run fast, or to save memory upfront but run dog slow and eat memory later when the function is called. Either way, the situation is likely to be very rare.

Your guess is as good as mine regarding which behavior would be more desirable to the user. Presumably, if they direct the computer to build a large object, they won't be surprised if a large object is created.

@mjpieters
Copy link
Mannequin Author

mjpieters mannequin commented May 9, 2017

Thanks Raymond, for the response. I agree, we can't prevent all possible misuse, and avoiding the memory issue would require overly costly checks as to what is being multiplied or added.

@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.7 (EOL) end of life performance Performance or resource usage
Projects
None yet
Development

No branches or pull requests

1 participant