Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid bytecode offsets in co_lnotab #82296

Closed
Yhg1s opened this issue Sep 11, 2019 · 8 comments
Closed

Invalid bytecode offsets in co_lnotab #82296

Yhg1s opened this issue Sep 11, 2019 · 8 comments
Assignees
Labels
3.8 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) release-blocker type-bug An unexpected behavior, bug, or error

Comments

@Yhg1s
Copy link
Member

Yhg1s commented Sep 11, 2019

BPO 38115
Nosy @Yhg1s, @gpshead, @ambv, @serhiy-storchaka, @pablogsal
PRs
  • bpo-38115: fix invalid lnotab after optimization #15970
  • bpo-38115: Deal with invalid bytecode offsets in lnotab #16079
  • [3.8] bpo-38115: Deal with invalid bytecode offsets in lnotab (GH-16079) #16464
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/Yhg1s'
    closed_at = <Date 2019-09-28.15:22:28.778>
    created_at = <Date 2019-09-11.14:27:50.966>
    labels = ['interpreter-core', 'type-bug', '3.8', 'release-blocker']
    title = 'Invalid bytecode offsets in co_lnotab'
    updated_at = <Date 2019-09-28.15:22:28.777>
    user = 'https://github.com/Yhg1s'

    bugs.python.org fields:

    activity = <Date 2019-09-28.15:22:28.777>
    actor = 'gregory.p.smith'
    assignee = 'twouters'
    closed = True
    closed_date = <Date 2019-09-28.15:22:28.778>
    closer = 'gregory.p.smith'
    components = ['Interpreter Core']
    creation = <Date 2019-09-11.14:27:50.966>
    creator = 'twouters'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 38115
    keywords = ['patch']
    message_count = 8.0
    messages = ['351902', '351904', '352149', '352255', '352257', '353458', '353461', '353462']
    nosy_count = 5.0
    nosy_names = ['twouters', 'gregory.p.smith', 'lukasz.langa', 'serhiy.storchaka', 'pablogsal']
    pr_nums = ['15970', '16079', '16464']
    priority = 'release blocker'
    resolution = 'fixed'
    stage = 'commit review'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue38115'
    versions = ['Python 3.8']

    @Yhg1s
    Copy link
    Member Author

    Yhg1s commented Sep 11, 2019

    The peephole optimizer in Python 2.7 and later (and probably a *lot* earlier) has a bug where if the optimizer entirely optimizes away the last line(s) of a function, the lnotab references invalid bytecode offsets:

    >>> def f(cond1, cond2):
    ...     while 1:
    ...         return 3
    ...     while 1:
    ...         return 5
    ...     return 6
    ... 
    >>> list(dis.findlinestarts(f.__code__))
    [(0, 3), (4, 5), (8, 6)]
    >>> len(f.__code__.co_code)
    8
    >>> f.__code__.co_code[8]
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    IndexError: index out of range

    The problem is that the lnotab-readjustment in Python/peephole.c doesn't account for trailing NOPs in a bytecode string. I haven't been able to reproduce this before Python 3.8, probably because the optimizer wasn't capable of optimizing things aggressively enough to end a bytecode string with NOPs.

    I have a fix for this bug already.

    @Yhg1s Yhg1s added 3.8 only security fixes release-blocker labels Sep 11, 2019
    @Yhg1s Yhg1s self-assigned this Sep 11, 2019
    @Yhg1s Yhg1s added interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error labels Sep 11, 2019
    @Yhg1s
    Copy link
    Member Author

    Yhg1s commented Sep 11, 2019

    There's also a bug where the optimizer may bail out on optimizing a code object *after* updating the lnotab (the last 'goto exitUnchanged' in Python/peephole.c). That bug has existed since Python 3.6, but it's not clear to me how much this actually affects.

    @Yhg1s
    Copy link
    Member Author

    Yhg1s commented Sep 12, 2019

    As mentioned in the PR (GH-15970), I don't think we should fix this bug. We can, but it involves replacing PyCode_Optimize() (which is public but undocumented, with confusing refcount effects) with a stub, and very careful surgery on the code of the peephole optimizer. I tried three different ways and I keep running into unexpected side-effects of my changes, because of how the optimizer is called by the compiler.

    It is the case that other changes in 3.8 make this bug more apparent, but it's always been around (at least since lnotab was introduced). At this point I think the best thing to do is to document that lnotab can have invalid bytecode offsets, and then reconsider serious refactoring and redesign of the peephole optimizer if it's going to be kept around in 3.9. (Right now there's talk about replacing it with a more sensible CFG-based optimizer.)

    @serhiy-storchaka
    Copy link
    Member

    Since we modify the content of the bytes object in any case, we can shrink it in-place by setting its Py_SIZE(). But it would be better to fill the end of it with "no-op" fillers.

    @Yhg1s
    Copy link
    Member Author

    Yhg1s commented Sep 13, 2019

    Setting Py_SIZE of the bytes object is possible, but gross and not how you're supposed to operate on bytes.

    I'm also not entirely convinced lnotab isn't reused in ways it shouldn't. The peephole optimizer already does gross things and is tied very intimately into the compiler and assembler structs, and any change I tried caused weird side-effects. I'm not comfortable making these changes without extensive rewrites of those bits of the code, which Mark Shannon is already working on for different reasons.

    The current lnotab format doesn't really have the concept of 'no-op fillers', because zero-increment entries are used to add to previous entries. Adding the concept could mean breaking third-party consumers of lnotab. Of all the uses of lnotab that I could find, dis.findlinestarts() was the only one that didn't ignore the invalid entries. I think just documenting the current behaviour (which, just as a reminder, has been around forever, but is just more obvious in Python 3.8) and fixing dis.findlinestarts() is enough of a fix for the foreseeable future. See #60283.

    @gpshead
    Copy link
    Member

    gpshead commented Sep 28, 2019

    New changeset c816503 by Gregory P. Smith (T. Wouters) in branch 'master':
    bpo-38115: Deal with invalid bytecode offsets in lnotab (GH-16079)
    c816503

    @gpshead
    Copy link
    Member

    gpshead commented Sep 28, 2019

    New changeset 36c6fa9 by Gregory P. Smith in branch '3.8':
    bpo-38115: Deal with invalid bytecode offsets in lnotab (GH-16079) (GH-16464)
    36c6fa9

    @gpshead
    Copy link
    Member

    gpshead commented Sep 28, 2019

    thanks Thomas!

    @gpshead gpshead closed this as completed Sep 28, 2019
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.8 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) release-blocker type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants