Issue 38115: Invalid bytecode offsets in co_lnotab

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/82296

classification

Title:	Invalid bytecode offsets in co_lnotab
Type:	behavior	Stage:	commit review
Components:	Interpreter Core	Versions:	Python 3.8

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:	twouters	Nosy List:	gregory.p.smith, lukasz.langa, pablogsal, serhiy.storchaka, twouters
Priority:	release blocker	Keywords:	patch

Created on 2019-09-11 14:27 by twouters, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Pull Requests
URL	Status	Linked	Edit
PR 15970	closed	twouters, 2019-09-11 14:48
PR 16079	merged	twouters, 2019-09-13 09:24
PR 16464	merged	gregory.p.smith, 2019-09-28 14:59

Messages (8)
msg351902 - (view)	Author: Thomas Wouters (twouters) *	Date: 2019-09-11 14:27
The peephole optimizer in Python 2.7 and later (and probably a lot earlier) has a bug where if the optimizer entirely optimizes away the last line(s) of a function, the lnotab references invalid bytecode offsets: >>> def f(cond1, cond2): ... while 1: ... return 3 ... while 1: ... return 5 ... return 6 ... >>> list(dis.findlinestarts(f.__code__)) [(0, 3), (4, 5), (8, 6)] >>> len(f.__code__.co_code) 8 >>> f.__code__.co_code[8] Traceback (most recent call last): File "<stdin>", line 1, in <module> IndexError: index out of range The problem is that the lnotab-readjustment in Python/peephole.c doesn't account for trailing NOPs in a bytecode string. I haven't been able to reproduce this before Python 3.8, probably because the optimizer wasn't capable of optimizing things aggressively enough to end a bytecode string with NOPs. I have a fix for this bug already.
msg351904 - (view)	Author: Thomas Wouters (twouters) *	Date: 2019-09-11 14:35
There's also a bug where the optimizer may bail out on optimizing a code object after updating the lnotab (the last 'goto exitUnchanged' in Python/peephole.c). That bug has existed since Python 3.6, but it's not clear to me how much this actually affects.
msg352149 - (view)	Author: Thomas Wouters (twouters) *	Date: 2019-09-12 12:55
As mentioned in the PR (GH-15970), I don't think we should fix this bug. We can, but it involves replacing PyCode_Optimize() (which is public but undocumented, with confusing refcount effects) with a stub, and very careful surgery on the code of the peephole optimizer. I tried three different ways and I keep running into unexpected side-effects of my changes, because of how the optimizer is called by the compiler. It is the case that other changes in 3.8 make this bug more apparent, but it's always been around (at least since lnotab was introduced). At this point I think the best thing to do is to document that lnotab can have invalid bytecode offsets, and then reconsider serious refactoring and redesign of the peephole optimizer if it's going to be kept around in 3.9. (Right now there's talk about replacing it with a more sensible CFG-based optimizer.)
msg352255 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2019-09-13 09:28
Since we modify the content of the bytes object in any case, we can shrink it in-place by setting its Py_SIZE(). But it would be better to fill the end of it with "no-op" fillers.
msg352257 - (view)	Author: Thomas Wouters (twouters) *	Date: 2019-09-13 09:35
Setting Py_SIZE of the bytes object is possible, but gross and not how you're supposed to operate on bytes. I'm also not entirely convinced lnotab isn't reused in ways it shouldn't. The peephole optimizer already does gross things and is tied very intimately into the compiler and assembler structs, and any change I tried caused weird side-effects. I'm not comfortable making these changes without extensive rewrites of those bits of the code, which Mark Shannon is already working on for different reasons. The current lnotab format doesn't really have the concept of 'no-op fillers', because zero-increment entries are used to add to previous entries. Adding the concept could mean breaking third-party consumers of lnotab. Of all the uses of lnotab that I could find, dis.findlinestarts() was the only one that didn't ignore the invalid entries. I think just documenting the current behaviour (which, just as a reminder, has been around forever, but is just more obvious in Python 3.8) and fixing dis.findlinestarts() is enough of a fix for the foreseeable future. See GH-16079.
msg353458 - (view)	Author: Gregory P. Smith (gregory.p.smith) *	Date: 2019-09-28 14:49
New changeset c8165036f374cd2ee64d4314eeb2514f7acb5026 by Gregory P. Smith (T. Wouters) in branch 'master': bpo-38115: Deal with invalid bytecode offsets in lnotab (GH-16079) https://github.com/python/cpython/commit/c8165036f374cd2ee64d4314eeb2514f7acb5026
msg353461 - (view)	Author: Gregory P. Smith (gregory.p.smith) *	Date: 2019-09-28 15:22
New changeset 36c6fa968016a46a39c3cdbd0a17ea5490dfa343 by Gregory P. Smith in branch '3.8': bpo-38115: Deal with invalid bytecode offsets in lnotab (GH-16079) (GH-16464) https://github.com/python/cpython/commit/36c6fa968016a46a39c3cdbd0a17ea5490dfa343
msg353462 - (view)	Author: Gregory P. Smith (gregory.p.smith) *	Date: 2019-09-28 15:22
thanks Thomas!

History
Date	User	Action	Args
2022-04-11 14:59:20	admin	set	github: 82296
2019-09-28 15:22:28	gregory.p.smith	set	status: open -> closed resolution: fixed messages: + msg353462 stage: patch review -> commit review
2019-09-28 15:22:03	gregory.p.smith	set	messages: + msg353461
2019-09-28 14:59:10	gregory.p.smith	set	pull_requests: + pull_request16049
2019-09-28 14:49:18	gregory.p.smith	set	nosy: + gregory.p.smith messages: + msg353458
2019-09-13 09:35:08	twouters	set	messages: + msg352257
2019-09-13 09:28:53	serhiy.storchaka	set	messages: + msg352255
2019-09-13 09:24:38	twouters	set	keywords: + patch pull_requests: + pull_request15701
2019-09-12 12:55:25	twouters	set	keywords: - patch messages: + msg352149
2019-09-11 18:35:21	serhiy.storchaka	set	nosy: + serhiy.storchaka
2019-09-11 14:48:56	twouters	set	stage: patch review pull_requests: + pull_request15603
2019-09-11 14:35:56	twouters	set	messages: + msg351904
2019-09-11 14:27:50	twouters	create