This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: traceback & inspect modules should verify that the .py source file matches the one that the running process is using
Type: enhancement Stage: needs patch
Components: Library (Lib) Versions: Python 3.11
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: gregory.p.smith, gvanrossum, iritkatriel
Priority: normal Keywords:

Created on 2021-05-09 17:15 by gregory.p.smith, last changed 2022-04-11 14:59 by admin.

Messages (6)
msg393328 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2021-05-09 17:15
A long-standing wart in Python is that once a module is loaded, when rendering a traceback and including source lines, we do not verify if the source file we're loading is the same as the one representing the code we are running.

It could have been replaced.  As is normal during software upgrades.

If our code was loaded from .py source, we should be recording the timestamp/size||hash of the source file and referencing that from each code object.  If our code was loaded from a .pyc source, the .pyc already contains a timestamp/size||hash for the corresponding .py source file that could be referenced.

When traceback.StackSummary and FrameSummary use the linecache module, we should plumb this source metainfo in from the relevant code object.

A traceback being rendered with potentially modified source code could choose to omit the source lines, or at least annotate them with a "  ## this source {timestamp/size||hash} does not match the running code {timestamp/size||hash}." marker so that anyone seeing the traceback knows the displayed line may not be trustworthy.  (If the pyc was written using the "unchecked-hash" mode, no source/pyc synchronization check should be made)

The inspect module also needs the ability to do indicate this to the caller.
msg400866 - (view) Author: Irit Katriel (iritkatriel) * (Python committer) Date: 2021-09-01 20:46
While we're changing code objects, perhaps consider this as well?
msg400867 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2021-09-01 20:59
This sounds like a project requiring a considerable amount of plumbing to get the info from where it's available to where it's needed. For example, importlib reads the PYC file, checks the header, and then passes the rest of the file to marshal.loads(), which creates the (nested) code objects. Similarly, when reading the PY file, the compile() builtin is called to create the code objects without access to metadata other than filename.

I also question whether the software updates that fall prey to this issue are being done the right way -- maybe the server should be stopped before moving the new files in place. :-)
msg400869 - (view) Author: Irit Katriel (iritkatriel) * (Python committer) Date: 2021-09-01 21:06
It's not only a software updates issue - these kinds of problems show up for developers when they change the code on disk while a program is running and some traceback or pdb show code from the disk which is confusing.

I also saw an issue about changing directory in pdb and picking up a module of the same name in a different path (because the filename is relative).
msg400870 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2021-09-01 21:08
All legit cases, I agree, but are they worth the considerable work?
msg400956 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2021-09-02 20:42
FWIW I don't remember the context that led me to just file the issue this year.  The most serious frequent instances of this I remember happening were all many years ago when a less capable software distribution mechanism was in use.

A scenario where I would imagine it today aside from things like what Irit mentioned with the developer workflow:

People using an OS distro's Python interpreter (and even OS distro supplied Python packages instead of pip in a virtualenv) in order to run their own potentially long running code.

The OS distro does not know about their running processes as they haven't created OS packages with startup/restart/shutdown and dependency relationships expressed.  So the OS updating Python packages does not trigger a restart of their software after updating a dependency out from underneath it.

I know this happens, but I don't know how often it actually bites anyone.  And there are workarounds if deemed serious (tie in with the OS package management).

I personally wouldn't prioritize work on this issue unless it fits in naturally with other work going on to plumb the information through.  Or without an ability to demonstrate a compelling frequently encountered user-confusion scenario.  It's a "nice to have" more than a "need".
History
Date User Action Args
2022-04-11 14:59:45adminsetgithub: 88257
2021-09-02 20:42:48gregory.p.smithsetmessages: + msg400956
2021-09-01 21:08:17gvanrossumsetmessages: + msg400870
2021-09-01 21:06:16iritkatrielsetmessages: + msg400869
2021-09-01 20:59:53gvanrossumsetmessages: + msg400867
2021-09-01 20:46:12iritkatrielsetnosy: + gvanrossum, iritkatriel
messages: + msg400866
2021-05-09 17:15:30gregory.p.smithcreate