classification
Title: Relative path in co_filename for zipped modules
Type: behavior Stage: patch review
Components: Interpreter Core Versions: Python 3.8
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Mariatta, brett.cannon, christian.heimes, ncoghlan, serhiy.storchaka, twouters, vmurashev
Priority: normal Keywords: patch

Created on 2013-06-26 14:27 by vmurashev, last changed 2018-09-19 07:34 by serhiy.storchaka.

Files
File name Uploaded Description Edit
test_zipimport_co_filename.py vmurashev, 2013-08-24 00:23
zipimport.diff vmurashev, 2013-09-15 20:26
pyzipimport-code-filename.diff serhiy.storchaka, 2018-08-31 22:32
Pull Requests
URL Status Linked Edit
PR 7138 open python-dev, 2018-05-27 14:24
PR 7139 closed python-dev, 2018-05-27 14:26
PR 7140 closed python-dev, 2018-05-27 14:27
Messages (15)
msg191907 - (view) Author: Vitaly Murashev (vmurashev) * Date: 2013-06-26 14:27
Recently I found out that it not possible to debug python code if it is a part of zip-module.
Python version being used is 3.3.0

Well known GUI debuggers like Eclipse+PyDev or PyCharm are unable to start debugging and give the following warning:
---
pydev debugger: CRITICAL WARNING: This version of python seems to be incorrectly compiled (internal generated filenames are not absolute)
pydev debugger: The debugger may still function, but it will work slower and may miss breakpoints.
---
So I started my own investigation of this issue and results are the following.
At first I took traditional python debugger 'pdb' to analyze how it behaves during debug of zipped module.
'pdb' showed me some backtaces and filename part for stack entries looks malformed.
I expected something like 'full-path-to-zip-dir/my_zipped_module.zip/subdir/test_module.py'
but realy it looks like 'full-path-to-current-dir/subdir/test_module.py'

Source code in pdb.py and bdb.py (which one are a part of python stdlib) gave me the answer why it happens.

The root cause are inside Bdb.format_stack_entry() + Bdb.canonic()

Please take a look at the following line inside 'format_stack_entry' method:
    filename = self.canonic(frame.f_code.co_filename)

For zipped module variable 'frame.f_code.co_filename' holds _relative_ file path started from the root of zip archive like 'subdir/test_module.py'
And as relult Bdb.canonic() method gives what we have - 'full-path-to-current-dir/subdir/test_module.py'
---
Looks like it is a bug in:
- in python core subsystem which one is responsible for loading zipped modules
- or in pdb debugger
msg196062 - (view) Author: Vitaly Murashev (vmurashev) * Date: 2013-08-24 00:23
unit-test attached.
There are 3 tests inside it: 2 failed, 1 succeeded.

According to this test results it becomes clear that specified issue is reproduced only with zip modules which contain precompiled bytecode inside (pyc-files)
msg197830 - (view) Author: Vitaly Murashev (vmurashev) * Date: 2013-09-15 20:26
patch suggested (over 3.3.0 code base).
Without patch test fails, with patch - passed
msg317253 - (view) Author: Vitaly Murashev (vmurashev) * Date: 2018-05-21 21:43
Guys, a couple questions ...
I want to suggest new patches for python3.7 and python2.7 with regression tests included
What is proper way to do it now, in year 2018 ?
May I do it on github.com ? Should I submit new issue for that there ?
Or am I still supposed to attach new patches here - on bugs.python.org ?
msg317258 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-05-22 05:51
In year 2018 you have to create a pull request on GitHub. See https://devguide.python.org/.
msg317514 - (view) Author: Mariatta Wijaya (Mariatta) * (Python committer) Date: 2018-05-24 02:34
Vitaly, in the future please use gender-neutral words such as "folks" and "y'all" instead of "guys". Thanks.
msg317796 - (view) Author: Vitaly Murashev (vmurashev) * Date: 2018-05-27 14:54
> Vitaly, in the future please use gender-neutral words
Mariatta, OK, got it, I am sorry for that. I am not a native speaker.
msg317798 - (view) Author: Vitaly Murashev (vmurashev) * Date: 2018-05-27 17:14
Pull-requests for 2.7, 3.7 and master submitted on github,
all tests look passed, so

Python dev-team,
please, take a look.
msg317828 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-05-28 07:29
Only a PR for master is needed. Changes to other branches will be ported after reviewing and merging the PR for master.
msg317835 - (view) Author: Vitaly Murashev (vmurashev) * Date: 2018-05-28 09:53
> Only a PR for master is needed.
Serhiy Storchaka, thanks for advice,
I cancelled unnecessary PRs.
msg323389 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2018-08-10 20:05
This will be a semantic change to the value of co_filename so I don't think this can safely be backported.
msg324448 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-08-31 22:32
The patch for the Python implementation of zipimport (see issue25711) is much simpler (see the attached sample patch). But it requires adding a way of modifying co_filename. Currently code objects are immutable.

This issue looks as a part of the larger problem. Zipimport is not the only source of precompiled bytecode which needs updating co_filename. For example the tree of py and pyc files can be moved to other place. Also, since co_filename contains system depended path, it doesn't make sense when load pyc files on other system (for example if they were created on Linux and ran on Windows). On other side, the import machinery tries to load pyc and py files, therefore it should know the path of corresponding py file when load a pyc file. Maybe it be better to not save co_filename in a pyc file (note that all code object in a file have the same co_filename, but it is saved for every code object), but set co_filename after unmarshalling the module bytecode by the import machinery in all loaders.
msg324481 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2018-09-02 16:47
Serhiy's analysis sounds right to me - for precompiled bytecode files, we really want co_filename to reflect the import time filename *not* the compile time filename.

While zipimport is one way to get the compile time and import time paths to differ, there are others:

- move an existing directory to another location
- copy a directory tree to a different machine
- move an implicitly cached pyc file to another location

Concrete example from a recent'ish build where I convert an implicitly cached pyc to a standalone one, but co_filename still references the original source file:

===========
$ ./python -c "import contextlib; print(contextlib.__file__); print(contextlib.contextmanager.__code__.co_filename)"
/home/ncoghlan/devel/cpython/Lib/contextlib.py
/home/ncoghlan/devel/cpython/Lib/contextlib.py
$ mkdir -p /tmp/example
$ cd /tmp/example
$ cp /home/ncoghlan/devel/cpython/Lib/__pycache__/contextlib.cpython-38.pyc contextlib.pyc
$ /home/ncoghlan/devel/cpython/python -c "import contextlib; print(contextlib.__file__); print(contextlib.contextmanager.__code__.co_filename)"
/tmp/example/contextlib.pyc
/home/ncoghlan/devel/cpython/Lib/contextlib.py
===========
msg324965 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2018-09-11 00:37
So it sounds like that maybe we need to decide if we are going to replace zipimport with Serhiy's Python version. And if we do decide to go with the Python code then it should get updated to use _imp._fix_co_filename().

Regardless of the decision, obviously thanks to Vitaly for sparking this change.
msg325726 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-09-19 07:34
zipimport has been rewritten in pure Python (issue25711).
History
Date User Action Args
2018-09-19 07:34:58serhiy.storchakasetmessages: + msg325726
2018-09-11 00:37:04brett.cannonsetmessages: + msg324965
2018-09-02 16:47:12ncoghlansetmessages: + msg324481
2018-08-31 22:32:40serhiy.storchakasetfiles: + pyzipimport-code-filename.diff

messages: + msg324448
2018-08-10 20:05:41brett.cannonsetmessages: + msg323389
versions: - Python 2.7, Python 3.7
2018-08-10 20:02:05brett.cannonsetnosy: + twouters
2018-05-28 09:53:47vmurashevsetmessages: + msg317835
2018-05-28 07:29:57serhiy.storchakasetmessages: + msg317828
2018-05-27 17:14:54vmurashevsetmessages: + msg317798
2018-05-27 17:10:09vmurashevsetversions: + Python 2.7, Python 3.7, Python 3.8, - Python 3.3, Python 3.4
2018-05-27 14:54:42vmurashevsetmessages: + msg317796
2018-05-27 14:27:47python-devsetpull_requests: + pull_request6774
2018-05-27 14:26:32python-devsetpull_requests: + pull_request6773
2018-05-27 14:24:21python-devsetstage: test needed -> patch review
pull_requests: + pull_request6772
2018-05-24 02:34:17Mariattasetnosy: + Mariatta
messages: + msg317514
2018-05-22 05:51:10serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg317258
2018-05-21 21:43:01vmurashevsetmessages: + msg317253
2013-09-15 23:28:22ncoghlansetnosy: + ncoghlan
2013-09-15 20:26:40vmurashevsetfiles: + zipimport.diff
keywords: + patch
messages: + msg197830

components: - Library (Lib)
2013-08-24 00:23:14vmurashevsetfiles: + test_zipimport_co_filename.py

messages: + msg196062
2013-06-26 17:38:21christian.heimessetnosy: + brett.cannon, christian.heimes
stage: test needed

versions: + Python 3.4
2013-06-26 14:27:09vmurashevcreate