This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Rewrite getpath.c in Python
Type: enhancement Stage: resolved
Components: Interpreter Core Versions: Python 3.11
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: steve.dower Nosy List: FFY00, benjamin.peterson, christian.heimes, eric.snow, lukasz.langa, miss-islington, ncoghlan, neonene, steve.dower
Priority: normal Keywords: patch

Created on 2021-10-22 23:26 by steve.dower, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 29041 merged steve.dower, 2021-10-22 23:27
PR 29902 merged christian.heimes, 2021-12-03 08:47
PR 29906 merged neonene, 2021-12-03 20:27
PR 29907 merged benjamin.peterson, 2021-12-03 22:56
PR 29921 merged christian.heimes, 2021-12-05 16:14
PR 29930 merged neonene, 2021-12-06 00:57
PR 29941 closed neonene, 2021-12-06 14:23
PR 29944 merged christian.heimes, 2021-12-06 17:44
PR 29948 merged steve.dower, 2021-12-06 23:43
PR 29954 merged christian.heimes, 2021-12-07 10:32
PR 29979 merged steve.dower, 2021-12-08 01:38
PR 30014 merged neonene, 2021-12-09 18:32
PR 31699 merged christian.heimes, 2022-03-05 20:33
Messages (30)
msg404841 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2021-10-22 23:26
As discussed in issue42260, combining the two getpath implementations into a single Python implementation would make it more maintainable and modifiable (particularly where distros need to patch to support alternative layouts).
msg404843 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2021-10-22 23:34
The PR has more work to do, but the overall layout/changes are more or less there, so happy to discuss feedback/etc.

Obviously there are a lot of edge cases here, but they seem to be mostly tested already. And I think we're early enough in alpha to find any major issues (or absorb any necessary minor changes - seems like trailing slashes might change on some paths).

There are also some changes/hacks into the new frozen module support, so that I can freeze getpath.py without turning it into a module. I really just want to execute the bytecode - no reason for any of its contents to stick around - and this works out pretty neatly. But if the changes to frozen modules seem off then maybe we can split it into totally separate freezing support?
msg405348 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2021-10-29 21:27
So I think I've found my first completely unavoidable API break: PyConfig_Read(config) has to work before initialisation, but is also supposed to fill out all the fields (including the search path). But because we need at least an interpreter state, we now can't calculate everything.

The only test that seems to be affected here is test_embed.test_init_read_set(), which does a PyConfig_Read() and then inserts new paths into module_search_paths before initialising. With that one skipped, I think everything else can be handled.
msg405640 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2021-11-03 19:21
Last remaining test failure is one that I can't figure out on my own - the freeze test is rerunning a CPython build (on Linux) and is apparently building getpath.c with the ".c.o" rule rather than the "Modules/getpath.o" rule.

Any tips as to what I should be looking at to figure this one out?
msg405641 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2021-11-03 20:15
On Wed, Nov 3, 2021 at 1:21 PM Steve Dower <report@bugs.python.org> wrote:
> Last remaining test failure is one that I can't figure out on my own - the freeze test is rerunning a CPython build (on Linux) and is apparently building getpath.c with the ".c.o" rule rather than the "Modules/getpath.o" rule.
>
> Any tips as to what I should be looking at to figure this one out?

That test does an out-of-tree build.  Might that be related?
msg405653 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2021-11-03 22:25
I'm betting the out-of-tree (actually just deeper within the same tree) bit is related, but I just can't see how. Modules/getbuildinfo.c takes extra parameters and they seem to be being used, so I can't tell why getpath.c's are not (those rules are listed right next to each other, but well above the .c.o rule).
msg405658 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2021-11-04 00:24
Unsurprisingly, it was a bad edit that I made to the Makefile myself. The commit that undoes it is https://github.com/python/cpython/pull/29041/commits/aedebcc45a638f5cf65d17046ae09b5cac97cebf but since I made the initial change as part of this PR, it was never merged in.

Now to find out why the old getpath could somehow locate the stdlib but new getpath cannot... (I'm guessing it is finding the "original" stdlib rather than the fresh clone, since AFAICT there's no reference at all to the original source dir)
msg405731 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2021-11-04 15:58
On Wed, Nov 3, 2021 at 6:25 PM Steve Dower <report@bugs.python.org> wrote:
> Now to find out why the old getpath could somehow locate the stdlib but new getpath cannot... (I'm guessing it is finding the "original" stdlib rather than the fresh clone, since AFAICT there's no reference at all to the original source dir)

What fresh clone do you mean?  test_embed is failing, not test_freeze.
So there is no out-of-tree build involved.
msg405742 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2021-11-04 18:05
> What fresh clone do you mean?  test_embed is failing, not test_freeze.

test_freeze is passing, but it shouldn't be able to locate a valid Lib/ directory to load modules from. So it's somehow managing to do it against the "official" logic (none of the searches are going to find `root/python-build/Lib` starting from `root/python-installation/python`).

I haven't dug into it yet, but I suspect if the root used for this test is moved outside of the main tree then it will fail again.
msg406183 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2021-11-12 01:26
I'm expecting another dumb error (on my part) or two in the PR, but I'm very close to having this working.

Reviews would be appreciated! Bear in mind that I'm trying to match the current (quirky) behaviour, rather than streamline anything by changing it (yet). We can do those once we know we've got something working.
msg406184 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2021-11-12 01:35
On Thu, Nov 11, 2021 at 6:27 PM Steve Dower <report@bugs.python.org> wrote:
> rather than streamline anything by changing it (yet). We can do those once we know we've got something working.

+1
msg406384 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2021-11-16 01:17
I have tests passing now, so reviews would be appreciated.

There's definitely scope for optimising this algorithm both for speed and clarity, but I'd prefer to get the main translation in first so that any further changes have a reliable baseline (especially since we'll likely end up changing the behaviour slightly if we touch anything at all in getpath, so it'd be good to capture those as individual commits/bugs).
msg407113 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2021-11-26 23:54
Status update on this: I owe everyone a perf comparison of the before/after with this change.

I don't particularly want to block on a regression unless it's significant (honestly still have no idea what to expect), but open to others' thoughts on this point. How big a perf impact is this change worth?

(Obviously once I have some numbers the discussion can be more concrete, but I don't have them yet, and I have to catch up on other issues for a while as this one took so long to get this far.)
msg407322 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2021-11-29 20:01
It's one data point (well, statistics over 1000 points), but it looks like it's actually a slight improvement in performance over the previous code on Windows :)

    before   after
min 23.103   22.154
25% 25.069   23.59925
50% 25.8125  24.2715
75% 26.65175 24.89575
max 147.567  138.997

Going to run a Linux test as well, since that was a completely different code path, but assuming it's not drastically different then I'll go ahead and merge.
msg407326 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2021-11-29 21:30
Basically unchanged on Debian/WSL as well.

There's a new conflict arisen, so I'll resolve that and then merge.
msg407561 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2021-12-03 00:08
New changeset 99fcf1505218464c489d419d4500f126b6d6dc28 by Steve Dower in branch 'main':
bpo-45582: Port getpath[p].c to Python (GH-29041)
https://github.com/python/cpython/commit/99fcf1505218464c489d419d4500f126b6d6dc28
msg407562 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2021-12-03 00:10
It's done! Those were some of the hardest memory leaks I've had to track down, but it should be clear now.

As I mentioned in the commit message, this change attempts to preserve every known quirk. However, I think these ought to be streamlined across platforms to improve the code and/or startup performance.

But at least we know now that any changes in the future that require test changes are expected, and not the result of this change :)
msg407565 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2021-12-03 01:32
Hurray!  Thanks, Steve!
msg407587 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2021-12-03 13:47
New changeset ccb73a0d50dd03bc8455fe210cb83e41a6dc91d8 by Christian Heimes in branch 'main':
bpo-45582: Fix out-of-tree build issues with new getpath (GH-29902)
https://github.com/python/cpython/commit/ccb73a0d50dd03bc8455fe210cb83e41a6dc91d8
msg407604 - (view) Author: neonene (neonene) * Date: 2021-12-03 21:35
PGO-instrumented binary seems not to specify the stdlib directory on PR29041. I can run it with PYTHONPATH set.


Python path configuration:
  PYTHONHOME = 'C:\Py311\'
  PYTHONPATH = (not set)
  program name = 'C:\Py311\PCbuild\amd64\instrumented\python.exe'
  isolated = 0
  environment = 1
  user site = 1
  import site = 1
  is in build tree = 1
  stdlib dir = 'C:\Py311\PCbuild\Lib'
  sys._base_executable = 'C:\\py311\\PCbuild\\amd64\\instrumented\\python.exe'
  sys.base_prefix = 'C:\\py311\\'
  sys.base_exec_prefix = 'C:\\py311\\'
  sys.platlibdir = 'DLLs'
  sys.executable = 'C:\\py311\\PCbuild\\amd64\\instrumented\\python.exe'
  sys.prefix = 'C:\\py311\\'
  sys.exec_prefix = 'C:\\py311\\'
  sys.path = [
    'C:\\py311\\PCbuild\\amd64\\instrumented\\python311.zip',
    'C:\\py311\\PCbuild\\Lib',
    'C:\\py311\\PCbuild\\amd64\\instrumented',
  ]
Fatal Python error: init_fs_encoding: failed to get the Python codec of the filesystem encoding
Python runtime state: core initialized
ModuleNotFoundError: No module named 'encodings'
msg407609 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2021-12-03 22:04
New changeset 7d7c91a8e8c0bb04105a21a17d1061ffc1c04d80 by neonene in branch 'main':
bpo-45582: Add a NOT operator to the condition in getpath_isxfile (GH-29906)
https://github.com/python/cpython/commit/7d7c91a8e8c0bb04105a21a17d1061ffc1c04d80
msg407616 - (view) Author: miss-islington (miss-islington) Date: 2021-12-03 23:22
New changeset 0ae4e0c959bbc90ec18180ef3cc957759d346ced by Benjamin Peterson in branch 'main':
bpo-45582 Fix prototype of _Py_Get_Getpath_CodeObject. (GH-29907)
https://github.com/python/cpython/commit/0ae4e0c959bbc90ec18180ef3cc957759d346ced
msg407720 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2021-12-05 17:41
New changeset 628abe4463ed40cd54ca952a2b4cc2d6e74073f7 by Christian Heimes in branch 'main':
bpo-45582: Fix signature of _Py_Get_Getpath_CodeObject (GH-29921)
https://github.com/python/cpython/commit/628abe4463ed40cd54ca952a2b4cc2d6e74073f7
msg407846 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2021-12-06 17:25
New changeset af1db4eb555e02d2bff3476f99f7a653764203b0 by neonene in branch 'main':
bpo-45582: Fix getpath_isxfile() and test_embed on Windows (GH-29930)
https://github.com/python/cpython/commit/af1db4eb555e02d2bff3476f99f7a653764203b0
msg407852 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2021-12-06 18:13
New changeset f16f93e5279f957ca25dd8b91233a44833167a8a by Christian Heimes in branch 'main':
bpo-45582: framework build: modPath must not be const (GH-29944)
https://github.com/python/cpython/commit/f16f93e5279f957ca25dd8b91233a44833167a8a
msg407881 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2021-12-07 00:07
New changeset b7ef27bc084665ce58d89fc69530c6f9d2d37754 by Steve Dower in branch 'main':
bpo-45582: Ensure PYTHONHOME still overrides detected build prefixes (GH-29948)
https://github.com/python/cpython/commit/b7ef27bc084665ce58d89fc69530c6f9d2d37754
msg407960 - (view) Author: Łukasz Langa (lukasz.langa) * (Python committer) Date: 2021-12-07 18:10
New changeset 06c4ae8b1380eec1c5f3cd8faa21102d1c940bab by Christian Heimes in branch 'main':
bpo-45582: Fix framework path and bootstrap build (GH-29954)
https://github.com/python/cpython/commit/06c4ae8b1380eec1c5f3cd8faa21102d1c940bab
msg407994 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2021-12-08 02:18
New changeset b0b30862796e97b3f0ee358bcc61d21f0cc98441 by Steve Dower in branch 'main':
bpo-45582: Write empty pybuilddir.txt on Windows to allow relocatable build directories (GH-29979)
https://github.com/python/cpython/commit/b0b30862796e97b3f0ee358bcc61d21f0cc98441
msg408226 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2021-12-10 17:14
New changeset 3f398a77d37b5dfd51dabbc362d482a482fa885a by neonene in branch 'main':
bpo-45582: Fix test_embed failure during a PGO build on Windows (GH-30014)
https://github.com/python/cpython/commit/3f398a77d37b5dfd51dabbc362d482a482fa885a
msg414622 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2022-03-06 19:49
New changeset 55a5e17d19fecb6a7af85a1a9d44304e5fcb19c7 by Christian Heimes in branch 'main':
bpo-45582: Don't fail if ENV_PATH is None in getpath.py (GH-31699)
https://github.com/python/cpython/commit/55a5e17d19fecb6a7af85a1a9d44304e5fcb19c7
History
Date User Action Args
2022-04-11 14:59:51adminsetgithub: 89745
2022-03-06 19:49:35christian.heimessetmessages: + msg414622
2022-03-05 20:33:38christian.heimessetpull_requests: + pull_request29819
2021-12-10 17:14:07steve.dowersetmessages: + msg408226
2021-12-09 18:32:24neonenesetpull_requests: + pull_request28238
2021-12-08 02:18:30steve.dowersetmessages: + msg407994
2021-12-08 01:38:10steve.dowersetpull_requests: + pull_request28205
2021-12-07 21:42:45vstinnersetnosy: - vstinner
2021-12-07 18:10:05lukasz.langasetnosy: + lukasz.langa
messages: + msg407960
2021-12-07 10:32:55christian.heimessetpull_requests: + pull_request28179
2021-12-07 00:07:45steve.dowersetmessages: + msg407881
2021-12-06 23:43:09steve.dowersetpull_requests: + pull_request28173
2021-12-06 18:13:21christian.heimessetmessages: + msg407852
2021-12-06 17:44:27christian.heimessetpull_requests: + pull_request28170
2021-12-06 17:25:33steve.dowersetmessages: + msg407846
2021-12-06 14:23:32neonenesetpull_requests: + pull_request28166
2021-12-06 00:57:31neonenesetpull_requests: + pull_request28154
2021-12-05 17:41:54christian.heimessetmessages: + msg407720
2021-12-05 16:14:40christian.heimessetpull_requests: + pull_request28145
2021-12-03 23:22:05miss-islingtonsetnosy: + miss-islington
messages: + msg407616
2021-12-03 22:56:53benjamin.petersonsetnosy: + benjamin.peterson

pull_requests: + pull_request28131
2021-12-03 22:04:21steve.dowersetmessages: + msg407609
2021-12-03 21:35:04neonenesetmessages: + msg407604
2021-12-03 20:27:52neonenesetnosy: + neonene

pull_requests: + pull_request28130
2021-12-03 13:47:16christian.heimessetmessages: + msg407587
2021-12-03 08:47:11christian.heimessetnosy: + christian.heimes

pull_requests: + pull_request28126
2021-12-03 01:32:33eric.snowsetmessages: + msg407565
2021-12-03 00:10:47steve.dowersetstatus: open -> closed
resolution: fixed
messages: + msg407562

stage: patch review -> resolved
2021-12-03 00:08:49steve.dowersetmessages: + msg407561
2021-11-29 21:30:21steve.dowersetmessages: + msg407326
2021-11-29 20:01:50steve.dowersetmessages: + msg407322
2021-11-26 23:54:06steve.dowersetmessages: + msg407113
2021-11-16 01:17:24steve.dowersetmessages: + msg406384
2021-11-12 01:35:38eric.snowsetmessages: + msg406184
2021-11-12 01:26:43steve.dowersetmessages: + msg406183
2021-11-04 18:05:08steve.dowersetmessages: + msg405742
2021-11-04 15:58:05eric.snowsetmessages: + msg405731
2021-11-04 00:24:59steve.dowersetmessages: + msg405658
2021-11-03 22:25:35steve.dowersetmessages: + msg405653
2021-11-03 20:15:04eric.snowsetmessages: + msg405641
2021-11-03 19:21:29steve.dowersetmessages: + msg405640
2021-10-29 21:27:28steve.dowersetmessages: + msg405348
2021-10-23 12:25:37FFY00setnosy: + FFY00
2021-10-22 23:34:33steve.dowersetmessages: + msg404843
2021-10-22 23:27:07steve.dowersetkeywords: + patch
stage: needs patch -> patch review
pull_requests: + pull_request27452
2021-10-22 23:26:55steve.dowercreate