Delay-load ShellExecute
Currently, pythonXY.dll has a dependency on shell32.dll solely for the os.startfile (Modules/posixmodule.c) function. This is quite a heavy dependency that many would rather not have to load (e.g. lightweight server configurations).

It would be nice to delay load the DLL and fail the operation if it is not available.

(This is as much a reminder for myself as anything else, but if someone wants to do it then feel free.)
Attached a patch.

Comparing the time for "python.exe -c '0'" with Powershell's Measure-Command tool, it looks like there's a 3-4ms (~8-10%) improvement in startup time too. That's not at all robust, but it's certainly no worse. (I'm not surprised - shell32.dll is a horrendously big dependency and we're better off without it.)
I'm +0.75. I think the idea's fine in principle and the patch (by 
inspection) seems to do the right things.

My only concerns are: that posixmodule.c becomes even longer and more 
involved; and that the benefit might not quite be great enough to 
justify the added complexity.
Yeah, I hate touching posixmodule.c for the same reason. It'd be nice to split it up into separate platform files, but nobody is volunteering for that.

If you focus on the performance, then yeah, this change probably isn't worth it. OTOH, the number of Windows platforms keep increasing (e.g. ARM, phone/tablet/various sandboxes, etc.) and the fewer dependencies we have the more likely Python will Just Work in them. (And the more value that a split up posixmodule.c will have... guess it'll have to happen eventually.)
If you want a robust measurement of startup impact, the benchmark suite has two benchmarks specifically for startup (w/ and w/o
I assume you're referring to normal_startup and startup_nosite in at h.p.o/benchmarks? Handy to know about (I need to explore our top-level repos more often, obviously), but probably still not going to measure time in the Windows PE loader as accurately as it'd need to be to conclusively prove a speed advantage. I'd probably need to hit up the Windows team for some of their profiling tools to get good numbers here.

Still, it's indisputable that this change will reduce the initial memory overhead, so I'll take Tim's 0.75 and run with it :)
New changeset 5bff604a864e by Steve Dower in branch 'default':
Closes #23253: Delay-load ShellExecute
