Author vstinner
Date 2021-01-18.10:08:33
Some use cases require to know if a module comes from the stdlib or not. For example, I would like to only dump extension modules which don't come from the stdlib in bpo-42923: "Py_FatalError(): dump the list of extension modules".

Stdlib modules are special. For example, the maintenance and updates are connected to the Python lifecycle. Stdlib modules cannot be updated with "pip install --upgrade". They are shipped with the system ("system" Python). They are usually "read only": on Unix, only the root user can write into /usr directory where the stdlib is installed, whereas modules installed with "pip install --user" can be modified by the current user.

There is a third party project on PyPI which contains the list of stdlib modules:

There is already sys.builtin_module_names:
"A tuple of strings giving the names of all modules that are compiled into this Python interpreter."

I propose to add a similar sys.module_names tuple of strings (module names).

There are different constraints:

* If we add a public sys attribute, users will likely expect the list to be (1) exhaustive (2) correct
* Some extensions are not built if there are missing dependencies. Dependencies are checked after Python "core" (the sys module) is built.
* Some extensions are not available on some platforms.
* This list should be maintained.

Should we only list top level packages, or also submodules? For example, only list "asyncio", or list the 31 submodules (asyncio.base_events, asyncio.futures, ...)? Maybe it can be decided on a case by case basis. For example, I consider that "os.path" is a stdlib module, even it's just an alias to "posixpath" or "ntpath" depending on the platform.

I propose to include all extensions in the list, even if they are not built/available on some platforms. For example, "winsound" would also be listed on Linux, even if the extension is specific to Windows.

I also propose to include stdlib module names even if they are overridden at runtime using PYTHONPATH with a different implementation. For example, "asyncio" would be in the list, even if an user creates "" file. The list would not depend on sys.path.


Another option is to add an attribute to modules to mark them as coming from the stdlib. The API would be an attribute: module.__stdlib__ (bool).

The attribute could be set directly in the module code. For example, add "__stdlib__ = True" in Python modules. Similar idea for C extension modules.

Or the attribute could be set after importing the module, in the import site. But we don't control how stdlib modules are imported.


For the specific case of bpo-42923, another option is to use a list of stdlib paths, and check module.__file__ to check if a module is a stdlib module, and also use sys.builtin_module_names. And so don't add any API.
