This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author vstinner
Recipients vstinner
Date 2021-01-18.10:08:33
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1610964514.06.0.696047252341.issue42955@roundup.psfhosted.org>
In-reply-to
Content
Some use cases require to know if a module comes from the stdlib or not. For example, I would like to only dump extension modules which don't come from the stdlib in bpo-42923: "Py_FatalError(): dump the list of extension modules".

Stdlib modules are special. For example, the maintenance and updates are connected to the Python lifecycle. Stdlib modules cannot be updated with "pip install --upgrade". They are shipped with the system ("system" Python). They are usually "read only": on Unix, only the root user can write into /usr directory where the stdlib is installed, whereas modules installed with "pip install --user" can be modified by the current user.

There is a third party project on PyPI which contains the list of stdlib modules:
https://pypi.org/project/stdlib-list/

There is already sys.builtin_module_names:
"A tuple of strings giving the names of all modules that are compiled into this Python interpreter."
https://docs.python.org/dev/library/sys.html#sys.builtin_module_names

I propose to add a similar sys.module_names tuple of strings (module names).

There are different constraints:

* If we add a public sys attribute, users will likely expect the list to be (1) exhaustive (2) correct
* Some extensions are not built if there are missing dependencies. Dependencies are checked after Python "core" (the sys module) is built.
* Some extensions are not available on some platforms.
* This list should be maintained.

Should we only list top level packages, or also submodules? For example, only list "asyncio", or list the 31 submodules (asyncio.base_events, asyncio.futures, ...)? Maybe it can be decided on a case by case basis. For example, I consider that "os.path" is a stdlib module, even it's just an alias to "posixpath" or "ntpath" depending on the platform.

I propose to include all extensions in the list, even if they are not built/available on some platforms. For example, "winsound" would also be listed on Linux, even if the extension is specific to Windows.

I also propose to include stdlib module names even if they are overridden at runtime using PYTHONPATH with a different implementation. For example, "asyncio" would be in the list, even if an user creates "asyncio.py" file. The list would not depend on sys.path.

--

Another option is to add an attribute to modules to mark them as coming from the stdlib. The API would be an attribute: module.__stdlib__ (bool).

The attribute could be set directly in the module code. For example, add "__stdlib__ = True" in Python modules. Similar idea for C extension modules.

Or the attribute could be set after importing the module, in the import site. But we don't control how stdlib modules are imported.

--

For the specific case of bpo-42923, another option is to use a list of stdlib paths, and check module.__file__ to check if a module is a stdlib module, and also use sys.builtin_module_names. And so don't add any API.
History
Date User Action Args
2021-01-18 10:08:34vstinnersetrecipients: + vstinner
2021-01-18 10:08:34vstinnersetmessageid: <1610964514.06.0.696047252341.issue42955@roundup.psfhosted.org>
2021-01-18 10:08:34vstinnerlinkissue42955 messages
2021-01-18 10:08:33vstinnercreate