This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Multi-phase extension module (PEP 489): don't call m_traverse, m_clear nor m_free before the module state is allocated
Type: Stage: resolved
Components: Interpreter Core Versions: Python 3.9
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Dormouse759, miss-islington, ncoghlan, pablogsal, petr.viktorin, scoder, shihai1991, vstinner
Priority: normal Keywords: patch

Created on 2020-03-02 09:52 by vstinner, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 18738 merged vstinner, 2020-03-02 10:51
PR 19076 merged shihai1991, 2020-03-19 16:53
Messages (20)
msg363145 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-03-02 09:52
Currently, when a module implements m_traverse(), m_clear() or m_free(), these methods can be called with md_state=NULL even if the module implements the "Multi-phase extension module initialization" API (PEP 489).

I'm talking about these module methods:

* tp_traverse: module_traverse() calls md_def->m_traverse() if m_traverse is not NULL
* tp_clear: module_clear() calls md_def->m_clear() if m_clear is not NULL
* tp_dealloc: module_dealloc() calls md_def->m_free() is m_free is not NULL

Because of that, the implementation of these methods must check manually if md_state is NULL or not.

I propose to change module_traverse(), module_clear() and module_dealloc() to not call m_traverse(), m_clear() and m_free() if md_state is NULL and m_size > 0.

"m_size > 0" is an heuristic to check if the module implements the "Multi-phase extension module initialization" API (PEP 489). For example, the _pickle module doesn't fully implements the PEP 489: m_size > 0, but PyInit__pickle() uses PyModule_Create().

See bpo-32374 which documented that "m_traverse may be called with m_state=NULL" (GH-5140).
msg363146 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-03-02 09:53
The question came up while I reviewed PR 18608 which ports the audioop extension module to multiphase initialization (PEP 489):
https://github.com/python/cpython/pull/18608/files#r386061746

Petr Viktorin referred to m_clear() documentation which says that md_state *can* be NULL when m_clear() is called:
https://docs.python.org/3/c-api/module.html#c.PyModuleDef.m_clear
msg363151 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-03-02 10:50
There are different kinds of extension modules:

(1) no module state (m_size <= 0): **not affected** by this change. Example: _asyncio which implements m_free() to clear global variables and free lists.

(2) Have a module state but PyInit_xxx() calls PyModule_Create(): PyModule_Create() always allocates md_state. I understand that such extension module is **not affected** by this change.

(3) Multi-phase extension: PyInit_xxx() function calls PyModuleDef_Init(). Such extension module **is affected** if m_traverse, m_clear or m_free() is not NULL. Example: atexit module implements m_traverse, m_clear and m_free.


PyModuleObject structure contains Python objects like md_dict (dict), md_name (str) or md_weaklist (list):

* module_traverse() always traverses md_dict: m_traverse() is no needed for that.
* module_clear() always clears md_dict: m_clear() is no needed for that.
* module_dealloc() always deallocates md_dict, md_name and md_weaklist: m_free() is no needed for that. 
* md_name is a string, it cannot be involved in a reference cycle.


I don't think that it's possible to extend PyModuleObject structure (as done by PyListObject for PyObject) to add other Python objects: md_state is designed for that. PyModule_Create() allocates exactly sizeof(PyModuleObject) bytes.


If an extension module has a module state, stores Python objects *outside* this state and uses m_traverse, m_clear and m_free to handle these objects: the GC will no longer be able to handle these objects before the module is executed with this change. If such extension module exists, I consider that it's ok to only handle objects stored outside the module state once the module is executed. The window between <the module is created> and <the module is executed> is very short.
msg363152 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-03-02 10:52
I wrote PR 18738 to implement this change.
msg363153 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-03-02 10:56
> I propose to change module_traverse(), module_clear() and module_dealloc() to not call m_traverse(), m_clear() and m_free() if md_state is NULL and m_size > 0.

Note: This change also means that m_traverse, m_clear and m_free are no longer called if md_state is set to NULL. But it never occurs in practice.

module_dealloc() calls PyMem_FREE(m->md_state) but it doesn't set md_state to NULL. It's not needed, since the module memory is deallocated anyway.
msg363155 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-03-02 11:09
Proposed PR checking if the module state is NULL:

* PR 18358: _locale module
* PR 18608: audioop module
* PR 18613: binascii
msg363591 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2020-03-07 12:22
One of the intended use cases for Py_mod_create is to return instances of ModuleType subclasses rather than straight ModuleType instances. And those are definitely legal to define:

>>> import __main__
>>> class MyModule(type(__main__)): pass
... 
>>> m = MyModule('example')
>>> m
<module 'example'>

So it isn't valid to skip calling the cleanup functions just because md_state is NULL - we have no idea what Py_mod_create might have done that needs to be cleaned up.

It would *probably* be legitimate to skip calling the cleanup functions when there's no Py_mod_create slot defined, but then the rules for "Do I need to account for md_state potentially being NULL or not?" are getting complicated enough that the safest option for a module author is to always assume that md_state might be NULL and handle that case appropriately.
msg363599 - (view) Author: Hai Shi (shihai1991) * (Python triager) Date: 2020-03-07 15:16
> we have no idea what Py_mod_create might have done that needs to be cleaned up.

Looks like no extension module author use `Py_mod_create` slots now.
msg363708 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-03-09 09:15
> So it isn't valid to skip calling the cleanup functions just because md_state is NULL - we have no idea what Py_mod_create might have done that needs to be cleaned up.

In your example, I don't see what m_clear/m_free would be supposed to clear/free.

I don't see how a module can store data without md_state which would require m_clear/m_free to clear/free such data. module_clear() continue to clear Python objects of the PyModuleObject structure with PR 18738.

Would you mind to elaborate?

The intent of PR 18738 is to simplify the implementation of C extension modules which implements the PEP 489 and uses a module state (md_state > 0).
msg363912 - (view) Author: Petr Viktorin (petr.viktorin) * (Python committer) Date: 2020-03-11 12:15
If you use a module subclass that needs some additional C-level infrastructure, it would be more appropriate to override tp_clear/tp_free directly.

IMO limiting m_clear/m_free to work just with the module state won't hurt. But it is an API change.
msg363938 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-03-11 16:51
> Proposed PR checking if the module state is NULL:
> 
> * PR 18358: _locale module
> * PR 18608: audioop module
> * PR 18613: binascii

Petr suggested to not hold these PRs with this controversial issue. I agree, so I merged these 3 PRs.
msg363939 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-03-11 16:52
Stefan Behnel: as the 3rd author of the PEP 489, what's your call on this issue?
msg364280 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2020-03-16 03:20
Petr's point that any subclass state should be managed in the subclass cleanup functions is a good one, so I withdraw my concern:

* custom module subclasses should clean up like any other class instance
* the module slots are then only needed if the module state actually gets populated, so we can safely skip them if it never even gets allocated
msg364318 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-03-16 13:35
I updated PR 18738 to document the incompatible change in What's New In Python 3.9. Sadly, I expect that almost no third-party extension module implement the PEP 489 yet. So I expect that little or no third-party code is impacted in pratice.


> the module slots are then only needed if the module state actually gets populated, so we can safely skip them if it never even gets allocated

That's also my understanding.

> custom module subclasses should clean up like any other class instance

That sounds like a reasonable compromise to me.
msg364348 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2020-03-16 18:09
I think Cython makes use of PEP-489 so unless I am missing something all generated extensions use PEP-489 structures.
msg364454 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-03-17 17:09
New changeset 5b1ef200d31a74a9b478d0217d73ed0a659a8a06 by Victor Stinner in branch 'master':
bpo-39824: module_traverse() don't call m_traverse if md_state=NULL (GH-18738)
https://github.com/python/cpython/commit/5b1ef200d31a74a9b478d0217d73ed0a659a8a06
msg364455 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-03-17 17:10
Thanks Petr and Nick for the review ;-)

Pablo Galindo Salgado:
> I think Cython makes use of PEP-489 so unless I am missing something all generated extensions use PEP-489 structures.

Alright. I still consider that my change is correct and will no harm anyone ;-)
msg364616 - (view) Author: miss-islington (miss-islington) Date: 2020-03-19 17:11
New changeset 13397ee47d23fda2e8d4bef40c1df986457673d1 by Hai Shi in branch 'master':
bpo-39824: Convert PyModule_GetState() to get_module_state() (GH-19076)
https://github.com/python/cpython/commit/13397ee47d23fda2e8d4bef40c1df986457673d1
msg364738 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2020-03-21 11:39
> I think Cython makes use of PEP-489 so unless I am missing something all generated extensions use PEP-489 structures.

Cython doesn't make complete use of PEP-489 yet, specifically not of the module state feature (nor module subclasses). This change looks good from my side. Good idea, Victor.
msg364758 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-03-21 16:40
> Cython doesn't make complete use of PEP-489 yet, specifically not of the module state feature (nor module subclasses). This change looks good from my side. Good idea, Victor.

Thanks for the confirmation Stefan ;-)
History
Date User Action Args
2022-04-11 14:59:27adminsetgithub: 84005
2020-03-21 16:40:23vstinnersetmessages: + msg364758
2020-03-21 11:39:56scodersetmessages: + msg364738
2020-03-19 17:11:37miss-islingtonsetnosy: + miss-islington
messages: + msg364616
2020-03-19 16:53:34shihai1991setpull_requests: + pull_request18432
2020-03-17 17:10:56vstinnersetstatus: open -> closed
resolution: fixed
messages: + msg364455

stage: patch review -> resolved
2020-03-17 17:09:50vstinnersetmessages: + msg364454
2020-03-16 18:09:47pablogsalsetmessages: + msg364348
2020-03-16 13:35:45vstinnersetmessages: + msg364318
2020-03-16 03:20:23ncoghlansetmessages: + msg364280
2020-03-11 16:52:45vstinnersetnosy: + scoder
messages: + msg363939
2020-03-11 16:51:39vstinnersetmessages: + msg363938
2020-03-11 12:15:27petr.viktorinsetmessages: + msg363912
2020-03-09 09:15:56vstinnersetmessages: + msg363708
2020-03-07 15:16:49shihai1991setmessages: + msg363599
2020-03-07 12:22:22ncoghlansetmessages: + msg363591
2020-03-02 11:09:21vstinnersetmessages: + msg363155
2020-03-02 10:56:00vstinnersetmessages: + msg363153
2020-03-02 10:53:03vstinnersettitle: Multi-phase extension module (PEP 489): don't call m_traverse, m_clear nor m_free if md_state is NULL -> Multi-phase extension module (PEP 489): don't call m_traverse, m_clear nor m_free before the module state is allocated
2020-03-02 10:52:47vstinnersetnosy: + ncoghlan, petr.viktorin, Dormouse759, pablogsal, shihai1991
messages: + msg363152
2020-03-02 10:51:37vstinnersetkeywords: + patch
stage: patch review
pull_requests: + pull_request18093
2020-03-02 10:50:53vstinnersetmessages: + msg363151
2020-03-02 09:53:06vstinnersetmessages: + msg363146
2020-03-02 09:52:22vstinnercreate