classification
Title: Add PyState_GetModuleAttr
Type: enhancement Stage: resolved
Components: Extension Modules, Interpreter Core Versions: Python 3.4
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: christian.heimes, eli.bendersky, loewis, ncoghlan, petr.viktorin, pitrou, vstinner
Priority: normal Keywords: patch

Created on 2013-08-12 12:40 by pitrou, last changed 2020-12-15 15:17 by petr.viktorin. This issue is now closed.

Files
File name Uploaded Description Edit
getmodattr.patch pitrou, 2013-08-12 12:40 review
Messages (13)
msg194949 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-08-12 12:40
Attached patch adds PyState_GetModuleAttr() and converts the _csv module to use it (as an example).

As you can see, the _csv module grows a little but it now has proper error handling (previously, it didn't check for PyState_FindModule() returning NULL).

(A few lines would be removed if we added PyErr_FormatV as well...)
msg194953 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-08-12 12:50
I don't understand your change. Why do we need to change the _csv module and why storing module global variables in the module dict is better than storing them in a simple C structure?
msg194954 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-08-12 12:51
> I don't understand your change. Why do we need to change the _csv
> module and why storing module global variables in the module dict is
> better than storing them in a simple C structure?

I won't repeat what was already said in the python-dev thread:
http://mail.python.org/pipermail/python-dev/2013-August/127862.html
msg194955 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2013-08-12 12:55
The patch contains the new function and a patch for the CSV module. How about you split it up across two patches: one for the new feature and one for the CSV module? It makes review easier.
msg194956 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-08-12 12:56
Bundling the two was aimed at showcasing the effect the new function can have. The function itself is trivial, there's not much point in reviewing it alone.
msg194957 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-08-12 13:01
+PyObject *
+PyState_GetModuleAttr(struct PyModuleDef *def,
+                      const char *name,
+                      PyObject *restrict_type)

When the char* type is used, the function has usually the suffix String. I prefer the PyIdentifier API because it avoids to create a temporary Python Unicode object.

It's the first type that I see a function getting an attribute which also checks for its type. I don't know if the check should be done in PyState_GetModuleAttr() or in the _csv module.
msg194960 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-08-12 13:51
> When the char* type is used, the function has usually the suffix
> String. I prefer the PyIdentifier API because it avoids to create a
> temporary Python Unicode object.

This is a convenience API, not a performance "optimization".

> It's the first type that I see a function getting an attribute which
> also checks for its type. I don't know if the check should be done
> in PyState_GetModuleAttr() or in the _csv module.

The aim is to help writing extension modules, which is why the type
check is done inside the helper function.
msg194962 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2013-08-12 13:55
The patched code looks better than the original one in several respects, but I think it raises some questions.

I agree with Victor that the type-checking API is unnatural, but I also think there may be a deeper issue hiding behind. You felt compelled to add the type-checking to provide some level of safety to the C code - but is it enough? Previously, the only way to add a dialect was through register_dialect that does type checking to make sure it gets a legit dialect object. Now, the _dialects dict is directly accessible to Python code and it can add arbitrary objects to it (both as keys and as values). Does this mean that the C code now has to do type checking in all internal code that accesses _dialects? I haven't had time to find a crasher example, but it's plausible that it's possible to crash the interpreter now by assigning un-expected stuff to members of _dialects.

Maybe it's not a problem in practice for _csv but something worth thinking about if we're going to switch other methods to this approach.

Also, the problem with http://mail.python.org/pipermail/python-dev/2013-August/127862.html does not go away with this patch, as expected. However, it's a step in the right direction in case we do have multiple instances of the extension module alive at the same time in the future. Although then it would be interesting to consider how to find the actually correct module instance from internal functions.
msg194963 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-08-12 14:00
> Previously, the only way to add a dialect was through register_dialect that does
> type checking to make sure it gets a legit dialect object. Now, the _dialects dict is 
> directly accessible to Python code and it can add arbitrary objects to it (both as
> keys and as values). Does this mean that the C code now has to do type checking in all 
> internal code that accesses _dialects?

You are right, I forgot about that part. That will make the patch significantly more complicated.

> However, it's a step in the right direction in case we do have multiple instances of
> the extension module alive at the same time in the future. Although then it would be
> interesting to consider how to find the actually correct module instance from internal 
> functions.

This sounds basically impossible (which is why we *can't* have multiple instances of an extension module alive in a single interpreter :-)).
msg194964 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2013-08-12 14:07
> Previously, the only way to add a dialect was through register_dialect
that does

>  > type checking to make sure it gets a legit dialect object. Now, the
> _dialects dict is
> > directly accessible to Python code and it can add arbitrary objects to
> it (both as
> > keys and as values). Does this mean that the C code now has to do type
> checking in all
> > internal code that accesses _dialects?
>
> You are right, I forgot about that part. That will make the patch
> significantly more complicated.
>

:-/

Turns out there's real merit in having a clear dividing line between the
safe world of Python and unsafe world of C. Within the C code, some
invariants are assumed to be correct (and can be efficiently assert()-ed) -
this makes the code simpler and more performant.

> > However, it's a step in the right direction in case we do have multiple
> instances of
> > the extension module alive at the same time in the future. Although then
> it would be
> > interesting to consider how to find the actually correct module instance
> from internal
> > functions.
>
> This sounds basically impossible (which is why we *can't* have multiple
> instances of an extension module alive in a single interpreter :-)).
>

Well, it surely isn't possible without significant code rewriting. But I
can envision carrying such "state" around in extension-specific objects.
Although it's not entirely clear whether the memory cost is justified.

But without it, it's not clear to me how we plan to have multiple living
extension modules at all. Perhaps this is a discussion outside this
specific issue though (maybe for the PEP Stefan writes)
msg194965 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2013-08-12 14:08
Breaking the 1:1 interpreter <-> extension module mapping involves
adding custom types to sys.modules rather than module objects. Making
that work sensibly will involve larger changes to the extension
initialisation APIs. import-sig already has plans for this :)
msg352319 - (view) Author: Petr Viktorin (petr.viktorin) * (Python committer) Date: 2019-09-13 13:24
PEP 573 proposes an alternative to PyState_FindModule.
msg383065 - (view) Author: Petr Viktorin (petr.viktorin) * (Python committer) Date: 2020-12-15 15:17
Let me summarize the original issue with the following quote from the python-dev thead:

> doing some sys.modules acrobatics and re-importing suddenly
changes the internal state of a previously imported [_csv] module.

The issue is now solved; both reproducers from that thread pass.

If you're interested, please read PEP 630 for a general overview of where we're at w.r.t. C-API changes that started with PEP 3121.
For the API used to fix this issue, see PEP 573; for the specific fix in _csv, see GH-23224.

Thanks to everyone involved, and especially to Nick Coghlan for helping me start this journey way back in 2013 :)
History
Date User Action Args
2020-12-15 15:17:34petr.viktorinsetstatus: open -> closed
resolution: fixed
messages: + msg383065

stage: patch review -> resolved
2019-09-13 13:24:04petr.viktorinsetnosy: + petr.viktorin
messages: + msg352319
2013-08-12 14:08:35ncoghlansetmessages: + msg194965
2013-08-12 14:07:46eli.benderskysetmessages: + msg194964
2013-08-12 14:00:18pitrousetmessages: + msg194963
2013-08-12 13:55:27eli.benderskysetmessages: + msg194962
2013-08-12 13:51:16pitrousetmessages: + msg194960
2013-08-12 13:01:31vstinnersetmessages: + msg194957
2013-08-12 12:56:51pitrousetmessages: + msg194956
2013-08-12 12:55:21christian.heimessetnosy: + christian.heimes
messages: + msg194955
2013-08-12 12:51:43pitrousetmessages: + msg194954
2013-08-12 12:50:28vstinnersetnosy: + vstinner
messages: + msg194953
2013-08-12 12:40:41pitroucreate