classification
Title: PEP 3121, 384 refactoring applied to elementtree module
Type: resource usage Stage: patch review
Components: Extension Modules Versions: Python 3.4
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: eli.bendersky Nosy List: Arfrever, Robin.Schreiber, asvetlov, effbot, eli.bendersky, pitrou, python-dev
Priority: normal Keywords: patch, pep3121

Created on 2012-08-14 18:25 by Robin.Schreiber, last changed 2013-08-11 00:45 by eli.bendersky.

Files
File name Uploaded Description Edit
_elementtree_pep3121-384_v0.patch Robin.Schreiber, 2012-08-14 18:25
_elementtree_pep3121-384_v1.patch Robin.Schreiber, 2012-12-14 16:37
etree_3121.patch pitrou, 2013-08-06 20:29 review
pystate-findmodule-clarify.1.patch eli.bendersky, 2013-08-11 00:45 review
Messages (24)
msg168217 - (view) Author: Robin Schreiber (Robin.Schreiber) * (Python triager) Date: 2012-08-14 18:25
Changes proposed in PEP3121 and PEP384 have now been applied to the elementtree module!
msg168231 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-08-14 19:12
See issue15390 review comments :)
msg177465 - (view) Author: Robin Schreiber (Robin.Schreiber) * (Python triager) Date: 2012-12-14 16:37
Patch updated to work with current 3.4 Branch version of elementtree.
msg178501 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2012-12-29 15:12
Thanks for the patch. I'll take a look.
msg179899 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2013-01-13 22:52
I looked at the patch a bit more in depth and must admit that I'm reluctant to apply it. It's a very large patch with very little documentation about what steps are taken and why, and I just don't see the motivation. 

The way I see it, PEP 384 is great for compatibility of third-party extensions and embeddings of Python, but much less critical for a module that's always distributed as part of stdlib and thus is kept in exact sync with the ABI of the Python version it comes with. Correct me if I'm wrong.

That said, I won't object to some refactoring if it improves the code. But when such large changes are proposed, I really prefer to see small, incremental patches that replace just a part of the code. Such patches should come with an explanation of why the change is made (i.e. which part of PEP 384 does it adhere to).
msg194570 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-08-06 20:29
Here is a simplified patch tackling only the PEP 3121 compliance. Eli, I think this would be good to go.
msg194582 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2013-08-06 21:26
Bless you Antoine, I've been just planning to do this myself to tackle the re-importing troubles I was having in tests the other day :-)

I'll take a look at this soon, promise!
msg194607 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2013-08-07 13:11
Antoine, some questions about the patch:

First, I think it omits expat_capi from the state. Is that intentional?

Second, I'm not sure if this approach is fully aligned with PEP 3121. A global, shared state is still used. Instead of actually having a different module state per subinterpreter, this patch will have shared state. Another problem seems to be using PyModule_FindModule without using PyModule_AddModule first. 

These problems could be shared to all of Robin's original patches. Of course, there's also the possibility that I don't fully understand PEP 3121 yet :)
msg194608 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-08-07 13:28
> First, I think it omits expat_capi from the state. Is that
> intentional?

What would it do in the state? There's nothing to release.

> Second, I'm not sure if this approach is fully aligned with PEP 3121.
> A global, shared state is still used. Instead of actually having a
> different module state per subinterpreter, this patch will have
> shared state.

I don't understand what you are talking about. Perhaps you haven't looked
what PyState_FindModule() does?
msg194610 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2013-08-07 13:47
On Wed, Aug 7, 2013 at 6:28 AM, Antoine Pitrou <report@bugs.python.org>wrote:

>
> Antoine Pitrou added the comment:
>
> > First, I think it omits expat_capi from the state. Is that
> > intentional?
>
> What would it do in the state? There's nothing to release.
>

That's true, but I thought one of the goals of PEP 3121 is to separate
states between sub-interpreters. So that one can't corrupt another. I'm not
sure how much it matters in practice in this case of the pyexpat capsule;
need to look into it more.

> > Second, I'm not sure if this approach is fully aligned with PEP 3121.
> > A global, shared state is still used. Instead of actually having a
> > different module state per subinterpreter, this patch will have
> > shared state.
>
> I don't understand what you are talking about. Perhaps you haven't looked
> what PyState_FindModule() does?
>

I did not look at the implementation yet. But the documentation says:

"""Returns the module object that was created from *def* for the current
interpreter. This method requires that the module object has been attached
to the interpreter state with
PyState_AddModule()<http://docs.python.org/dev/c-api/module.html?highlight=pymoduledef_base#PyState_AddModule>beforehand.
In case the corresponding module object is not found or has not
been attached to the interpreter state yet, it returns NULL."""

I don't see a call to PyState_AddModule. What am I missing?
msg194612 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-08-07 14:00
> That's true, but I thought one of the goals of PEP 3121 is to
> separate
> states between sub-interpreters. So that one can't corrupt another.
> I'm not
> sure how much it matters in practice in this case of the pyexpat
> capsule;
> need to look into it more.

pyexpat's "capi" object is a static struct inside pyexpat.c, so that
wouldn't change anything.
Separating states between sub-interpreters only matters when said state
is mutable, which it isn't here.

> I don't see a call to PyState_AddModule. What am I missing?

It is called implicitly when an extension module is imported.
msg194674 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2013-08-08 13:56
Thanks Antoine. I think I understand the patch better now. Just a couple small questions and otherwise LGTM

This code in the beginning in PyInit__elementtree:

    m = PyState_FindModule(&elementtreemodule);
    if (m) {
        Py_INCREF(m);
        return m;
    }

Can you explain what use case it tries to cover? I couldn't find similar code in other modules we have that implement PEP 3121 (_csv, readline, io, etc.)

This code has at least one adverse effect, for testing. The problem with re-importing _elementtree I raised in http://mail.python.org/pipermail/python-dev/2013-August/127766.html is solved by moving to PEP 3121, but this piece of code above ruins it. This is because I want to set sys.modules['pyexpat'] = None and re-import _elementtree (this is what support.import_fresh_module does). But with this code in place, if _elementtree was imported any time in the past (say, in a previous test), I'll just get the instance back without attempting to do the full module initialization.


>> I don't see a call to PyState_AddModule. What am I missing?
>It is called implicitly when an extension module is imported.

Do you think this should be documented in the C API docs? The way they read now, it seems that calling PyState_AddModule is needed manually by extension writers.
msg194675 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2013-08-08 13:58
> Can you explain what use case it tries to cover?

What I'm trying to say is that although I think I understand its effect (if the same sub-interpreter re-imports _elementtree bypassing the Python module cache, this is an additional level of caching), I'm not sure what *real* use cases it aims for.
msg194677 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-08-08 14:00
> This code in the beginning in PyInit__elementtree:
> 
>     m = PyState_FindModule(&elementtreemodule);
>     if (m) {
>         Py_INCREF(m);
>         return m;
>     }
> 
> Can you explain what use case it tries to cover? I couldn't find
> similar code in other modules we have that implement PEP 3121 (_csv,
> readline, io, etc.)

I don't know :-) I just re-used Robin's original patch.

> >> I don't see a call to PyState_AddModule. What am I missing?
> >It is called implicitly when an extension module is imported.
> 
> Do you think this should be documented in the C API docs? The way
> they read now, it seems that calling PyState_AddModule is needed
> manually by extension writers.

Well, how to deal with module state should probably be better
documented. Not sure how, though.
msg194679 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2013-08-08 14:08
On Thu, Aug 8, 2013 at 7:00 AM, Antoine Pitrou <report@bugs.python.org>wrote:

>
> Antoine Pitrou added the comment:
>
> > This code in the beginning in PyInit__elementtree:
> >
> >     m = PyState_FindModule(&elementtreemodule);
> >     if (m) {
> >         Py_INCREF(m);
> >         return m;
> >     }
> >
> > Can you explain what use case it tries to cover? I couldn't find
> > similar code in other modules we have that implement PEP 3121 (_csv,
> > readline, io, etc.)
>
> I don't know :-) I just re-used Robin's original patch.
>

Would you mind removing it from the patch, due to the case described above?
ISTM that in real scenarios the sys.modules cache kicks in anyway. It
should not be really bypassed for any given sub-interpreter in sane code.

>
> > >> I don't see a call to PyState_AddModule. What am I missing?
> > >It is called implicitly when an extension module is imported.
> >
> > Do you think this should be documented in the C API docs? The way
> > they read now, it seems that calling PyState_AddModule is needed
> > manually by extension writers.
>
> Well, how to deal with module state should probably be better
> documented. Not sure how, though.
>

I'll think about it some more and will try to propose a documentation
patch. This can be done incrementally; we don't have to go to perfect docs
on the first try ;-)
msg194680 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-08-08 14:14
> Would you mind removing it from the patch, due to the case described
> above?
> ISTM that in real scenarios the sys.modules cache kicks in anyway. It
> should not be really bypassed for any given sub-interpreter in sane
> code.

Yeah, I think you're right. I'll submit an updated patch or, if it's
the only issue with it, perhaps you can simply remove the 3 offending
lines?
msg194684 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2013-08-08 15:16
On Thu, Aug 8, 2013 at 7:14 AM, Antoine Pitrou <report@bugs.python.org>wrote:

>
> Antoine Pitrou added the comment:
>
> > Would you mind removing it from the patch, due to the case described
> > above?
> > ISTM that in real scenarios the sys.modules cache kicks in anyway. It
> > should not be really bypassed for any given sub-interpreter in sane
> > code.
>
> Yeah, I think you're right. I'll submit an updated patch or, if it's
> the only issue with it, perhaps you can simply remove the 3 offending
> lines?
>

Sure, I'll do that. Thanks.
msg194797 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2013-08-10 15:01
New changeset 8a060e2de608 by Eli Bendersky in branch 'default':
Issue #15651: PEP 3121 refactoring for _elementtree
http://hg.python.org/cpython/rev/8a060e2de608
msg194798 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2013-08-10 15:07
Antoine, I committed your patch (with a bit of comments added), *leaving the module caching in*. This is because removing it breaks the tests, unfortunately. The _elementtree tests are so crooked that they manage to create a situation in which the module under test throws ParseError which is a different class from ET.ParseError. This is "achieved" by multiple invocations of import_fresh_module with various fresh & blocked parameters, and I still haven't fully traced all the problems yet. Since this caching only potentially harms other tests, it's ok to leave in.

A longer term solution to all this will be http://mail.python.org/pipermail/python-dev/2013-August/127766.html - I want to eventually run all "monkey-patch the import environment to simulate some situation" sub-tests of ET in different subprocesses to they are kept independent.
msg194822 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-08-10 17:38
> Antoine, I committed your patch (with a bit of comments added),
> *leaving the module caching in*.

Thanks!

> A longer term solution to all this will be
> http://mail.python.org/pipermail/python-dev/2013-August/127766.html -
> I want to eventually run all "monkey-patch the import environment to
> simulate some situation" sub-tests of ET in different subprocesses to
> they are kept independent.

I find it useful that the test suite stresses module unloading or
reloading. There's probably a bug either in ET or in the ET tests. I'm
not saying it's a very important issue of course, but IMHO it would be
better if we don't try to swipe it under the carpet :-)
msg194835 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2013-08-10 21:12
On Sat, Aug 10, 2013 at 10:38 AM, Antoine Pitrou <report@bugs.python.org>wrote:

>
> Antoine Pitrou added the comment:
>
> > Antoine, I committed your patch (with a bit of comments added),
> > *leaving the module caching in*.
>
> Thanks!
>
> > A longer term solution to all this will be
> > http://mail.python.org/pipermail/python-dev/2013-August/127766.html -
> > I want to eventually run all "monkey-patch the import environment to
> > simulate some situation" sub-tests of ET in different subprocesses to
> > they are kept independent.
>
> I find it useful that the test suite stresses module unloading or
> reloading. There's probably a bug either in ET or in the ET tests. I'm
> not saying it's a very important issue of course, but IMHO it would be
> better if we don't try to swipe it under the carpet :-)
>

I have no intention swiping things under the carpet. I'll get to the bottom
of this to understand the exact flow that causes this to happen.

But I still think ET tests should be logically separated into subprocesses.
If we want to stress test module unloading and reloading, let's have
specific, targeted tests for that.
msg194837 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-08-10 21:36
> If we want to stress test module unloading and reloading, let's have
> specific, targeted tests for that.

Agreed.
msg194855 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2013-08-11 00:30
Found some problems in the interaction of PEP 3121 and import_fresh_module: http://mail.python.org/pipermail/python-dev/2013-August/127862.html

I'd still like to see the in-PyInit__elementtree caching deleted eventually, without harming test coverage. So this issue will remain open for now, until we decide what to do.
msg194856 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2013-08-11 00:44
BTW, Antoine, w.r.t documentation - I agree that the whole PyState_* sequence needs better documentation and examples, but in the meantime I'm attaching a simple patch for c-api/module.rst. It clarifies that PyState_AddModule doesn't really have to be called explicitly in extensions.
History
Date User Action Args
2013-08-11 00:45:03eli.benderskysetfiles: + pystate-findmodule-clarify.1.patch
2013-08-11 00:44:52eli.benderskysetmessages: + msg194856
2013-08-11 00:30:09eli.benderskysetmessages: + msg194855
2013-08-10 21:36:21pitrousetmessages: + msg194837
2013-08-10 21:12:21eli.benderskysetmessages: + msg194835
2013-08-10 17:38:42pitrousetmessages: + msg194822
2013-08-10 15:07:26eli.benderskysetmessages: + msg194798
2013-08-10 15:01:21python-devsetnosy: + python-dev
messages: + msg194797
2013-08-08 15:16:39eli.benderskysetmessages: + msg194684
2013-08-08 14:14:32pitrousetmessages: + msg194680
2013-08-08 14:08:32eli.benderskysetmessages: + msg194679
2013-08-08 14:00:10pitrousetmessages: + msg194677
2013-08-08 13:58:13eli.benderskysetmessages: + msg194675
2013-08-08 13:56:34eli.benderskysetmessages: + msg194674
2013-08-07 14:00:07pitrousetmessages: + msg194612
2013-08-07 13:47:16eli.benderskysetmessages: + msg194610
2013-08-07 13:28:36pitrousetmessages: + msg194608
2013-08-07 13:11:17eli.benderskysetmessages: + msg194607
2013-08-06 21:26:01eli.benderskysetmessages: + msg194582
2013-08-06 20:29:38pitrousetfiles: + etree_3121.patch

messages: + msg194570
2013-01-13 22:52:36eli.benderskysetmessages: + msg179899
2012-12-29 15:12:05eli.benderskysetassignee: eli.bendersky
messages: + msg178501
2012-12-14 17:44:45Arfreversetnosy: + Arfrever
2012-12-14 16:37:24Robin.Schreibersetfiles: + _elementtree_pep3121-384_v1.patch
keywords: + patch
messages: + msg177465
2012-11-08 13:39:49Robin.Schreibersetkeywords: + pep3121, - patch
2012-08-27 03:41:57belopolskylinkissue15787 dependencies
2012-08-17 16:36:58asvetlovsetnosy: + asvetlov
2012-08-14 19:12:39pitrousetnosy: + eli.bendersky, pitrou

messages: + msg168231
stage: patch review
2012-08-14 18:33:34Robin.Schreibersetnosy: + effbot
2012-08-14 18:25:20Robin.Schreibercreate