Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PEP 3121, 384 refactoring applied to elementtree module #59856

Closed
RobinSchreiber mannequin opened this issue Aug 14, 2012 · 26 comments
Closed

PEP 3121, 384 refactoring applied to elementtree module #59856

RobinSchreiber mannequin opened this issue Aug 14, 2012 · 26 comments
Labels
extension-modules C modules in the Modules dir performance Performance or resource usage

Comments

@RobinSchreiber
Copy link
Mannequin

RobinSchreiber mannequin commented Aug 14, 2012

BPO 15651
Nosy @pitrou, @vstinner, @asvetlov
Superseder
  • bpo-15651: PEP 3121, 384 refactoring applied to elementtree module
  • Files
  • _elementtree_pep3121-384_v0.patch
  • _elementtree_pep3121-384_v1.patch
  • etree_3121.patch
  • pystate-findmodule-clarify.1.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2021-12-08.15:25:16.711>
    created_at = <Date 2012-08-14.18:25:20.879>
    labels = ['extension-modules', 'performance']
    title = 'PEP 3121, 384 refactoring applied to elementtree module'
    updated_at = <Date 2021-12-08.15:25:16.711>
    user = 'https://bugs.python.org/RobinSchreiber'

    bugs.python.org fields:

    activity = <Date 2021-12-08.15:25:16.711>
    actor = 'iritkatriel'
    assignee = 'eli.bendersky'
    closed = True
    closed_date = <Date 2021-12-08.15:25:16.711>
    closer = 'iritkatriel'
    components = ['Extension Modules']
    creation = <Date 2012-08-14.18:25:20.879>
    creator = 'Robin.Schreiber'
    dependencies = []
    files = ['26805', '28311', '31179', '31226']
    hgrepos = []
    issue_num = 15651
    keywords = ['patch', 'pep3121']
    message_count = 26.0
    messages = ['168217', '168231', '177465', '178501', '179899', '194570', '194582', '194607', '194608', '194610', '194612', '194674', '194675', '194677', '194679', '194680', '194684', '194797', '194798', '194822', '194835', '194837', '194855', '194856', '383280', '383286']
    nosy_count = 8.0
    nosy_names = ['effbot', 'pitrou', 'vstinner', 'Arfrever', 'eli.bendersky', 'asvetlov', 'python-dev', 'Robin.Schreiber']
    pr_nums = []
    priority = 'normal'
    resolution = 'duplicate'
    stage = 'resolved'
    status = 'closed'
    superseder = '15651'
    type = 'resource usage'
    url = 'https://bugs.python.org/issue15651'
    versions = ['Python 3.4']

    @RobinSchreiber
    Copy link
    Mannequin Author

    RobinSchreiber mannequin commented Aug 14, 2012

    Changes proposed in PEP-3121 and PEP-384 have now been applied to the elementtree module!

    @RobinSchreiber RobinSchreiber mannequin added extension-modules C modules in the Modules dir performance Performance or resource usage labels Aug 14, 2012
    @pitrou
    Copy link
    Member

    pitrou commented Aug 14, 2012

    See bpo-15390 review comments :)

    @RobinSchreiber
    Copy link
    Mannequin Author

    RobinSchreiber mannequin commented Dec 14, 2012

    Patch updated to work with current 3.4 Branch version of elementtree.

    @elibendersky
    Copy link
    Mannequin

    elibendersky mannequin commented Dec 29, 2012

    Thanks for the patch. I'll take a look.

    @elibendersky elibendersky mannequin self-assigned this Dec 29, 2012
    @elibendersky
    Copy link
    Mannequin

    elibendersky mannequin commented Jan 13, 2013

    I looked at the patch a bit more in depth and must admit that I'm reluctant to apply it. It's a very large patch with very little documentation about what steps are taken and why, and I just don't see the motivation.

    The way I see it, PEP-384 is great for compatibility of third-party extensions and embeddings of Python, but much less critical for a module that's always distributed as part of stdlib and thus is kept in exact sync with the ABI of the Python version it comes with. Correct me if I'm wrong.

    That said, I won't object to some refactoring if it improves the code. But when such large changes are proposed, I really prefer to see small, incremental patches that replace just a part of the code. Such patches should come with an explanation of why the change is made (i.e. which part of PEP-384 does it adhere to).

    @pitrou
    Copy link
    Member

    pitrou commented Aug 6, 2013

    Here is a simplified patch tackling only the PEP-3121 compliance. Eli, I think this would be good to go.

    @elibendersky
    Copy link
    Mannequin

    elibendersky mannequin commented Aug 6, 2013

    Bless you Antoine, I've been just planning to do this myself to tackle the re-importing troubles I was having in tests the other day :-)

    I'll take a look at this soon, promise!

    @elibendersky
    Copy link
    Mannequin

    elibendersky mannequin commented Aug 7, 2013

    Antoine, some questions about the patch:

    First, I think it omits expat_capi from the state. Is that intentional?

    Second, I'm not sure if this approach is fully aligned with PEP-3121. A global, shared state is still used. Instead of actually having a different module state per subinterpreter, this patch will have shared state. Another problem seems to be using PyModule_FindModule without using PyModule_AddModule first.

    These problems could be shared to all of Robin's original patches. Of course, there's also the possibility that I don't fully understand PEP-3121 yet :)

    @pitrou
    Copy link
    Member

    pitrou commented Aug 7, 2013

    First, I think it omits expat_capi from the state. Is that
    intentional?

    What would it do in the state? There's nothing to release.

    Second, I'm not sure if this approach is fully aligned with PEP-3121.
    A global, shared state is still used. Instead of actually having a
    different module state per subinterpreter, this patch will have
    shared state.

    I don't understand what you are talking about. Perhaps you haven't looked
    what PyState_FindModule() does?

    @elibendersky
    Copy link
    Mannequin

    elibendersky mannequin commented Aug 7, 2013

    On Wed, Aug 7, 2013 at 6:28 AM, Antoine Pitrou <report@bugs.python.org>wrote:

    Antoine Pitrou added the comment:

    > First, I think it omits expat_capi from the state. Is that
    > intentional?

    What would it do in the state? There's nothing to release.

    That's true, but I thought one of the goals of PEP-3121 is to separate
    states between sub-interpreters. So that one can't corrupt another. I'm not
    sure how much it matters in practice in this case of the pyexpat capsule;
    need to look into it more.

    > Second, I'm not sure if this approach is fully aligned with PEP-3121.
    > A global, shared state is still used. Instead of actually having a
    > different module state per subinterpreter, this patch will have
    > shared state.

    I don't understand what you are talking about. Perhaps you haven't looked
    what PyState_FindModule() does?

    I did not look at the implementation yet. But the documentation says:

    """Returns the module object that was created from *def* for the current
    interpreter. This method requires that the module object has been attached
    to the interpreter state with
    PyState_AddModule()<http://docs.python.org/dev/c-api/module.html?highlight=pymoduledef_base#PyState_AddModule\>beforehand.
    In case the corresponding module object is not found or has not
    been attached to the interpreter state yet, it returns NULL."""

    I don't see a call to PyState_AddModule. What am I missing?

    @pitrou
    Copy link
    Member

    pitrou commented Aug 7, 2013

    That's true, but I thought one of the goals of PEP-3121 is to
    separate
    states between sub-interpreters. So that one can't corrupt another.
    I'm not
    sure how much it matters in practice in this case of the pyexpat
    capsule;
    need to look into it more.

    pyexpat's "capi" object is a static struct inside pyexpat.c, so that
    wouldn't change anything.
    Separating states between sub-interpreters only matters when said state
    is mutable, which it isn't here.

    I don't see a call to PyState_AddModule. What am I missing?

    It is called implicitly when an extension module is imported.

    @elibendersky
    Copy link
    Mannequin

    elibendersky mannequin commented Aug 8, 2013

    Thanks Antoine. I think I understand the patch better now. Just a couple small questions and otherwise LGTM

    This code in the beginning in PyInit__elementtree:

        m = PyState_FindModule(&elementtreemodule);
        if (m) {
            Py_INCREF(m);
            return m;
        }

    Can you explain what use case it tries to cover? I couldn't find similar code in other modules we have that implement PEP-3121 (_csv, readline, io, etc.)

    This code has at least one adverse effect, for testing. The problem with re-importing _elementtree I raised in http://mail.python.org/pipermail/python-dev/2013-August/127766.html is solved by moving to PEP-3121, but this piece of code above ruins it. This is because I want to set sys.modules['pyexpat'] = None and re-import _elementtree (this is what support.import_fresh_module does). But with this code in place, if _elementtree was imported any time in the past (say, in a previous test), I'll just get the instance back without attempting to do the full module initialization.

    > I don't see a call to PyState_AddModule. What am I missing?
    It is called implicitly when an extension module is imported.

    Do you think this should be documented in the C API docs? The way they read now, it seems that calling PyState_AddModule is needed manually by extension writers.

    @elibendersky
    Copy link
    Mannequin

    elibendersky mannequin commented Aug 8, 2013

    Can you explain what use case it tries to cover?

    What I'm trying to say is that although I think I understand its effect (if the same sub-interpreter re-imports _elementtree bypassing the Python module cache, this is an additional level of caching), I'm not sure what *real* use cases it aims for.

    @pitrou
    Copy link
    Member

    pitrou commented Aug 8, 2013

    This code in the beginning in PyInit__elementtree:

    m = PyState_FindModule(&elementtreemodule);
    if (m) {
        Py_INCREF(m);
        return m;
    }
    

    Can you explain what use case it tries to cover? I couldn't find
    similar code in other modules we have that implement PEP-3121 (_csv,
    readline, io, etc.)

    I don't know :-) I just re-used Robin's original patch.

    >> I don't see a call to PyState_AddModule. What am I missing?
    >It is called implicitly when an extension module is imported.

    Do you think this should be documented in the C API docs? The way
    they read now, it seems that calling PyState_AddModule is needed
    manually by extension writers.

    Well, how to deal with module state should probably be better
    documented. Not sure how, though.

    @elibendersky
    Copy link
    Mannequin

    elibendersky mannequin commented Aug 8, 2013

    On Thu, Aug 8, 2013 at 7:00 AM, Antoine Pitrou <report@bugs.python.org>wrote:

    Antoine Pitrou added the comment:

    > This code in the beginning in PyInit__elementtree:
    >
    > m = PyState_FindModule(&elementtreemodule);
    > if (m) {
    > Py_INCREF(m);
    > return m;
    > }
    >
    > Can you explain what use case it tries to cover? I couldn't find
    > similar code in other modules we have that implement PEP-3121 (_csv,
    > readline, io, etc.)

    I don't know :-) I just re-used Robin's original patch.

    Would you mind removing it from the patch, due to the case described above?
    ISTM that in real scenarios the sys.modules cache kicks in anyway. It
    should not be really bypassed for any given sub-interpreter in sane code.

    > >> I don't see a call to PyState_AddModule. What am I missing?
    > >It is called implicitly when an extension module is imported.
    >
    > Do you think this should be documented in the C API docs? The way
    > they read now, it seems that calling PyState_AddModule is needed
    > manually by extension writers.

    Well, how to deal with module state should probably be better
    documented. Not sure how, though.

    I'll think about it some more and will try to propose a documentation
    patch. This can be done incrementally; we don't have to go to perfect docs
    on the first try ;-)

    @pitrou
    Copy link
    Member

    pitrou commented Aug 8, 2013

    Would you mind removing it from the patch, due to the case described
    above?
    ISTM that in real scenarios the sys.modules cache kicks in anyway. It
    should not be really bypassed for any given sub-interpreter in sane
    code.

    Yeah, I think you're right. I'll submit an updated patch or, if it's
    the only issue with it, perhaps you can simply remove the 3 offending
    lines?

    @elibendersky
    Copy link
    Mannequin

    elibendersky mannequin commented Aug 8, 2013

    On Thu, Aug 8, 2013 at 7:14 AM, Antoine Pitrou <report@bugs.python.org>wrote:

    Antoine Pitrou added the comment:

    > Would you mind removing it from the patch, due to the case described
    > above?
    > ISTM that in real scenarios the sys.modules cache kicks in anyway. It
    > should not be really bypassed for any given sub-interpreter in sane
    > code.

    Yeah, I think you're right. I'll submit an updated patch or, if it's
    the only issue with it, perhaps you can simply remove the 3 offending
    lines?

    Sure, I'll do that. Thanks.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Aug 10, 2013

    New changeset 8a060e2de608 by Eli Bendersky in branch 'default':
    Issue bpo-15651: PEP-3121 refactoring for _elementtree
    http://hg.python.org/cpython/rev/8a060e2de608

    @elibendersky
    Copy link
    Mannequin

    elibendersky mannequin commented Aug 10, 2013

    Antoine, I committed your patch (with a bit of comments added), *leaving the module caching in*. This is because removing it breaks the tests, unfortunately. The _elementtree tests are so crooked that they manage to create a situation in which the module under test throws ParseError which is a different class from ET.ParseError. This is "achieved" by multiple invocations of import_fresh_module with various fresh & blocked parameters, and I still haven't fully traced all the problems yet. Since this caching only potentially harms other tests, it's ok to leave in.

    A longer term solution to all this will be http://mail.python.org/pipermail/python-dev/2013-August/127766.html - I want to eventually run all "monkey-patch the import environment to simulate some situation" sub-tests of ET in different subprocesses to they are kept independent.

    @pitrou
    Copy link
    Member

    pitrou commented Aug 10, 2013

    Antoine, I committed your patch (with a bit of comments added),
    *leaving the module caching in*.

    Thanks!

    A longer term solution to all this will be
    http://mail.python.org/pipermail/python-dev/2013-August/127766.html -
    I want to eventually run all "monkey-patch the import environment to
    simulate some situation" sub-tests of ET in different subprocesses to
    they are kept independent.

    I find it useful that the test suite stresses module unloading or
    reloading. There's probably a bug either in ET or in the ET tests. I'm
    not saying it's a very important issue of course, but IMHO it would be
    better if we don't try to swipe it under the carpet :-)

    @elibendersky
    Copy link
    Mannequin

    elibendersky mannequin commented Aug 10, 2013

    On Sat, Aug 10, 2013 at 10:38 AM, Antoine Pitrou <report@bugs.python.org>wrote:

    Antoine Pitrou added the comment:

    > Antoine, I committed your patch (with a bit of comments added),
    > *leaving the module caching in*.

    Thanks!

    > A longer term solution to all this will be
    > http://mail.python.org/pipermail/python-dev/2013-August/127766.html -
    > I want to eventually run all "monkey-patch the import environment to
    > simulate some situation" sub-tests of ET in different subprocesses to
    > they are kept independent.

    I find it useful that the test suite stresses module unloading or
    reloading. There's probably a bug either in ET or in the ET tests. I'm
    not saying it's a very important issue of course, but IMHO it would be
    better if we don't try to swipe it under the carpet :-)

    I have no intention swiping things under the carpet. I'll get to the bottom
    of this to understand the exact flow that causes this to happen.

    But I still think ET tests should be logically separated into subprocesses.
    If we want to stress test module unloading and reloading, let's have
    specific, targeted tests for that.

    @pitrou
    Copy link
    Member

    pitrou commented Aug 10, 2013

    If we want to stress test module unloading and reloading, let's have
    specific, targeted tests for that.

    Agreed.

    @elibendersky
    Copy link
    Mannequin

    elibendersky mannequin commented Aug 11, 2013

    Found some problems in the interaction of PEP-3121 and import_fresh_module: http://mail.python.org/pipermail/python-dev/2013-August/127862.html

    I'd still like to see the in-PyInit__elementtree caching deleted eventually, without harming test coverage. So this issue will remain open for now, until we decide what to do.

    @elibendersky
    Copy link
    Mannequin

    elibendersky mannequin commented Aug 11, 2013

    BTW, Antoine, w.r.t documentation - I agree that the whole PyState_* sequence needs better documentation and examples, but in the meantime I'm attaching a simple patch for c-api/module.rst. It clarifies that PyState_AddModule doesn't really have to be called explicitly in extensions.

    @vstinner
    Copy link
    Member

    Fixed by:

    commit a6109ef
    Author: Erlend Egeberg Aasland <erlend.aasland@innova.no>
    Date: Fri Nov 20 13:36:23 2020 +0100

    bpo-1635741: Convert _sre types to heap types and establish module state (PEP-384) (GH-23393)
    

    @vstinner
    Copy link
    Member

    Fixed by: (...) bpo-1635741: Convert _sre types to heap types...

    Oops, I commented the wrong issue. I reopen it.

    @vstinner vstinner reopened this Dec 18, 2020
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    extension-modules C modules in the Modules dir performance Performance or resource usage
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants