Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-implement parts of imp in pure Python #58167

Closed
brettcannon opened this issue Feb 7, 2012 · 76 comments
Closed

Re-implement parts of imp in pure Python #58167

brettcannon opened this issue Feb 7, 2012 · 76 comments
Assignees
Labels
release-blocker stdlib Python modules in the Lib dir

Comments

@brettcannon
Copy link
Member

BPO 13959
Nosy @brettcannon, @birkenfeld, @pitrou, @vstinner, @benjaminp, @merwok, @meadori, @ericsnowcurrently, @berkerpeksag
PRs
  • bpo-32030: pass interp to _PyImport_Init() #4736
  • Dependencies
  • bpo-2377: Replace import w/ importlib.import
  • bpo-14605: Make import machinery explicit
  • bpo-14657: Avoid two importlib copies
  • bpo-15166: Implement imp.get_tag() using sys.implementation
  • bpo-15167: Re-implement imp.get_magic() in pure Python
  • Files
  • imp.diff
  • imp.diff
  • use_of_imp.txt: stdlib modules that use imp (could be refactored to use importlib)
  • imp_find_module.diff: sketch of find_module() implementation
  • issue13959_reload.diff: port of _imp.reload() to imp.py
  • issue13959_magic_and_tag.diff: one approach to moving the magic and tag code out of import.c
  • explicit_meta_path.diff
  • explicit_path_hooks.diff
  • issue13959_magic.diff: porting MAGIC to importlib
  • issue13959_tag.diff: porting TAG to importlib
  • imptest.py
  • issue13959_get_tag.diff: new TAG implementation using sys.implementation
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/brettcannon'
    closed_at = <Date 2012-07-09.20:23:48.007>
    created_at = <Date 2012-02-07.00:51:29.679>
    labels = ['library', 'release-blocker']
    title = 'Re-implement parts of imp in pure Python'
    updated_at = <Date 2017-12-06.16:25:57.617>
    user = 'https://github.com/brettcannon'

    bugs.python.org fields:

    activity = <Date 2017-12-06.16:25:57.617>
    actor = 'vstinner'
    assignee = 'brett.cannon'
    closed = True
    closed_date = <Date 2012-07-09.20:23:48.007>
    closer = 'brett.cannon'
    components = ['Library (Lib)']
    creation = <Date 2012-02-07.00:51:29.679>
    creator = 'brett.cannon'
    dependencies = ['2377', '14605', '14657', '15166', '15167']
    files = ['24846', '24979', '24980', '25250', '25302', '25303', '25308', '25309', '25335', '25336', '25834', '26058']
    hgrepos = []
    issue_num = 13959
    keywords = ['patch']
    message_count = 76.0
    messages = ['152800', '155790', '156473', '156474', '156475', '156527', '156530', '156964', '158269', '158362', '158364', '158365', '158370', '158371', '158377', '158383', '158398', '158474', '158522', '158531', '158555', '158575', '158593', '158594', '158622', '158628', '158704', '158895', '158908', '158930', '158931', '158939', '158941', '158951', '158955', '158968', '158971', '158985', '158986', '158995', '158998', '158999', '159118', '159119', '159628', '159630', '159631', '159641', '159644', '159757', '159758', '159960', '159962', '159969', '160424', '160433', '160434', '160439', '160493', '160495', '160524', '162276', '162356', '162357', '162364', '162384', '162940', '163212', '163232', '163569', '163798', '164102', '165116', '165120', '165148', '307747']
    nosy_count = 12.0
    nosy_names = ['brett.cannon', 'georg.brandl', 'jpe', 'pitrou', 'vstinner', 'benjamin.peterson', 'eric.araujo', 'Arfrever', 'meador.inge', 'python-dev', 'eric.snow', 'berker.peksag']
    pr_nums = ['4736']
    priority = 'release blocker'
    resolution = 'fixed'
    stage = 'needs patch'
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue13959'
    versions = ['Python 3.3']

    @brettcannon
    Copy link
    Member Author

    A bunch of code in Python/import.c exists purely for the imp module, but a large chunk of it does not need to be implemented in C but instead can be implemented in Python code.

    Once importlib is bootstrapped in then an attempt to scale back the C code should be done by re-implementing parts of imp in pure Python (either in some _imp module or directly in _importlib itself).

    @brettcannon brettcannon added the stdlib Python modules in the Lib dir label Feb 7, 2012
    @brettcannon
    Copy link
    Member Author

    Attached is a start for re-implementing imp. Still need code for cache_from_source, source_from_cache, find_module, load_module, reload, load_compiled, load_source, and load_package.

    @ericsnowcurrently
    Copy link
    Member

    Here's an incomplete (though essentially test-passing) patch that flips more of imp into importlib. It builds on Brett's patch. The gist is that imp.py is a minimal wrapper around importlib, which uses _imp directly.

    Ultimately the things implemented in Lib/importlib/_bootstrap.py should be yanked out of Python/import.c (and Python/dynload_*.c). I expect that this will wait for the bootstrap merge (bpo-2377) to land.

    @ericsnowcurrently
    Copy link
    Member

    Here's a listing of the places where import-ish modules are used in CPython. Once things all the import stuff currently on the table is wrapped up, I'll probably work on switching things over to using importlib directly.

    @ericsnowcurrently
    Copy link
    Member

    Of note for my patch is the addition of SUFFIXES and MODE to the 3 main "file loader" classes. I did this to reverse the dependence on imp.get_suffixes().

    As well, a couple of extra functions are added to Python/importlib/_bootstrap.py for a similar reason, but should also be meaningful externally in their own right.

    load_module_from_file(), is one of these, though the current incarnation relies on a mythical new method on loaders that may not pass muster.

    (FWIW, I accidently included the use_of_imp.txt file in the patch and don't intend it to get included)

    @merwok
    Copy link
    Member

    merwok commented Mar 22, 2012

    Once things all the import stuff currently on the table is wrapped up,
    I'll probably work on switching things over to using importlib directly.

    Please leave distutils unchanged; we don’t clean it up or improve it in any way, it’s bugfixes only. Changing packaging is fine; I’ll just keep using the imp module in the distutils2-python3 backport.

    @ericsnowcurrently
    Copy link
    Member

    Sounds good. It will be a while before we get there, but at that point I was already planning on getting in touch with the maintainers of relevant modules before messing with them.

    @ericsnowcurrently
    Copy link
    Member

    (Éric, I'd reply in the review if reitveld recognized my current username -- see http://psf.upfronthosting.co.za/roundup/meta/issue402. FYI, I also did not get an email for your review, which is likely related to that issue.)

    Regarding __all__, I didn't realize that about being a tuple. I'll fix it.

    As to the "dm" in the SOABI value, that is definitely something we'll have to factor in. There are actually a few such places in the current patch that rely on problematic data. It's either hidden in C or not available during bootstrapping. That's something we'll have to iron out before the solution for this issue gets committed.

    @pitrou
    Copy link
    Member

    pitrou commented Apr 14, 2012

    I think this should be a blocker for 3.3.

    @brettcannon
    Copy link
    Member Author

    Just because I was thinking about it, I wonder if necessarily all the frozen stuff really needs to stay in import.c. I mean a frozen module is really just an entry in an array of structs that has a name of an char*[]. I don't see why one couldn't simply have a get_frozen_bytes() method to convert that char*[] into a bytes object and use that to construct a module in pure Python code.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Apr 15, 2012

    New changeset d777f854a66e by Brett Cannon in branch 'default':
    Issue bpo-13959: Rename imp to _imp and add Lib/imp.py and begin
    http://hg.python.org/cpython/rev/d777f854a66e

    @brettcannon
    Copy link
    Member Author

    OK, so I have started to check this stuff in, but I think it's best to do it piecemeal. Going forward I would like to commit in units of functions being replaced, and prioritize stuff that cuts out C code (e.g. the load_*() methods, find_module(), etc.). That way it's clear that progress is being made. Obviously the best way to tell if code is hanging on just because of imp is to comment out the interface code and see what your compiler complains about in terms of dead code (or at least clang is good at this).

    @brettcannon
    Copy link
    Member Author

    It looks like in order to get a clear sign of what it will take to remove various parts of import.c, imp.load_module() needs to go along with imp.load_package() (since they call each other in the C code). You also have to take care of imp.reload(), but I am simplifying the C code greatly and will take care of referencing other code in the module.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Apr 15, 2012

    New changeset 4dce3afc392c by Brett Cannon in branch 'default':
    Issue bpo-13959: Simplify imp.reload() by relying on a module's
    http://hg.python.org/cpython/rev/4dce3afc392c

    @brettcannon
    Copy link
    Member Author

    I am seeing how this is going to go down. the load_dynamic, load_source, etc. family of functions are simply dispatched to by load_module(). So to keep some semblance of backwards-compatibility, each of those modules need to be implemented and then have load_module() simply dispatch to them based on the "type" of module it is.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Apr 16, 2012

    New changeset 2df37938b8e1 by Brett Cannon in branch 'default':
    Issue bpo-13959: Re-implement imp.load_module() in imp.py.
    http://hg.python.org/cpython/rev/2df37938b8e1

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Apr 16, 2012

    New changeset 4256df44023b by Brett Cannon in branch 'default':
    Issue bpo-13959: Re-implement imp.load_package() in imp.py.
    http://hg.python.org/cpython/rev/4256df44023b

    @brettcannon
    Copy link
    Member Author

    From Eric Smith on python-dev:

    • suffix, mode, type_ = details
    • if mode and (not mode.startswith(('r', 'U'))) or '+' in mode:
    •    raise ValueError('invalid file open mode {!r}'.format(mode))
      

    Should this be:
    if mode and (not mode.startswith(('r', 'U')) or '+' in mode):

    to match:

    • if (*mode) {
      ...
    •    if (!(*mode == 'r' || \*mode == 'U') || strchr(mode, '+')) {
      
    •        PyErr_Format(PyExc_ValueError,
      
    •                     "invalid file open mode %.200s", mode);
      

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Apr 17, 2012

    New changeset 3b5b4b4bb43c by Brett Cannon in branch 'default':
    Issue bpo-13959: Re-implement imp.load_source() in imp.py.
    http://hg.python.org/cpython/rev/3b5b4b4bb43c

    @ericsnowcurrently
    Copy link
    Member

    This is a mostly-working sketch of find_module() in imp.py. I've run out of time tonight, but wanted to get it up in case it's useful to anyone else (a.k.a Brett). If nobody's touched it before then, I'll finish it up tomorrow.

    @brettcannon
    Copy link
    Member Author

    I doubt I will beat you to it, Eric, but I did want to say that your overall design was what I had in my head when I was thinking about how to re-implement the function, so keep at it!

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Apr 17, 2012

    New changeset 66bd85bcf916 by Brett Cannon in branch 'default':
    Issue bpo-13959: Re-implement imp.load_compiled() in imp.py.
    http://hg.python.org/cpython/rev/66bd85bcf916

    @ericsnowcurrently
    Copy link
    Member

    2 "problems" with that last patch:

    • the types of the loaders that get returned by _bootstrap._find_module() are not the classes in _bootstrap (e.g. _frozen_importlib._SourceFileLoader). That doesn't smell right.
    • tokenize.get_encoding? is saying that Lib/test/badsyntax_pep3120.py has an encoding of utf-8, when test_find_module_encoding is expecting it to not. So does PyTokenizer_FindEncodingFilename (which the old import code used) behaving differently than tokenize.get_encoding()? FYI, tokenize.get_encoding() is what importlib uses...

    @ericsnowcurrently
    Copy link
    Member

    rather, tokenize.detect_encoding()

    @brettcannon
    Copy link
    Member Author

    You could change Lib/imp.py to have import _frozen_importlib as _bootstrap if you want the same modules coming from imp, but I would argue against changing importlib itself being changed as that complicates development since if you screw up importlib._bootstrap when you compile it becomes a major pain to revert the importlib.h change, recompile, and continue to do that until you get it right. Plus you would only care about this if you are doing isinstance() checks on what kind of loader you have which you shouldn't care about since we have clearly defined ABCs to test against.

    As for Lib/test/badsyntax_pep3120.py, it *does* have a source encoding of UTF-8 since it does not explicitly specify an encoding. Based on the name I'm assuming the file itself has bad UTF-8 characters, but that doesn't mean that the file is not supposed to be interpreted as UTF-8.

    @ericsnowcurrently
    Copy link
    Member

    On the _frozen_importlib point, I'm fine with that. It was just counter-intuitive that the importlib._bootstrap functions were returning loaders whose classes weren't also defined there.

    I'll have another look at that test tonight and run the patch through the full test suite, but otherwise I think find_module() is basically done.

    @ericsnowcurrently
    Copy link
    Member

    Looking it over, I'm confident that tokenizer.detect_encoding() does not raise a SyntaxError where PyTokenizer_FindEncodingFilename() does. I've run out of time tonight, but I'll look at it more tomorrow.

    Once find_module() is done, I'd like to move on to reload(), which I expect will be pretty straightforward at this point. Then the feasibility of bpo-14618 should be clear.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Apr 20, 2012

    New changeset c820aa9c0c00 by Brett Cannon in branch 'default':
    Issue bpo-13959: Keep imp.get_magic() in C code, but cache in importlib
    http://hg.python.org/cpython/rev/c820aa9c0c00

    @brettcannon
    Copy link
    Member Author

    OK, I'm waiting on issue bpo-14657 (avoiding the _frozen_importlib/importlib dichotomy) is resolved before I document the new imp.extension_suffixes() as it would be good to know where I should pull in the source and bytecode suffixes.

    I think this only leaves get_magic() and get_tag() as the only things to move + trying to figure out how to handle PyImport_ExecCodeModuleObject() and its grip on various bits of C code.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented May 4, 2012

    New changeset 22b0689346f9 by Brett Cannon in branch 'default':
    Issue bpo-13959: Move module type constants to Lib/imp.py.
    http://hg.python.org/cpython/rev/22b0689346f9

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented May 11, 2012

    New changeset b81ddaf0db47 by Brett Cannon in branch 'default':
    Issue bpo-13959: Deprecate imp.get_suffixes() for new attributes on
    http://hg.python.org/cpython/rev/b81ddaf0db47

    @ericsnowcurrently
    Copy link
    Member

    Question on this one:

    <snip>
    @@ -126,7 +131,7 @@ def load_compiled(name, pathname, file=N
     # XXX deprecate
     def load_package(name, path):
         if os.path.isdir(path):
    -        extensions = _bootstrap._SOURCE_SUFFIXES + [_bootstrap._BYTECODE_SUFFIX]
    +        extensions = machinery.SOURCE_SUFFIXES[:] + [machinery.BYTECODE_SUFFIXES]
             for extension in extensions:
                 path = os.path.join(path, '__init__'+extension)
                 if os.path.exists(path):
    </snip>

    Should that be the following?

      extensions = machinery.SOURCE_SUFFIXES[:] + machinery.BYTECODE_SUFFIXES[:]

    Also, why the "[:]"?

    Finally, in a couple spots you use the first element of the list (like the old case of "machinery.SOURCE_SUFFIXES[0]" in source_from_cache() and the new one in find_module()). Will this be a problem where the source file has one of the other suffixes? I'm not sure it's a big enough deal for the moment to worry about, but thought I'd ask. :)

    @brettcannon
    Copy link
    Member Author

    Yes.

    And the [:] copies the list so I don't accidentally mutate in-place (did that once already, so now I'm just being paranoid; I doubt I need it hardly anywhere I don't do +=).

    As for using SOURCE_SUFFIXES[0], I'm done following the ported code. I really don't care if .pyw isn't supported in that case. But in general people shouldn't assume just .py (or .py at all).

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented May 11, 2012

    New changeset 626d5c6fbd95 by Brett Cannon in branch 'default':
    Issue bpo-13959: Have
    http://hg.python.org/cpython/rev/626d5c6fbd95

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented May 12, 2012

    New changeset 7bf8ac742d2f by Brett Cannon in branch 'default':
    Issue bpo-13959: Introduce importlib.find_loader().
    http://hg.python.org/cpython/rev/7bf8ac742d2f

    @brettcannon
    Copy link
    Member Author

    I have importlib.find_loader() coded up, but getting rid of find_module() is going to be a pain thanks to pkgutil, multiprocessing.forking, modulefinder, and idlelib.EditorWindow.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented May 13, 2012

    New changeset 59870239813c by Brett Cannon in branch 'default':
    Issue bpo-13959: Document imp.find_module/load_module as deprecated.
    http://hg.python.org/cpython/rev/59870239813c

    @brettcannon
    Copy link
    Member Author

    Need to update the docstrings for imp.find_module() and load_module() to mention the functions are deprecated since there is no specific raised deprecation, only documented deprecation.

    @jpe
    Copy link
    Mannequin

    jpe mannequin commented Jun 5, 2012

    This may be a known problem, but imp.load_module fails when trying to load a C extension. This is with the 3.3a4 release on Windows, though I suspect the problem is cross-platform.

    @brettcannon
    Copy link
    Member Author

    Does it work in Python 3.2, John? If it does then it's just an oversight thanks to the lack of tests in test_imp and it shouldn't be too difficult to support.

    But do realize I have deprecated the function. =)

    @jpe
    Copy link
    Mannequin

    jpe mannequin commented Jun 5, 2012

    On 6/5/12 1:08 PM, Brett Cannon wrote:

    Brett Cannon<brett@python.org> added the comment:

    Does it work in Python 3.2, John? If it does then it's just an oversight thanks to the lack of tests in test_imp and it shouldn't be too difficult to support.

    But do realize I have deprecated the function. =)

    Attached is a short test file to demonstrate this. It assumes the
    standard win32 layout, but shouldn't be hard to modify. It does work in
    Python 3.2, but the else: clause in 3.3's imp.load_module function
    raises an ImportError. I think the fix is to add an elif C_EXTENSION:
    clause that loads the .so / .pyd.

    I've already rewritten my code to use importlib when running in Python 3.3.

    @brettcannon
    Copy link
    Member Author

    You're right about the solution, John. I (or someone else) just needs to code it up.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Jun 15, 2012

    New changeset 034c814eb187 by Brett Cannon in branch 'default':
    Issue bpo-13959: Add to imp.find_module() and load_module's docstrings
    http://hg.python.org/cpython/rev/034c814eb187

    @birkenfeld
    Copy link
    Member

    Which parts are still missing here?

    @ericsnowcurrently
    Copy link
    Member

    Here are the things I'm aware of:

    • implement imp.get_tag() using sys.implementation
    • *maybe* implement imp.get_magic() in pure Python (patch already attached here)
    • (low priority) find a way to rip out the bulk of PyImport_ExecCodeModuleObject() from imp.c, if practical

    From my discussions with Brett, there was a laundry list he was aiming to get in for 3.3 (for import stuff in general). I know much of it got done, but not all of it. Of the remainder, I'm not sure how much of that he still wants to see get in. Some of it pertains to this issue, though not all.

    @birkenfeld
    Copy link
    Member

    OK, sounds like none of it would block beta1.

    @brettcannon
    Copy link
    Member Author

    Sorry, been on vacation the past week.

    Georg is right, nothing left is a b1 blocker, just stuff I want to get in before 3.3 ships.

    @birkenfeld
    Copy link
    Member

    Moving back to blocker for beta2.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Jul 9, 2012

    New changeset efb5e6ab10f4 by Brett Cannon in branch 'default':
    Issue bpo-15167 (as part of bpo-13959): imp.get_magic() is no implemented in
    http://hg.python.org/cpython/rev/efb5e6ab10f4

    @brettcannon
    Copy link
    Member Author

    Since the final issue is just a nicety, I am considering this issue closed.

    @ericsnowcurrently
    Copy link
    Member

    hurray!

    @vstinner
    Copy link
    Member

    vstinner commented Dec 6, 2017

    New changeset 672b6ba by Victor Stinner in branch 'master':
    bpo-32030: pass interp to _PyImport_Init() (bpo-4736)
    672b6ba

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    release-blocker stdlib Python modules in the Lib dir
    Projects
    None yet
    Development

    No branches or pull requests

    7 participants