classification
Title: Restructure import.c into PEP 302 importer objects
Type: behavior Stage:
Components: Interpreter Core Versions: Python 3.0
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: Nosy List: belopolsky, brett.cannon, christian.heimes, dgreiman, ncoghlan, pje
Priority: normal Keywords: patch

Created on 2008-02-17 23:11 by dgreiman, last changed 2008-04-22 00:58 by brett.cannon. This issue is now closed.

Files
File name Uploaded Description Edit
builtin_importer-20080217.diff dgreiman, 2008-02-17 23:11 Patch against py3k r60881
Messages (9)
msg62510 - (view) Author: Douglas Greiman (dgreiman) * Date: 2008-02-17 23:11
This patch reorganizes import.c to move functionality into two new PEP
302-style Importer objects.  I attempted to change as little as
feasible, but the patch is still ~4700 lines long, about 1000 of which
is new tests.

BuiltinImporter: handles C_BUILTIN and PY_FROZEN modules
DirectoryImporter: handles PY_SOURCE, PY_COMPILED, PKG_DIRECTORY,
C_EXTENSION modules

BuiltinImporter is put on sys.meta_path, DirectoryImporter is put on
sys.path_hooks.

To preserve backward compatibility of methods like imp.find_module(),
they use new variables sys.builtin_meta_path and sys.builtin_path_hooks
which are analogous to sys.meta_path and sys.path_hook but contain only
the two importer objects above.

Character encoding issues were substantial.  The existing code was
somewhat inscrutable in this regard.  The tests disabled in issue 1377
were re-added with more safeguards and harder tests.  It is possible to
import modules with non-ascii names from non-ascii paths.  Tested on
Windows XP and Linux.

Areas for discussion:

Naming: Names of the importer classes, names of variables, etc

sys: I added three variables to sys.  Is there a better alternative?

ModuleInfo: The importers use a somewhat tricky way to store an open
file between calls to importer.find_module() and importer.load_module(),
so that the new code doesn't do any more system calls that the old code.

semantics: There are many import corner cases where the semantics are
unknown or unclear.  

Frozen packages: the __path__ of a frozen package is a string rather
than a list, and it is possible to replace that string to change how
imports inside that package are resolved.  I have attempted to keep that
functionality intact but it's not clear that it's a feature rather than
a bug.
msg62697 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2008-02-22 20:38
I've noticed that the proposed patch adds const qualifier to many C-API 
functions' arguments.  While at the first glance these changes are 
reasonable, they add clutter to the already massive patch.

I would recommend to separate const qualifier changes into a separate 
patch that can be reviewed separately from the substantive changes.

I promise, I will make substantive comments soon. :-)
msg62698 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2008-02-22 20:47
The patch breaks test_import on Mac OS 10.4:

$ ./python.exe -E -tt -bb ./Lib/test/regrtest.py test_import
test_import
test test_import failed -- Traceback (most recent call last):
  File "./Lib/test/regrtest.py", line 1199, in <module>
    main()
  File "./Lib/test/regrtest.py", line 405, in main
    huntrleaks)
  File "./Lib/test/regrtest.py", line 562, in runtest
    huntrleaks, debug)
  File "./Lib/test/regrtest.py", line 614, in runtest_inner
    print("test", test, "failed --", msg)
  File "/Users/sasha/Work/python-svn/py3k/Lib/io.py", line 1247, in 
write
    b = encoder.encode(s)
  File "/Users/sasha/Work/python-svn/py3k/Lib/encodings/ascii.py", line 
22, in encode
    return codecs.ascii_encode(input, self.errors)[0]
UnicodeEncodeError: 'ascii' codec can't encode characters in position 
211-214: ordinal not in range(128)
msg62704 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2008-02-22 21:23
Sorry I have not commented on this sooner; been swamped.

First, the error Alexander is seeing is probably caused by a source file
that has an encoding other than ASCII (which is fine as the default
encoding in Python 3.0 is UTF-8). But chances are the file has an
encoding marker at the top and that is not being picked up by Douglas'
code. Look at PyTokenizer_FindEncoding() for a way to find out the encoding.

Second, I am the wrong person to be reviewing this as I have something
under development that competes with this. =) I am trying to get my pure
Python implementation bootstrapped into Python 3.0 to completely replace
the C code. But I don't know if I will pull this off in time so this
work could still be useful. But if I have to choose time between this
patch and my stuff, I am going to be biased. =)
msg62706 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2008-02-22 21:46
First, let me state that I like the idea of a uniform approach to 
importing, but given the complexity of the task, Brett's approach (which 
I understand as implementing the builtin importer classes in Python) 
would make sense (as long as he can solve the chicken and egg problem).

This said, here is my first question:

Is there a reason why new importer classes are not subclassable while 
zipimporter is?

>>> class x(imp.BuiltinImporter):pass
... 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: type 'imp.BuiltinImporter' is not an acceptable base type

>>> class x(imp.DirectoryImporter): pass
... 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>

>>> class x(zipimport.zipimporter): pass
... 


Users might want to overload find_module while reusing load_module from 
a built-in importer class.

In fact, I see a great deal of code duplication between the new importer 
classes that can be reduced by deriving from a common base class.
msg62710 - (view) Author: Douglas Greiman (dgreiman) * Date: 2008-02-23 01:21
Brett,

I wrote my patch thinking that the next step would be to rewrite
DirectoryImporter in Python.  If you're already working on that and
just want to skip straight from point A to point C and skip this point
B, I'm fine with that.  Basically, tell me if you want me to pursue
this further or drop it.  Are you also reimplementing the code for
builtin and frozen modules in Python?

Alexander,

The const thing: I went on a bit of a const rampage after an
especially irritating debugging session involving function side
effects.  They could be removed.

The non-subclassable aspect was an oversight.  I thought about having
a common base class but ended up creating the ModuleInfo helper class
instead.  There's still a few bits of code duplication between the two
importer classes.  16x2 lines in  *_find_module() could be pulled into
a common place, 4x2 lines of *_require_module(), and a few lines in
the various methods.  I wanted to err on the side of introducing as
few new classes as possible to minimize the general scariness of this
patch.

I don't have a Mac setup, but I think the error you're seeing is
actually an error in my test code.
msg62714 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2008-02-23 02:16
On Fri, Feb 22, 2008 at 5:21 PM, Douglas Greiman <report@bugs.python.org> wrote:
>
>  Douglas Greiman added the comment:
>
>  Brett,
>
>  I wrote my patch thinking that the next step would be to rewrite
>  DirectoryImporter in Python.  If you're already working on that and
>  just want to skip straight from point A to point C and skip this point
>  B, I'm fine with that.  Basically, tell me if you want me to pursue
>  this further or drop it.

You can drop it if you want, but I just don't know how long it will
take for me to finish the bootstrap work (for instance I am having to
rewrite the warnings module in C). So there is a chance it might still
be useful, but I can't guarantee it.

>  Are you also reimplementing the code for
>  builtin and frozen modules in Python?
>

I use the imp module to handle the actual loading, but the importers
and loaders are all in Python. You can see the implementation in the
sandbox under import_in_py.
msg62716 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2008-02-23 02:23
Interesting - I'll try to find the time to take a look at this. (I also
added PJE to the nosy list - hopefully he will get a chance to look at it)
msg65667 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2008-04-22 00:58
Closed at Doug's request. My importlib work, once it lands, will supplant 
all of this.
History
Date User Action Args
2008-04-22 00:58:24brett.cannonsetstatus: open -> closed
resolution: rejected
messages: + msg65667
2008-02-23 02:23:50ncoghlansetnosy: + ncoghlan, pje
messages: + msg62716
2008-02-23 02:16:24brett.cannonsetmessages: + msg62714
2008-02-23 01:21:34dgreimansetmessages: + msg62710
2008-02-22 21:46:06belopolskysetmessages: + msg62706
2008-02-22 21:23:44brett.cannonsetassignee: brett.cannon ->
messages: + msg62704
2008-02-22 20:47:19belopolskysetmessages: + msg62698
2008-02-22 20:38:16belopolskysetnosy: + belopolsky
messages: + msg62697
2008-02-19 09:07:41christian.heimessetpriority: normal
assignee: brett.cannon
keywords: + patch
nosy: + christian.heimes, brett.cannon
2008-02-17 23:11:18dgreimancreate