Full unicode import system #47330

amauryfa · 2008-06-11T18:24:56Z

BPO	3080
Nosy	@brettcannon, @birkenfeld, @terryjreedy, @amauryfa, @ncoghlan, @abalkin, @pitrou, @vstinner, @benjaminp, @merwok, @bitdancer, @asvetlov
Files	issue3080-5.patch issue3080.py typo.diff

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = 'https://github.com/vstinner'
closed_at = <Date 2011-03-21.00:01:35.123>
created_at = <Date 2008-06-11.18:24:56.218>
labels = ['interpreter-core', 'type-bug']
title = 'Full unicode import system'
updated_at = <Date 2011-03-23.15:58:18.116>
user = 'https://github.com/amauryfa'

bugs.python.org fields:

activity = <Date 2011-03-23.15:58:18.116>
actor = 'vstinner'
assignee = 'vstinner'
closed = True
closed_date = <Date 2011-03-21.00:01:35.123>
closer = 'vstinner'
components = ['Interpreter Core']
creation = <Date 2008-06-11.18:24:56.218>
creator = 'amaury.forgeotdarc'
dependencies = []
files = ['20477', '21296', '21305']
hgrepos = []
issue_num = 3080
keywords = ['patch']
message_count = 60.0
messages = ['68005', '68015', '109844', '112028', '119107', '123963', '123993', '124756', '125752', '126514', '126515', '126516', '126606', '126608', '126612', '126613', '126672', '126673', '126676', '126678', '126680', '126681', '126695', '126705', '126706', '126708', '126752', '126755', '126756', '126760', '127591', '127674', '129141', '129143', '129185', '129196', '130050', '130473', '130492', '130507', '130645', '130935', '131457', '131464', '131472', '131473', '131474', '131483', '131484', '131516', '131547', '131571', '131572', '131575', '131606', '131612', '131625', '131664', '131711', '131890']
nosy_count = 14.0
nosy_names = ['brett.cannon', 'georg.brandl', 'terry.reedy', 'amaury.forgeotdarc', 'ncoghlan', 'belopolsky', 'pitrou', 'vstinner', 'benjamin.peterson', 'eric.araujo', 'Arfrever', 'r.david.murray', 'asvetlov', 'python-dev']
pr_nums = []
priority = 'high'
resolution = 'fixed'
stage = None
status = 'closed'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue3080'
versions = ['Python 3.3']

amauryfa · 2008-06-11T18:24:52Z

This is the most difficult part of bpo-1342:
"""
On Windows, don't use the FileSystemEncoding on Windows for sys.path items.
Instead, it should use the wide API to perform all system calls. Py3k
shouldn't ever use the file system encoding for anything on Windows.
"""

This imply to rewrite all functions in import.c, and replace all char*
arguments with unicode variables.

benjaminp · 2008-06-11T20:27:35Z

I suspect importlib may help with this.

birkenfeld · 2010-07-10T10:34:04Z

Victor is working on this.

vstinner · 2010-07-30T00:20:34Z

I posted a patch: bpo-9425.

vstinner · 2010-10-19T02:19:12Z

With bpo-8611 and bpo-9425, I patched a lot of functions and modules, including the NullImporter and zipimport, but not the core of the import machinery.

In my import_unicode SVN branch, I patched the import machinery to manipulate unicode strings, instead of bytes strings. But the patch is huge and the import machinery is fragile. Since Python 3.2 now works in a non-ASCII directory with an ASCII locale (fileystem) encoding, I don't plan to merge the patch into py3k.

The patch is still useful on Windows, because Python uses the mbcs encoding to encode/decode filenames, and this encoding is usually a very small subset of Unicode (eg. cp1252 is 256 codes wheres unicode 6.0 has 109,449 characters).

bitdancer · 2010-12-14T17:58:37Z

With bpo-1342 fixed, it seems that this issue is no longer critical (Haypo describes his complicated patch as "useful on Windows", but not critical. So I'm downgrading it to 'high'. Perhaps it is even 'normal'. It also seems as though it is currently languishing unless someone wants to pick it up.

vstinner · 2010-12-14T23:47:56Z

Haypo describes his complicated patch as "useful on Windows",
but not critical

Usecase on Windows: your japanese friend gives you an USB key (eg. created on Windows with code page 932) with his Python project, you cannot run it on your english speaking Windows (eg. code page 1252), because it loads Python modules with japanese characters in their paths.

It works if all paths are encodable to your ANSI code page. It doesn't work if a least one character of one path is not encodable to your ANSI code page.

I don't know if this usecase is common or not.

Note: the FAT file system of the USB key stores filenames as UTF-16 (and not in the user code page).

vstinner · 2010-12-28T02:51:41Z

Issue bpo-10785 prepares the work for this issue: store input filename as a unicode string, instead of a byte string, in the parser.

terryjreedy · 2011-01-08T06:23:10Z

If I edit a file with IDLE, save it, and successfully run it (perhaps to test it), then when I edit a second file that imports the first, I expect the import to work. It does not always (see bpo-10828).

Import is part of the core definition of the language. Unicode identifiers are supposedly part of Python3. Given the existence of <identifier>.py in the current directory, 'import identifier' should work. If it does not, the 3.1 message '<identifier> not found' is more truthful than the current 'no module named <identifier>', when there is one.

The doc says "identifier ::= (identifier ".")* identifier". As long as that is not true, some indication of the restriction that most people can understand would be nice. (And I suspect that a majority of Windows users, at least in the US, have no idea of what an 'ANSI code page' is.)

vstinner · 2011-01-19T01:22:03Z

Here is a work-in-progress patch: bpo-3080-3.patch. The patch is HUGE and written for Python 3.3.

$ diffstat issue3080-3.patch 
 Doc/c-api/module.rst   |   24 
 Include/import.h       |   73 +
 Include/moduleobject.h |    2 
 Include/pycapsule.h    |    4 
 Modules/zipimport.c    |  272 +++

As expected, most of the work in done in import.c.

Decode the module name earlier and encode it later. Try to manipulate PyUnicodeObject objects instead of char* buffers (so we have directly the string length).

Split the huge and very complex find_module() function into 3 functions (find_module, find_module_filename and find_module2) and document them. Drop OS/2 support in find_module() (it can be kept, but it was easier for me to drop it and the OS/2 maintainer wrote that Python 3 is far from being compatible with OS/2).

The patch creates some functions: PyModule_GetNameObject(), PyImport_ExecCodeModuleUnicode(), PyImport_AddModuleUnicode(), PyImport_ImportFrozenModuleUnicode(), PyModule_NewUnicode(), ...

Use "U" format to parse a module name, and "%R" to format a module name (to escape surrogates characters and add quotes, instead of "... '%.200s' ...").

PyWin_FindRegisteredModule() is now private. Remove fqname argument from _PyImport_GetDynLoadFunc(), it wasn't used.

Replace open_exclusive() by fopen(name, "wb") on Windows: is it correct?

TODO:

rename xxxobj => xxx to keep original names and have a short patch (eg. I renamed name to nameobj during the transition to detect bugs)
catch encoding errors in case_ok()
don't encode in case_ok() if case_ok() does nothing (eg. on Linux)
find a better name for find_module2()

The patch contains a tiny script, bpo-3080.py, to test the patch using an ISO-8859-1 locale.

I will open a thread on the mailing list (python-dev) to decide if this patch is needed or not. If we agree that this issue should be fixed, I will split the patch into smaller parts and start a review process.

vstinner · 2011-01-19T01:25:18Z

This patch changes more lines of code than my previous crazy unicode patch (msg103663, issue bpo-8242 bpo-8611 bpo-9425), but it changes less files.

vstinner · 2011-01-19T01:27:45Z

Oh, msg103663 was not the final patch. A more recent version of my patch for bpo-8611 / bpo-9425 is http://codereview.appspot.com/1874048:

Doc/library/sys.rst Include/Python.h Include/fileobject.h Include/import.h Include/moduleobject.h Include/sysmodule.h Include/warnings.h Lib/distutils/ Lib/platform.py Lib/test/test_import.py Lib/test/test_sax.py Lib/test/test Lib/test/test_sys.py Lib/test/test_urllib.py Lib/test/test_urllib2.py Lib/test/test_ Modules/getpath.c Modules/main.c Modules/zipimport.c Objects/codeobject.c Objects/fileobject.c Objects/moduleobject.c Objects/object.c Objects/typeobject.c Objects/unicodeobject.c PC/import_nt.c Parser/tokenizer.c Python/_warnings.c Python/ast.c Python/bltinmodule.c Python/ceval.c Python/compile.c Python/errors.c Python/import.c Python/importdl.c Python/importdl.h Python/pythonrun.c Python/sysmodule.c 38 files changed, 1404 insertions(+), 748 deletions(-) | 6
| 4
| 20
| 21
| 1
| 5
| 2
file_util.py | 2
| 50 +-
| 7
| 5
_subprocess.py | 14
| 5
| 8
| 5
xml_etree.py | 6
| 209 +++++----
| 99 +++-
| 202 +++++----
| 17
| 32 +
| 25 -
| 6
| 12
| 11
| 18
| 12
| 69 ++-
| 16
| 24 -
| 7
| 14
| 2
| 958 ++++++++++++++++++++++++++------------------
| 27 -
| 2
| 169 +++++++
| 60 ++

So, bpo-3080-3.patch and issue1874048_1.diff are close :-)

ncoghlan · 2011-01-20T13:22:45Z

Victor, could you please create a Reitveld review for this? The auto-review creator can't cope with the Git diffs.

vstinner · 2011-01-20T13:24:24Z

Victor, could you please create a Reitveld review for this?

Yes, but not yet. I have first to cleanup the patch.

ncoghlan · 2011-01-20T13:35:20Z

OK - I'll wait until that is ready before digging into this.

vstinner · 2011-01-20T13:52:10Z

Use "U" format to parse a module name, and "%R" to format a module name
(to escape surrogates characters and add quotes, instead of
"... '%.200s' ...").

See also bpo-8754: repr() is better than str() for other reasons, eg. to see a space at the end of a module name (import('space ')) thanks to the quotes.

vstinner · 2011-01-21T01:21:14Z

Version 4 of the patch.

vstinner · 2011-01-21T01:25:26Z

Same patch (version 4) generated by svn.

vstinner · 2011-01-21T01:37:11Z

You can review the patch with Rietveld:
http://codereview.appspot.com/3972045

vstinner · 2011-01-21T02:08:20Z

Oops, there is a dummy typo in imp_init_builtin() that makes test_importlib to crash (which proves that importlib has a good coverage :-)): replace "s:" by "U:" in if (!PyArg_ParseTuple(args, "s:init_builtin", &name)).

vstinner · 2011-01-21T02:13:42Z

test_reprlib fails on Windows, because '\' in quoted '\\' in the filename on repr(module). Workaround:

*******
index b0dc4d7..e476941 100644
--- a/Lib/test/test_reprlib.py
+++ b/Lib/test/test_reprlib.py
@@ -234,7 +234,7 @@ class LongReprTest(unittest.TestCase):
         touch(os.path.join(self.subpkgname, self.pkgname + '.py'))
         from areallylongpackageandmodulenametotestreprtruncation.areallylongpackageandmodulenametotestreprtruncation import areallylongpackageandmodulenametotestreprtruncation
         eq(repr(areallylongpackageandmodulenametotestreprtruncation),
-           "<module '%s' from '%s'>" % (areallylongpackageandmodulenametotestreprtruncation.__name__, areallylongpackageandmodulenametotestreprtruncation.__file__))
+           "<module %r from %r>" % (areallylongpackageandmodulenametotestreprtruncation.__name__, areallylongpackageandmodulenametotestreprtruncation.__file__))
         eq(repr(sys), "<module 'sys' (built-in)>")
 
     def test_type(self):
*******

It is maybe not a good idea to use %R to format the filename in module.__repr__().

vstinner · 2011-01-21T02:18:01Z

test_runpy fails on Windows on make_legacy_pyc() (of test.support), I don't know why.

ncoghlan · 2011-01-21T06:14:14Z

After applying the patch, doing a make clean and rebuild, I found that test_importlib fails with a segmentation fault, but the default test suite otherwise runs without error (that's on Linux with a UTF-8 filesystem, though).

I'll see how a -uall run fares.

ncoghlan · 2011-01-21T07:59:47Z

As for the more limited run, I get a clean run with -uall except for the segfault in test_importlib.

I'll switch to a pydebug build and see how a verbose run of that test fares.

ncoghlan · 2011-01-21T08:19:12Z

I haven't investigated in detail yet, but this is the final line showing the failing test:

test_module (importlib.test.builtin.test_loader.LoaderTests) ... Segmentation fault

vstinner · 2011-01-21T09:19:29Z

except for the segfault in test_importlib.

Yes, as reported in my previous comment :-) Let's update the patch for practical reasons. But I don't want to touch http://codereview.appspot.com/1874048 (based on patch version 4).

ncoghlan · 2011-01-21T16:43:15Z

Oops, missed that post - that was indeed the problem. With that fixed, tests are all good on this system. I'll give the patch a look anyway, but I'm going to have trouble diagnosing things that don't fail on my development machine.

As far as the test_reprlib failure goes, I seem to recall addressing a similar problem elsewhere in the standard lib by replace a "%r" code with "'%s'" to get the single quotes without the backslash escaping. A similar change should probably do the trick here.

vstinner · 2011-02-22T23:46:01Z

I started to commit some parts of the huge patch:

r88515: Mark PyWin_FindRegisteredModule() as private
r88516: Remove unused argument of _PyImport_GetDynLoadFunc()
r88517 (3.3), r88518 (3.2): document encoding used by import functions

vstinner · 2011-02-23T00:24:39Z

r88519: Mark _PyImport_FindBuiltin() argument as constant
r88520: Add PyModule_GetNameObject()

pitrou · 2011-02-23T12:52:58Z

This new failure is perhaps related:

http://www.python.org/dev/buildbot/all/builders/AMD64%20Windows%20Server%202008%203.x/builds/572/steps/test/logs/stdio

======================================================================
FAIL: test_module (test.test_reprlib.LongReprTest)
----------------------------------------------------------------------

Traceback (most recent call last):
  File "c:\buildslave-py3k\3.x.curtin-win2008-amd64\build\lib\test\test_reprlib.py", line 237, in test_module
    "<module '%s' from '%s'>" % (areallylongpackageandmodulenametotestreprtruncation.__name__, areallylongpackageandmodulenametotestreprtruncation.__file__))
AssertionError: "<module 'areallylongpackageandmodulenametotestreprtruncation.areallylongpackage [truncated]... != "<module 'areallylongpackageandmodulenametotestreprtruncation.areallylongpackage [truncated]...
Diff is 825 characters long. Set self.maxDiff to None to see it.

vstinner · 2011-02-23T14:18:08Z

This new failure is perhaps related: (...) test_reprlib

Ah yes, yesterday, I tried to remember which test was impacted by the module change, but all tests passed on Linux. Anyway, it's now fixed by r88533.

vstinner · 2011-03-04T12:58:03Z

r88746: Add PyModule_NewObject() function
r88747: Add PyImport_AddModuleObject() and PyImport_ExecCodeModuleObject()

vstinner · 2011-03-09T22:51:11Z

I created the features/unicode_import repository with a "unicode_import" branch:
http://hg.python.org/features/unicode_import/

It's my huge patch splitted into small and atomic commits.

amauryfa · 2011-03-10T08:01:21Z

Nice work! Is there a specific place for comments? Here are some of them already:

Modules/zipimport.c::make_filename: remove the limit buffer, the code could
look like:

  pathsize = PyUnicode_GET_SIZE(prefix) + PyUnicode_GET_SIZE(name);
  result = PyUnicode_FromUnicode(NULL, pathsize);
  path = PyUnicode_AS_UNICODE(ret);
  ...
  return result;

Python/importdl.c::_PyImport_LoadDynamicModule: shortnameobj is not necessary:

  lastdot = Py_UNICODE_strrchr(nameuni, '.');
  if (lastdot == NULL)
      shortname = namenuni;
  else:
      shortname = lastdot + 1;

_PyImport_GetDynLoadFunc still takes char* arguments. Can this fail on win32
for example, in case the pathname cannot be encoded to mbcs?

vstinner · 2011-03-10T14:34:23Z

Is there a specific place for comments?

Yes, but my work is not done. I still have parts to commit.

_PyImport_GetDynLoadFunc still takes char* arguments.

Oh. This one is not easy because this function has many implementations and all implementations have the same prototype. I will maybe fix it later.

vstinner · 2011-03-11T23:40:43Z

See also bpo-9319: when this issue will be fixed, it will be easier to fix bpo-9319.

vstinner · 2011-03-15T00:59:51Z

I finished to split the huge patch into smaller commits. You can now test the unicode_import Mercurial branch. Especially, it should be tested on Windows.

I don't know if I should merge the branch as an unique commit or as multiple commits. Some of them can be simply be merged.

You can try bpo-3080.py (file attached to this issue, extracted from the patch): a short script testing this issue.

--

The parser and _PyImport_GetDynLoadFunc() (on Windows) do still store the filename as byte strings, and so I don't think that Python is ready to use full Unicode range for filenames on Windows. But at least, it should now support non-ASCII module names and paths which are encodable to the ANSI code page.

Issue bpo-10785 should improve the situation at least for the parser.

But for _PyImport_GetDynLoadFunc(), I don't know if there is a Unicode version of LoadLibraryEx().

--

Modules/zipimport.c::make_filename: remove the limit buffer

Implemented in f286d3b514e0.

Python/importdl.c::_PyImport_LoadDynamicModule: shortnameobj is not necessary

Done in 76907d413b99

vstinner · 2011-03-19T23:01:44Z

Replace open_exclusive() by fopen(name, "wb") on Windows: is it correct?

I reverted this change in my Mercurial branch (unicode_import).

rename xxxobj => xxx to keep original names and have a short patch

done

catch encoding errors in case_ok()

done

don't encode in case_ok() if case_ok() does nothing (eg. on Linux)

done

find a better name for find_module2()

done: find_module_path_list() and find_module_path()

vstinner · 2011-03-19T23:26:59Z

test_runpy fails on Windows on make_legacy_pyc() (of test.support),
I don't know why.

Gotcha: I replaced mkdir() by CreateDirectoryW(), but the "directory already exists" error was not ignored. Fixed by 2debe178697b.

python-dev · 2011-03-20T03:13:16Z

New changeset 6c80ac44ae9c by Victor Stinner in branch 'default':
Issue bpo-3080: zipimport has a full unicode suppport
http://hg.python.org/cpython/rev/6c80ac44ae9c

New changeset b50a0d44545a by Victor Stinner in branch 'default':
Issue bpo-3080: PyImport_Cleanup() uses Unicode
http://hg.python.org/cpython/rev/b50a0d44545a

New changeset e7c1019b27b9 by Victor Stinner in branch 'default':
Issue bpo-3080: Add PyImport_ImportFrozenModuleObject()
http://hg.python.org/cpython/rev/e7c1019b27b9

New changeset 2425717c6430 by Victor Stinner in branch 'default':
Issue bpo-3080: Import builtins using Unicode strings
http://hg.python.org/cpython/rev/2425717c6430

New changeset ced52fcd95f6 by Victor Stinner in branch 'default':
Issue bpo-3080: Use PyUnicode_InternFromString() for builtins
http://hg.python.org/cpython/rev/ced52fcd95f6

New changeset e63a583ec689 by Victor Stinner in branch 'default':
Issue bpo-3080: Document the name attribute of the _inittab structure
http://hg.python.org/cpython/rev/e63a583ec689

New changeset bab42673674a by Victor Stinner in branch 'default':
Issue bpo-3080: _PyWin_FindRegisteredModule() returns the path as Unicode
http://hg.python.org/cpython/rev/bab42673674a

New changeset ef2b6305d395 by Victor Stinner in branch 'default':
Issue bpo-3080: _PyImport_LoadDynamicModule() uses Unicode for name and path
http://hg.python.org/cpython/rev/ef2b6305d395

New changeset d52f471fbbeb by Victor Stinner in branch 'default':
Issue bpo-3080: find_module() initialize buf and *p_fp
http://hg.python.org/cpython/rev/d52f471fbbeb

New changeset bdf5820f5a39 by Victor Stinner in branch 'default':
Issue bpo-3080: Remove useless name buffer from find_module()
http://hg.python.org/cpython/rev/bdf5820f5a39

New changeset a4d797b9ff63 by Victor Stinner in branch 'default':
Issue bpo-3080: Create find_module_path_list() subfunction
http://hg.python.org/cpython/rev/a4d797b9ff63

New changeset 09aaac73d9cf by Victor Stinner in branch 'default':
Issue bpo-3080: Create find_module_path() subfunction
http://hg.python.org/cpython/rev/09aaac73d9cf

New changeset f6507eb8e689 by Victor Stinner in branch 'default':
Issue bpo-3080: get_sourcefile(), make_source_pathname(), load_package()
http://hg.python.org/cpython/rev/f6507eb8e689

New changeset d24decc8c97e by Victor Stinner in branch 'default':
Issue bpo-3080: Use Unicode to import source and compiled modules
http://hg.python.org/cpython/rev/d24decc8c97e

New changeset 64c21f364519 by Victor Stinner in branch 'default':
Issue bpo-3080: load_module() expects name and path as Unicode
http://hg.python.org/cpython/rev/64c21f364519

New changeset e55e7f197649 by Victor Stinner in branch 'default':
Issue bpo-3080: PyImport_ImportModuleNoBlock() uses Unicode
http://hg.python.org/cpython/rev/e55e7f197649

New changeset 7c67aa3ab531 by Victor Stinner in branch 'default':
Issue bpo-3080: Use Unicode for the "The Magnum Opus of dotted-name import"
http://hg.python.org/cpython/rev/7c67aa3ab531

New changeset 23fe237afa81 by Victor Stinner in branch 'default':
Issue bpo-3080: Use %R to format module name in error messages
http://hg.python.org/cpython/rev/23fe237afa81

New changeset 2ee0ab9d2e8a by Victor Stinner in branch 'default':
Issue bpo-3080: Reindent and simplify import_submodule()
http://hg.python.org/cpython/rev/2ee0ab9d2e8a

New changeset 340f76a6a792 by Victor Stinner in branch 'default':
Issue bpo-3080: Drop OS/2 support for the import machinery
http://hg.python.org/cpython/rev/340f76a6a792

New changeset 156818529636 by Victor Stinner in branch 'default':
Issue bpo-3080: find_module() expects module fullname and subname as Unicode
http://hg.python.org/cpython/rev/156818529636

New changeset fe1d421ca3fa by Victor Stinner in branch 'default':
Issue bpo-3080: Rename some path variables to path_list
http://hg.python.org/cpython/rev/fe1d421ca3fa

New changeset c1a5a7dca1ec by Victor Stinner in branch 'default':
Issue bpo-3080: find_module() sets an empty path for builtin and frozen modules
http://hg.python.org/cpython/rev/c1a5a7dca1ec

New changeset c4ccf02456d6 by Victor Stinner in branch 'default':
Issue bpo-3080: Refactor find_module_path(), use return instead of break
http://hg.python.org/cpython/rev/c4ccf02456d6

New changeset 298a70b27497 by Victor Stinner in branch 'default':
Issue bpo-3080: find_init_module() expects Unicode
http://hg.python.org/cpython/rev/298a70b27497

New changeset 066b399a8477 by Victor Stinner in branch 'default':
Issue bpo-3080: case_ok() expects Unicode strings
http://hg.python.org/cpython/rev/066b399a8477

New changeset 9aec6f0e4076 by Victor Stinner in branch 'default':
Issue bpo-3080: find_module() returns the path as Unicode
http://hg.python.org/cpython/rev/9aec6f0e4076

New changeset c17bc2026145 by Victor Stinner in branch 'default':
Issue bpo-3080: imp.new_module() uses Unicode
http://hg.python.org/cpython/rev/c17bc2026145

New changeset c4361bab6914 by Victor Stinner in branch 'default':
Issue bpo-3080: Use repr() to format the module name on error
http://hg.python.org/cpython/rev/c4361bab6914

New changeset 80f4bd647695 by Victor Stinner in branch 'default':
Issue bpo-3080: Add PyImport_ImportModuleLevelObject() function
http://hg.python.org/cpython/rev/80f4bd647695

New changeset cc7c0f6f60bf by Victor Stinner in branch 'default':
Issue bpo-3080: skip test_bdist_rpm if sys.executable is not encodable to UTF-8
http://hg.python.org/cpython/rev/cc7c0f6f60bf

python-dev · 2011-03-20T03:29:01Z

New changeset f8d6f6797909 by Victor Stinner in branch 'default':
Issue bpo-3080: Fix case_ok() using case_bytes()
http://hg.python.org/cpython/rev/f8d6f6797909

python-dev · 2011-03-20T04:00:28Z

New changeset dc38c4d65cd9 by Victor Stinner in branch 'default':
Issue bpo-3080: Fix call to case_ok() in find_init_module()
http://hg.python.org/cpython/rev/dc38c4d65cd9

vstinner · 2011-03-20T11:26:50Z

http://www.python.org/dev/buildbot/all/builders/PPC%20Tiger%203.x/builds/1599/steps/test/logs/stdio

======================================================================
ERROR: testImpWrapper (test.test_importhooks.ImportHooksTestCase)
----------------------------------------------------------------------

Traceback (most recent call last):
  File "/Users/buildbot/buildarea/3.x.parc-tiger-1/build/Lib/test/test_importhooks.py", line 239, in testImpWrapper
    m = __import__(mname, globals(), locals(), ["__dummy__"])
  File "/Users/buildbot/buildarea/3.x.parc-tiger-1/build/Lib/test/test_importhooks.py", line 132, in load_module
    mod = imp.load_module(fullname, self.file, self.filename, self.stuff)
  File "/Users/buildbot/buildarea/3.x.parc-tiger-1/build/Lib/distutils/core.py", line 19, in <module>
    from distutils.cmd import Command
  File "/Users/buildbot/buildarea/3.x.parc-tiger-1/build/Lib/test/test_importhooks.py", line 132, in load_module
    mod = imp.load_module(fullname, self.file, self.filename, self.stuff)
  File "/Users/buildbot/buildarea/3.x.parc-tiger-1/build/Lib/distutils/cmd.py", line 11, in <module>
    from distutils import util, dir_util, file_util, archive_util, dep_util
  File "/Users/buildbot/buildarea/3.x.parc-tiger-1/build/Lib/test/test_importhooks.py", line 132, in load_module
    mod = imp.load_module(fullname, self.file, self.filename, self.stuff)
  File "/Users/buildbot/buildarea/3.x.parc-tiger-1/build/Lib/distutils/dir_util.py", line 8, in <module>
    import errno
  File "/Users/buildbot/buildarea/3.x.parc-tiger-1/build/Lib/test/test_importhooks.py", line 132, in load_module
    mod = imp.load_module(fullname, self.file, self.filename, self.stuff)
TypeError: 'NoneType' object is not iterable

vstinner · 2011-03-20T11:35:13Z

mod = imp.load_module(fullname, self.file, self.filename, self.stuff)
TypeError: 'NoneType' object is not iterable

The problem is that imp.find_module() now returns None as the filename, but imp.load_module() doesn't support None.

merwok · 2011-03-20T17:09:42Z

Attached patch fixes a typo in Doc/c-api/import.rst. You can merge it in your next commit.

python-dev · 2011-03-20T21:38:25Z

New changeset 7f4a4e393058 by Victor Stinner in branch 'default':
Issue bpo-3080: imp.load_module() accepts None for the module path
http://hg.python.org/cpython/rev/7f4a4e393058

vstinner · 2011-03-20T23:34:01Z

Ok. Python 3.3 does now support non-ASCII characters in module paths and names on Windows, but only characters encodable to the ANSI code page. To support the full Unicode range, we should remove all calls to PyUnicode_EncodeFSDefault() on Windows:

a) parse_source_module() has to encode the filename because the parser has no function expecting a filename as a Python object. It uses currently PyParser_ASTFromFile().

b) write_compiled_module() encodes the filename to call open_exclusive(). I don't know how to implement open_exclusive() for Windows using Unicode filename: open() expects the filename as a byte string. Can we use _Py_fopen() (_wfopen)? Or do you need the O_EXCL flag?

c) _PyImport_LoadDynamicModule() encodes the filename for _PyImport_GetDynLoadFunc(). The prototype should be changed, but only on Windows, to accept a filename as a Unicode string.

Issue bpo-10785 is the right fix to (a). When bpo-10785 will be fixed, it will be easier to fix bpo-9319 crash.

vstinner · 2011-03-20T23:49:40Z

c) _PyImport_LoadDynamicModule() encodes the filename for
_PyImport_GetDynLoadFunc(). The prototype should be changed,
but only on Windows, to accept a filename as a Unicode string.

Hum, the difficult part is to use Unicode in _PyImport_GetDynLoadFunc() for:

hDLL = LoadLibraryEx(pathname, NULL, LOAD_WITH_ALTERED_SEARCH_PATH);

There is a LoadLibraryW() function, but it doesn't have a flag argument. And I suppose that the LOAD_WITH_ALTERED_SEARCH_PATH option is important.

vstinner · 2011-03-21T00:01:35Z

Ok, I think that the most important part is now implemented in Python 3.3: use Unicode for module names and paths in the import machinery. Remaing parts are specific to Windows, and so I opened a new issue: bpo-11619. Let's close this 3 years old issue.

python-dev · 2011-03-21T02:22:59Z

New changeset ee4e780a6b7a by Éric Araujo in branch 'default':
Fix a typo (see bpo-3080)
http://hg.python.org/cpython/rev/ee4e780a6b7a

asvetlov · 2011-03-21T04:08:23Z

As I see Victor has dropped OS/2 support from Python/import.c
Perhaps file Python/dynload_os2.c should be removed also.
Not sure about other dynload_* files.

vstinner · 2011-03-21T09:34:11Z

As I see Victor has dropped OS/2 support from Python/import.c
Perhaps file Python/dynload_os2.c should be removed also.
Not sure about other dynload_* files.

340f76a6a792 just removes few lines in import.c: they can easily be rewritten. And this commit doesn't drop completly the support of OS/2 from the import machinery, as you wrote: dynload_os2.c still exists.

If we drop completly the support of OS/2, it should be done completly using a PEP (I don't remember its number), and it should be discussed. At least with Andrew I MacIntyre :-)

asvetlov · 2011-03-21T15:06:31Z

Understood. Sorry.
I thought Python support only Windows and posix (Linux, BSD, MacOSX etc) systems now, all other OSes are not maintained.

Anyway please don't care about that.

python-dev · 2011-03-22T00:22:49Z

New changeset 15f9eca5e956 by Victor Stinner in branch 'default':
Issue bpo-3080: On DJGPP, case_bytes() returns -1 to signal an error if the file
http://hg.python.org/cpython/rev/15f9eca5e956

vstinner · 2011-03-23T15:58:18Z

test the fixed nosy list

amauryfa added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label Jun 11, 2008

pitrou added the type-bug An unexpected behavior, bug, or error label Aug 9, 2008

birkenfeld assigned vstinner Jul 10, 2010

vstinner closed this as completed Mar 21, 2011

ezio-melotti transferred this issue from another repository Apr 10, 2022

Full unicode import system #47330

Full unicode import system #47330

Comments

amauryfa commented Jun 11, 2008

amauryfa commented Jun 11, 2008

benjaminp commented Jun 11, 2008

birkenfeld commented Jul 10, 2010

vstinner commented Jul 30, 2010

vstinner commented Oct 19, 2010

bitdancer commented Dec 14, 2010

vstinner commented Dec 14, 2010

vstinner commented Dec 28, 2010

terryjreedy commented Jan 8, 2011

vstinner commented Jan 19, 2011

vstinner commented Jan 19, 2011

vstinner commented Jan 19, 2011

ncoghlan commented Jan 20, 2011

vstinner commented Jan 20, 2011

ncoghlan commented Jan 20, 2011

vstinner commented Jan 20, 2011

vstinner commented Jan 21, 2011

vstinner commented Jan 21, 2011

vstinner commented Jan 21, 2011

vstinner commented Jan 21, 2011

vstinner commented Jan 21, 2011

vstinner commented Jan 21, 2011

ncoghlan commented Jan 21, 2011

ncoghlan commented Jan 21, 2011

ncoghlan commented Jan 21, 2011

vstinner commented Jan 21, 2011

ncoghlan commented Jan 21, 2011

vstinner commented Feb 22, 2011

vstinner commented Feb 23, 2011

pitrou commented Feb 23, 2011

vstinner commented Feb 23, 2011

vstinner commented Mar 4, 2011

vstinner commented Mar 9, 2011

amauryfa commented Mar 10, 2011

vstinner commented Mar 10, 2011

vstinner commented Mar 11, 2011

vstinner commented Mar 15, 2011

vstinner commented Mar 19, 2011

vstinner commented Mar 19, 2011

python-dev mannequin commented Mar 20, 2011

python-dev mannequin commented Mar 20, 2011

python-dev mannequin commented Mar 20, 2011

vstinner commented Mar 20, 2011

vstinner commented Mar 20, 2011

merwok commented Mar 20, 2011

python-dev mannequin commented Mar 20, 2011

vstinner commented Mar 20, 2011

vstinner commented Mar 20, 2011

vstinner commented Mar 21, 2011

python-dev mannequin commented Mar 21, 2011

asvetlov commented Mar 21, 2011

vstinner commented Mar 21, 2011

asvetlov commented Mar 21, 2011

python-dev mannequin commented Mar 22, 2011

vstinner commented Mar 23, 2011