Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Full unicode import system #47330

Closed
amauryfa opened this issue Jun 11, 2008 · 60 comments
Closed

Full unicode import system #47330

amauryfa opened this issue Jun 11, 2008 · 60 comments
Assignees
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error

Comments

@amauryfa
Copy link
Member

BPO 3080
Nosy @brettcannon, @birkenfeld, @terryjreedy, @amauryfa, @ncoghlan, @abalkin, @pitrou, @vstinner, @benjaminp, @merwok, @bitdancer, @asvetlov
Files
  • issue3080-5.patch
  • issue3080.py
  • typo.diff
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/vstinner'
    closed_at = <Date 2011-03-21.00:01:35.123>
    created_at = <Date 2008-06-11.18:24:56.218>
    labels = ['interpreter-core', 'type-bug']
    title = 'Full unicode import system'
    updated_at = <Date 2011-03-23.15:58:18.116>
    user = 'https://github.com/amauryfa'

    bugs.python.org fields:

    activity = <Date 2011-03-23.15:58:18.116>
    actor = 'vstinner'
    assignee = 'vstinner'
    closed = True
    closed_date = <Date 2011-03-21.00:01:35.123>
    closer = 'vstinner'
    components = ['Interpreter Core']
    creation = <Date 2008-06-11.18:24:56.218>
    creator = 'amaury.forgeotdarc'
    dependencies = []
    files = ['20477', '21296', '21305']
    hgrepos = []
    issue_num = 3080
    keywords = ['patch']
    message_count = 60.0
    messages = ['68005', '68015', '109844', '112028', '119107', '123963', '123993', '124756', '125752', '126514', '126515', '126516', '126606', '126608', '126612', '126613', '126672', '126673', '126676', '126678', '126680', '126681', '126695', '126705', '126706', '126708', '126752', '126755', '126756', '126760', '127591', '127674', '129141', '129143', '129185', '129196', '130050', '130473', '130492', '130507', '130645', '130935', '131457', '131464', '131472', '131473', '131474', '131483', '131484', '131516', '131547', '131571', '131572', '131575', '131606', '131612', '131625', '131664', '131711', '131890']
    nosy_count = 14.0
    nosy_names = ['brett.cannon', 'georg.brandl', 'terry.reedy', 'amaury.forgeotdarc', 'ncoghlan', 'belopolsky', 'pitrou', 'vstinner', 'benjamin.peterson', 'eric.araujo', 'Arfrever', 'r.david.murray', 'asvetlov', 'python-dev']
    pr_nums = []
    priority = 'high'
    resolution = 'fixed'
    stage = None
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue3080'
    versions = ['Python 3.3']

    @amauryfa
    Copy link
    Member Author

    This is the most difficult part of bpo-1342:
    """
    On Windows, don't use the FileSystemEncoding on Windows for sys.path items.
    Instead, it should use the wide API to perform all system calls. Py3k
    shouldn't ever use the file system encoding for anything on Windows.
    """

    This imply to rewrite all functions in import.c, and replace all char*
    arguments with unicode variables.

    @amauryfa amauryfa added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label Jun 11, 2008
    @benjaminp
    Copy link
    Contributor

    I suspect importlib may help with this.

    @pitrou pitrou added the type-bug An unexpected behavior, bug, or error label Aug 9, 2008
    @birkenfeld
    Copy link
    Member

    Victor is working on this.

    @vstinner
    Copy link
    Member

    I posted a patch: bpo-9425.

    @vstinner
    Copy link
    Member

    With bpo-8611 and bpo-9425, I patched a lot of functions and modules, including the NullImporter and zipimport, but not the core of the import machinery.

    In my import_unicode SVN branch, I patched the import machinery to manipulate unicode strings, instead of bytes strings. But the patch is huge and the import machinery is fragile. Since Python 3.2 now works in a non-ASCII directory with an ASCII locale (fileystem) encoding, I don't plan to merge the patch into py3k.

    The patch is still useful on Windows, because Python uses the mbcs encoding to encode/decode filenames, and this encoding is usually a very small subset of Unicode (eg. cp1252 is 256 codes wheres unicode 6.0 has 109,449 characters).

    @bitdancer
    Copy link
    Member

    With bpo-1342 fixed, it seems that this issue is no longer critical (Haypo describes his complicated patch as "useful on Windows", but not critical. So I'm downgrading it to 'high'. Perhaps it is even 'normal'. It also seems as though it is currently languishing unless someone wants to pick it up.

    @vstinner
    Copy link
    Member

    Haypo describes his complicated patch as "useful on Windows",
    but not critical

    Usecase on Windows: your japanese friend gives you an USB key (eg. created on Windows with code page 932) with his Python project, you cannot run it on your english speaking Windows (eg. code page 1252), because it loads Python modules with japanese characters in their paths.

    It works if all paths are encodable to your ANSI code page. It doesn't work if a least one character of one path is not encodable to your ANSI code page.

    I don't know if this usecase is common or not.

    Note: the FAT file system of the USB key stores filenames as UTF-16 (and not in the user code page).

    @vstinner
    Copy link
    Member

    Issue bpo-10785 prepares the work for this issue: store input filename as a unicode string, instead of a byte string, in the parser.

    @terryjreedy
    Copy link
    Member

    If I edit a file with IDLE, save it, and successfully run it (perhaps to test it), then when I edit a second file that imports the first, I expect the import to work. It does not always (see bpo-10828).

    Import is part of the core definition of the language. Unicode identifiers are supposedly part of Python3. Given the existence of <identifier>.py in the current directory, 'import identifier' should work. If it does not, the 3.1 message '<identifier> not found' is more truthful than the current 'no module named <identifier>', when there is one.

    The doc says "identifier ::= (identifier ".")* identifier". As long as that is not true, some indication of the restriction that most people can understand would be nice. (And I suspect that a majority of Windows users, at least in the US, have no idea of what an 'ANSI code page' is.)

    @vstinner
    Copy link
    Member

    Here is a work-in-progress patch: bpo-3080-3.patch. The patch is HUGE and written for Python 3.3.

    $ diffstat issue3080-3.patch 
     Doc/c-api/module.rst   |   24 
     Include/import.h       |   73 +
     Include/moduleobject.h |    2 
     Include/pycapsule.h    |    4 
     Modules/zipimport.c    |  272 +++

    Objects/moduleobject.c | 52 -
    PC/import_nt.c | 84 +-
    Python/dynload_aix.c | 2
    Python/dynload_dl.c | 2
    Python/dynload_hpux.c | 2
    Python/dynload_next.c | 4
    Python/dynload_os2.c | 2
    Python/dynload_shlib.c | 2
    Python/dynload_win.c | 2
    Python/import.c | 1910 +++++++++++++++++++++++++++----------------------
    Python/importdl.c | 79 +-
    Python/importdl.h | 2
    bpo-3080.py | 29
    18 files changed, 1484 insertions(+), 1063 deletions(-)

    As expected, most of the work in done in import.c.

    Decode the module name earlier and encode it later. Try to manipulate PyUnicodeObject objects instead of char* buffers (so we have directly the string length).

    Split the huge and very complex find_module() function into 3 functions (find_module, find_module_filename and find_module2) and document them. Drop OS/2 support in find_module() (it can be kept, but it was easier for me to drop it and the OS/2 maintainer wrote that Python 3 is far from being compatible with OS/2).

    The patch creates some functions: PyModule_GetNameObject(), PyImport_ExecCodeModuleUnicode(), PyImport_AddModuleUnicode(), PyImport_ImportFrozenModuleUnicode(), PyModule_NewUnicode(), ...

    Use "U" format to parse a module name, and "%R" to format a module name (to escape surrogates characters and add quotes, instead of "... '%.200s' ...").

    PyWin_FindRegisteredModule() is now private. Remove fqname argument from _PyImport_GetDynLoadFunc(), it wasn't used.

    Replace open_exclusive() by fopen(name, "wb") on Windows: is it correct?

    TODO:

    • rename xxxobj => xxx to keep original names and have a short patch (eg. I renamed name to nameobj during the transition to detect bugs)
    • catch encoding errors in case_ok()
    • don't encode in case_ok() if case_ok() does nothing (eg. on Linux)
    • find a better name for find_module2()

    The patch contains a tiny script, bpo-3080.py, to test the patch using an ISO-8859-1 locale.

    I will open a thread on the mailing list (python-dev) to decide if this patch is needed or not. If we agree that this issue should be fixed, I will split the patch into smaller parts and start a review process.

    @vstinner
    Copy link
    Member

    This patch changes more lines of code than my previous crazy unicode patch (msg103663, issue bpo-8242 bpo-8611 bpo-9425), but it changes less files.

    @vstinner
    Copy link
    Member

    @ncoghlan
    Copy link
    Contributor

    Victor, could you please create a Reitveld review for this? The auto-review creator can't cope with the Git diffs.

    @vstinner
    Copy link
    Member

    Victor, could you please create a Reitveld review for this?

    Yes, but not yet. I have first to cleanup the patch.

    @ncoghlan
    Copy link
    Contributor

    OK - I'll wait until that is ready before digging into this.

    @vstinner
    Copy link
    Member

    Use "U" format to parse a module name, and "%R" to format a module name
    (to escape surrogates characters and add quotes, instead of
    "... '%.200s' ...").

    See also bpo-8754: repr() is better than str() for other reasons, eg. to see a space at the end of a module name (import('space ')) thanks to the quotes.

    @vstinner
    Copy link
    Member

    Version 4 of the patch.

    @vstinner
    Copy link
    Member

    Same patch (version 4) generated by svn.

    @vstinner
    Copy link
    Member

    You can review the patch with Rietveld:
    http://codereview.appspot.com/3972045

    @vstinner
    Copy link
    Member

    Oops, there is a dummy typo in imp_init_builtin() that makes test_importlib to crash (which proves that importlib has a good coverage :-)): replace "s:" by "U:" in if (!PyArg_ParseTuple(args, "s:init_builtin", &name)).

    @vstinner
    Copy link
    Member

    test_reprlib fails on Windows, because '\' in quoted '\\' in the filename on repr(module). Workaround:

    *******
    index b0dc4d7..e476941 100644
    --- a/Lib/test/test_reprlib.py
    +++ b/Lib/test/test_reprlib.py
    @@ -234,7 +234,7 @@ class LongReprTest(unittest.TestCase):
             touch(os.path.join(self.subpkgname, self.pkgname + '.py'))
             from areallylongpackageandmodulenametotestreprtruncation.areallylongpackageandmodulenametotestreprtruncation import areallylongpackageandmodulenametotestreprtruncation
             eq(repr(areallylongpackageandmodulenametotestreprtruncation),
    -           "<module '%s' from '%s'>" % (areallylongpackageandmodulenametotestreprtruncation.__name__, areallylongpackageandmodulenametotestreprtruncation.__file__))
    +           "<module %r from %r>" % (areallylongpackageandmodulenametotestreprtruncation.__name__, areallylongpackageandmodulenametotestreprtruncation.__file__))
             eq(repr(sys), "<module 'sys' (built-in)>")
     
         def test_type(self):
    *******

    It is maybe not a good idea to use %R to format the filename in module.__repr__().

    @vstinner
    Copy link
    Member

    test_runpy fails on Windows on make_legacy_pyc() (of test.support), I don't know why.

    @ncoghlan
    Copy link
    Contributor

    After applying the patch, doing a make clean and rebuild, I found that test_importlib fails with a segmentation fault, but the default test suite otherwise runs without error (that's on Linux with a UTF-8 filesystem, though).

    I'll see how a -uall run fares.

    @ncoghlan
    Copy link
    Contributor

    As for the more limited run, I get a clean run with -uall except for the segfault in test_importlib.

    I'll switch to a pydebug build and see how a verbose run of that test fares.

    @ncoghlan
    Copy link
    Contributor

    I haven't investigated in detail yet, but this is the final line showing the failing test:

    test_module (importlib.test.builtin.test_loader.LoaderTests) ... Segmentation fault

    @vstinner
    Copy link
    Member

    except for the segfault in test_importlib.

    Yes, as reported in my previous comment :-) Let's update the patch for practical reasons. But I don't want to touch http://codereview.appspot.com/1874048 (based on patch version 4).

    @ncoghlan
    Copy link
    Contributor

    Oops, missed that post - that was indeed the problem. With that fixed, tests are all good on this system. I'll give the patch a look anyway, but I'm going to have trouble diagnosing things that don't fail on my development machine.

    As far as the test_reprlib failure goes, I seem to recall addressing a similar problem elsewhere in the standard lib by replace a "%r" code with "'%s'" to get the single quotes without the backslash escaping. A similar change should probably do the trick here.

    @vstinner
    Copy link
    Member

    I started to commit some parts of the huge patch:

    r88515: Mark PyWin_FindRegisteredModule() as private
    r88516: Remove unused argument of _PyImport_GetDynLoadFunc()
    r88517 (3.3), r88518 (3.2): document encoding used by import functions

    @vstinner
    Copy link
    Member

    r88519: Mark _PyImport_FindBuiltin() argument as constant
    r88520: Add PyModule_GetNameObject()

    @pitrou
    Copy link
    Member

    pitrou commented Feb 23, 2011

    This new failure is perhaps related:

    http://www.python.org/dev/buildbot/all/builders/AMD64%20Windows%20Server%202008%203.x/builds/572/steps/test/logs/stdio

    ======================================================================
    FAIL: test_module (test.test_reprlib.LongReprTest)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "c:\buildslave-py3k\3.x.curtin-win2008-amd64\build\lib\test\test_reprlib.py", line 237, in test_module
        "<module '%s' from '%s'>" % (areallylongpackageandmodulenametotestreprtruncation.__name__, areallylongpackageandmodulenametotestreprtruncation.__file__))
    AssertionError: "<module 'areallylongpackageandmodulenametotestreprtruncation.areallylongpackage [truncated]... != "<module 'areallylongpackageandmodulenametotestreprtruncation.areallylongpackage [truncated]...
    Diff is 825 characters long. Set self.maxDiff to None to see it.

    @vstinner
    Copy link
    Member

    This new failure is perhaps related: (...) test_reprlib

    Ah yes, yesterday, I tried to remember which test was impacted by the module change, but all tests passed on Linux. Anyway, it's now fixed by r88533.

    @vstinner
    Copy link
    Member

    vstinner commented Mar 4, 2011

    r88746: Add PyModule_NewObject() function
    r88747: Add PyImport_AddModuleObject() and PyImport_ExecCodeModuleObject()

    @vstinner
    Copy link
    Member

    vstinner commented Mar 9, 2011

    I created the features/unicode_import repository with a "unicode_import" branch:
    http://hg.python.org/features/unicode_import/

    It's my huge patch splitted into small and atomic commits.

    @amauryfa
    Copy link
    Member Author

    Nice work! Is there a specific place for comments? Here are some of them already:

      pathsize = PyUnicode_GET_SIZE(prefix) + PyUnicode_GET_SIZE(name);
      result = PyUnicode_FromUnicode(NULL, pathsize);
      path = PyUnicode_AS_UNICODE(ret);
      ...
      return result;
      lastdot = Py_UNICODE_strrchr(nameuni, '.');
      if (lastdot == NULL)
          shortname = namenuni;
      else:
          shortname = lastdot + 1;
    • _PyImport_GetDynLoadFunc still takes char* arguments. Can this fail on win32
      for example, in case the pathname cannot be encoded to mbcs?

    @vstinner
    Copy link
    Member

    Is there a specific place for comments?

    Yes, but my work is not done. I still have parts to commit.

    _PyImport_GetDynLoadFunc still takes char* arguments.

    Oh. This one is not easy because this function has many implementations and all implementations have the same prototype. I will maybe fix it later.

    @vstinner
    Copy link
    Member

    See also bpo-9319: when this issue will be fixed, it will be easier to fix bpo-9319.

    @vstinner
    Copy link
    Member

    I finished to split the huge patch into smaller commits. You can now test the unicode_import Mercurial branch. Especially, it should be tested on Windows.

    I don't know if I should merge the branch as an unique commit or as multiple commits. Some of them can be simply be merged.

    You can try bpo-3080.py (file attached to this issue, extracted from the patch): a short script testing this issue.

    --

    The parser and _PyImport_GetDynLoadFunc() (on Windows) do still store the filename as byte strings, and so I don't think that Python is ready to use full Unicode range for filenames on Windows. But at least, it should now support non-ASCII module names and paths which are encodable to the ANSI code page.

    Issue bpo-10785 should improve the situation at least for the parser.

    But for _PyImport_GetDynLoadFunc(), I don't know if there is a Unicode version of LoadLibraryEx().

    --

    Modules/zipimport.c::make_filename: remove the limit buffer

    Implemented in f286d3b514e0.

    Python/importdl.c::_PyImport_LoadDynamicModule: shortnameobj is not necessary

    Done in 76907d413b99

    @vstinner
    Copy link
    Member

    Replace open_exclusive() by fopen(name, "wb") on Windows: is it correct?

    I reverted this change in my Mercurial branch (unicode_import).

    rename xxxobj => xxx to keep original names and have a short patch

    done

    catch encoding errors in case_ok()

    done

    don't encode in case_ok() if case_ok() does nothing (eg. on Linux)

    done

    find a better name for find_module2()

    done: find_module_path_list() and find_module_path()

    @vstinner
    Copy link
    Member

    test_runpy fails on Windows on make_legacy_pyc() (of test.support),
    I don't know why.

    Gotcha: I replaced mkdir() by CreateDirectoryW(), but the "directory already exists" error was not ignored. Fixed by 2debe178697b.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Mar 20, 2011

    New changeset 6c80ac44ae9c by Victor Stinner in branch 'default':
    Issue bpo-3080: zipimport has a full unicode suppport
    http://hg.python.org/cpython/rev/6c80ac44ae9c

    New changeset b50a0d44545a by Victor Stinner in branch 'default':
    Issue bpo-3080: PyImport_Cleanup() uses Unicode
    http://hg.python.org/cpython/rev/b50a0d44545a

    New changeset e7c1019b27b9 by Victor Stinner in branch 'default':
    Issue bpo-3080: Add PyImport_ImportFrozenModuleObject()
    http://hg.python.org/cpython/rev/e7c1019b27b9

    New changeset 2425717c6430 by Victor Stinner in branch 'default':
    Issue bpo-3080: Import builtins using Unicode strings
    http://hg.python.org/cpython/rev/2425717c6430

    New changeset ced52fcd95f6 by Victor Stinner in branch 'default':
    Issue bpo-3080: Use PyUnicode_InternFromString() for builtins
    http://hg.python.org/cpython/rev/ced52fcd95f6

    New changeset e63a583ec689 by Victor Stinner in branch 'default':
    Issue bpo-3080: Document the name attribute of the _inittab structure
    http://hg.python.org/cpython/rev/e63a583ec689

    New changeset bab42673674a by Victor Stinner in branch 'default':
    Issue bpo-3080: _PyWin_FindRegisteredModule() returns the path as Unicode
    http://hg.python.org/cpython/rev/bab42673674a

    New changeset ef2b6305d395 by Victor Stinner in branch 'default':
    Issue bpo-3080: _PyImport_LoadDynamicModule() uses Unicode for name and path
    http://hg.python.org/cpython/rev/ef2b6305d395

    New changeset d52f471fbbeb by Victor Stinner in branch 'default':
    Issue bpo-3080: find_module() initialize buf and *p_fp
    http://hg.python.org/cpython/rev/d52f471fbbeb

    New changeset bdf5820f5a39 by Victor Stinner in branch 'default':
    Issue bpo-3080: Remove useless name buffer from find_module()
    http://hg.python.org/cpython/rev/bdf5820f5a39

    New changeset a4d797b9ff63 by Victor Stinner in branch 'default':
    Issue bpo-3080: Create find_module_path_list() subfunction
    http://hg.python.org/cpython/rev/a4d797b9ff63

    New changeset 09aaac73d9cf by Victor Stinner in branch 'default':
    Issue bpo-3080: Create find_module_path() subfunction
    http://hg.python.org/cpython/rev/09aaac73d9cf

    New changeset f6507eb8e689 by Victor Stinner in branch 'default':
    Issue bpo-3080: get_sourcefile(), make_source_pathname(), load_package()
    http://hg.python.org/cpython/rev/f6507eb8e689

    New changeset d24decc8c97e by Victor Stinner in branch 'default':
    Issue bpo-3080: Use Unicode to import source and compiled modules
    http://hg.python.org/cpython/rev/d24decc8c97e

    New changeset 64c21f364519 by Victor Stinner in branch 'default':
    Issue bpo-3080: load_module() expects name and path as Unicode
    http://hg.python.org/cpython/rev/64c21f364519

    New changeset e55e7f197649 by Victor Stinner in branch 'default':
    Issue bpo-3080: PyImport_ImportModuleNoBlock() uses Unicode
    http://hg.python.org/cpython/rev/e55e7f197649

    New changeset 7c67aa3ab531 by Victor Stinner in branch 'default':
    Issue bpo-3080: Use Unicode for the "The Magnum Opus of dotted-name import"
    http://hg.python.org/cpython/rev/7c67aa3ab531

    New changeset 23fe237afa81 by Victor Stinner in branch 'default':
    Issue bpo-3080: Use %R to format module name in error messages
    http://hg.python.org/cpython/rev/23fe237afa81

    New changeset 2ee0ab9d2e8a by Victor Stinner in branch 'default':
    Issue bpo-3080: Reindent and simplify import_submodule()
    http://hg.python.org/cpython/rev/2ee0ab9d2e8a

    New changeset 340f76a6a792 by Victor Stinner in branch 'default':
    Issue bpo-3080: Drop OS/2 support for the import machinery
    http://hg.python.org/cpython/rev/340f76a6a792

    New changeset 156818529636 by Victor Stinner in branch 'default':
    Issue bpo-3080: find_module() expects module fullname and subname as Unicode
    http://hg.python.org/cpython/rev/156818529636

    New changeset fe1d421ca3fa by Victor Stinner in branch 'default':
    Issue bpo-3080: Rename some path variables to path_list
    http://hg.python.org/cpython/rev/fe1d421ca3fa

    New changeset c1a5a7dca1ec by Victor Stinner in branch 'default':
    Issue bpo-3080: find_module() sets an empty path for builtin and frozen modules
    http://hg.python.org/cpython/rev/c1a5a7dca1ec

    New changeset c4ccf02456d6 by Victor Stinner in branch 'default':
    Issue bpo-3080: Refactor find_module_path(), use return instead of break
    http://hg.python.org/cpython/rev/c4ccf02456d6

    New changeset 298a70b27497 by Victor Stinner in branch 'default':
    Issue bpo-3080: find_init_module() expects Unicode
    http://hg.python.org/cpython/rev/298a70b27497

    New changeset 066b399a8477 by Victor Stinner in branch 'default':
    Issue bpo-3080: case_ok() expects Unicode strings
    http://hg.python.org/cpython/rev/066b399a8477

    New changeset 9aec6f0e4076 by Victor Stinner in branch 'default':
    Issue bpo-3080: find_module() returns the path as Unicode
    http://hg.python.org/cpython/rev/9aec6f0e4076

    New changeset c17bc2026145 by Victor Stinner in branch 'default':
    Issue bpo-3080: imp.new_module() uses Unicode
    http://hg.python.org/cpython/rev/c17bc2026145

    New changeset c4361bab6914 by Victor Stinner in branch 'default':
    Issue bpo-3080: Use repr() to format the module name on error
    http://hg.python.org/cpython/rev/c4361bab6914

    New changeset 80f4bd647695 by Victor Stinner in branch 'default':
    Issue bpo-3080: Add PyImport_ImportModuleLevelObject() function
    http://hg.python.org/cpython/rev/80f4bd647695

    New changeset cc7c0f6f60bf by Victor Stinner in branch 'default':
    Issue bpo-3080: skip test_bdist_rpm if sys.executable is not encodable to UTF-8
    http://hg.python.org/cpython/rev/cc7c0f6f60bf

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Mar 20, 2011

    New changeset f8d6f6797909 by Victor Stinner in branch 'default':
    Issue bpo-3080: Fix case_ok() using case_bytes()
    http://hg.python.org/cpython/rev/f8d6f6797909

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Mar 20, 2011

    New changeset dc38c4d65cd9 by Victor Stinner in branch 'default':
    Issue bpo-3080: Fix call to case_ok() in find_init_module()
    http://hg.python.org/cpython/rev/dc38c4d65cd9

    @vstinner
    Copy link
    Member

    http://www.python.org/dev/buildbot/all/builders/PPC%20Tiger%203.x/builds/1599/steps/test/logs/stdio

    ======================================================================
    ERROR: testImpWrapper (test.test_importhooks.ImportHooksTestCase)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "/Users/buildbot/buildarea/3.x.parc-tiger-1/build/Lib/test/test_importhooks.py", line 239, in testImpWrapper
        m = __import__(mname, globals(), locals(), ["__dummy__"])
      File "/Users/buildbot/buildarea/3.x.parc-tiger-1/build/Lib/test/test_importhooks.py", line 132, in load_module
        mod = imp.load_module(fullname, self.file, self.filename, self.stuff)
      File "/Users/buildbot/buildarea/3.x.parc-tiger-1/build/Lib/distutils/core.py", line 19, in <module>
        from distutils.cmd import Command
      File "/Users/buildbot/buildarea/3.x.parc-tiger-1/build/Lib/test/test_importhooks.py", line 132, in load_module
        mod = imp.load_module(fullname, self.file, self.filename, self.stuff)
      File "/Users/buildbot/buildarea/3.x.parc-tiger-1/build/Lib/distutils/cmd.py", line 11, in <module>
        from distutils import util, dir_util, file_util, archive_util, dep_util
      File "/Users/buildbot/buildarea/3.x.parc-tiger-1/build/Lib/test/test_importhooks.py", line 132, in load_module
        mod = imp.load_module(fullname, self.file, self.filename, self.stuff)
      File "/Users/buildbot/buildarea/3.x.parc-tiger-1/build/Lib/distutils/dir_util.py", line 8, in <module>
        import errno
      File "/Users/buildbot/buildarea/3.x.parc-tiger-1/build/Lib/test/test_importhooks.py", line 132, in load_module
        mod = imp.load_module(fullname, self.file, self.filename, self.stuff)
    TypeError: 'NoneType' object is not iterable

    @vstinner
    Copy link
    Member

    mod = imp.load_module(fullname, self.file, self.filename, self.stuff)
    

    TypeError: 'NoneType' object is not iterable

    The problem is that imp.find_module() now returns None as the filename, but imp.load_module() doesn't support None.

    @merwok
    Copy link
    Member

    merwok commented Mar 20, 2011

    Attached patch fixes a typo in Doc/c-api/import.rst. You can merge it in your next commit.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Mar 20, 2011

    New changeset 7f4a4e393058 by Victor Stinner in branch 'default':
    Issue bpo-3080: imp.load_module() accepts None for the module path
    http://hg.python.org/cpython/rev/7f4a4e393058

    @vstinner
    Copy link
    Member

    Ok. Python 3.3 does now support non-ASCII characters in module paths and names on Windows, but only characters encodable to the ANSI code page. To support the full Unicode range, we should remove all calls to PyUnicode_EncodeFSDefault() on Windows:

    a) parse_source_module() has to encode the filename because the parser has no function expecting a filename as a Python object. It uses currently PyParser_ASTFromFile().

    b) write_compiled_module() encodes the filename to call open_exclusive(). I don't know how to implement open_exclusive() for Windows using Unicode filename: open() expects the filename as a byte string. Can we use _Py_fopen() (_wfopen)? Or do you need the O_EXCL flag?

    c) _PyImport_LoadDynamicModule() encodes the filename for _PyImport_GetDynLoadFunc(). The prototype should be changed, but only on Windows, to accept a filename as a Unicode string.

    Issue bpo-10785 is the right fix to (a). When bpo-10785 will be fixed, it will be easier to fix bpo-9319 crash.

    @vstinner
    Copy link
    Member

    c) _PyImport_LoadDynamicModule() encodes the filename for
    _PyImport_GetDynLoadFunc(). The prototype should be changed,
    but only on Windows, to accept a filename as a Unicode string.

    Hum, the difficult part is to use Unicode in _PyImport_GetDynLoadFunc() for:

    hDLL = LoadLibraryEx(pathname, NULL, LOAD_WITH_ALTERED_SEARCH_PATH);

    There is a LoadLibraryW() function, but it doesn't have a flag argument. And I suppose that the LOAD_WITH_ALTERED_SEARCH_PATH option is important.

    @vstinner
    Copy link
    Member

    Ok, I think that the most important part is now implemented in Python 3.3: use Unicode for module names and paths in the import machinery. Remaing parts are specific to Windows, and so I opened a new issue: bpo-11619. Let's close this 3 years old issue.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Mar 21, 2011

    New changeset ee4e780a6b7a by Éric Araujo in branch 'default':
    Fix a typo (see bpo-3080)
    http://hg.python.org/cpython/rev/ee4e780a6b7a

    @asvetlov
    Copy link
    Contributor

    As I see Victor has dropped OS/2 support from Python/import.c
    Perhaps file Python/dynload_os2.c should be removed also.
    Not sure about other dynload_* files.

    @vstinner
    Copy link
    Member

    As I see Victor has dropped OS/2 support from Python/import.c
    Perhaps file Python/dynload_os2.c should be removed also.
    Not sure about other dynload_* files.

    340f76a6a792 just removes few lines in import.c: they can easily be rewritten. And this commit doesn't drop completly the support of OS/2 from the import machinery, as you wrote: dynload_os2.c still exists.

    If we drop completly the support of OS/2, it should be done completly using a PEP (I don't remember its number), and it should be discussed. At least with Andrew I MacIntyre :-)

    @asvetlov
    Copy link
    Contributor

    Understood. Sorry.
    I thought Python support only Windows and posix (Linux, BSD, MacOSX etc) systems now, all other OSes are not maintained.

    Anyway please don't care about that.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Mar 22, 2011

    New changeset 15f9eca5e956 by Victor Stinner in branch 'default':
    Issue bpo-3080: On DJGPP, case_bytes() returns -1 to signal an error if the file
    http://hg.python.org/cpython/rev/15f9eca5e956

    @vstinner
    Copy link
    Member

    test the fixed nosy list

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    10 participants