This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: imp.find_module() mixes UTF8 and MBCS
Type: behavior Stage: resolved
Components: Interpreter Core Versions: Python 3.1, Python 3.2
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: ezio.melotti Nosy List: asvetlov, brett.cannon, ezio.melotti, flox, gvanrossum, loewis, ncoghlan, ocean-city
Priority: normal Keywords: buildbot

Created on 2009-03-30 14:26 by gvanrossum, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
import.zip asvetlov, 2009-03-30 19:48 patch for support non-ascii characters in module names. With unittest
import.zip asvetlov, 2009-03-30 20:42
import.zip asvetlov, 2009-03-31 19:07 updated patch for fixing error messages in generated exception
import_patch_4th_edition.zip asvetlov, 2009-04-05 04:58 DONT APPLY IT. Works only for windows now.
Messages (15)
msg84548 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2009-03-30 14:26
There's a path in imp.find_module that mixes encodings.  The module name
is encoded to char* using UTF-8 by the 's' format passed to
PyArg_ParseTuple().  But the path name is converted (in the loop over
the path in find_module()) to char* using the filesystem encoding.  On
Windows this ends up constructing a char* that mixes MBCS and UTF8 in
one string.

(We discovered this when researching bug 4352, but this is not the cause
of the problem reported there -- the module name in that bug is ASCII.)

Andrew Svetlov is looking into producing a patch.
msg84626 - (view) Author: Andrew Svetlov (asvetlov) * (Python committer) Date: 2009-03-30 19:48
Problem fixed, patch attached
I inserted conversion path parameters to using Py_FileSystemDefaultEncoding for:

* load_module
* load_compiled
* load_dynamic
* load_source
* load_package

find_module is already has conversion.
msg84634 - (view) Author: Hirokazu Yamamoto (ocean-city) * (Python committer) Date: 2009-03-30 20:13
PyMem_Free is needed when "es" is used with PyArg_ParseTuple. See other
part of import.c. I did same mistake before. ;-)
msg84636 - (view) Author: Andrew Svetlov (asvetlov) * (Python committer) Date: 2009-03-30 20:20
Thank you.

On Mon, Mar 30, 2009 at 3:13 PM, Hirokazu Yamamoto
<report@bugs.python.org> wrote:
>
> Hirokazu Yamamoto <ocean-city@m2.ccsnet.ne.jp> added the comment:
>
> PyMem_Free is needed when "es" is used with PyArg_ParseTuple. See other
> part of import.c. I did same mistake before. ;-)
>
> ----------
> nosy: +ocean-city
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue5604>
> _______________________________________
>
msg84644 - (view) Author: Andrew Svetlov (asvetlov) * (Python committer) Date: 2009-03-30 20:42
According to Hirokazu Yamamoto memory cleanup added.
Patch is updated.
msg84663 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2009-03-30 21:49
Thanks Andrew!  Committed to 3.0.2 as 70756.

Should be merged into 3.1, but should *not* be backported to 2.x.
msg84862 - (view) Author: Andrew Svetlov (asvetlov) * (Python committer) Date: 2009-03-31 19:07
Continuing work over import.c I fixed bad error message encoding in 
generated ImportError exception.
Tests for checking in case of non-ascii characters added.
msg84865 - (view) Author: Andrew Svetlov (asvetlov) * (Python committer) Date: 2009-03-31 19:09
Martin von Loewis added to nosy list
msg85220 - (view) Author: Andrew Svetlov (asvetlov) * (Python committer) Date: 2009-04-02 16:27
Martin, can you review latest patch and apply it if this one is correct.

I want to start working on conversion import.c to use unicode strings 
(we spoke about Tuesday) this weekend.
It will be nice if I will have synchronized svn before making new 
changes.
msg85464 - (view) Author: Andrew Svetlov (asvetlov) * (Python committer) Date: 2009-04-05 04:58
Continuing work on problem I figured out:
* on Windows it's impossible to convert filenames to file system 
encoding without and don't miss something.
* Windows can work properly only with unicode (wchar_t) characters.
* all other systems feels itself good using utf-8 (or another filesystem 
encoding).
* it's very errorprone to change 'char*' to 'PyUnicode*'.

To solve this problem I assume: 
* all char* in Python API is utf-8.
* if there are need call to operation system api like fopen - call 
imp_fopen, this function will do need conversions. Inside import.c there 
are 4 calls: fopen, open_exclusive, unlink, stat. I want to write stubs 
for this calls. 
* also loaders for dynamic modules aka 'C extensions' have to expect 
utf-8 as  pathname parameter, not 'filesystem encoded'.

Patch for windows is applied (STILL NOT CONVERTED TO OTHER OS).
But for Windows it works (regression tests passed).

If this solution is applicable for 3.1 (as I know Cannon works on excellent importlib but this library will replace imp functionality only 
in 3.2) - I can continue switching. Unfortunately I cannot test py3k 
trunk on non-windows machines - but I can 'make all OS calls as 
expected' and wait for buildbot answer.

Please review import_patch_4th_edition.zip and if I ran in wrong way - 
let me know.
msg99499 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2010-02-18 12:09
Still an issue for some buildbot:

http://www.python.org/dev/buildbot/all/builders/x86%20XP-4%203.x/builds/1487
http://www.python.org/dev/buildbot/all/builders/x86%20XP-4%203.x/builds/1491


It is loosely related with #7712, because now the tests are run in TEMP (on C:) instead of the build dir (on D:).
msg99503 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2010-02-18 13:23
Also the test has a few problems:
1) the keys of known_locales are lowercase, but locale_encoding = locale.getpreferredencoding() can return uppercase encodings (e.g. UTF-8);
2) this masks another error: the b'\xe4' is not a valid utf-8 byte and it can be decoded;
3) the test should be skipped properly if the preferred encoding is not among the ones of the known_locales dict;
4) the 'encoded_char' should be 'decoded_char'.

It seems that in the failure linked by Florent, find_module tries to encode the filename with the wrong encoding and with error='replace' and the char at the end of 'test_imp_helper_' is converted to U+FFFD.
If the file is created with the correct name (e.g. 'test_imp_helper_\xc0') and find_module tries to search the wrong name (i.e. 'test_imp_helper_\ufffd'), then the error is raised (but then cp1252 used by the terminal can't encode that char and the second exception is raised).
Now the tests are run on C: and the filesystem encoding might be different, so it might not match anymore the encoding returned by locale.getpreferredencoding().
msg100505 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2010-03-05 23:45
I fixed all the things listed in the previous message in r78689, but that just enabled the test on several Linux buildbots and some started to fail too.
In r78696 (and r78697) I tried to use sys.getfilesystemencoding() instead of locale.getpreferredencoding() and that turned at least the Windows buildbots green, but there are still a few linux bots that fail:
http://www.python.org/dev/buildbot/builders/AMD64%20Ubuntu%20wide%203.x/builds/545/steps/test/logs/stdio
http://www.python.org/dev/buildbot/builders/AMD64%20Ubuntu%203.x/builds/561/steps/test/logs/stdio
http://www.python.org/dev/buildbot/builders/x86%20Ubuntu%203.x/builds/525/steps/test/logs/stdio
http://www.python.org/dev/buildbot/builders/amd64%20gentoo%203.x/builds/513/steps/test/logs/stdio
msg100528 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2010-03-06 14:50
Thanks for fixing this. Now Win7 buildbot is green on trunk.
msg100529 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2010-03-06 14:54
The Linux buildbots were running the tests using ./python ./Lib/test/regrtest.py instead of ./python -m test.regrtest and '' was missing from sys.path, so imp.find_module couldn't find the module.
This is now fixed in r78711 and backported r78716.
Thanks to Florent that noticed that those buildbots were using a different command to run the tests.
History
Date User Action Args
2022-04-11 14:56:47adminsetgithub: 49854
2010-03-06 14:54:18ezio.melottisetstatus: open -> closed
assignee: ezio.melotti
resolution: fixed
messages: + msg100529
2010-03-06 14:50:28floxsetmessages: + msg100528
stage: needs patch -> resolved
2010-03-05 23:45:40ezio.melottisetnosy: + ncoghlan
messages: + msg100505
2010-02-18 14:45:26floxsetversions: + Python 3.2, - Python 3.0
2010-02-18 13:24:48ezio.melottisetstatus: closed -> open
resolution: fixed -> (no value)
2010-02-18 13:23:16ezio.melottisetstatus: open -> closed
resolution: fixed
messages: + msg99503
2010-02-18 12:09:46floxsetstatus: closed -> open

nosy: + ezio.melotti, flox
messages: + msg99499

keywords: + buildbot
resolution: fixed -> (no value)
2009-04-05 04:59:09asvetlovsetfiles: + import_patch_4th_edition.zip
nosy: + brett.cannon
messages: + msg85464

2009-04-02 16:27:54asvetlovsetmessages: + msg85220
2009-03-31 19:09:28asvetlovsetnosy: + loewis
messages: + msg84865
2009-03-31 19:07:14asvetlovsetfiles: + import.zip

messages: + msg84862
2009-03-30 21:49:01gvanrossumsetstatus: open -> closed
resolution: fixed
messages: + msg84663
2009-03-30 20:42:48asvetlovsetfiles: + import.zip

messages: + msg84644
2009-03-30 20:20:42asvetlovsetmessages: + msg84636
2009-03-30 20:13:02ocean-citysetnosy: + ocean-city
messages: + msg84634
2009-03-30 19:48:45asvetlovsetfiles: + import.zip

messages: + msg84626
2009-03-30 14:26:40gvanrossumcreate