classification
Title: Python 3 doesn't support non-ASCII module names with a locale encoding different than UTF-8
Type: behavior Stage:
Components: Versions: Python 3.1, Python 3.2
process
Status: closed Resolution: duplicate
Dependencies: Superseder: On Windows, don't encode filenames in the import machinery
View: 11619
Assigned To: Nosy List: ingemar, r.david.murray, terry.reedy, vstinner
Priority: normal Keywords:

Created on 2011-01-04 19:44 by ingemar, last changed 2011-03-21 00:56 by terry.reedy. This issue is now closed.

Messages (15)
msg125360 - (view) Author: ingemar (ingemar) Date: 2011-01-04 19:44
I have a set of programs written for Python3.1 and running well on Kubuntu. The source files are located on a Samba server on a Kubuntu box.  Several of the programs contain Python/PyQt  code to start other programs in the set (   QtCore.QProcess().startDetached(kommando)   )
I have had no problems using non-ascii filenames in the Linux environment.


When I tried to check the programs in a MS Windows environment  (Win2K with Python 3.1.2 in a VirtualBox in a Kubuntu box) then Python complained:
ImportError: module xxx not found..

The ugly solution has been to refrain from the use of non-ascii characters in the names of files imported from. This involved the filename of the imported file and also one line of code changed in the importing file.

Example: 
1) rename  "gui_jämföra.py"   --->   "gui_jamfora.py"
2) in the importing file  "jämföra.py"  change one line:
"from  gui_jämföra  import  * "   --->   "from  gui_jamfora  import  gui_Jämföra"

Is there a beautiful solution that will permit me to use non-ascii utf-8 also in the file names of files imported from?
msg125366 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-01-04 21:44
Have you tried 3.2b2?
msg125381 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-01-04 22:59
I think that this issue is a duplicate of #8611 (and #9425), it should be fixed in Python 3.2.
msg125408 - (view) Author: ingemar (ingemar) Date: 2011-01-05 04:26
Have I tried 3.2b2?

No. I will have to wait for 3.2, or more exactly for a Windows installer for PyQt for 3.2 to become available.
Compiling that on Windows is beyond my resources and experience.
I will make a point to tell you then.
msg125739 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2011-01-08 01:14
(Ingemar: one can easily test import statements without pyqt, let alone qt ;-)

With 3.2b2 on our Win7, 64 bit machine, files with a Japanese name run but apparently cannot be imported.

a.py: print('something')
^|.py: print('other') # ^| == imitation of katakana name
c.py: import a; import ^|
something
ImportError: No module named ^|

Tried in both japanese- and then ascii-named directories.
So I am not convinced that #9425 is finished. What might I have misunderstood?
msg125745 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-01-08 03:04
> With 3.2b2 on our Win7, 64 bit machine, files with a Japanese name...

What is your ANSI code page? If it is not a japanese code page, it is the issue #3080.

On Windows, #8611 (and #9425) permit to use non-ASCII characters in the module path... but only characters encodable to your ANSI code page.
msg125753 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2011-01-08 06:34
ANSI code page? I have no idea how to find out and many would not even know what such a thing exists. It is an HP laptop sold in the US.  

I think bugs in core syntax should have high priority. I appreciate your work toward fixing it.
msg125754 - (view) Author: ingemar (ingemar) Date: 2011-01-08 06:37
Terry: Thanks for the hint
In a pure ascii path I created files very similar to yours with Swedish "ä" instead of your katakana character.
I also got the same result.

a.py:
print ('something')

ä.py:
print ('other')

c.py:
# -*- coding: utf-8 -*-
import a
import ä

I ran the files with 3.2b2:
    
c:\Python32\python.exe a.py
something

c:\Python32\python.exe ä.py
other

c:\Python32\python.exe c.py
something
Traceback (most recent call last):
  File "c.py", line 3, in <module>
    import ä
ImportError: No module name ä


Victor: How do I determine what code page my old w2k is using?.
Would that be 8859-1 or some older variant for western Europe or Sweden?
msg125786 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-01-08 16:13
> Victor: How do I determine what code page my old w2k is using?.

python.exe -c 'import locale; print("ANSI code page: {}".format(locale.getpreferredencoding()))'


> On Windows, #8611 (and #9425) permit to use non-ASCII characters 
> in the module path... but only characters encodable to your 
> ANSI code page.

If you would like to check if your path is encodable to your ANSI code page, try:

python.exe -c "import os; fn=os.fsencode('ä'); print(ascii(fn))"

If fsencode() raises an error, the filename is not encodable to your ANSI code page and you have to wait until #3080 is fixed :-)
msg125787 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-01-08 16:16
> I think bugs in core syntax should have high priority.

It took me 7 months to implement the first part (#8611 and #9425). I plan to do the second part (#3080) in Python 3.3 (it's too late for Python 3.2, final is planned for February 5, 2011). I already have an huge patch somewhere (in a SVN branch, import_unicode), but I have to update the patch and split it into small and simple patches.
msg125795 - (view) Author: ingemar (ingemar) Date: 2011-01-08 19:34
python.exe -c "import locale; print('ANSI code page: {}'.format(locale.getpreferredencoding()))"
ANSI code page: cp1252


python.exe -c "import os; fn=os.fsencode('ä'); print(ascii(fn))"
b'\xe4'
and no error raised
msg125819 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-01-09 02:40
> ANSI code page: cp1252 ...os.fsencode('ä') => b'\xe4'

Hum, I ran your example with a debugger, and ok, I now remember the whole thing.

I fixed Python to support non-ASCII characters (... only non-ASCII characters encodable to the ANSI code page for Windows) in the *search path*, not in the module name.

The import machinery encodes each search path to the filesystem encoding, but it encodes the module name to UTF-8. Concatenate two byte strings encoded to different encodings doesn't work (it leads to mojibake).

To fix this problem, there are two solutions:

 a) encode the module name to the fileystem encoding
 b) manipulate paths as unicode strings; to access the filesystem: use the wide character (unicode) API of Windows and encode paths to the filesystem encoding on UNIX/BSD

It is easier to implement (a) than (b), but (a) only gives you the support of paths and module names encodable to the ANSI code page.

(b) gives you the full unicode support because it never *encodes* paths to the filesystem encoding, but it may *decodes* paths from the filesystem encoding. Encode a path raises a UnicodeEncodeError on the first character not encodable to the ANSI code page, whereas decode a path never fails (except if the user manually changed its code page to a rare ANSI code page like UTF-8).

I implemented (b) in my import_unicode SVN branch, but as I wrote, I still have some work to merge this branch into py3k, and anyway I will wait for Python 3.3.
msg125822 - (view) Author: ingemar (ingemar) Date: 2011-01-09 04:47
Thanks Victor for the explanation.

Py3 is still far better than Py2, letting me use utf-8 as much as it does.

I will be able to live with this bug being known. I can understand though, that people in some places of the world may feel more concerned.
msg131577 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-03-21 00:03
I closed #3080: Python 3.3 is now able to handle non-ASCII characters in module names and paths. But it is only able to handle non-ASCII characters encodable to the ANSI code page. To support all characters, I opened the issue #11619 (see also #10785).
msg131588 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2011-03-21 00:56
As Victor noted, this issue is essentially a duplicate of #3080 (and others) and now #11619 and needs no independent action apart from the latter. Since the discussion with ingemar seems finished, I am now closing.
History
Date User Action Args
2011-03-21 00:56:42terry.reedysetstatus: open -> closed

superseder: On Windows, don't encode filenames in the import machinery
resolution: duplicate
messages: + msg131588
2011-03-21 00:03:37vstinnersetmessages: + msg131577
2011-01-19 13:09:49vstinnersettitle: Cannot use nonascii utf8 in names of files imported from -> Python 3 doesn't support non-ASCII module names with a locale encoding different than UTF-8
2011-01-09 04:47:13ingemarsetmessages: + msg125822
2011-01-09 02:40:20vstinnersetmessages: + msg125819
2011-01-08 19:34:15ingemarsetmessages: + msg125795
2011-01-08 16:16:25vstinnersetmessages: + msg125787
2011-01-08 16:13:16vstinnersetmessages: + msg125786
2011-01-08 06:37:31ingemarsetmessages: + msg125754
2011-01-08 06:34:28terry.reedysetmessages: + msg125753
versions: + Python 3.2
2011-01-08 03:04:30vstinnersetmessages: + msg125745
2011-01-08 01:14:21terry.reedysetnosy: + terry.reedy
messages: + msg125739
2011-01-05 04:26:39ingemarsetnosy: vstinner, r.david.murray, ingemar
messages: + msg125408
2011-01-04 22:59:45vstinnersetnosy: vstinner, r.david.murray, ingemar
messages: + msg125381
2011-01-04 21:44:10r.david.murraysetnosy: + r.david.murray, vstinner
type: behavior
messages: + msg125366
2011-01-04 19:44:14ingemarcreate