Issue 10828: Python 3 doesn't support non-ASCII module names with a locale encoding different than UTF-8

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/55037

classification

Title:	Python 3 doesn't support non-ASCII module names with a locale encoding different than UTF-8
Type:	behavior	Stage:
Components:		Versions:	Python 3.1, Python 3.2

process

Status:	closed	Resolution:	duplicate
Dependencies:		Superseder:	On Windows, don't encode filenames in the import machinery View: 11619
Assigned To:		Nosy List:	ingemar, r.david.murray, terry.reedy, vstinner
Priority:	normal	Keywords:

Created on 2011-01-04 19:44 by ingemar, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Messages (15)
msg125360 - (view)	Author: ingemar (ingemar)	Date: 2011-01-04 19:44
I have a set of programs written for Python3.1 and running well on Kubuntu. The source files are located on a Samba server on a Kubuntu box. Several of the programs contain Python/PyQt code to start other programs in the set ( QtCore.QProcess().startDetached(kommando) ) I have had no problems using non-ascii filenames in the Linux environment. When I tried to check the programs in a MS Windows environment (Win2K with Python 3.1.2 in a VirtualBox in a Kubuntu box) then Python complained: ImportError: module xxx not found.. The ugly solution has been to refrain from the use of non-ascii characters in the names of files imported from. This involved the filename of the imported file and also one line of code changed in the importing file. Example: 1) rename "gui_jämföra.py" ---> "gui_jamfora.py" 2) in the importing file "jämföra.py" change one line: "from gui_jämföra import * " ---> "from gui_jamfora import gui_Jämföra" Is there a beautiful solution that will permit me to use non-ascii utf-8 also in the file names of files imported from?
msg125366 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2011-01-04 21:44
Have you tried 3.2b2?
msg125381 - (view)	Author: STINNER Victor (vstinner) *	Date: 2011-01-04 22:59
I think that this issue is a duplicate of #8611 (and #9425), it should be fixed in Python 3.2.
msg125408 - (view)	Author: ingemar (ingemar)	Date: 2011-01-05 04:26
Have I tried 3.2b2? No. I will have to wait for 3.2, or more exactly for a Windows installer for PyQt for 3.2 to become available. Compiling that on Windows is beyond my resources and experience. I will make a point to tell you then.
msg125739 - (view)	Author: Terry J. Reedy (terry.reedy) *	Date: 2011-01-08 01:14
(Ingemar: one can easily test import statements without pyqt, let alone qt ;-) With 3.2b2 on our Win7, 64 bit machine, files with a Japanese name run but apparently cannot be imported. a.py: print('something') ^\|.py: print('other') # ^\| == imitation of katakana name c.py: import a; import ^\| something ImportError: No module named ^\| Tried in both japanese- and then ascii-named directories. So I am not convinced that #9425 is finished. What might I have misunderstood?
msg125745 - (view)	Author: STINNER Victor (vstinner) *	Date: 2011-01-08 03:04
> With 3.2b2 on our Win7, 64 bit machine, files with a Japanese name... What is your ANSI code page? If it is not a japanese code page, it is the issue #3080. On Windows, #8611 (and #9425) permit to use non-ASCII characters in the module path... but only characters encodable to your ANSI code page.
msg125753 - (view)	Author: Terry J. Reedy (terry.reedy) *	Date: 2011-01-08 06:34
ANSI code page? I have no idea how to find out and many would not even know what such a thing exists. It is an HP laptop sold in the US. I think bugs in core syntax should have high priority. I appreciate your work toward fixing it.
msg125754 - (view)	Author: ingemar (ingemar)	Date: 2011-01-08 06:37
Terry: Thanks for the hint In a pure ascii path I created files very similar to yours with Swedish "ä" instead of your katakana character. I also got the same result. a.py: print ('something') ä.py: print ('other') c.py: # -- coding: utf-8 -- import a import ä I ran the files with 3.2b2: c:\Python32\python.exe a.py something c:\Python32\python.exe ä.py other c:\Python32\python.exe c.py something Traceback (most recent call last): File "c.py", line 3, in <module> import ä ImportError: No module name ä Victor: How do I determine what code page my old w2k is using?. Would that be 8859-1 or some older variant for western Europe or Sweden?
msg125786 - (view)	Author: STINNER Victor (vstinner) *	Date: 2011-01-08 16:13
> Victor: How do I determine what code page my old w2k is using?. python.exe -c 'import locale; print("ANSI code page: {}".format(locale.getpreferredencoding()))' > On Windows, #8611 (and #9425) permit to use non-ASCII characters > in the module path... but only characters encodable to your > ANSI code page. If you would like to check if your path is encodable to your ANSI code page, try: python.exe -c "import os; fn=os.fsencode('ä'); print(ascii(fn))" If fsencode() raises an error, the filename is not encodable to your ANSI code page and you have to wait until #3080 is fixed :-)
msg125787 - (view)	Author: STINNER Victor (vstinner) *	Date: 2011-01-08 16:16
> I think bugs in core syntax should have high priority. It took me 7 months to implement the first part (#8611 and #9425). I plan to do the second part (#3080) in Python 3.3 (it's too late for Python 3.2, final is planned for February 5, 2011). I already have an huge patch somewhere (in a SVN branch, import_unicode), but I have to update the patch and split it into small and simple patches.
msg125795 - (view)	Author: ingemar (ingemar)	Date: 2011-01-08 19:34
python.exe -c "import locale; print('ANSI code page: {}'.format(locale.getpreferredencoding()))" ANSI code page: cp1252 python.exe -c "import os; fn=os.fsencode('ä'); print(ascii(fn))" b'\xe4' and no error raised
msg125819 - (view)	Author: STINNER Victor (vstinner) *	Date: 2011-01-09 02:40
> ANSI code page: cp1252 ...os.fsencode('ä') => b'\xe4' Hum, I ran your example with a debugger, and ok, I now remember the whole thing. I fixed Python to support non-ASCII characters (... only non-ASCII characters encodable to the ANSI code page for Windows) in the search path, not in the module name. The import machinery encodes each search path to the filesystem encoding, but it encodes the module name to UTF-8. Concatenate two byte strings encoded to different encodings doesn't work (it leads to mojibake). To fix this problem, there are two solutions: a) encode the module name to the fileystem encoding b) manipulate paths as unicode strings; to access the filesystem: use the wide character (unicode) API of Windows and encode paths to the filesystem encoding on UNIX/BSD It is easier to implement (a) than (b), but (a) only gives you the support of paths and module names encodable to the ANSI code page. (b) gives you the full unicode support because it never encodes paths to the filesystem encoding, but it may decodes paths from the filesystem encoding. Encode a path raises a UnicodeEncodeError on the first character not encodable to the ANSI code page, whereas decode a path never fails (except if the user manually changed its code page to a rare ANSI code page like UTF-8). I implemented (b) in my import_unicode SVN branch, but as I wrote, I still have some work to merge this branch into py3k, and anyway I will wait for Python 3.3.
msg125822 - (view)	Author: ingemar (ingemar)	Date: 2011-01-09 04:47
Thanks Victor for the explanation. Py3 is still far better than Py2, letting me use utf-8 as much as it does. I will be able to live with this bug being known. I can understand though, that people in some places of the world may feel more concerned.
msg131577 - (view)	Author: STINNER Victor (vstinner) *	Date: 2011-03-21 00:03
I closed #3080: Python 3.3 is now able to handle non-ASCII characters in module names and paths. But it is only able to handle non-ASCII characters encodable to the ANSI code page. To support all characters, I opened the issue #11619 (see also #10785).
msg131588 - (view)	Author: Terry J. Reedy (terry.reedy) *	Date: 2011-03-21 00:56
As Victor noted, this issue is essentially a duplicate of #3080 (and others) and now #11619 and needs no independent action apart from the latter. Since the discussion with ingemar seems finished, I am now closing.

History
Date	User	Action	Args
2022-04-11 14:57:10	admin	set	github: 55037
2011-03-21 00:56:42	terry.reedy	set	status: open -> closed superseder: On Windows, don't encode filenames in the import machinery resolution: duplicate messages: + msg131588
2011-03-21 00:03:37	vstinner	set	messages: + msg131577
2011-01-19 13:09:49	vstinner	set	title: Cannot use nonascii utf8 in names of files imported from -> Python 3 doesn't support non-ASCII module names with a locale encoding different than UTF-8
2011-01-09 04:47:13	ingemar	set	messages: + msg125822
2011-01-09 02:40:20	vstinner	set	messages: + msg125819
2011-01-08 19:34:15	ingemar	set	messages: + msg125795
2011-01-08 16:16:25	vstinner	set	messages: + msg125787
2011-01-08 16:13:16	vstinner	set	messages: + msg125786
2011-01-08 06:37:31	ingemar	set	messages: + msg125754
2011-01-08 06:34:28	terry.reedy	set	messages: + msg125753 versions: + Python 3.2
2011-01-08 03:04:30	vstinner	set	messages: + msg125745
2011-01-08 01:14:21	terry.reedy	set	nosy: + terry.reedy messages: + msg125739
2011-01-05 04:26:39	ingemar	set	nosy: vstinner, r.david.murray, ingemar messages: + msg125408
2011-01-04 22:59:45	vstinner	set	nosy: vstinner, r.david.murray, ingemar messages: + msg125381
2011-01-04 21:44:10	r.david.murray	set	nosy: + r.david.murray, vstinner type: behavior messages: + msg125366
2011-01-04 19:44:14	ingemar	create