msg125360 - (view) |
Author: ingemar (ingemar) |
Date: 2011-01-04 19:44 |
I have a set of programs written for Python3.1 and running well on Kubuntu. The source files are located on a Samba server on a Kubuntu box. Several of the programs contain Python/PyQt code to start other programs in the set ( QtCore.QProcess().startDetached(kommando) )
I have had no problems using non-ascii filenames in the Linux environment.
When I tried to check the programs in a MS Windows environment (Win2K with Python 3.1.2 in a VirtualBox in a Kubuntu box) then Python complained:
ImportError: module xxx not found..
The ugly solution has been to refrain from the use of non-ascii characters in the names of files imported from. This involved the filename of the imported file and also one line of code changed in the importing file.
Example:
1) rename "gui_jämföra.py" ---> "gui_jamfora.py"
2) in the importing file "jämföra.py" change one line:
"from gui_jämföra import * " ---> "from gui_jamfora import gui_Jämföra"
Is there a beautiful solution that will permit me to use non-ascii utf-8 also in the file names of files imported from?
|
msg125366 - (view) |
Author: R. David Murray (r.david.murray) * |
Date: 2011-01-04 21:44 |
Have you tried 3.2b2?
|
msg125381 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2011-01-04 22:59 |
I think that this issue is a duplicate of #8611 (and #9425), it should be fixed in Python 3.2.
|
msg125408 - (view) |
Author: ingemar (ingemar) |
Date: 2011-01-05 04:26 |
Have I tried 3.2b2?
No. I will have to wait for 3.2, or more exactly for a Windows installer for PyQt for 3.2 to become available.
Compiling that on Windows is beyond my resources and experience.
I will make a point to tell you then.
|
msg125739 - (view) |
Author: Terry J. Reedy (terry.reedy) * |
Date: 2011-01-08 01:14 |
(Ingemar: one can easily test import statements without pyqt, let alone qt ;-)
With 3.2b2 on our Win7, 64 bit machine, files with a Japanese name run but apparently cannot be imported.
a.py: print('something')
^|.py: print('other') # ^| == imitation of katakana name
c.py: import a; import ^|
something
ImportError: No module named ^|
Tried in both japanese- and then ascii-named directories.
So I am not convinced that #9425 is finished. What might I have misunderstood?
|
msg125745 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2011-01-08 03:04 |
> With 3.2b2 on our Win7, 64 bit machine, files with a Japanese name...
What is your ANSI code page? If it is not a japanese code page, it is the issue #3080.
On Windows, #8611 (and #9425) permit to use non-ASCII characters in the module path... but only characters encodable to your ANSI code page.
|
msg125753 - (view) |
Author: Terry J. Reedy (terry.reedy) * |
Date: 2011-01-08 06:34 |
ANSI code page? I have no idea how to find out and many would not even know what such a thing exists. It is an HP laptop sold in the US.
I think bugs in core syntax should have high priority. I appreciate your work toward fixing it.
|
msg125754 - (view) |
Author: ingemar (ingemar) |
Date: 2011-01-08 06:37 |
Terry: Thanks for the hint
In a pure ascii path I created files very similar to yours with Swedish "ä" instead of your katakana character.
I also got the same result.
a.py:
print ('something')
ä.py:
print ('other')
c.py:
# -*- coding: utf-8 -*-
import a
import ä
I ran the files with 3.2b2:
c:\Python32\python.exe a.py
something
c:\Python32\python.exe ä.py
other
c:\Python32\python.exe c.py
something
Traceback (most recent call last):
File "c.py", line 3, in <module>
import ä
ImportError: No module name ä
Victor: How do I determine what code page my old w2k is using?.
Would that be 8859-1 or some older variant for western Europe or Sweden?
|
msg125786 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2011-01-08 16:13 |
> Victor: How do I determine what code page my old w2k is using?.
python.exe -c 'import locale; print("ANSI code page: {}".format(locale.getpreferredencoding()))'
> On Windows, #8611 (and #9425) permit to use non-ASCII characters
> in the module path... but only characters encodable to your
> ANSI code page.
If you would like to check if your path is encodable to your ANSI code page, try:
python.exe -c "import os; fn=os.fsencode('ä'); print(ascii(fn))"
If fsencode() raises an error, the filename is not encodable to your ANSI code page and you have to wait until #3080 is fixed :-)
|
msg125787 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2011-01-08 16:16 |
> I think bugs in core syntax should have high priority.
It took me 7 months to implement the first part (#8611 and #9425). I plan to do the second part (#3080) in Python 3.3 (it's too late for Python 3.2, final is planned for February 5, 2011). I already have an huge patch somewhere (in a SVN branch, import_unicode), but I have to update the patch and split it into small and simple patches.
|
msg125795 - (view) |
Author: ingemar (ingemar) |
Date: 2011-01-08 19:34 |
python.exe -c "import locale; print('ANSI code page: {}'.format(locale.getpreferredencoding()))"
ANSI code page: cp1252
python.exe -c "import os; fn=os.fsencode('ä'); print(ascii(fn))"
b'\xe4'
and no error raised
|
msg125819 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2011-01-09 02:40 |
> ANSI code page: cp1252 ...os.fsencode('ä') => b'\xe4'
Hum, I ran your example with a debugger, and ok, I now remember the whole thing.
I fixed Python to support non-ASCII characters (... only non-ASCII characters encodable to the ANSI code page for Windows) in the *search path*, not in the module name.
The import machinery encodes each search path to the filesystem encoding, but it encodes the module name to UTF-8. Concatenate two byte strings encoded to different encodings doesn't work (it leads to mojibake).
To fix this problem, there are two solutions:
a) encode the module name to the fileystem encoding
b) manipulate paths as unicode strings; to access the filesystem: use the wide character (unicode) API of Windows and encode paths to the filesystem encoding on UNIX/BSD
It is easier to implement (a) than (b), but (a) only gives you the support of paths and module names encodable to the ANSI code page.
(b) gives you the full unicode support because it never *encodes* paths to the filesystem encoding, but it may *decodes* paths from the filesystem encoding. Encode a path raises a UnicodeEncodeError on the first character not encodable to the ANSI code page, whereas decode a path never fails (except if the user manually changed its code page to a rare ANSI code page like UTF-8).
I implemented (b) in my import_unicode SVN branch, but as I wrote, I still have some work to merge this branch into py3k, and anyway I will wait for Python 3.3.
|
msg125822 - (view) |
Author: ingemar (ingemar) |
Date: 2011-01-09 04:47 |
Thanks Victor for the explanation.
Py3 is still far better than Py2, letting me use utf-8 as much as it does.
I will be able to live with this bug being known. I can understand though, that people in some places of the world may feel more concerned.
|
msg131577 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2011-03-21 00:03 |
I closed #3080: Python 3.3 is now able to handle non-ASCII characters in module names and paths. But it is only able to handle non-ASCII characters encodable to the ANSI code page. To support all characters, I opened the issue #11619 (see also #10785).
|
msg131588 - (view) |
Author: Terry J. Reedy (terry.reedy) * |
Date: 2011-03-21 00:56 |
As Victor noted, this issue is essentially a duplicate of #3080 (and others) and now #11619 and needs no independent action apart from the latter. Since the discussion with ingemar seems finished, I am now closing.
|
|
Date |
User |
Action |
Args |
2022-04-11 14:57:10 | admin | set | github: 55037 |
2011-03-21 00:56:42 | terry.reedy | set | status: open -> closed
superseder: On Windows, don't encode filenames in the import machinery resolution: duplicate messages:
+ msg131588 |
2011-03-21 00:03:37 | vstinner | set | messages:
+ msg131577 |
2011-01-19 13:09:49 | vstinner | set | title: Cannot use nonascii utf8 in names of files imported from -> Python 3 doesn't support non-ASCII module names with a locale encoding different than UTF-8 |
2011-01-09 04:47:13 | ingemar | set | messages:
+ msg125822 |
2011-01-09 02:40:20 | vstinner | set | messages:
+ msg125819 |
2011-01-08 19:34:15 | ingemar | set | messages:
+ msg125795 |
2011-01-08 16:16:25 | vstinner | set | messages:
+ msg125787 |
2011-01-08 16:13:16 | vstinner | set | messages:
+ msg125786 |
2011-01-08 06:37:31 | ingemar | set | messages:
+ msg125754 |
2011-01-08 06:34:28 | terry.reedy | set | messages:
+ msg125753 versions:
+ Python 3.2 |
2011-01-08 03:04:30 | vstinner | set | messages:
+ msg125745 |
2011-01-08 01:14:21 | terry.reedy | set | nosy:
+ terry.reedy messages:
+ msg125739
|
2011-01-05 04:26:39 | ingemar | set | nosy:
vstinner, r.david.murray, ingemar messages:
+ msg125408 |
2011-01-04 22:59:45 | vstinner | set | nosy:
vstinner, r.david.murray, ingemar messages:
+ msg125381 |
2011-01-04 21:44:10 | r.david.murray | set | nosy:
+ r.david.murray, vstinner type: behavior messages:
+ msg125366
|
2011-01-04 19:44:14 | ingemar | create | |