msg62458 - (view) |
Author: Giovanni Bajo (giovannibajo) |
Date: 2008-02-16 16:27 |
Under Windows, sys.argv is created through the Windows ANSI API.
When you have a file/directory which can't be represented in the
system encoding (eg: a japanese-named file or directory on a Western
Windows), Windows will encode the filename to the system encoding using
what we call the "replace" policy, and thus sys.argv[] will contain an
entry like "c:\\foo\\??????????????.dat".
My suggestion is that:
* At the Python level, we still expose a single sys.argv[], which will
contain unicode strings. I think this exactly matches what Py3k does now.
* At the C level, I believe it involves using GetCommandLineW() and
CommandLineToArgvW() in WinMain.c, but should Py_Main/PySys_SetArgv() be
changed to also accept wchar_t** arguments? Or is it better to allow for
NULL to be passed (under Windows at least), so that the Windows
code-path in there can use GetCommandLineW()/CommandLineToArgvW() to get
the current process' arguments?
|
msg62460 - (view) |
Author: Christian Heimes (christian.heimes) *  |
Date: 2008-02-16 16:54 |
The issue is related to #1342
Since we have dropped support for older versions of Windows (9x, ME,
NT4) I like to get the Python interface to argv, env and files fixed.
|
msg62499 - (view) |
Author: Giovanni Bajo (giovannibajo) |
Date: 2008-02-17 18:57 |
I'm attaching a simple patch that seems to work under Py3k. The trick is
that Py3k already attempts (not sure how or why) to decode argv using
utf-8. So it's sufficient to setup argv as UTF8-encoded strings.
Notice that brings the output of "python ààààà" from this:
Fatal Python error: no mem for sys.argv
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 0-2:
invalid data
to this:
TypeError: zipimporter() argument 1 must be string without null bytes,
not str
which is expected since zipimporter_init() doesn't even know to ignore
unicode strings (let alone handle them correctly...).
|
msg62659 - (view) |
Author: Martin v. Löwis (loewis) *  |
Date: 2008-02-21 20:50 |
I dislike the double decoding, and would prefer if sys.argv would be
created directly from the wide command line.
In addition, I think the patch is incorrect: it ignores the arguments to
Py_Main, which is a documented API function.
One solution might be to declare all these functions (Py_Main,
SetProgramName, GetArgcArgv) to operate on Py_UNICODE*, and then
convert the POSIX callers of Py_Main to use mbstowcs when going
from the command line to Py_Main. WinMain could then become
recompiled for Unicode directly, likewise Modules/python.c
|
msg62660 - (view) |
Author: Giovanni Bajo (giovannibajo) |
Date: 2008-02-21 21:33 |
mbstowcs uses LC_CTYPE. Is that correct and consistent with the way
default encoding under UNIX is handled by Py3k?
Would a Py_MainW or similar wrapper be easier on the UNIX guys? I'm just
asking, I don't have a definite idea.
|
msg62664 - (view) |
Author: Martin v. Löwis (loewis) *  |
Date: 2008-02-21 22:01 |
> mbstowcs uses LC_CTYPE. Is that correct and consistent with the way
> default encoding under UNIX is handled by Py3k?
It's correct, but it's not consistent with the default encoding - there
isn't really any default encoding in Py3k. More specifically,
PyUnicode_FromString uses UTF-8, but not as a (changeable) default,
but as part of its API specification.
Command line arguments are in the locale's charset, so the LC_CTYPE
must be used to convert them.
> Would a Py_MainW or similar wrapper be easier on the UNIX guys? I'm just
> asking, I don't have a definite idea.
See above. The current POSIX implementation is incorrect also. It should
use the locale's encoding, but doesn't.
|
msg63443 - (view) |
Author: Martin v. Löwis (loewis) *  |
Date: 2008-03-10 14:40 |
Here is a patch that redoes the entire argv handling, in terms of
wchar_t. As a side effect, it also changes the sys.path handling to use
wchar_t.
|
msg65005 - (view) |
Author: Martin v. Löwis (loewis) *  |
Date: 2008-04-05 20:42 |
This is now fixed in r62178 for Py3k. For 2.6, I don't think fixing it
is feasible.
|
msg65045 - (view) |
Author: Benjamin Peterson (benjamin.peterson) *  |
Date: 2008-04-06 16:50 |
MvL's recent commit creates compiler warnings for Unicode UCS4 for the
same reason as #2388.
|
msg65061 - (view) |
Author: Martin v. Löwis (loewis) *  |
Date: 2008-04-07 03:27 |
What warnings precisely are you seeing? I didn't see anything in the 3k
branch (not even for #2388, as PyErr_Format doesn't have the GCC format
attribute in 3k, unlike 2.x).
|
msg65073 - (view) |
Author: Benjamin Peterson (benjamin.peterson) *  |
Date: 2008-04-07 11:54 |
Martin, you are right that they are not from the same reason as that issue.
gcc -c -arch ppc -arch i386 -isysroot /Developer/SDKs/MacOSX10.4u.sdk/
-fno-strict-aliasing -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes
-I. -IInclude -I./Include -DPy_BUILD_CORE -o Modules/main.o Modules/main.c
Modules/main.c: In function 'Py_Main':
Modules/main.c:478: warning: passing argument 1 of 'Py_SetProgramName'
from incompatible pointer type
Modules/main.c: In function 'Py_Main':
Modules/main.c:478: warning: passing argument 1 of 'Py_SetProgramName'
from incompatible pointer type
|
msg125827 - (view) |
Author: David-Sarah Hopwood (davidsarah) |
Date: 2011-01-09 07:36 |
The following code is being used to work around this issue for Python 2.x in Tahoe-LAFS:
# This works around <http://bugs.python.org/issue2128>.
GetCommandLineW = WINFUNCTYPE(LPWSTR)(("GetCommandLineW", windll.kernel32))
CommandLineToArgvW = WINFUNCTYPE(POINTER(LPWSTR), LPCWSTR, POINTER(c_int)) \
(("CommandLineToArgvW", windll.shell32))
argc = c_int(0)
argv_unicode = CommandLineToArgvW(GetCommandLineW(), byref(argc))
argv = [argv_unicode[i].encode('utf-8') for i in range(0, argc.value)]
if not hasattr(sys, 'frozen'):
# If this is an executable produced by py2exe or bbfreeze, then it will
# have been invoked directly. Otherwise, unicode_argv[0] is the Python
# interpreter, so skip that.
argv = argv[1:]
# Also skip option arguments to the Python interpreter.
while len(argv) > 0:
arg = argv[0]
if not arg.startswith("-") or arg == "-":
break
argv = argv[1:]
if arg == '-m':
# sys.argv[0] should really be the absolute path of the module source,
# but never mind
break
if arg == '-c':
argv[0] = '-c'
break
|
msg125829 - (view) |
Author: David-Sarah Hopwood (davidsarah) |
Date: 2011-01-09 07:39 |
Sorry, missed out the imports:
from ctypes import WINFUNCTYPE, windll, POINTER, byref, c_int
from ctypes.wintypes import LPWSTR, LPCWSTR
|
msg179892 - (view) |
Author: Michael Herrmann (mherrmann.at) |
Date: 2013-01-13 20:23 |
Hi,
is it correct that this bug no longer appears in Python 2.7.3? I checked the changelogs of 2.7, but couldn't find anything.
Thanks!
Michael
|
msg179928 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2013-01-14 09:15 |
> is it correct that this bug no longer appears in Python 2.7.3?
Martin wrote that it cannot be fixed in Python 2: "For 2.6, I don't think fixing it is feasible."
The "fix" is to upgrade your application to Python 3.
|
|
Date |
User |
Action |
Args |
2022-04-11 14:56:30 | admin | set | github: 46381 |
2013-01-14 09:15:31 | vstinner | set | messages:
+ msg179928 |
2013-01-13 20:23:17 | mherrmann.at | set | nosy:
+ mherrmann.at messages:
+ msg179892
|
2011-01-14 22:18:04 | vstinner | set | nosy:
+ vstinner
|
2011-01-09 07:39:42 | davidsarah | set | nosy:
loewis, christian.heimes, giovannibajo, benjamin.peterson, davidsarah messages:
+ msg125829 |
2011-01-09 07:36:51 | davidsarah | set | nosy:
+ davidsarah
messages:
+ msg125827 versions:
+ Python 2.6, Python 2.5, Python 2.7 |
2008-04-07 11:54:38 | benjamin.peterson | set | messages:
+ msg65073 |
2008-04-07 03:27:27 | loewis | set | messages:
+ msg65061 |
2008-04-06 16:50:36 | benjamin.peterson | set | nosy:
+ benjamin.peterson messages:
+ msg65045 |
2008-04-05 20:42:42 | loewis | set | status: open -> closed messages:
+ msg65005 resolution: fixed versions:
- Python 2.6 |
2008-03-10 14:40:50 | loewis | set | files:
+ wchar.diff keywords:
+ patch messages:
+ msg63443 |
2008-02-21 22:01:33 | loewis | set | messages:
+ msg62664 |
2008-02-21 21:33:17 | giovannibajo | set | messages:
+ msg62660 |
2008-02-21 20:50:58 | loewis | set | nosy:
+ loewis messages:
+ msg62659 |
2008-02-17 18:58:00 | giovannibajo | set | files:
+ argv_unicode.patch messages:
+ msg62499 |
2008-02-16 16:54:06 | christian.heimes | set | priority: high nosy:
+ christian.heimes messages:
+ msg62460 components:
+ Windows versions:
+ Python 2.6 |
2008-02-16 16:27:45 | giovannibajo | create | |