New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sys.argv is wrong for unicode strings #46381
Comments
Under Windows, sys.argv is created through the Windows ANSI API. When you have a file/directory which can't be represented in the My suggestion is that:
|
The issue is related to bpo-1342 Since we have dropped support for older versions of Windows (9x, ME, |
I'm attaching a simple patch that seems to work under Py3k. The trick is Notice that brings the output of "python ààààà" from this: Fatal Python error: no mem for sys.argv to this: TypeError: zipimporter() argument 1 must be string without null bytes, which is expected since zipimporter_init() doesn't even know to ignore |
I dislike the double decoding, and would prefer if sys.argv would be In addition, I think the patch is incorrect: it ignores the arguments to One solution might be to declare all these functions (Py_Main, |
mbstowcs uses LC_CTYPE. Is that correct and consistent with the way Would a Py_MainW or similar wrapper be easier on the UNIX guys? I'm just |
It's correct, but it's not consistent with the default encoding - there
See above. The current POSIX implementation is incorrect also. It should |
Here is a patch that redoes the entire argv handling, in terms of |
This is now fixed in r62178 for Py3k. For 2.6, I don't think fixing it |
MvL's recent commit creates compiler warnings for Unicode UCS4 for the |
What warnings precisely are you seeing? I didn't see anything in the 3k |
Martin, you are right that they are not from the same reason as that issue. gcc -c -arch ppc -arch i386 -isysroot /Developer/SDKs/MacOSX10.4u.sdk/ |
The following code is being used to work around this issue for Python 2.x in Tahoe-LAFS: # This works around <http://bugs.python.org/issue2128>.
GetCommandLineW = WINFUNCTYPE(LPWSTR)(("GetCommandLineW", windll.kernel32))
CommandLineToArgvW = WINFUNCTYPE(POINTER(LPWSTR), LPCWSTR, POINTER(c_int)) \
(("CommandLineToArgvW", windll.shell32))
argc = c_int(0)
argv_unicode = CommandLineToArgvW(GetCommandLineW(), byref(argc))
argv = [argv_unicode[i].encode('utf-8') for i in range(0, argc.value)]
if not hasattr(sys, 'frozen'):
# If this is an executable produced by py2exe or bbfreeze, then it will
# have been invoked directly. Otherwise, unicode_argv[0] is the Python
# interpreter, so skip that.
argv = argv[1:]
# Also skip option arguments to the Python interpreter.
while len(argv) > 0:
arg = argv[0]
if not arg.startswith("-") or arg == "-":
break
argv = argv[1:]
if arg == '-m':
# sys.argv[0] should really be the absolute path of the module source,
# but never mind
break
if arg == '-c':
argv[0] = '-c'
break |
Sorry, missed out the imports: from ctypes import WINFUNCTYPE, windll, POINTER, byref, c_int
from ctypes.wintypes import LPWSTR, LPCWSTR |
Hi, is it correct that this bug no longer appears in Python 2.7.3? I checked the changelogs of 2.7, but couldn't find anything. Thanks! |
Martin wrote that it cannot be fixed in Python 2: "For 2.6, I don't think fixing it is feasible." The "fix" is to upgrade your application to Python 3. |
The fix_encoding module within depot_tools was included back in the python2[1] days to as a be all encoding fix boilerplate that is called across depot_tools scripts. However, now that depot_tools officially deprecated support for py2 and support >= 3.8[2], the boilerplate is not needed anymore. * `fix_win_codec()`[3] The 'cp65001' codec issue this fixes is fixed in python 3.3[4]. * `fix_default_encoding()`[5] python3 defaults to utf8. * `fix_win_sys_argv()`[6] sys.srgv unicode issue is fixed in python3[7]. * `fix_win_console()`[8] Fixed[9]. TODO: <Get performance changes in windows>. [1] https://codereview.chromium.org/6721029 [2] https://crrev.com/371aa997c04791d21e222ed43a1a0d55b450dd53/README.md [3] https://source.chromium.org/chromium/chromium/tools/depot_tools/+/main:fix_encoding.py;l=123-132;drc=cfa826c9845122d445dce4f51f556381865dbed3 [4] python/cpython#57425 (comment) [5] https://source.chromium.org/chromium/chromium/tools/depot_tools/+/main:fix_encoding.py;l=29-66;drc=cfa826c9845122d445dce4f51f556381865dbed3 [6] https://crsrc.org/d/fix_encoding.py;l=73-120;drc=cfa826c9845122d445dce4f51f556381865dbed3 [7] python/cpython#46381 (comment) [8] https://source.chromium.org/chromium/chromium/tools/depot_tools/+/main:fix_encoding.py;l=315-344;drc=cfa826c9845122d445dce4f51f556381865dbed3 [9] python/cpython#45943 (comment) Bug: 1501984 Change-Id: I1d512a4b1bfe14e680ac0aa08027849b999cc638
The fix_encoding module within depot_tools was included back in the python2[1] days to as a be all encoding fix boilerplate that is called across depot_tools scripts. However, now that depot_tools officially deprecated support for py2 and support >= 3.8[2], the boilerplate is not needed anymore. * `fix_win_codec()`[3] The 'cp65001' codec issue this fixes is fixed in python 3.3[4]. * `fix_default_encoding()`[5] python3 defaults to utf8. * `fix_win_sys_argv()`[6] sys.srgv unicode issue is fixed in python3[7]. * `fix_win_console()`[8] Fixed[9]. [1] https://codereview.chromium.org/6721029 [2] https://crrev.com/371aa997c04791d21e222ed43a1a0d55b450dd53/README.md [3] https://source.chromium.org/chromium/chromium/tools/depot_tools/+/main:fix_encoding.py;l=123-132;drc=cfa826c9845122d445dce4f51f556381865dbed3 [4] python/cpython#57425 (comment) [5] https://source.chromium.org/chromium/chromium/tools/depot_tools/+/main:fix_encoding.py;l=29-66;drc=cfa826c9845122d445dce4f51f556381865dbed3 [6] https://crsrc.org/d/fix_encoding.py;l=73-120;drc=cfa826c9845122d445dce4f51f556381865dbed3 [7] python/cpython#46381 (comment) [8] https://source.chromium.org/chromium/chromium/tools/depot_tools/+/main:fix_encoding.py;l=315-344;drc=cfa826c9845122d445dce4f51f556381865dbed3 [9] python/cpython#45943 (comment) Bug: 1501984 Change-Id: I1d512a4b1bfe14e680ac0aa08027849b999cc638
The fix_encoding module within depot_tools was included back in the python2[1] days to as a be all encoding fix boilerplate that is called across depot_tools scripts. However, now that depot_tools officially deprecated support for py2 and support >= 3.8[2], the boilerplate is not needed anymore. * `fix_win_codec()`[3] The 'cp65001' codec issue this fixes is fixed in python 3.3[4]. * `fix_default_encoding()`[5] python3 defaults to utf8. * `fix_win_sys_argv()`[6] sys.srgv unicode issue is fixed in python3[7]. * `fix_win_console()`[8] Fixed[9]. Benchmarking on windows: * Baseline (http://gpaste/6701096112750592): [1] https://codereview.chromium.org/6721029 [2] https://crrev.com/371aa997c04791d21e222ed43a1a0d55b450dd53/README.md [3] https://source.chromium.org/chromium/chromium/tools/depot_tools/+/main:fix_encoding.py;l=123-132;drc=cfa826c9845122d445dce4f51f556381865dbed3 [4] python/cpython#57425 (comment) [5] https://source.chromium.org/chromium/chromium/tools/depot_tools/+/main:fix_encoding.py;l=29-66;drc=cfa826c9845122d445dce4f51f556381865dbed3 [6] https://crsrc.org/d/fix_encoding.py;l=73-120;drc=cfa826c9845122d445dce4f51f556381865dbed3 [7] python/cpython#46381 (comment) [8] https://source.chromium.org/chromium/chromium/tools/depot_tools/+/main:fix_encoding.py;l=315-344;drc=cfa826c9845122d445dce4f51f556381865dbed3 [9] python/cpython#45943 (comment) Bug: 1501984 Change-Id: I1d512a4b1bfe14e680ac0aa08027849b999cc638
The fix_encoding module within depot_tools was included back in the python2[1] days to as a be all encoding fix boilerplate that is called across depot_tools scripts. However, now that depot_tools officially deprecated support for py2 and support >= 3.8[2], the boilerplate is not needed anymore. * `fix_win_codec()`[3] The 'cp65001' codec issue this fixes is fixed in python 3.3[4]. * `fix_default_encoding()`[5] python3 defaults to utf8. * `fix_win_sys_argv()`[6] sys.srgv unicode issue is fixed in python3[7]. * `fix_win_console()`[8] Fixed[9]. Benchmarking on windows: * Baseline (http://gpaste/6701096112750592): ~1min 41sec. [1] https://codereview.chromium.org/6721029 [2] https://crrev.com/371aa997c04791d21e222ed43a1a0d55b450dd53/README.md [3] https://source.chromium.org/chromium/chromium/tools/depot_tools/+/main:fix_encoding.py;l=123-132;drc=cfa826c9845122d445dce4f51f556381865dbed3 [4] python/cpython#57425 (comment) [5] https://source.chromium.org/chromium/chromium/tools/depot_tools/+/main:fix_encoding.py;l=29-66;drc=cfa826c9845122d445dce4f51f556381865dbed3 [6] https://crsrc.org/d/fix_encoding.py;l=73-120;drc=cfa826c9845122d445dce4f51f556381865dbed3 [7] python/cpython#46381 (comment) [8] https://source.chromium.org/chromium/chromium/tools/depot_tools/+/main:fix_encoding.py;l=315-344;drc=cfa826c9845122d445dce4f51f556381865dbed3 [9] python/cpython#45943 (comment) Bug: 1501984 Change-Id: I1d512a4b1bfe14e680ac0aa08027849b999cc638
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: