New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use locale encoding to encode command line arguments (subprocess, os.exec*(), etc.) #53021
Comments
The file system is hardcoded to UTF-8 on Mac OS X, whereas the locale encoding... depends on the locale. See issue bpo-4388 for the details. I think that we should use the locale encoding to encode and decode command line arguments. We have to create a new encoding variable used for the command line arguments:
This encoding only should be used on POSIX: Windows native type is unicode (wchar_t*). It should be used to decode sys.argv and to encode child processes arguments (subprocess, os.exec*(), etc.)). On Linux, it should change anything because the file system encoding is the locale encoding. Said differently, Python3 does already use the locale encoding for the command arguments on Linux. If you pass a filename on the command line and then open it: the filename is decoded with the locale encoding, and then encoded with the file system encoding. I fear that it will fail if both encodings are differents... |
Fix the title: sys.argv is already decoded using the locale encoding on Unix, the problem is that it uses a (possibly) different encoding to encode command line arguments: file system encoding. |
I disagree. IIUC, this is only about OSX. Now, we shouldn't take any I think the C locale is very poorly supported on OSX, and we shouldn't For file name arguments at least, it's very clear that the command line |
@loewis: You restored the original (wrong) title "Use locale encoding to decode sys.argv, not the file system encoding", instead of the new (good) title "Use locale encoding to encode command line arguments (subprocess, os.exec*(), etc.)". Is it wanted or not? |
Attached patch is a draft adding a new encoding: command line encoding. It is used to encode (subprocess) and decode (python) the command line arguments. It adds sys.getcmdlineencoding(). |
I'm still -1, failing to see the problem that is solved. |
I know (and I agree), but I don't want to loose the patch :-) |
This issue only seems to be relevant for OSX, and then only for OSX releases before 10.5, because in that release Apple made sure that the LANG variable and simular LC_* ones specify a UTF-8 encoding and we're back at the common case where the filesystem encoding matches the locale encoding. A system where the filesystem encoding doesn't match the locale encoding is hard to get right. While it would be possible to add sys.cmdlineencoding that doesn't actually solve the semantic problem because external tools might not cooperate. That is, most system tools seem to work with bytes internally and do not treat arguments as text encoded in the locale encoding that should be re-encoded in the filesystem encoding before passing them to the C APIs. That is, when calling "ls somefile" the "ls" command will pass the bytes in argv[1] to the POSIX routines for getting file information without trying to reencode. In short, having a filesystem encoding that is different from the command-line only works when all system tools cooperate and are unicode aware. To be honest, I'd say the behavior of OSX 10.4 is a bug and we might add a workaround on that platform that uses CFStringGetSystemEncoding() to fetch the actual system encoding when LANG=C. (And I'm -1 on adding the patch) See also: bpo-9167 |
It seems that everybody now agrees to close this issue as "won't fix". |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: