classification
Title: Use locale encoding to encode command line arguments (subprocess, os.exec*(), etc.)
Type: Stage:
Components: Versions: Python 3.2
process
Status: closed Resolution: wont fix
Dependencies: Superseder:
Assigned To: Nosy List: Arfrever, loewis, piro, ronaldoussoren, vstinner
Priority: normal Keywords: patch

Created on 2010-05-20 12:09 by vstinner, last changed 2010-07-24 11:26 by loewis. This issue is now closed.

Files
File name Uploaded Description Edit
cmdline_encoding.patch vstinner, 2010-06-18 23:42
Messages (9)
msg106139 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-05-20 12:09
The file system is hardcoded to UTF-8 on Mac OS X, whereas the locale encoding... depends on the locale. See issue #4388 for the details.

I think that we should use the locale encoding to encode and decode command line arguments. We have to create a new encoding variable used for the command line arguments:
 * Py_CommandLineEncoding
 * sys.getcmdlineencoding()
 * (no sys.setcmdlineencoding() please!)
 * ...

This encoding only should be used on POSIX: Windows native type is unicode (wchar_t*). It should be used to decode sys.argv and to encode child processes arguments (subprocess, os.exec*(), etc.)).

On Linux, it should change anything because the file system encoding is the locale encoding. Said differently, Python3 does already use the locale encoding for the command arguments on Linux.

If you pass a filename on the command line and then open it: the filename is decoded with the locale encoding, and then encoded with the file system encoding. I fear that it will fail if both encodings are differents...
msg106150 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-05-20 13:02
Fix the title: sys.argv is already decoded using the locale encoding on Unix, the problem is that it uses a (possibly) different encoding to encode command line arguments: file system encoding.
msg106171 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2010-05-20 17:23
> I think that we should use the locale encoding to encode and decode command line arguments. 

I disagree. IIUC, this is only about OSX. Now, we shouldn't take any
action until either some OSX expert explains us how command line
arguments are being passed on OSX, or we find some Apple documentation
that can be taken as a specification.

I think the C locale is very poorly supported on OSX, and we shouldn't
really use it for anything. What may be useful is the terminal encoding
(which may be different both from UTF-8 and the locale encoding),
however, it's not possible to find out what the terminal encoding is.
In addition, programs may be started "directly" (i.e. not from the
terminal), in which case the terminal encoding would be irrelevant.

For file name arguments at least, it's very clear that the command line
arguments also use the file system encoding.
msg106543 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-05-26 17:01
@loewis: You restored the original (wrong) title "Use locale encoding to decode sys.argv, not the file system encoding", instead of the new (good) title "Use locale encoding to encode command line arguments (subprocess, os.exec*(), etc.)". Is it wanted or not?
msg108151 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-06-18 23:42
Attached patch is a draft adding a new encoding: command line encoding. It is used to encode (subprocess) and decode (python) the command line arguments. It adds sys.getcmdlineencoding().
msg108153 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2010-06-18 23:54
I'm still -1, failing to see the problem that is solved.
msg108154 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-06-18 23:55
> I'm still -1, failing to see the problem that is solved.

I know (and I agree), but I don't want to loose the patch :-)
msg111432 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2010-07-24 09:14
This issue only seems to be relevant for OSX, and then only for OSX releases before 10.5, because in that release Apple made sure that the LANG variable and simular LC_* ones specify a UTF-8 encoding and we're back at the common case where the filesystem encoding matches the locale encoding.

A system where the filesystem encoding doesn't match the locale encoding is hard to get right. While it would be possible to add sys.cmdlineencoding that doesn't actually solve the semantic problem because external tools might not cooperate.

That is, most system tools seem to work with bytes internally and do not treat arguments as text encoded in the locale encoding that should be re-encoded in the filesystem encoding before passing them to the C APIs.

That is, when calling "ls somefile" the "ls" command will pass the bytes in argv[1] to the POSIX routines for getting file information without trying to reencode.

In short, having a filesystem encoding that is different from the command-line only works when all system tools cooperate and are unicode aware.

To be honest, I'd say the behavior of OSX 10.4 is a bug and we might add a workaround on that platform that uses CFStringGetSystemEncoding() to fetch the actual system encoding when LANG=C.

(And I'm -1 on adding the patch)

See also: issue9167
msg111456 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2010-07-24 11:26
It seems that everybody now agrees to close this issue as "won't fix".
History
Date User Action Args
2010-07-24 11:26:52loewissetstatus: open -> closed
resolution: wont fix
messages: + msg111456
2010-07-24 09:14:40ronaldoussorensetnosy: + ronaldoussoren
messages: + msg111432
2010-07-07 02:02:28pirosetnosy: + piro
2010-06-18 23:55:55vstinnersetmessages: + msg108154
2010-06-18 23:54:13loewissetmessages: + msg108153
2010-06-18 23:53:35loewissettitle: Use locale encoding to decode sys.argv, not the file system encoding -> Use locale encoding to encode command line arguments (subprocess, os.exec*(), etc.)
2010-06-18 23:42:45vstinnersetfiles: + cmdline_encoding.patch
keywords: + patch
messages: + msg108151
2010-05-26 17:01:45vstinnersetmessages: + msg106543
2010-05-20 17:23:04loewissetnosy: + loewis
title: Use locale encoding to encode command line arguments (subprocess, os.exec*(), etc.) -> Use locale encoding to decode sys.argv, not the file system encoding
messages: + msg106171
2010-05-20 16:29:04Arfreversetnosy: + Arfrever
2010-05-20 13:02:03vstinnersetmessages: + msg106150
title: Use locale encoding to decode sys.argv, not the file system encoding -> Use locale encoding to encode command line arguments (subprocess, os.exec*(), etc.)
2010-05-20 12:09:24vstinnercreate