Issue8514
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2010-04-23 23:39 by vstinner, last changed 2022-04-11 14:57 by admin. This issue is now closed.
Files | ||||
---|---|---|---|---|
File name | Uploaded | Description | Edit | |
fsencode.patch | vstinner, 2010-05-06 23:13 |
Messages (36) | |||
---|---|---|---|
msg104063 - (view) | Author: STINNER Victor (vstinner) * | Date: 2010-04-23 23:39 | |
Python3 uses unicode filenames in Windows and bytes filenames (but support also unicode filenames) on other OS. We have to support both types. On POSIX system, bytes filenames can be stored in unicode filenames using sys.getfilesystemencoding() and the surrogateescape error handler (to store undecodable bytes as unicode surrogates, see PEP 383). I would like to create fs_encode() and fs_decode() in os.path to ease the manipulation of filenames in the two bytes (str and bytes). * Use fs_decode() to convert a filename from the OS native format to unicode * Use fs_encode() to convert an unicode filename to the OS native format On Windows, fs_decode() and fs_encode() don't touch the filename, but reject filenames of types different than str (unicode) with a TypeError, especially bytes filename. Mac OS X rejects invalid UTF-8 filenames, and so surrogateescape should maybe not be used on this OS. Attached patch is an implementation of this issue. |
|||
msg104064 - (view) | Author: STINNER Victor (vstinner) * | Date: 2010-04-23 23:44 | |
Issue #8513 would benefit from these functions. |
|||
msg104068 - (view) | Author: Marc-Andre Lemburg (lemburg) * | Date: 2010-04-24 08:33 | |
STINNER Victor wrote: > > New submission from STINNER Victor <victor.stinner@haypocalc.com>: > > Python3 uses unicode filenames in Windows and bytes filenames (but support also unicode filenames) on other OS. We have to support both types. On POSIX system, bytes filenames can be stored in unicode filenames using sys.getfilesystemencoding() and the surrogateescape error handler (to store undecodable bytes as unicode surrogates, see PEP 383). > > I would like to create fs_encode() and fs_decode() in os.path to ease the manipulation of filenames in the two bytes (str and bytes). > * Use fs_decode() to convert a filename from the OS native format to unicode > * Use fs_encode() to convert an unicode filename to the OS native format > > On Windows, fs_decode() and fs_encode() don't touch the filename, but reject filenames of types different than str (unicode) with a TypeError, especially bytes filename. > > Mac OS X rejects invalid UTF-8 filenames, and so surrogateescape should maybe not be used on this OS. > > Attached patch is an implementation of this issue. Please follow the naming convention used in os.path. The functions would have to be called os.path.fsencode() and os.path.fsdecode(). Other than that, I'm +0 on the patch: the sys.filesystemencoding logic doesn't really work well in practice - on Unix and BSD platforms, there's no such thing as a single system-wide file system and consequently, the file system encoding depends on the path you are looking at. For most of those file systems, the name is just a sequence of bytes with arbitrary encoding. |
|||
msg104147 - (view) | Author: STINNER Victor (vstinner) * | Date: 2010-04-25 16:01 | |
> Please follow the naming convention used in os.path. The functions > would have to be called os.path.fsencode() and os.path.fsdecode(). Ok > Other than that, I'm +0 on the patch: the sys.filesystemencoding > logic doesn't really work well in practice - on Unix and BSD > platforms, there's no such thing as a single system-wide file > system Today, most POSIX system uses utf8 by default for all partitions. If you mount an USB key, CD-Rom or network shared directory with the wrong options, you may get filenames in a different encoding. But this issue is not about fixing your OS configuration, but helping the most common case: a system using the same encoding everywhere (for the whole file system). You are still free to use directly the native OS type (unicode on Windows, bytes on other OS), ie. don't use fsencode()/fsdecode(). Python3 prefers unicode, eg. print expects an unicode string, not a byte string. I mean it's more pratical to use unicode everywhere in Python, and so fsencode()/fsdecode() can be really useful on POSIX systems. |
|||
msg104185 - (view) | Author: STINNER Victor (vstinner) * | Date: 2010-04-25 23:20 | |
Update path: rename fs_encode/fs_decode to fsencode/fsdecode. |
|||
msg104186 - (view) | Author: STINNER Victor (vstinner) * | Date: 2010-04-25 23:20 | |
Oops, "Update path": I mean "Update patch" ;-) |
|||
msg104200 - (view) | Author: Gregory P. Smith (gregory.p.smith) * | Date: 2010-04-26 05:30 | |
i'm +0.7 on fsencode/fsdecode going into os.path. My bikeshed 0.7? They're also useful for dealing with environment variables which are not strictly filesystem (fs) related but also suffer from the same issue requiring surrogate escape. But other than just calling these os.encode and os.decode I don't have any brilliant alternate naming suggestions. thoughts? I could easily live with os.path.fsencode/fsdecode, I just wanted to point the other use out. |
|||
msg104210 - (view) | Author: STINNER Victor (vstinner) * | Date: 2010-04-26 10:37 | |
> They're also useful for dealing with environment variables > which are not strictly filesystem (fs) related but also suffer > from the same issue requiring surrogate escape. Yes, Python3 decodes environment variables using sys.getfilesystemencoding()+surrogateescape. And since my last fix on os.execve(), subprocess (and os.execv(p)e) uses also surrogateescape to encode environment variables. And yes again, I also patched os.getenv() to decode bytes name to unicode using sys.getfilesystemencoding()+surrogateescape. > But other than just calling these os.encode and os.decode *fs*encode() and *fs*decode() is a reference to the encoding: sys.get*filesystem*encoding(). > I just wanted to point the other use out See also issue #8513. |
|||
msg104214 - (view) | Author: STINNER Victor (vstinner) * | Date: 2010-04-26 10:49 | |
Oh! In Python3, ntpath.expanduser() supports bytes path and uses sys.getfilesystemencoding() to encode an unicode environment variable to a byte string. Should we remove bytes path support in ntpath.expanduser(), or support bytes in ntpath.fsencode()/.fsdecode()? (sys.getfilesystemencoding() is "mbcs" on Windows) |
|||
msg104218 - (view) | Author: Marc-Andre Lemburg (lemburg) * | Date: 2010-04-26 11:06 | |
STINNER Victor wrote: > > STINNER Victor <victor.stinner@haypocalc.com> added the comment: > > Oh! In Python3, ntpath.expanduser() supports bytes path and uses sys.getfilesystemencoding() to encode an unicode environment variable to a byte string. > > Should we remove bytes path support in ntpath.expanduser(), or support bytes in ntpath.fsencode()/.fsdecode()? > > (sys.getfilesystemencoding() is "mbcs" on Windows) I don't see what environment variables have to do with the file system. Those are two different contexts and thus also require two different approaches to the problem. Command line parameters are another area, where an encoding comes into play, but this again does not have to coincide with the file system encoding. Also note that "mbcs" on Windows is a meta-encoding. The implementation of that encoding depends on the locale used by the Windows user. It's just a coincidence that this may actually work for the environment variables on Windows as well, but there's no guarantee. On Unix, you often have the case that the environment variables use mixed encodings, e.g. the CGI interface is a good example where this happens per definition. The CGI environment can includes file system paths, data encoded in Latin-1 (or some other encoding), etc. See http://www.ietf.org/rfc/rfc3875.txt for details. Environment variables are also commonly used to interface to external programs from daemons, e.g. postfix, procmail and others use environment variables to communicate with external helper applications. |
|||
msg104220 - (view) | Author: Marc-Andre Lemburg (lemburg) * | Date: 2010-04-26 11:16 | |
STINNER Victor wrote: > > STINNER Victor <victor.stinner@haypocalc.com> added the comment: > >> Please follow the naming convention used in os.path. The functions >> would have to be called os.path.fsencode() and os.path.fsdecode(). > > Ok > >> Other than that, I'm +0 on the patch: the sys.filesystemencoding >> logic doesn't really work well in practice - on Unix and BSD >> platforms, there's no such thing as a single system-wide file >> system > > Today, most POSIX system uses utf8 by default for all partitions. If you mount an USB key, CD-Rom or network shared directory with the wrong options, you may get filenames in a different encoding. But this issue is not about fixing your OS configuration, but helping the most common case: a system using the same encoding everywhere (for the whole file system). > > You are still free to use directly the native OS type (unicode on Windows, bytes on other OS), ie. don't use fsencode()/fsdecode(). Right, but if you start using those new API in standard lib functions, programmers no longer have that choice. In real life applications, you do run into these problems quite often, so instead of coding against an ideal world, we have to be aware of the problems and make it possible for the standard lib modules to deal with them. > Python3 prefers unicode, eg. print expects an unicode string, not a byte string. I mean it's more pratical to use unicode everywhere in Python, and so fsencode()/fsdecode() can be really useful on POSIX systems. Sure, but forcing UnicodeDecodeErrors upon Python3 programmers is not a good idea. Please keep that in mind. Thanks, -- Marc-Andre Lemburg eGenix.com ________________________________________________________________________ ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ |
|||
msg104224 - (view) | Author: STINNER Victor (vstinner) * | Date: 2010-04-26 11:44 | |
> In real life applications, you do run into these problems quite > often Yes, I'm agree 100% with you :-) > > Python3 prefers unicode, eg. print expects an unicode string, not a byte > > string. I mean it's more pratical to use unicode everywhere in Python, > > and so fsencode()/fsdecode() can be really useful on POSIX systems. > > Sure, but forcing UnicodeDecodeErrors upon Python3 programmers is > not a good idea. Please keep that in mind. I proposed to reject bytes on Windows because Martin (who knows Windows better than me) decided to *not* support byte string on Windows. Windows native API uses unicode, and conversion from bytes and unicode on Windows using "mbcs" is not reliable (it depends on the locale, and it may loose some informations). http://mail.python.org/pipermail/python-dev/2010-April/099556.html Reject byte string on Windows is just a suggestion. To support byte strings on Windows, each Python function written in C should be fixed to use the ANSI version instead of the Wide version (eg. CreateProcessA instead of CreateProcessW) if it gets byte arguments. The code would become twice bigger, and it introduces new issues: which function should be choosen if there are two arguments, one is a byte string, and the other an unicode string? _subprocess.CreateProcess has 9 arguments... Since unicode is a superset of MBCS and MBCS has subtle bugs, it's preferable to use (force) unicode. -- But on POSIX, it's the opposite: I'm doing my best to support byte string everywhere (filenames, environment variables, etc.). See the dependency list of my "meta" issue #8242. The first goal of fsencode() is to accept byte strings on POSIX systems. Maybe, I didn't explained it correctly. |
|||
msg104225 - (view) | Author: STINNER Victor (vstinner) * | Date: 2010-04-26 12:00 | |
Le lundi 26 avril 2010 13:06:48, vous avez écrit : > I don't see what environment variables have to do with the file > system. A POSIX system only offers *one* function about the encoding: nl_langinfo(CODESET) and Python3 uses it for the filenames, environment variables and the command line arguments. Are you suggesting that Python3 should support a encoding different for environment variables and the file system? How would the user configure it? About filenames, Python3 choose the encoding using the locale, but the user cannot change it: sys.setfilesystemencoding() is removed by the site module. > Also note that "mbcs" on Windows is a meta-encoding. The > implementation of that encoding depends on the locale used by > the Windows user. It's just a coincidence that this may actually > work for the environment variables on Windows as well, but there's > no guarantee. os.getenv() should raise a TypeError on Windows if key is a byte string. os.getenv() didn't support byte string. I patched it to support byte string (issue #8391, r80421). But I don't like my fix because we should reject support byte string *on Windows*. I would like to factorize the type check for all operations on the file system and environment variables in fsencode()/fsdecode(). > On Unix, you often have the case that the environment variables > use mixed encodings, e.g. the CGI interface is a good example > where this happens per definition. The CGI environment can > includes file system paths, data encoded in Latin-1 (or some > other encoding), etc. Since Python3 choosed to store environment variables as unicode string on Windows and POSIX, in this specific case you should reconvert the value to byte strings using fsencode() and then manipulate byte strings. Because Python3 uses surrogateescape, you will get the original byte string values. My patch should help both cases: people using unicode objects and people using the native OS type (bytes on POSIX). As written in my previous message, you can still use byte strings if you want. My patch doesn't change that (on POSIX systems). |
|||
msg104236 - (view) | Author: STINNER Victor (vstinner) * | Date: 2010-04-26 14:05 | |
Version 3 of the patch: fix also os.getenv() which rejects now bytes on Windows (one of the goals of this issue). |
|||
msg104635 - (view) | Author: Marc-Andre Lemburg (lemburg) * | Date: 2010-04-30 13:58 | |
STINNER Victor wrote: > > STINNER Victor <victor.stinner@haypocalc.com> added the comment: > > Le lundi 26 avril 2010 13:06:48, vous avez écrit : >> I don't see what environment variables have to do with the file >> system. > > A POSIX system only offers *one* function about the encoding: > nl_langinfo(CODESET) and Python3 uses it for the filenames, environment > variables and the command line arguments. > > Are you suggesting that Python3 should support a encoding different for > environment variables and the file system? How would the user configure it? It's better to let the application decide how to solve this problem and in order to allow for this, the encodings must be adjustable. By using fsencode() and fsdecode() in stdlib functions, you basically prevent this kind of adjustment, since they hardcode the use of a single encoding which is guessed by looking at nl_langinfo(CODESET). Note that application may well use completely different encodings in the environment and for things like pipes than what the user setup for her GUI environment. In the end, this will only lead to the same kind of mess we've had with sys.setdefaultencoding() in Python 2.x, only this time with sys.setfilesystemencoding() and I'd like to avoid that. > Since Python3 choosed to store environment variables as unicode string on > Windows and POSIX, in this specific case you should reconvert the value to > byte strings using fsencode() and then manipulate byte strings. Because > Python3 uses surrogateescape, you will get the original byte string values. Well, yes, but that's a cludge isn't it ? If you know that e.g. your environment variables are going to have Latin-1 data (say some content-type variable has this information), but the user's default LANG setting is UTF-8, Python will fetch the data as broken Unicode data, you then have to convert it back to bytes and then back to Unicode using the correct Latin-1 encoding. It would be a lot better to have the application provide the encoding to the os.getenv() function and have Python do the correct decoding right from the start. |
|||
msg104648 - (view) | Author: STINNER Victor (vstinner) * | Date: 2010-04-30 16:05 | |
Le vendredi 30 avril 2010 15:58:28, vous avez écrit : > It's better to let the application decide how to solve this problem > and in order to allow for this, the encodings must be adjustable. On POSIX, use byte strings to avoid encoding issues. Examples: subprocess.call(['env'], {b'TEST: b'a\xff-'}) # env subprocess.call(['echo', b'a\xff-']) # command line open('a\xff-') # filename os.getenv(b'a\xff-') # get env (result as unicode) Are you talking about issues on Windows? > By using fsencode() and fsdecode() in stdlib functions, you basically > prevent this kind of adjustment, ... Not if you use byte strings. On POSIX, an unicode string is always converted at the end for the system call (using sys.getfilesystemencoding()). > If you know that e.g. your environment variables are going to have > Latin-1 data (say some content-type variable has this information), > but the user's default LANG setting is UTF-8, Python will fetch the > data as broken Unicode data, you then have to convert it back to bytes > and then back to Unicode using the correct Latin-1 encoding. > > It would be a lot better to have the application provide the > encoding to the os.getenv() function and have Python do the > correct decoding right from the start. You mean that os.getenv() should have an optionnal argument? Something like: def getenv(key, default=None, encoding=None): value = environ.get(key, default) if encoding: value = value.encode(sys.getfileystemencoding(), 'surrogateescape') value = value.decode(encoding, 'surrogateescape') return value There are many indirect calls to os.getenv() (eg. by using os.environ.get()): - curses uses TERM - webbrowser uses PROGRAMFILES (path) - distutils.msvc9compiler uses "VS%0.f0COMNTOOLS" % version (path) - wsgiref.util uses HTTP_HOST, SERVER_NAME, SCRIPT_NAME, ... (url) - platform uses PROCESSOR_ARCHITEW6432 - sysconfig uses PYTHONUSERBASE, APPDATA, ... (path) - idlelib.PyShell uses IDLESTARTUP and PYTHONSTARTUP (path) - ... How would you specify the correct encoding in indirect calls? If your application gets variables in *mixed* encoding, I think that your program should start by reencoding variables: for name, encoding in (('PATH', 'latin1'), ...): value = os.getenv(name) value = value.encode(sys.getfileystemencoding(), 'surrogateescape') value = value.decode(encoding, 'surrogateescape') os.setenv(name, value) |
|||
msg104650 - (view) | Author: Marc-Andre Lemburg (lemburg) * | Date: 2010-04-30 16:25 | |
STINNER Victor wrote: > > STINNER Victor <victor.stinner@haypocalc.com> added the comment: > > Le vendredi 30 avril 2010 15:58:28, vous avez écrit : >> It's better to let the application decide how to solve this problem >> and in order to allow for this, the encodings must be adjustable. > > On POSIX, use byte strings to avoid encoding issues. Examples: > > subprocess.call(['env'], {b'TEST: b'a\xff-'}) # env > subprocess.call(['echo', b'a\xff-']) # command line > open('a\xff-') # filename > os.getenv(b'a\xff-') # get env (result as unicode) > > Are you talking about issues on Windows? The issues normally occur on the way in, not the way out of Python, so I don't see how using bytes would help. >> By using fsencode() and fsdecode() in stdlib functions, you basically >> prevent this kind of adjustment, ... > > Not if you use byte strings. On POSIX, an unicode string is always converted > at the end for the system call (using sys.getfilesystemencoding()). Right and that's a problem since the file system encoding doesn't need to have anything to do with what you have in the environment. >> If you know that e.g. your environment variables are going to have >> Latin-1 data (say some content-type variable has this information), >> but the user's default LANG setting is UTF-8, Python will fetch the >> data as broken Unicode data, you then have to convert it back to bytes >> and then back to Unicode using the correct Latin-1 encoding. >> >> It would be a lot better to have the application provide the >> encoding to the os.getenv() function and have Python do the >> correct decoding right from the start. > > You mean that os.getenv() should have an optionnal argument? Something like: Yes. > def getenv(key, default=None, encoding=None): > value = environ.get(key, default) > if encoding: > value = value.encode(sys.getfileystemencoding(), 'surrogateescape') > value = value.decode(encoding, 'surrogateescape') > return value No, you store the environment data as bytes and only decode in getenv() based on the given encoding or using the file system encoding or default encoding (UTF-8) as default. It would probably also worthwhile adding the encoding parameter to os.environ.get(). > There are many indirect calls to os.getenv() (eg. by using os.environ.get()): > - curses uses TERM > - webbrowser uses PROGRAMFILES (path) > - distutils.msvc9compiler uses "VS%0.f0COMNTOOLS" % version (path) > - wsgiref.util uses HTTP_HOST, SERVER_NAME, SCRIPT_NAME, ... (url) > - platform uses PROCESSOR_ARCHITEW6432 > - sysconfig uses PYTHONUSERBASE, APPDATA, ... (path) > - idlelib.PyShell uses IDLESTARTUP and PYTHONSTARTUP (path) > - ... > > How would you specify the correct encoding in indirect calls? In all of the above cases, the application (in this case the various modules) knows which encoding to expect and can add the right encoding parameter to the os.getenv() call. E.g. the cgi module can use the content-type passed in as environment parameter to determine the encoding, most other modules will just use ASCII or the file system encoding if they are dealing with paths or file names. > If your application gets variables in *mixed* encoding, I think that your > program should start by reencoding variables: > > for name, encoding in (('PATH', 'latin1'), ...): > value = os.getenv(name) > value = value.encode(sys.getfileystemencoding(), 'surrogateescape') > value = value.decode(encoding, 'surrogateescape') > os.setenv(name, value) Which is a cludge as I mentioned in my previous comment: value = os.getenv(name, encoding=encoding) my_environ[name] = value reads much better. Also note that os.setenv() won't work since that'll use the file system encoding for encoding the value back into the C process environment array. You'd end up with mojibake in your C environment array. The point I want to make is that adding fsencode() and fsdecode() will help refactor the code a bit, but it shouldn't be used as excuse for not making the encoding explicit. |
|||
msg104652 - (view) | Author: STINNER Victor (vstinner) * | Date: 2010-04-30 17:16 | |
> No, you store the environment data as bytes and only > decode in getenv() ... Yes, this is the best solution for POSIX. We need maybe also a os.getenvb()->bytes function, maybe only on POSIX. But I think that Windows should continue to use unicode environment variables. Should os.getenv(key, encoding=...) reencode the value on Windows? |
|||
msg104654 - (view) | Author: Marc-Andre Lemburg (lemburg) * | Date: 2010-04-30 17:48 | |
STINNER Victor wrote: > > STINNER Victor <victor.stinner@haypocalc.com> added the comment: > >> No, you store the environment data as bytes and only >> decode in getenv() ... > > Yes, this is the best solution for POSIX. We need maybe also a os.getenvb()->bytes function, maybe only on POSIX. Yes, plus a os.setenvb() function to pass the data back to the C level array. > But I think that Windows should continue to use unicode environment variables. Should os.getenv(key, encoding=...) reencode the value on Windows? Good idea. That would make applications more easily portable between Windows and POSIX. |
|||
msg104672 - (view) | Author: STINNER Victor (vstinner) * | Date: 2010-04-30 23:03 | |
Ok, here is a first version of my patch to implement os.environb: - os.environb is the bytes version of os.environ, both are synchronized - os.environ(b).data stores bytes keys and values on POSIX (but unicode on Windows) - create os.getenvb()->bytes - os.environb and os.getenvb() are not available on Windows nor OS/2 - os.environ(b) et os.getenv(b)() accept both byte and unicode keys: that's maybe a stupid idea, I don't know yet :-) - fix #8513: subprocess: support bytes program name on POSIX - create os.fsencode() and os.fsdecode() The patch is not done (the documentation should be updated), but it's a new step to help the discussion. I didn't tried it on Windows. I already try twice to write os.environb some months ago, but I failed (it was too complex for me). os.environ and os.environb now share the same "data" dictionary, and their methods converts inputs and outputs if necessary. |
|||
msg104723 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2010-05-01 15:12 | |
In posixmodule.c, the following snippet doesn't make sense anymore: if (k == NULL) { PyErr_Clear(); continue; } If memory allocation of the bytes object fails, we should error out. (same for "if (v == NULL)" a bit later) |
|||
msg104802 - (view) | Author: Martin v. Löwis (loewis) * | Date: 2010-05-02 20:34 | |
I really, really, REALLY think that it is bad to mix issues. This makes patch review impossible. This specific issue is about introducing an fsdecode and fsencode function; this is what the bug title says, and what the initial patch did. Whether or not byte-oriented access to environment variables is also needed is a *separate* issue. -1 on dealing with that in this report. FWIW, I'm +0 on adding these functions. MAL, please stop messing issue subjects. If you are fundamentally opposed to adding such functions, please request that a PEP be written or something. Otherwise, I accept the original patch. I'm -1 on issue8514.patch; it is out-of-scope of the issue. |
|||
msg104823 - (view) | Author: Marc-Andre Lemburg (lemburg) * | Date: 2010-05-03 08:34 | |
I agree with Martin regarding the os.environ changes. Victor, please open a new ticket for this. Martin: As you probably know, these issues are managed as micro- mailing lists. Discussions on these lists often result in new aspects which then drift off to new issues. That's normal business and we are all well aware of this. Please stop yelling all about the place and change your tone ! Thanks. |
|||
msg104826 - (view) | Author: STINNER Victor (vstinner) * | Date: 2010-05-03 09:02 | |
loewis> I really, really, REALLY think that it is bad to mix issues. loewis> This makes patch review impossible. I tried to, but it looks difficult :-) Anyway, I opened #8603. > This specific issue is about introducing an fsdecode and fsencode > function; this is what the bug title says, and what the initial patch > did. I know, but the two topics (fs*code() and os.environb) are very close and related. My os.environb implementation uses fsencode()/fsdecode(). > FWIW, I'm +0 on adding these functions. MAL, please stop messing > issue subjects. (...) I think that we cannot decide correctly about fs*code() until we decided for os.environb. |
|||
msg104869 - (view) | Author: Martin v. Löwis (loewis) * | Date: 2010-05-03 19:32 | |
> I think that we cannot decide correctly about fs*code() until we decided for os.environb. Why is that? In msg104063, you claim that you want to create these functions to deal with file names (not environment variables), in msg104064, you claim that #8513 (which is about the program name in subprocess) would benefit from these functions. Do these use cases become invalid if os.environb becomes available? |
|||
msg104874 - (view) | Author: STINNER Victor (vstinner) * | Date: 2010-05-03 20:12 | |
> Why is that? In msg104063, you claim that you want to create these > functions to deal with file names (not environment variables) Yes, but my os_path_fs_encode_decode-3.patch uses it in getenv() which is maybe a bad idea: os.environb may avoid this. > in msg104064, you claim that #8513 (which is about the program name in > subprocess) would benefit from these functions. Do these use cases > become invalid if os.environb becomes available? #8513 is also related to environment variables: subprocess._execute_child() calls os.get_exec_path() which search the PATH environment variable. It would be nice to support bytes environment variable in the env argument of Popen constructor (bytes key and/or value). |
|||
msg104876 - (view) | Author: Martin v. Löwis (loewis) * | Date: 2010-05-03 20:18 | |
STINNER Victor wrote: > STINNER Victor <victor.stinner@haypocalc.com> added the comment: > >> Why is that? In msg104063, you claim that you want to create these >> functions to deal with file names (not environment variables) > > Yes, but my os_path_fs_encode_decode-3.patch uses it in getenv() which > is maybe a bad idea: os.environb may avoid this. IIUC, that usage is an equivalent transformation, i.e. the code doesn't change its behavior. It is mere refactorization. So *if* these functions are accepted, this change is a good idea regardless of the os.environb introduction (unless I'm missing something, and there is indeed a behavior change). >> in msg104064, you claim that #8513 (which is about the program name in >> subprocess) would benefit from these functions. Do these use cases >> become invalid if os.environb becomes available? > > #8513 is also related to environment variables: subprocess._execute_child() > calls os.get_exec_path() which search the PATH environment variable. > It would be nice to support bytes environment variable in the env > argument of Popen constructor (bytes key and/or value). I still fail to see why this would make this issue block on the os.environb introduction. Whether this gets introduced or not, the program name issue remains, no? |
|||
msg104896 - (view) | Author: STINNER Victor (vstinner) * | Date: 2010-05-03 22:31 | |
> IIUC, that usage is an equivalent transformation, i.e. the code doesn't > change its behavior. It is mere refactorization. I changed os.getenv() to accept byte string key (in a previous commit), but I don't like this hack. If we have os.environb, os.getenv() shouldn't support bytes anymore (but use str only, as before). -- I worked a little more on fsencode()/os.environb, trying to fix all issues. fsdecode() is no more needed if we have os.environb, and fsencode() can be simplified to: def fsencode(value): return value.encode(sys.getfilesystemencoding(), 'surrogateescape') fsdecode() leads to mojibake. |
|||
msg104921 - (view) | Author: STINNER Victor (vstinner) * | Date: 2010-05-04 10:56 | |
I think that fsencode() (and fsdecode()) should be specific to POSIX. I don't know any good reason to encode a nice and correctly encoded unicode string to the ugly MBCS "encoding". |
|||
msg105171 - (view) | Author: STINNER Victor (vstinner) * | Date: 2010-05-06 23:13 | |
New short, simple and clean path: add os.fsencode() for Unix only. -- Don't create it for Windows to encourage the usage of unicode on Windows (and use MBCS is a bad idea). fsdecode() was a also bad idea: it's better to keep bytes unchanged on Unix, and it's now possible thanks to os.environb and os.getenvb(). |
|||
msg105264 - (view) | Author: Gregory P. Smith (gregory.p.smith) * | Date: 2010-05-08 05:25 | |
+.. function:: fsencode(value) + + Encode *value* to bytes for use in the file system, environment variables or + the command line. Use :func:`sys.getfilesystemencoding` and + ``'surrogateescape'`` error handler for str, and keep bytes unchanged. I'd word the latter sentence as: Uses :func:`sys.getfilesystemencoding` and ``'surrogateescape'`` error handler for strings and returns bytes unchanged. Otherwise I think this patch looks good. +1 |
|||
msg105278 - (view) | Author: STINNER Victor (vstinner) * | Date: 2010-05-08 11:12 | |
Commited: r80971 (py3k), blocked by r80972 (3.1). |
|||
msg105301 - (view) | Author: Benjamin Peterson (benjamin.peterson) * | Date: 2010-05-08 15:35 | |
Why does this have no tests? |
|||
msg105363 - (view) | Author: STINNER Victor (vstinner) * | Date: 2010-05-09 02:23 | |
> Why does this have no tests? The function is trivial. Does it really need tests? What kind of tests? fsencode() is already tested indirectly by test_subprocess, and #8513 will add new tests. |
|||
msg105364 - (view) | Author: Benjamin Peterson (benjamin.peterson) * | Date: 2010-05-09 02:31 | |
2010/5/8 STINNER Victor <report@bugs.python.org>: > > STINNER Victor <victor.stinner@haypocalc.com> added the comment: > >> Why does this have no tests? > > The function is trivial. Does it really need tests? What kind of tests? Check that it is equivalent to utf-8 with surrogatesescape then. > > fsencode() is already tested indirectly by test_subprocess, and #8513 will add > new tests. Excuses, excuses! |
|||
msg105368 - (view) | Author: STINNER Victor (vstinner) * | Date: 2010-05-09 03:18 | |
> Check that it is equivalent to utf-8 with surrogatesescape then. The file system encoding can be anything, not only utf-8. Anyway: r81014. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:57:00 | admin | set | github: 52760 |
2010-05-09 03:18:11 | vstinner | set | messages: + msg105368 |
2010-05-09 02:31:38 | benjamin.peterson | set | messages: + msg105364 |
2010-05-09 02:23:44 | vstinner | set | messages: + msg105363 |
2010-05-08 15:35:16 | benjamin.peterson | set | nosy:
+ benjamin.peterson messages: + msg105301 |
2010-05-08 11:12:37 | vstinner | set | status: open -> closed resolution: accepted -> fixed messages: + msg105278 |
2010-05-08 10:58:34 | vstinner | set | title: Create fsencode() and fsdecode() functions in os.path -> Add fsencode() functions to os module |
2010-05-08 05:25:28 | gregory.p.smith | set | messages: + msg105264 |
2010-05-07 00:17:00 | vstinner | link | issue8513 dependencies |
2010-05-06 23:13:24 | vstinner | set | files:
+ fsencode.patch messages: + msg105171 |
2010-05-06 23:09:03 | vstinner | set | files: - issue8514.patch |
2010-05-06 23:09:00 | vstinner | set | files: - os_path_fs_encode_decode-3.patch |
2010-05-04 11:10:40 | vstinner | unlink | issue8513 dependencies |
2010-05-04 10:56:27 | vstinner | set | messages: + msg104921 |
2010-05-03 22:31:06 | vstinner | set | messages: + msg104896 |
2010-05-03 20:18:25 | loewis | set | messages: + msg104876 |
2010-05-03 20:12:08 | vstinner | set | messages: + msg104874 |
2010-05-03 19:32:06 | loewis | set | messages: + msg104869 |
2010-05-03 09:02:57 | vstinner | set | messages: + msg104826 |
2010-05-03 08:34:23 | lemburg | set | messages: + msg104823 |
2010-05-02 20:34:52 | loewis | set | resolution: accepted messages: + msg104802 |
2010-05-01 15:12:16 | pitrou | set | nosy:
+ pitrou messages: + msg104723 |
2010-04-30 23:03:50 | vstinner | set | files:
+ issue8514.patch messages: + msg104672 |
2010-04-30 17:48:24 | lemburg | set | messages: + msg104654 |
2010-04-30 17:16:06 | vstinner | set | messages: + msg104652 |
2010-04-30 16:25:39 | lemburg | set | messages: + msg104650 |
2010-04-30 16:05:27 | vstinner | set | messages: + msg104648 |
2010-04-30 13:58:24 | lemburg | set | messages: + msg104635 |
2010-04-26 14:05:07 | vstinner | set | files: - os_path_fs_encode_decode-2.patch |
2010-04-26 14:05:01 | vstinner | set | files:
+ os_path_fs_encode_decode-3.patch messages: + msg104236 |
2010-04-26 12:00:16 | vstinner | set | messages:
+ msg104225 title: Create fs_encode() and fs_decode() functions in os.path -> Create fsencode() and fsdecode() functions in os.path |
2010-04-26 11:44:54 | vstinner | set | messages: + msg104224 |
2010-04-26 11:16:18 | lemburg | set | messages:
+ msg104220 title: Create fsencode() and fsdecode() functions in os.path -> Create fs_encode() and fs_decode() functions in os.path |
2010-04-26 11:06:46 | lemburg | set | messages: + msg104218 |
2010-04-26 10:49:21 | vstinner | set | messages: + msg104214 |
2010-04-26 10:37:59 | vstinner | set | messages: + msg104210 |
2010-04-26 05:30:58 | gregory.p.smith | set | nosy:
+ gregory.p.smith messages: + msg104200 |
2010-04-25 23:20:56 | vstinner | set | messages: + msg104186 |
2010-04-25 23:20:36 | vstinner | set | files: - os_path_fs_encode_decode.patch |
2010-04-25 23:20:24 | vstinner | set | files:
+ os_path_fs_encode_decode-2.patch messages: + msg104185 title: Create fs_encode() and fs_decode() functions in os.path -> Create fsencode() and fsdecode() functions in os.path |
2010-04-25 16:01:17 | vstinner | set | messages: + msg104147 |
2010-04-24 14:51:46 | Arfrever | set | nosy:
+ Arfrever |
2010-04-24 08:43:13 | ezio.melotti | set | priority: normal nosy: + ezio.melotti type: enhancement stage: patch review |
2010-04-24 08:33:43 | lemburg | set | messages: + msg104068 |
2010-04-23 23:48:01 | vstinner | link | issue8513 dependencies |
2010-04-23 23:44:06 | vstinner | set | messages: + msg104064 |
2010-04-23 23:41:13 | vstinner | set | nosy:
+ lemburg, loewis |
2010-04-23 23:39:10 | vstinner | create |