Author lemburg
Recipients lemburg, loewis, vstinner
Date 2010-04-24.08:33:42
SpamBayes Score 1.60372e-13
Marked as misclassified No
Message-id <4BD2ACE5.8060200@egenix.com>
In-reply-to <1272065952.1.0.987702075651.issue8514@psf.upfronthosting.co.za>
Content
STINNER Victor wrote:
> 
> New submission from STINNER Victor <victor.stinner@haypocalc.com>:
> 
> Python3 uses unicode filenames in Windows and bytes filenames (but support also unicode filenames) on other OS. We have to support both types. On POSIX system, bytes filenames can be stored in unicode filenames using sys.getfilesystemencoding() and the surrogateescape error handler (to store undecodable bytes as unicode surrogates, see PEP 383).
> 
> I would like to create fs_encode() and fs_decode() in os.path to ease the manipulation of filenames in the two bytes (str and bytes).
>  * Use fs_decode() to convert a filename from the OS native format to unicode
>  * Use fs_encode() to convert an unicode filename to the OS native format
> 
> On Windows, fs_decode() and fs_encode() don't touch the filename, but reject filenames of types different than str (unicode) with a TypeError, especially bytes filename.
> 
> Mac OS X rejects invalid UTF-8 filenames, and so surrogateescape should maybe not be used on this OS.
> 
> Attached patch is an implementation of this issue.

Please follow the naming convention used in os.path. The functions
would have to be called os.path.fsencode() and os.path.fsdecode().

Other than that, I'm +0 on the patch: the sys.filesystemencoding logic
doesn't really work well in practice - on Unix and BSD platforms, there's
no such thing as a single system-wide file system and consequently,
the file system encoding depends on the path you are looking at. For most
of those file systems, the name is just a sequence of bytes with arbitrary
encoding.
History
Date User Action Args
2010-04-24 08:33:45lemburgsetrecipients: + lemburg, loewis, vstinner
2010-04-24 08:33:43lemburglinkissue8514 messages
2010-04-24 08:33:42lemburgcreate