This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author dlitz
Recipients HWJ, amaury.forgeotdarc, benjamin.peterson, dlitz, gvanrossum, pitrou, vstinner, zegreek
Date 2008-09-27.15:12:03
SpamBayes Score 2.0539126e-15
Marked as misclassified No
Message-id <1222528326.7.0.602635776173.issue3187@psf.upfronthosting.co.za>
In-reply-to
Content
On Sat, Sep 27, 2008 at 01:15:46AM +0000, Guido van Rossum wrote:
> I don't see the advantage over the existing rule bytes in -> bytes out...

Guido,

I figure I should say something since I have some experience in this area.

I wrote some automatic backup software in Python 2 earlier this year.  It
had to work on ext3/Linux (where filenames are natively octet-strings) and
on NTFS/Win32 (where filenames are natively unicode-strings).  I had to be
ridiculously careful to always use unicode paths on Win32, and to always
use str paths on Linux, because otherwise Python would do the conversion
automatically---poorly.

It was particularly bad on Win32, where if you used os.listdir() with a
non-unicode path (Python 2.x str object) in a directory that contained
non-ascii filenames, Windows would invent filenames that looked similar but
couldn't actually be found when using open().  So, naive (Python 2) code
like this would break:

    for filename in os.listdir("."):
        f = open(filename, "rb")
        # ...

On Linux, it was bad too, since if you used unicode paths, the filenames
actually opened would depend on your LANG or LC_CTYPE or LC_ALL environment
variables, and those could vary from one system to another, or even from
one invocation of the program to another.

The simple fact of the matter is that pathnames on Linux are _not_ Unicode,
and pathnames on Windows are _not_ octet strings.  They're fundamentally
incompatible types that can only be reconciled when you make assumptions
(e.g. specifying a character encoding) that allow you to convert from one
to the other.

Ideally, io.open(), os.listdir(), os.path.*, etc. would accept _only_
pathnames in their native format, and it would be the job of a wrapper to
provide a portable-but-less-robust interface on top of that.  Perhaps the
built-in functions would use the wrapper (with reasonable defaults), but
the native-only interface should be there for module-writers who want
robust pathname handling.
History
Date User Action Args
2008-09-27 15:12:06dlitzsetrecipients: + dlitz, gvanrossum, amaury.forgeotdarc, pitrou, vstinner, benjamin.peterson, HWJ, zegreek
2008-09-27 15:12:06dlitzsetmessageid: <1222528326.7.0.602635776173.issue3187@psf.upfronthosting.co.za>
2008-09-27 15:12:05dlitzlinkissue3187 messages
2008-09-27 15:12:04dlitzcreate