This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author dlitz
Recipients HWJ, amaury.forgeotdarc, benjamin.peterson, dlitz, gvanrossum, pitrou, vstinner
Date 2008-08-26.18:15:14
SpamBayes Score 4.828587e-07
Marked as misclassified No
Message-id <1219774521.19.0.793476495037.issue3187@psf.upfronthosting.co.za>
In-reply-to
Content
I think Guido already understands this, but I haven't seen it stated
very clearly here:

** Different systems use different "things" to identify files. **

On Linux/ext3, all filenames are *octet strings* (i.e. bytes), and
*only* the following caveats apply:
- a filename/pathname cannot contain the zero-octet (b"\x00").
- a filename/pathname cannot be empty.
- a filename cannot contain the slash (b"/"); In a pathname, the slash
is used to separate filenames.
- the filenames b"." and b".." have special meanings; They cannot be
created, deleted, or renamed.

All filenames that meet these criteria are valid, and calling them
"invalid" amounts to plugging one's ears and shouting "LA LA LA" while
imagining Unicode having pre-dated Unix.

It is sometimes convenient to imagine filenames on Linux/ext3 as
sequences of Unicode code points (where the encoding is specified by
LC_CTYPE---it's not necessarily UTF-8), but other times (e.g. in backup
tools that need to be robust in the face of mischievous users) it is an
unnecessary abstraction that introduces bugs.

On Windows/NTFS, the situation is entirely different: Filenames are
actually sequences of Unicode code points, and if you pretend they are
octet strings, Windows will happily invent phantom filenames for you
that will show up in the output of os.listdir(), but that will return
"File not found" if you try to open them for reading (if you open them
for writing, you risk clobbering other files that happens to have the
same names).

To avoid bugs, it should be possible to work exclusively with filenames
in the platform's native representation.  It was possible in Python 2
(though you had to be very careful).  Ideally, Python 3 would recognize
and enforce the difference instead of trying to guess the translations;
"Explicit is better than implicit" and all that.
History
Date User Action Args
2008-08-26 18:15:21dlitzsetrecipients: + dlitz, gvanrossum, amaury.forgeotdarc, pitrou, vstinner, benjamin.peterson, HWJ
2008-08-26 18:15:21dlitzsetmessageid: <1219774521.19.0.793476495037.issue3187@psf.upfronthosting.co.za>
2008-08-26 18:15:20dlitzlinkissue3187 messages
2008-08-26 18:15:14dlitzcreate