This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author loewis
Recipients HWJ, amaury.forgeotdarc, benjamin.peterson, dlitz, gvanrossum, loewis, pitrou, vstinner, zegreek
Date 2008-09-29.04:45:28
SpamBayes Score 3.330669e-16
Marked as misclassified No
Message-id <48E05D64.4000501@v.loewis.de>
In-reply-to <1222649677.79.0.00352087354963.issue3187@psf.upfronthosting.co.za>
Content
> Consider this scenario.  On ext3/Linux, assume that UTF-8 is specified
> in the system locale.  What would happen if you have two files, named
> b"\xf3\xb3\x83\x80\x00" and b"\xc0\x00"?  Under your proposal, the first
> file would decode successfully as "\U000f30c0\x00", and the second file
> would decode unsuccessfully, so it would be mapped to
> "\U000f30c0\x00"---the same thing!

Correct.

> Under your proposal, you could end up with multiple files having the
> same filename (from Python's perspective). Python shouldn't break if
> somebody deliberately created some weird filenames.

I'm not so sure about that. Practicality beats purity.

> Your proposal would
> make it impossible to write a robust remote backup tool in Python 3.

There could be an option to set the file system encoding via an API
to some known safe value, such as Latin-1, or ASCII. If you set the
file system encoding to Latin-1, this escaping would never happen;
if you set it to ASCII, it would happen uniformly for all non-ASCII
bytes. The robust backup tool would have to know to set this option
on POSIX systems.

> Pathnames on ext3/Linux *are not Unicode*.  Blindly pretending they're
> Unicode is a leaky abstraction at best, and a security hole at worst.

I think most Linux users would disagree, and claim that file names are
indeed character strings (which is synonym to "being Unicode"). It is
technically true that it's possible to create file names which are not
text, but that's really a bug, not a feature - Unix and POSIX were never
intended to work this way. Also, in the overwhelming majority of Python
applications, consistent support for practically-existing systems
matters more than robustness against malicious users.
History
Date User Action Args
2008-09-29 04:45:32loewissetrecipients: + loewis, gvanrossum, amaury.forgeotdarc, pitrou, vstinner, benjamin.peterson, HWJ, dlitz, zegreek
2008-09-29 04:45:30loewislinkissue3187 messages
2008-09-29 04:45:28loewiscreate