This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author vstinner
Recipients loewis, vstinner
Date 2010-09-12.21:14:53
SpamBayes Score 2.0972113e-13
Marked as misclassified No
Message-id <201009122314.46191.victor.stinner@haypocalc.com>
In-reply-to <4C8D09F0.7020901@v.loewis.de>
Content
It remembers me the discussion of the issue #3187. About unencodable filenames, 
Guido proposed to ignore them or to use errors="replace", and wrote "Failing 
the entire os.listdir() call is not acceptable". (... long discussion ...) And 
finally, os.listdir() ignored undecodable filenames on UNIX/BSD.

Then you introduced the genious PEP 383 (utf8b then renamed surrogateescape) 
and os.listdir() now raises an error if the PyUnicode_FromEncodedObject(v, 
Py_FileSystemDefaultEncoding, "surrogateescape") fails... which doesn't occur 
because of undecodable byte sequence, but for other reasons like a memory 
error.

About Windows, os.listdir(str) never fails, but my question is about 
os.listdir(bytes). Should os.listdir(bytes) returns invalid filenames (encoded 
with "mbcs+replace", filenames not usable to open, rename or delete the file) or 
just ignore them?

> Ok. Then I'm -1 on the patch: you can't know whether the application
> actually wants to open the file. Perhaps it only wants to display the
> file names, or perhaps it only wants to open some of the files, or
> only traverse into subdirectories.
>
> For backwards compatibility, I recommend to leave things as they are.
> FindFirst/NextFileA will also do some other interesting conversions,
> such as the best-fit conversion (which the "mbcs" code doesn't do
> (anymore?)).

"it only wants to open some of the files" is the typical reason for which I 
hate Python2 and its implicit conversion between bytes and characters: it 
works in most cases, but it fails "sometimes". The problem is to define (and 
explain) "sometimes".

The typical use case of listing a directory is a file chooser. On Windows using 
the bytes API, it works in most cases, but it fails if the user picks the 
"wrong" file (name with "?"). That's the problem I would like to address.

--

Ignore unencodable filenames solution is compatible with the "traverse into 
subdirectories" case. And it does also keep backward compatibility (except 
that unencodable files are hidden, which is a least problem I think).

--

I proposed to raise an error on unencodable filename. I changed my mind after 
reading your answer and the discussion on #3187. My patch breaks compatibility 
and users don't bother to unencodable filenames. Eg. glob("*.mp3") should not 
fail if the directory contains a temporary unencodable filename ("xxx.tmp").
History
Date User Action Args
2010-09-12 21:14:56vstinnersetrecipients: + vstinner, loewis
2010-09-12 21:14:54vstinnerlinkissue9820 messages
2010-09-12 21:14:53vstinnercreate