Message116230
It remembers me the discussion of the issue #3187. About unencodable filenames,
Guido proposed to ignore them or to use errors="replace", and wrote "Failing
the entire os.listdir() call is not acceptable". (... long discussion ...) And
finally, os.listdir() ignored undecodable filenames on UNIX/BSD.
Then you introduced the genious PEP 383 (utf8b then renamed surrogateescape)
and os.listdir() now raises an error if the PyUnicode_FromEncodedObject(v,
Py_FileSystemDefaultEncoding, "surrogateescape") fails... which doesn't occur
because of undecodable byte sequence, but for other reasons like a memory
error.
About Windows, os.listdir(str) never fails, but my question is about
os.listdir(bytes). Should os.listdir(bytes) returns invalid filenames (encoded
with "mbcs+replace", filenames not usable to open, rename or delete the file) or
just ignore them?
> Ok. Then I'm -1 on the patch: you can't know whether the application
> actually wants to open the file. Perhaps it only wants to display the
> file names, or perhaps it only wants to open some of the files, or
> only traverse into subdirectories.
>
> For backwards compatibility, I recommend to leave things as they are.
> FindFirst/NextFileA will also do some other interesting conversions,
> such as the best-fit conversion (which the "mbcs" code doesn't do
> (anymore?)).
"it only wants to open some of the files" is the typical reason for which I
hate Python2 and its implicit conversion between bytes and characters: it
works in most cases, but it fails "sometimes". The problem is to define (and
explain) "sometimes".
The typical use case of listing a directory is a file chooser. On Windows using
the bytes API, it works in most cases, but it fails if the user picks the
"wrong" file (name with "?"). That's the problem I would like to address.
--
Ignore unencodable filenames solution is compatible with the "traverse into
subdirectories" case. And it does also keep backward compatibility (except
that unencodable files are hidden, which is a least problem I think).
--
I proposed to raise an error on unencodable filename. I changed my mind after
reading your answer and the discussion on #3187. My patch breaks compatibility
and users don't bother to unencodable filenames. Eg. glob("*.mp3") should not
fail if the directory contains a temporary unencodable filename ("xxx.tmp"). |
|
Date |
User |
Action |
Args |
2010-09-12 21:14:56 | vstinner | set | recipients:
+ vstinner, loewis |
2010-09-12 21:14:54 | vstinner | link | issue9820 messages |
2010-09-12 21:14:53 | vstinner | create | |
|