Message71680
Le Thursday 21 August 2008 18:17:47 Guido van Rossum, vous avez écrit :
> The proper work-around is for the app to pass bytes into os.listdir();
> then it will return bytes.
In my case, I just would like to remove a directory with shutil.rmtree(). I
don't know if it contains bytes or characters filenames :-)
> It would be nice if open() etc. accepted
> bytes (as well as strings of course), at least on Unix, but not
> absolutely necessary -- the app could also just know the right encoding.
An invalid filename has no charset. It's just a "raw" byte string. So open(),
unlink(), etc. have to accept byte string. Maybe not in the Python version
with in low level (C version)?
> I see two reasonable alternatives for what os.listdir() should return
> when the input is a string and one of the filenames can't be decoded:
> either omit it from the output list;
It's not a good option: rmtree() will fails because the directory in not
empty :-/
> or use errors='replace' in the encoding.
It will also fails because filenames will be invalid (valid unicode string but
non existent file names :-/).
> Failing the entire os.listdir() call is not acceptable, and
> neither is returning a mixture of str and bytes instances.
Ok, I have another suggestion:
- *by default*, listdir() only returns str and raise an error (TypeError?)
on invalid filename
- add an optional argument (a callback), eg. "fallback_encoder", to catch
such errors (similar to "onerror" from shutils.rmtree())
Example of new listdir implementation (pseudo-code):
charset = sys.getfilesystemcharset()
dirobj = opendir(path)
try:
for bytesname in readdir(dirobj):
try:
name = str(bytesname, charset)
exept UnicodeDecodeError:
name = fallback_encoder(bytesname)
yield name
finally:
closedir(dirobj)
The default fallback_encoder:
def fallback_encoder(name):
raise
Keep raw bytes string:
def fallback_encoder(name):
return name
Create my custom filename object:
class Filename:
...
def fallback_encoder(name):
return Filename(name)
If a callback is overkill, we can just add an option,
eg. "keep_invalid_filename=True", to ask listdir() to keep bytes string if
the conversion to unicode fails.
In any case, open(), unlink(), etc. have to accept byte string to be accept to
read, copy, remove invalid filenames. In a perfect world, all filenames would
be valid UTF-8 strings, but in the real world (think to Matrix :-)), we have
to support such strange cases... |
|
Date |
User |
Action |
Args |
2008-08-21 20:55:34 | vstinner | set | recipients:
+ vstinner, gvanrossum, amaury.forgeotdarc, pitrou, benjamin.peterson, HWJ |
2008-08-21 20:55:33 | vstinner | link | issue3187 messages |
2008-08-21 20:55:32 | vstinner | create | |
|