New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
os.listdir can return byte strings #47437
Comments
The script below produces 1664 lines of output before it bails out with
Traceback (most recent call last):
File "WalkBug.py", line 5, in <module>
for Dir, SubDirs, Files in os.walk('/home/jarausch') :
File "/usr/local/lib/python3.0/os.py", line 278, in walk
for x in walk(path, topdown, onerror, followlinks):
File "/usr/local/lib/python3.0/os.py", line 268, in walk
if isdir(join(top, name)):
File "/usr/local/lib/python3.0/posixpath.py", line 64, in join
if b.startswith('/'):
TypeError: expected an object with the buffer interface ========================= #!/usr/local/bin/python3.0 import os
for Dir, SubDirs, Files in os.walk('/home/jarausch') :
print("processing {0:d} files in {1}".format(len(Files),Dir)) |
Could you tell us what this 1665th line should be? Can you try with an older version of python? |
It's failing because he's giving a string to bytes.startswith when it |
"he's giving a string"... the user simply called os.walk, which accepts We should discover what produced this bytestring. Does listdir() returns |
It seems the conversion to unicode strings (PyUnicode vs PyBytes) was |
The original problem seems to come from some Unix platform, but this
In the posix part of the function, there is the comment (2003-03-04): |
Yes, the next directory contains a filename with an iso-latin1 but non-
The patch (applied to SVN GMT 13:30) does NOT help. |
Hmm, I suppose that while the filename is latin1-encoded, |
Let's make this a release blocker for RCs. |
See bpo-3616 for a consequence of this. |
If the filename can not be encoded correctly in the system charset, orig = filename from the kernel (bytes)
filename = filename from listdir() (str)
dest = filename to the kernel (bytes) The goal is to get orig == dest. In my program Hachoir, to workaround IMHO, the best solution is to create such class: class Filename:
def __init__(self, orig):
self.as_bytes = orig
self.as_str = myformat(orig)
def __str__(self):
return self.as_str
def __bytes__(self):
return self.as_bytes New problems: I guess that functions operating on filenames |
Selon STINNER Victor <report@bugs.python.org>:
I agree that logically it's the right solution. It's also the most invasive. If |
I wrote a Filename class. I tries different methods:
The idea is to encode str -> bytes (and not bytes -> str because we I added an example of fixed os.listdir(): create Filename() object if |
I don't think that makes sense (especially under Windows which has Unicode file
Well, of course, if we create a filename type, then all os functions must be All this is highly speculative of course, and if we really follow this course |
Le Thursday 21 August 2008 14:55:43 Antoine Pitrou, vous avez écrit :
In we use "class Filename(str): ...", we have to ensure that all operations
If Filename has no parent class but is convertible to bytes(), os functions |
This sounds highly optimistic. Also, I think it's wrong to introduce a string-like class with implicit |
The proper work-around is for the app to pass bytes into os.listdir(); I see two reasonable alternatives for what os.listdir() should return |
Le Thursday 21 August 2008 18:17:47 Guido van Rossum, vous avez écrit :
In my case, I just would like to remove a directory with shutil.rmtree(). I
An invalid filename has no charset. It's just a "raw" byte string. So open(),
It's not a good option: rmtree() will fails because the directory in not
It will also fails because filenames will be invalid (valid unicode string but
Ok, I have another suggestion:
Example of new listdir implementation (pseudo-code): charset = sys.getfilesystemcharset()
dirobj = opendir(path)
try:
for bytesname in readdir(dirobj):
try:
name = str(bytesname, charset)
exept UnicodeDecodeError:
name = fallback_encoder(bytesname)
yield name
finally:
closedir(dirobj) The default fallback_encoder: def fallback_encoder(name):
raise Keep raw bytes string: def fallback_encoder(name):
return name Create my custom filename object: class Filename:
...
def fallback_encoder(name):
return Filename(name) If a callback is overkill, we can just add an option, In any case, open(), unlink(), etc. have to accept byte string to be accept to |
Le vendredi 03 octobre 2008 à 11:43 +0000, STINNER Victor a écrit :
Then make it: path = path if isinstance(path, str) else bytes(path) |
path=path is useless most of the code (unicode path), this code is
faster if both cases (bytes or unicode)!
if not isinstance(path, str):
path = bytes(path)
|
I've committed sys.setfilesystemencoding as r66769. Declaring it as a documentation issue now. Not sure whether it should |
Reducing priority to critical, it's just docs and tweaks from here. You should also support bytearray() in ntpath:
No, you shouldn't. I changed my mind on this several times and in the Amaury: I've reviewed your patch and ran test_ntpath.py on a Linux box. ====================================================================== Traceback (most recent call last):
File "Lib/test/test_ntpath.py", line 188, in test_relpath
tester('ntpath.relpath("a")', 'a')
File "Lib/test/test_ntpath.py", line 22, in tester
gotResult = eval(fn)
File "<string>", line 1, in <module>
File "/usr/local/google/home/guido/python/py3k/Lib/ntpath.py", line
530, in relpath
start_list = abspath(start).split(sep)
File "/usr/local/google/home/guido/python/py3k/Lib/ntpath.py", line
499, in abspath
path = join(os.getcwd(), path)
File "/usr/local/google/home/guido/python/py3k/Lib/ntpath.py", line
137, in join
if b[:1] in seps:
TypeError: 'in <string>' requires string as left operand, not bytes The fix is to change the fallback abspath to this code: def abspath(path):
"""Return the absolute version of a path."""
if not isabs(path):
if isinstance(path, bytes):
cwd = os.getcwdb()
else:
cwd = os.getcwd()
path = join(cwd, path)
return normpath(path) Once you fix that please check it in! |
Assigning to Amaury for Windows fix first. |
Thanks for testing the non-Windows part of ntpath. Leaving the issue open: macpath.py should certainly be modified. |
Sorry Amaury, but there's another issue. test_ntpath now fails when run with -bb: ====================================================================== Traceback (most recent call last):
File "Lib/test/test_ntpath.py", line 151, in test_expandvars
tester('ntpath.expandvars("$foo bar")', "bar bar")
File "Lib/test/test_ntpath.py", line 10, in tester
gotResult = eval(fn)
File "<string>", line 1, in <module>
File "/usr/local/google/home/guido/python/py3k/Lib/ntpath.py", line
344, in expandvars
if c in ('\'', b'\''): # no expansion within single quotes
BytesWarning: Comparison between bytes and string ====================================================================== Traceback (most recent call last):
File "Lib/test/test_ntpath.py", line 120, in test_normpath
tester("ntpath.normpath('A//////././//.//B')", r'A\B')
File "Lib/test/test_ntpath.py", line 10, in tester
gotResult = eval(fn)
File "<string>", line 1, in <module>
File "/usr/local/google/home/guido/python/py3k/Lib/ntpath.py", line
465, in normpath
if comps[i] in ('.', '', b'.', b''):
BytesWarning: Comparison between bytes and string ====================================================================== Traceback (most recent call last):
File "Lib/test/test_ntpath.py", line 188, in test_relpath
tester('ntpath.relpath("a")', 'a')
File "Lib/test/test_ntpath.py", line 10, in tester
gotResult = eval(fn)
File "<string>", line 1, in <module>
File "/usr/local/google/home/guido/python/py3k/Lib/ntpath.py", line
534, in relpath
start_list = abspath(start).split(sep)
File "/usr/local/google/home/guido/python/py3k/Lib/ntpath.py", line
504, in abspath
return normpath(path)
File "/usr/local/google/home/guido/python/py3k/Lib/ntpath.py", line
465, in normpath
if comps[i] in ('.', '', b'.', b''):
BytesWarning: Comparison between bytes and string |
FWIW, I don't see a need to change macpath.py -- it's only used for |
Committed r66779: test_ntpath now passes with the -bb option. It seems that the Windows buildbots do not set -bb. |
Thanks Amaury! On to Georg for doc tweaks. Summary:
Stuff that didn't change but that you might want to mention:
Martin already documented sys.setfilesystemencoding(). |
I have a patch for macpath.py nonetheless. I also added tests for three functions which were not exercised at all. |
Amaury, you're patch looks good. |
Committed macpath.py in r66781. |
IIUC, these fixes are still not complete: they lack documentation As for test cases: it seems that those got waived, in the hurry. |
Le Tuesday 07 October 2008 01:13:22 Martin v. Löwis, vous avez écrit :
Most (or all) patches include new tests about bytes. Here is a patch for
I wrote a long document about bytes for filenames but not only. I'm still
Can you be more precise? Which tests have to be improved/rewritten? |
Thanks! Committed as r66829. I've added additional documentation in r66830, which should complete
We should discuss that on python-dev, of course - the question is
I was probably looking at the wrong patches (such as getcwd_bytes.patch, |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: