Issue 22862: os.walk fails on undecodable filenames

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/67051

classification

Title:	os.walk fails on undecodable filenames
Type:	behavior	Stage:	resolved
Components:	Library (Lib), Unicode	Versions:	Python 2.7

process

Status:	closed	Resolution:	wont fix
Dependencies:		Superseder:
Assigned To:		Nosy List:	ezio.melotti, fhoech, vstinner
Priority:	normal	Keywords:

Created on 2014-11-13 13:16 by fhoech, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Messages (7)
msg231110 - (view)	Author: Florian Höch (fhoech) *	Date: 2014-11-13 13:16
If 'top' is an unicode directory name, os.listdir can still return non-unicode filenames if they can't be decoded. This case is not handled in the Python 2.x standard library version of os.walk and will cause join(top, name) to fail on such filenames with an UnicodeDecodeError.
msg231111 - (view)	Author: STINNER Victor (vstinner) *	Date: 2014-11-13 13:23
What is your OS?
msg231112 - (view)	Author: Florian Höch (fhoech) *	Date: 2014-11-13 13:30
This problem only affects Linux as far as I know (in my case I'm using Fedora 21 Beta).
msg231115 - (view)	Author: STINNER Victor (vstinner) *	Date: 2014-11-13 14:40
Your problem has two solutions. 1) Upgrade to Python 3 which handles correctly your use case (thanks to the PEP 383, surrogateescape error handler) 2) Only process filenames as bytes, and encode/decode manually (so you can decide how to handle undecodable filenames)
msg231117 - (view)	Author: Florian Höch (fhoech) *	Date: 2014-11-13 14:50
1) Is not yet possible for me unfortunately, some libraries I require are not yet available for Python 3 (but in the long run, this would be my preferred solution) 2) Would necessitate too many changes in a carefully crafted, unicode-only application. I think I'll just override os.listdir and filter out filenames that are not decodable, or override os.walk and do something equivalent.
msg231118 - (view)	Author: STINNER Victor (vstinner) *	Date: 2014-11-13 14:57
> 1) Is not yet possible for me unfortunately, some libraries I require are not yet available for Python 3 (but in the long run, this would be my preferred solution) I'm curious, which libraries? Oh, I forgot to say that it's not possible to fix this issue in Python 2. Backporting the PEP 383 in Python 2 requires deep changes in the Unicode machinery, starting by the UTF-8 codec. Currently, the UTF-8 encoder encodes surrogates which violates Unicode standard and makes impossible to use this codec with the surrogateescape error handler.
msg231120 - (view)	Author: Florian Höch (fhoech) *	Date: 2014-11-13 15:16
> I'm curious, which libraries? wxPython and wexpect (wexpect I could probably port myself, so the problem is mainly with wx) > Oh, I forgot to say that it's not possible to fix this issue in Python 2. Backporting the PEP 383 in Python 2 requires deep changes in the Unicode machinery, starting by the UTF-8 codec. Ok, that's understandable of course.

History
Date	User	Action	Args
2022-04-11 14:58:10	admin	set	github: 67051
2014-11-13 15:16:27	fhoech	set	messages: + msg231120
2014-11-13 15:15:54	r.david.murray	set	status: open -> closed resolution: wont fix stage: resolved
2014-11-13 14:57:11	vstinner	set	messages: + msg231118
2014-11-13 14:50:07	fhoech	set	messages: + msg231117
2014-11-13 14:40:44	vstinner	set	messages: + msg231115
2014-11-13 13:30:54	fhoech	set	messages: + msg231112
2014-11-13 13:23:11	vstinner	set	nosy: + ezio.melotti, vstinner messages: + msg231111 components: + Unicode
2014-11-13 13:16:20	fhoech	create