Issue 10039: python é.py fails with UnicodeEncodeError if PYTHONFSENCODING is used

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/54248

classification

Title:	python é.py fails with UnicodeEncodeError if PYTHONFSENCODING is used
Type:		Stage:
Components:	Interpreter Core, Unicode	Versions:	Python 3.2

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:		Nosy List:	eric.araujo, r.david.murray, vstinner
Priority:	normal	Keywords:	patch

Created on 2010-10-07 01:25 by vstinner, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
redecode_filename.patch	vstinner, 2010-10-07 01:25		review

Messages (7)
msg118089 - (view)	Author: STINNER Victor (vstinner) *	Date: 2010-10-07 01:25
If a program name contains a non-ascii character in its name and/or full path and PYTHONFSENCODING is set to an encoding different than the locale encoding, Python fails to open the program. Example in the utf-8 locale: $ PYTHONFSENCODING=ascii ./python é.py UnicodeEncodeError: 'ascii' codec can't encode character '\xe9' in position 0: ordinal not in range(128) This issue is similar to #9992 and #10014. Solutions: remove PYTHONFSENCODING environment variable or redecode the filename from the locale encoding to the filesystem encoding. Attached patch implements the latter. -- We may also redecode Py_GetProgramName().
msg118436 - (view)	Author: Éric Araujo (eric.araujo) *	Date: 2010-10-12 17:06
I don’t understand why reading a filename would not respect the envvar stating the filesystem encoding.
msg118444 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2010-10-12 17:29
Éric, if you are saying, "the user asked for it, it should fail", then that is indeed one of the arguments put forward in issue 9992 where this was discussed. But I think the emerging consensus is that it is better to just avoid the problem by always using the locale on Unix, and solve the problem that PYTHONFSENCODING was supposed to solve in a different way (by always using utf-8 on OSX and unicode on Windows).
msg118445 - (view)	Author: Éric Araujo (eric.araujo) *	Date: 2010-10-12 17:39
> if you are saying, "the user asked for it, it should fail", then > that is indeed one of the arguments put forward in issue 9992 where > this was discussed. You could put it that way, thanks for phrasing my thoughts :) > But I think the emerging consensus is that it is better to just avoid > the problem by always using the locale on Unix, displays his lack of knowledge Is it always correct to decode a filename with the locale encoding on Unix? Can’t each filesystem have its own encoding? > and solve the problem that PYTHONFSENCODING was supposed to solve in a > different way (by always using utf-8 on OSX and unicode on Windows). If there is a better alternate way, let’s go for it, and maybe remove PYTHONFSENCODING altogether, since it’s new in 3.2. Thanks for explaining! I’ll repay your time by reviewing the doc patches.
msg118492 - (view)	Author: STINNER Victor (vstinner) *	Date: 2010-10-13 00:25
> Is it always correct to decode a filename with the locale encoding > on Unix? Do you know something better than the locale encoding? I don't. > Can’t each filesystem have its own encoding? Yes, but how do you get the encoding of each filesystem? I think that few or no application support such case without mojibake. Backup programs can use the "raw" (bytes) API of Python 3 to avoid all encoding issues. -- As wrote R. David Murray, read issue #9992 if you would like to know more about this problem and the different proposed solutions. I voted for removal of PYTHONFSENCODING which fix most issues.
msg118593 - (view)	Author: STINNER Victor (vstinner) *	Date: 2010-10-13 22:20
Fixed by r85430 (remove PYTHONFSENCODING), see #9992.
msg119039 - (view)	Author: Éric Araujo (eric.araujo) *	Date: 2010-10-18 17:03
> Do you know something better than the locale encoding? I don't. Neither do I, sorry. >> Can’t each filesystem have its own encoding? > Yes, but how do you get the encoding of each filesystem? If I really had to, on linux I could parse the output of the mount command, but this could get messy quickly, and of course is not okay for official Python. > Backup programs can use the "raw" (bytes) API of Python 3 to avoid > all encoding issues. Neat! > As wrote R. David Murray, read issue #9992 if you would like to know > more about this problem and the different proposed solutions. I did so, thanks for the pointer and all the explanations.

History
Date	User	Action	Args
2022-04-11 14:57:07	admin	set	github: 54248
2010-10-18 17:03:42	eric.araujo	set	messages: + msg119039
2010-10-13 22:20:20	vstinner	set	status: open -> closed resolution: fixed messages: + msg118593
2010-10-13 00:25:38	vstinner	set	messages: + msg118492
2010-10-12 17:39:45	eric.araujo	set	messages: + msg118445
2010-10-12 17:29:26	r.david.murray	set	nosy: + r.david.murray messages: + msg118444
2010-10-12 17:06:01	eric.araujo	set	nosy: + eric.araujo messages: + msg118436
2010-10-07 11:33:05	vstinner	link	issue10014 dependencies
2010-10-07 01:25:31	vstinner	create