Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setting the default filesystem-encoding #64046

Closed
Sworddragon mannequin opened this issue Nov 30, 2013 · 12 comments
Closed

Setting the default filesystem-encoding #64046

Sworddragon mannequin opened this issue Nov 30, 2013 · 12 comments
Labels
docs Documentation in the Doc dir type-feature A feature request or enhancement

Comments

@Sworddragon
Copy link
Mannequin

Sworddragon mannequin commented Nov 30, 2013

BPO 19847
Nosy @vstinner

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2013-12-13.10:24:56.924>
created_at = <Date 2013-11-30.22:29:52.577>
labels = ['type-feature', 'invalid', 'docs']
title = 'Setting the default filesystem-encoding'
updated_at = <Date 2017-12-18.14:32:57.602>
user = 'https://bugs.python.org/Sworddragon'

bugs.python.org fields:

activity = <Date 2017-12-18.14:32:57.602>
actor = 'vstinner'
assignee = 'docs@python'
closed = True
closed_date = <Date 2013-12-13.10:24:56.924>
closer = 'vstinner'
components = ['Documentation']
creation = <Date 2013-11-30.22:29:52.577>
creator = 'Sworddragon'
dependencies = []
files = []
hgrepos = []
issue_num = 19847
keywords = []
message_count = 12.0
messages = ['204853', '204998', '205000', '205001', '205002', '205006', '205008', '205058', '205982', '206049', '206113', '308563']
nosy_count = 3.0
nosy_names = ['vstinner', 'docs@python', 'Sworddragon']
pr_nums = []
priority = 'normal'
resolution = 'not a bug'
stage = None
status = 'closed'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue19847'
versions = ['Python 3.3', 'Python 3.4']

@Sworddragon
Copy link
Mannequin Author

Sworddragon mannequin commented Nov 30, 2013

sys.getfilesystemencoding() says for Unix: On Unix, the encoding is the user’s preference according to the result of nl_langinfo(CODESET), or 'utf-8' if nl_langinfo(CODESET) failed.

In my opinion relying on the locale environment is risky since filesystem-encoding != locale. This is especially the case if working on a filesystem from an external media like an external hard disk drive. Operating on multiple media can also result in different filesystem-encodings.

It would be useful if the user can make his own checks and change the default filesystem-encoding if needed.

@Sworddragon Sworddragon mannequin added topic-IO type-feature A feature request or enhancement labels Nov 30, 2013
@vstinner
Copy link
Member

vstinner commented Dec 2, 2013

"sys.getfilesystemencoding() says for Unix: On Unix, the encoding is the user’s preference according to the result of nl_langinfo(CODESET), or 'utf-8' if nl_langinfo(CODESET) failed."

Oh, this documentation is wrong since at least Python 3.2: if nl_langinfo(CODESET) fails, Python exits immediatly with a (fatal) error.

There is no (more?) such fallback to "utf-8".

@vstinner
Copy link
Member

vstinner commented Dec 2, 2013

I fixed the documentation, thanks for your report!

@vstinner vstinner added docs Documentation in the Doc dir and removed topic-IO labels Dec 2, 2013
@vstinner vstinner closed this as completed Dec 2, 2013
@vstinner
Copy link
Member

vstinner commented Dec 2, 2013

@Sworddragon
Copy link
Mannequin Author

Sworddragon mannequin commented Dec 2, 2013

It is nice that you could fixed the documentation due to this report but this was just a sideeffect - so closing this report and moving it to "Documentation" was maybe wrong.

@vstinner
Copy link
Member

vstinner commented Dec 2, 2013

(Oops, I specified the wrong issue number in my commits.)

New changeset b231e0c3fd26 by Victor Stinner in branch '3.3':
Issue bpo-19728: Fix sys.getfilesystemencoding() documentation
http://hg.python.org/cpython/rev/b231e0c3fd26

New changeset e3c48bddf621 by Victor Stinner in branch 'default':
(Merge 3.3) Issue bpo-19728: Fix sys.getfilesystemencoding() documentation
http://hg.python.org/cpython/rev/e3c48bddf621

@vstinner
Copy link
Member

vstinner commented Dec 2, 2013

"It is nice that you could fixed the documentation due to this report but this was just a sideeffect - so closing this report and moving it to "Documentation" was maybe wrong."

Oh sorry, I read the issue too quickly, I stopped at the first sentence. I reopen the issue the reply to the other points.

"In my opinion relying on the locale environment is risky since filesystem-encoding != locale. This is especially the case if working on a filesystem from an external media like an external hard disk drive. Operating on multiple media can also result in different filesystem-encodings."

This issue is not specific to Python. If you mount an USB key formated in VFAT with the wrong encoding on Linux, you will get mojibake in your file explorer. Same issue if you connect a network share (ex: NFS) using a different encoding than the server. You can find many other examples (hint: Mac OS X and Unicode normalization).

There is no good compromise here. The only two safe options are:

(A) convert filenames of your filesystem to the same encoding than your computer (there are tools for that, like convmv)

(B) use raw bytes instead of Unicode, Python 3 should accept bytes anywhere that OS data is expected (filenames, command line arguments, environment variables)

All operating systems (except Windows) are now using UTF-8 by default for the locale encoding. So slowly, mojibake issues on filename should become very rare.

"It would be useful if the user can make his own checks and change the default filesystem-encoding if needed."

This idea was already proposed in issue bpo-8622, but it was a big fail. Please read my following email for more information:
https://mail.python.org/pipermail/python-dev/2010-October/104509.html

@vstinner vstinner reopened this Dec 2, 2013
@Sworddragon
Copy link
Mannequin Author

Sworddragon mannequin commented Dec 2, 2013

This idea was already proposed in issue bpo-8622, but it was a big fail.

Not completely: If your locale is utf-8 and you want to operate on an utf-8 filesystem all is fine. But what if you want then to operate on a ntfs (non-utf-8) partition? As I know there is no way to apply Python-environment variables on the fly with an effect to the interpreter. In my opinion this is the reason why a setter is needed here.

Otherwise the user has to go sure to use .encode() on all filesystem operations. Also he must ensure that .encode() doesn't throw any exception if the code must be robust. And with issue http://bugs.python.org/issue19846 this must likely be done with the content too. This will be really a hell in increasing the number of lines due to exception checking.

Is there a special reason that is against such a setter? The current advantage would be a huge increasing in maintainability of Python scripts who are relying on a high stability.

@vstinner
Copy link
Member

See also the issue bpo-19846.

@vstinner
Copy link
Member

I'm closing this issue as invalid for the same reason than I closed the issue bpo-19846:
http://bugs.python.org/issue19846#msg205675

@vstinner
Copy link
Member

I created the issue bpo-19977 as a follow up of this one: "Use surrogateescape error handler for sys.stdout on UNIX for the C locale".

@vstinner
Copy link
Member

Follow-up: the PEP-538 (bpo-28180) and PEP-540 (bpo-29240) have been accepted and implemented in Python 3.7. Python 3.7 will now use UTF-8 by default for the POSIX locale, and the encoding can be forced to UTF-8 using -X utf8 option.

@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Documentation in the Doc dir type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

1 participant