Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a new 'surrogatereplace' output only error handler #66215

Closed
ncoghlan opened this issue Jul 20, 2014 · 3 comments
Closed

Add a new 'surrogatereplace' output only error handler #66215

ncoghlan opened this issue Jul 20, 2014 · 3 comments
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@ncoghlan
Copy link
Contributor

BPO 22016
Nosy @ncoghlan, @vstinner

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2014-08-21.12:35:21.094>
created_at = <Date 2014-07-20.11:19:14.619>
labels = ['interpreter-core', 'type-feature', 'library']
title = "Add a new 'surrogatereplace' output only error handler"
updated_at = <Date 2017-12-18.14:36:37.556>
user = 'https://github.com/ncoghlan'

bugs.python.org fields:

activity = <Date 2017-12-18.14:36:37.556>
actor = 'vstinner'
assignee = 'none'
closed = True
closed_date = <Date 2014-08-21.12:35:21.094>
closer = 'ncoghlan'
components = ['Interpreter Core', 'Library (Lib)']
creation = <Date 2014-07-20.11:19:14.619>
creator = 'ncoghlan'
dependencies = []
files = []
hgrepos = []
issue_num = 22016
keywords = []
message_count = 3.0
messages = ['223508', '225607', '308565']
nosy_count = 2.0
nosy_names = ['ncoghlan', 'vstinner']
pr_nums = []
priority = 'normal'
resolution = 'rejected'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue22016'
versions = ['Python 3.5']

@ncoghlan
Copy link
Contributor Author

This would be along the same lines as xmlcharrefreplace and backslashreplace, but only affect surrogate escaped characters.

Unlike surrogate escape, which reproduces the escaped characters directly in the data stream, this would follow the 'replace' error handler and insert an appropriately encoded '?' character in the output stream.

The use case would be any context where losing the escaped characters is preferred to either potentially injecting arbitrary binary data into the output (surrogateescape), failing with an exception (strict), or any of the other existing codecs.

It would differ from 'replace' in that normal code points that can't be encoded at all would still trigger an error.

@ncoghlan ncoghlan added interpreter-core (Objects, Python, Grammar, and Parser dirs) stdlib Python modules in the Lib dir labels Jul 20, 2014
@ncoghlan
Copy link
Contributor Author

Stephen Turnbull suggested on python-dev that this was a bad idea, and after reconsidering the current behaviour in Python 2, I realised that setting surrogateescape and letting the terminal deal with the consequences is exactly what we want.

What confused me is that ls replaces the unknown characters with question marks in the C locale:

$ ls
ニコラス.txt
$ LANG=C ls
????????????.txt

Python 2 passes the bytes through, regardless of locale:

$ python -c "import os; print(os.listdir('.')[0])"
ニコラス.txt
$ LANG=C python -c "import os; print(os.listdir('.')[0])"
ニコラス.txt

Current Python 3 gets confused if the C locale is set, as the encoding on sys.stdout gets set to "ascii", which breaks roundtripping:

$ python3 -c "import os; print(os.listdir('.')[0])"
ニコラス.txt                                   
$ LANG=C python3 -c "import os; print(os.listdir('.')[0])"
Traceback (most recent call last):
  File "<string>", line 1, in <module>           
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-11: ordinal not in range(128)

However, Python 3.5 will already set "surrogateescape" on sys.stdout by default, reproducing the behaviour of *Python 2*, rather than the behaviour of ls:
$ LANG=C ~/devel/py3k/python -c "import os; print(os.listdir('.')[0])"
ニコラス.txt

@ncoghlan ncoghlan added the type-feature A feature request or enhancement label Aug 21, 2014
@vstinner
Copy link
Member

Follow-up: the PEP-538 (bpo-28180) and PEP-540 (bpo-29240) have been accepted and implemented in Python 3.7!

@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) stdlib Python modules in the Lib dir type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

2 participants