classification
Title: Add support of UnicodeTranslateError in standard error handlers
Type: enhancement Stage: patch review
Components: Interpreter Core Versions: Python 3.5
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: doerwalter, lemburg, martin.panter, ncoghlan, serhiy.storchaka, vstinner
Priority: normal Keywords: patch

Created on 2015-03-15 21:59 by serhiy.storchaka, last changed 2015-03-26 22:45 by serhiy.storchaka.

Files
File name Uploaded Description Edit
translate_error_handlers.patch serhiy.storchaka, 2015-03-15 21:59
translate_error_handlers_2.patch serhiy.storchaka, 2015-03-16 06:45 review
Messages (8)
msg238163 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-03-15 21:59
Proposed patch adds support of UnicodeTranslateError in standard error handlers "xmlcharrefreplace", "namereplace" and "surrogatepass". Support in "backslashreplace" was added in issue22286, support in "strict", "ignore" and "replace" was always, support in "surrogateescape" is unlikely possible.

This can be used with issue18814.
msg238180 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-03-16 06:45
Fixed a bug in "surrogatepass" with translating and added the versionchanged directive.
msg238973 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-03-23 02:34
I think I saw your patch for Issue 18814 proposes to use UnicodeTranslateError. Is there any other case where it is used, either currently or in the past? All I know of it is the documentation, which says it is raised “during translating”.

Experimenting with the constructor reveals that the “object” attribute is only allowed to be a text string (not bytes). So perhaps “translating” actually means converting from text strings to text strings, like “rot-13”. It would be nice if this were documented somewhere, rather than just saying translating is now supported.
msg239018 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-03-23 13:41
No, currently UnicodeTranslateError is not used in the stdlib in 3.x. But it is documented and supported by some error handlers. I think it should be wider used in text-to-text translations similar to proposed in issue18814.
msg239353 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-03-26 21:52
I'm sorry, I don't understand this issue. Could you please elaborate the use case? Why do you want to support more error handlers? str.translate() calls _PyUnicode_TranslateCharmap() with errors="ignore", it's not possible to choose the error handler.

Many codecs are implemented in Python and some of them are implemented with "charmap". Does this issue enhance the codecs implemented with "charmap"?

"a\udc80".encode("latin9", "surrogatepass") raises UnicodeEncodeError with and without the patch, b"\x81".decode("cp1252", "surrogatepass") raises UnicodeDecodeError with and without the patch.

Hum, I'm not sure that codecs.charmap_build() is related str.translate().
msg239355 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-03-26 22:20
str.encode, bytes.decode and str.translate are unrelated to UnicodeTranslateError. But str.transform could be.
msg239357 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-03-26 22:29
Serhiy Storchaka added the comment:
> str.encode, bytes.decode and str.translate are unrelated to UnicodeTranslateError. But str.transform could be.

Can you please give an example of Python code to show your change?
msg239358 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-03-26 22:45
issue18814
History
Date User Action Args
2015-03-26 22:45:41serhiy.storchakasetmessages: + msg239358
2015-03-26 22:29:08vstinnersetmessages: + msg239357
2015-03-26 22:20:59serhiy.storchakasetmessages: + msg239355
2015-03-26 21:52:05vstinnersetnosy: + vstinner
messages: + msg239353
2015-03-23 13:41:39serhiy.storchakasetassignee: serhiy.storchaka
messages: + msg239018
2015-03-23 02:34:53martin.pantersetmessages: + msg238973
2015-03-20 02:59:30martin.pantersetnosy: + martin.panter
2015-03-16 06:45:11serhiy.storchakasetfiles: + translate_error_handlers_2.patch

messages: + msg238180
2015-03-15 22:09:04serhiy.storchakalinkissue18814 dependencies
2015-03-15 21:59:08serhiy.storchakacreate