This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Add the "namereplace" error handler
Type: enhancement Stage: resolved
Components: Unicode Versions: Python 3.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: amaury.forgeotdarc, ethan.furman, ezio.melotti, lemburg, ncoghlan, ned.deily, python-dev, serhiy.storchaka, steven.daprano, vstinner
Priority: normal Keywords: needs review, patch

Created on 2013-11-21 07:41 by serhiy.storchaka, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
namereplace_errors.patch serhiy.storchaka, 2013-11-21 07:41 review
Messages (13)
msg203579 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-11-21 07:41
The proposed patch adds the "namereplace" error handler. This error handler is almost same as the "backslashreplace" error handler, but use \N{...} escape sequences if there is a character name in Unicode database. Result is a little more human-readable (but less portable) than with "backslashreplace".

>>> '∀ x∈ℜ'.encode('ascii', 'namereplace')
b'\\N{FOR ALL} x\\N{ELEMENT OF}\\N{BLACK-LETTER CAPITAL R}'

The proposition was discussed and bikeshedded on Python-Ideas: http://comments.gmane.org/gmane.comp.python.ideas/21296 .
msg203580 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-11-21 07:56
See also issue #18234.
msg231647 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-11-25 09:30
Ping.
msg231649 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2014-11-25 10:34
The patch looks good to me.
But it seems that the reverse operation is not possible in the general case: .decode('unicode_escape') assumes a latin-1 or ascii encoding.
Should we document this?
msg231650 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2014-11-25 10:54
The patch looks good.

One nit: the name buffer length should be NAME_MAXLEN instead of 100.
msg231652 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2014-11-25 10:58
Patch looks good to me, too.

As far as Amaury's question goes, isn't the general reverse operation the same as for the existing backslashreplace handler?

That is, decode with the appropriate ASCII compatible encoding (since ASCII compatibility is needed for the escape sequences to be valid), then run the result through ast.literal_eval?

(I'll grant we don't currently provide guidance on reversing backslashreplace either, but addressing that sounds like a separate question from this change)
msg231653 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2014-11-25 12:13
New changeset 32d08aacffe0 by Serhiy Storchaka in branch 'default':
Issue #19676: Added the "namereplace" error handler.
https://hg.python.org/cpython/rev/32d08aacffe0
msg231654 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-11-25 12:29
Thank you all for reviews.

> One nit: the name buffer length should be NAME_MAXLEN instead of 100.

NAME_MAXLEN is private name available only in Modules/unicodedata.c. Making it public name would be other issue. I have increased buffer size to 256.
msg231672 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2014-11-25 16:59
New changeset b6fab008d63a by Berker Peksag in branch 'default':
Issue #19676: Tweak documentation a bit.
https://hg.python.org/cpython/rev/b6fab008d63a
msg231700 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2014-11-26 10:14
New changeset 21d1571c0533 by Serhiy Storchaka in branch 'default':
Issue #19676: Fixed integer overflow issue in "namereplace" error handler.
https://hg.python.org/cpython/rev/21d1571c0533
msg231701 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-11-26 10:14
Thank you Berker.
msg231727 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2014-11-26 20:08
../../source/Python/codecs.c:1022:16: error: use of undeclared identifier 'out'; did you
      mean 'outp'?
        assert(out == start + ressize);
               ^~~
               outp
msg231728 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2014-11-26 20:27
Fixed in ce8a8531d29a
History
Date User Action Args
2022-04-11 14:57:53adminsetgithub: 63875
2014-11-26 20:27:44ned.deilysetstatus: open -> closed
nosy: lemburg, amaury.forgeotdarc, ncoghlan, vstinner, ned.deily, ezio.melotti, steven.daprano, ethan.furman, python-dev, serhiy.storchaka
messages: + msg231728
2014-11-26 20:08:52ned.deilysetstatus: closed -> open
nosy: + ned.deily
messages: + msg231727

2014-11-26 10:14:28serhiy.storchakasetmessages: + msg231701
2014-11-26 10:14:01python-devsetmessages: + msg231700
2014-11-25 16:59:24python-devsetmessages: + msg231672
2014-11-25 12:29:35serhiy.storchakasetstatus: open -> closed
resolution: fixed
messages: + msg231654

stage: patch review -> resolved
2014-11-25 12:13:46python-devsetnosy: + python-dev
messages: + msg231653
2014-11-25 10:58:27ncoghlansetmessages: + msg231652
2014-11-25 10:54:01lemburgsetmessages: + msg231650
2014-11-25 10:34:07amaury.forgeotdarcsetnosy: + amaury.forgeotdarc
messages: + msg231649
2014-11-25 09:30:55serhiy.storchakasetkeywords: + needs review
assignee: serhiy.storchaka
messages: + msg231647

versions: + Python 3.5, - Python 3.4
2013-11-23 14:46:13serhiy.storchakasetnosy: + lemburg, ncoghlan, steven.daprano, ethan.furman
2013-11-21 07:56:24vstinnersetmessages: + msg203580
2013-11-21 07:41:46serhiy.storchakacreate