Issue 18614: Enhanced \N{} escapes for Unicode strings

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/62814

classification

Title:	Enhanced \N{} escapes for Unicode strings
Type:	enhancement	Stage:
Components:	Unicode	Versions:	Python 3.4

process

Status:	open	Resolution:
Dependencies:		Superseder:
Assigned To:		Nosy List:	ezio.melotti, mrabarnett, steven.daprano, terry.reedy
Priority:	normal	Keywords:	patch

Created on 2013-08-01 13:54 by steven.daprano, last changed 2022-04-11 14:57 by admin.

Files
File name	Uploaded	Description	Edit
issue18614.patch	mrabarnett, 2013-08-01 16:46

Messages (3)
msg194075 - (view)	Author: Steven D'Aprano (steven.daprano) *	Date: 2013-08-01 13:54
As per the discussion here: http://mail.python.org/pipermail/python-ideas/2013-July/022419.html \N{} escapes should support the Unicode code point notation U+xxxx (where there are four, five or six hex digits after the U+). E.g. '\N{U+03BB}' => 'λ' unicodedata.lookup should also support such numeric names, e.g.: unicodedata.lookup('U+03BB') => 'λ' As '+' is otherwise prohibited in Unicode character names, there should never be ambiguity between 'U+xxxx' as a code point and an actual name, and a single lookup function can handle both. (See http://www.unicode.org/versions/Unicode6.2.0/ch04.pdf#G39 for details on characters allowed in names.) Also add a function for the reverse unicodedata.codepoint('λ') => 'U+03BB' def codepoint(c): return 'U+{:04X}'.format(ord(c))
msg194087 - (view)	Author: Matthew Barnett (mrabarnett) *	Date: 2013-08-01 16:46
I've attached a patch for this.
msg194123 - (view)	Author: Terry J. Reedy (terry.reedy) *	Date: 2013-08-01 22:04
I agree with the proposal. Some of the code seems redundant with code we already have. In Python, I would write def codepoint_from_U_notation(name, namelen): if not (4 <= namelen <= 6): raise <wrong length> return chr(int(name, 16)) maybe with try-except to re-write error messages like ValueError: invalid literal for int() with base 16: '99x3' ValueError: chr() arg not in range(0x110000) My point is that we already have code to convert hex strings to int; I presume PyUnicode_FromOrdinal(code) is the C version of 'chr' that already checks the max value.

History
Date	User	Action	Args
2022-04-11 14:57:48	admin	set	github: 62814
2013-08-01 22:04:55	terry.reedy	set	nosy: + terry.reedy messages: + msg194123
2013-08-01 16:46:11	mrabarnett	set	files: + issue18614.patch nosy: + mrabarnett messages: + msg194087 keywords: + patch
2013-08-01 13:54:05	steven.daprano	create