Issue 13958: Comment _PyUnicode_FromId

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/58166

classification

Title:	Comment _PyUnicode_FromId
Type:		Stage:
Components:	Unicode	Versions:	Python 3.3

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:		Nosy List:	Jim.Jewett, ezio.melotti, loewis, vstinner
Priority:	normal	Keywords:

Created on 2012-02-06 21:07 by Jim.Jewett, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Messages (6)
msg152775 - (view)	Author: Jim Jewett (Jim.Jewett) *	Date: 2012-02-06 21:07
Add a comment explaining why _PyUnicode_FromId can (and should) assume ASCII-only identifiers. /* PEP3131 guarantees that all python-internal identifiers are ASCII-only. Violating this would break some supported C compilers. */ See http://mail.python.org/pipermail/python-dev/2012-February/116234.html
msg152778 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2012-02-06 21:25
This has nothing to do with PEP 3131. Python could (and does) support non-ASCII identifiers just fine, regardless of C compiler limitations.
msg152791 - (view)	Author: Jim Jewett (Jim.Jewett) *	Date: 2012-02-06 22:08
On Mon, Feb 6, 2012 at 4:25 PM, Martin v. Löwis <report@bugs.python.org> wrote: > Martin v. Löwis <martin@v.loewis.de> added the comment: > This has nothing to do with PEP 3131. Python could (and does) > support non-ASCII identifiers just fine, regardless of C compiler > limitations. I think you're saying that the _Py_Identifier( ) is a smaller set than identifiers in general. Would the following be more accurate? /* PEP3131 does allow non-ASCII identifiers in user code, but limits their use within the implementation itself. In particular, a _Py_Identifier may be passed directly to C code; such identifiers are restricted to ASCII to avoid breaking some supported C compilers. */
msg152792 - (view)	Author: Jim Jewett (Jim.Jewett) *	Date: 2012-02-06 22:10
And is there a way to characterize the compilers that would break? Is it a few specific compilers, or "compilers that do not implement UTF8, which is not required by the C standard", or ...
msg152793 - (view)	Author: STINNER Victor (vstinner) *	Date: 2012-02-06 22:13
Using _Py_static_string(), you can write literal UTF-8 strings using hexadecimal escape sequences. It works on any C compiler. E.g. _Py_static_string(ecute, "\xc3\xa9").
msg152954 - (view)	Author: Jim Jewett (Jim.Jewett) *	Date: 2012-02-09 16:04
After clarification, the original change was backed out. These are C Identifiers, and nothing beyond ASCII is guaranteed, but other characters are in practice possible.

History
Date	User	Action	Args
2022-04-11 14:57:26	admin	set	github: 58166
2012-02-09 16:04:29	Jim.Jewett	set	status: open -> closed resolution: fixed messages: + msg152954
2012-02-06 22:13:11	vstinner	set	messages: + msg152793
2012-02-06 22:10:35	Jim.Jewett	set	messages: + msg152792
2012-02-06 22:08:42	Jim.Jewett	set	messages: + msg152791
2012-02-06 21:28:20	pitrou	set	nosy: + vstinner
2012-02-06 21:25:06	loewis	set	nosy: + loewis messages: + msg152778
2012-02-06 21:07:59	Jim.Jewett	create