This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Comment _PyUnicode_FromId
Type: Stage:
Components: Unicode Versions: Python 3.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Jim.Jewett, ezio.melotti, loewis, vstinner
Priority: normal Keywords:

Created on 2012-02-06 21:07 by Jim.Jewett, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Messages (6)
msg152775 - (view) Author: Jim Jewett (Jim.Jewett) * (Python triager) Date: 2012-02-06 21:07
Add a comment explaining why _PyUnicode_FromId can (and should) assume ASCII-only identifiers.


	/* PEP3131 guarantees that all python-internal identifiers
	   are ASCII-only.  Violating this would break some supported
	   C compilers. */

See http://mail.python.org/pipermail/python-dev/2012-February/116234.html
msg152778 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2012-02-06 21:25
This has nothing to do with PEP 3131. Python could (and does) support non-ASCII identifiers just fine, regardless of C compiler limitations.
msg152791 - (view) Author: Jim Jewett (Jim.Jewett) * (Python triager) Date: 2012-02-06 22:08
On Mon, Feb 6, 2012 at 4:25 PM, Martin v. Löwis <report@bugs.python.org> wrote:

> Martin v. Löwis <martin@v.loewis.de> added the comment:

> This has nothing to do with PEP 3131. Python could (and does)
> support non-ASCII identifiers just fine, regardless of C compiler
> limitations.

I *think* you're saying that the _Py_Identifier( ) is a smaller set
than identifiers in general.  Would the following be more accurate?

        /* PEP3131 does allow non-ASCII identifiers in user code, but
		   limits their use within the implementation itself.
		   In particular, a _Py_Identifier may be passed directly to
		   C code; such identifiers are restricted to ASCII to avoid
		   breaking some supported C compilers. */
msg152792 - (view) Author: Jim Jewett (Jim.Jewett) * (Python triager) Date: 2012-02-06 22:10
And is there a way to characterize the compilers that would break?  Is
it a few specific compilers, or "compilers that do not implement UTF8,
which is not required by the C standard", or ...
msg152793 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2012-02-06 22:13
Using _Py_static_string(), you can write literal UTF-8 strings using hexadecimal escape sequences. It works on any C compiler. E.g. _Py_static_string(ecute, "\xc3\xa9").
msg152954 - (view) Author: Jim Jewett (Jim.Jewett) * (Python triager) Date: 2012-02-09 16:04
After clarification, the original change was backed out.

These are C Identifiers, and nothing beyond ASCII is guaranteed, but other characters are in practice possible.
History
Date User Action Args
2022-04-11 14:57:26adminsetgithub: 58166
2012-02-09 16:04:29Jim.Jewettsetstatus: open -> closed
resolution: fixed
messages: + msg152954
2012-02-06 22:13:11vstinnersetmessages: + msg152793
2012-02-06 22:10:35Jim.Jewettsetmessages: + msg152792
2012-02-06 22:08:42Jim.Jewettsetmessages: + msg152791
2012-02-06 21:28:20pitrousetnosy: + vstinner
2012-02-06 21:25:06loewissetnosy: + loewis
messages: + msg152778
2012-02-06 21:07:59Jim.Jewettcreate