Issue 526840: PEP 263 Implementation

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/36217

classification

Title:	PEP 263 Implementation
Type:		Stage:
Components:	Interpreter Core	Versions:	Python 2.3

process

Status:	closed	Resolution:	out of date
Dependencies:		Superseder:
Assigned To:	lemburg	Nosy List:	gvanrossum, lemburg, loewis
Priority:	high	Keywords:	patch

Created on 2002-03-07 08:55 by loewis, last changed 2022-04-10 16:05 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
codings.diff	loewis, 2002-03-21 10:25	Version 2

Messages (9)
msg39158 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2002-03-07 08:55
The attached patch implements PEP 263. The following differences to the PEP (rev. 1.8) are known: - The implementation interprets "ASCII compatible" as meaning "bytes below 128 always denote ASCII characters", although this property is only used for ",', and \. There have been other readings of "ASCII compatible", so this should probably be elaborated in the PEP. - The check whether all bytes follow the declared or system encoding (including comments and string literals) is only performed if the encoding is "ascii".
msg39159 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2002-03-07 09:11
Logged In: YES user_id=21627 A note on the implementation strategy: it turned out that communicating the encoding into the abstract syntax was the biggest challenge. To solve this, I introduced encoding_decl pseudo node: it is an unused non-terminal whose STR() is the encoding, and whose only child is the true root of the syntax tree. As such, it is the only non-terminal which has a STR value.
msg39160 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2002-03-07 11:06
Logged In: YES user_id=38388 Thank you ! I'll add a note to the PEP about the way the first two lines are processed (removing the ASCII mention...).
msg39161 - (view)	Author: Guido van Rossum (gvanrossum) *	Date: 2002-03-07 14:06
Logged In: YES user_id=6380 I've set the group to Python 2.3 so the priority has some context (I'd rather you move the priority down to 5 but I understand this is your personal priority). I haven't accepted the PEP yet (although I expect I will), so please don't check this in yet (if you feel it needs to be saved in CVS, use a branch).
msg39162 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2002-03-07 18:01
Logged In: YES user_id=38388 Ok, I've had a look at the patch. It looks good except for the overly complicated implementation of the unicode-escape codec. Even though there's a bit of code duplication, I'd prefer to have two separate functions here: one for the standard char* pointer type and another one for Py_UNICODE, ie. PyUnicode_DecodeUnicodeEscape(char...) and PyUnicode_DecodeUnicodeEscapeFromUnicode(Py_UNICODE*...) This is easier to support and gives better performance since the compiler can optimize the two functions making different assumptions. You'll also need to include a name mangling at the top of the header for the new API.
msg39163 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2002-03-07 18:24
Logged In: YES user_id=21627 Changing the decoding functions will not result in one additional function, but in two of them: you'll also get PyUnicode_DecodeRawUnicodeEscapeFromUnicode. That seems quite unmaintainable to me: any change now needs to propagate into four functions. OTOH, I don't think that the code that allows parsing a variable-sized strings is overly complicated.
msg39164 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2002-03-21 10:25
Logged In: YES user_id=21627 Version 2 of this patch implements revision 1.11 of the PEP (phase 1). The check of the complete source file for compliance with the declared encoding is implemented by decoding the input line-by-line; I believe that for all supported encodings, this is not different compared to decoding the entire source file at once.
msg39165 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2002-04-11 16:23
Logged In: YES user_id=38388 Apart from the codec changes, the patch looks ok. I would still like two APIs for the two different codec tasks, though. I don't expect anything much to change in the codecs, so maintenance is not an issue.
msg39166 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2002-08-04 16:59
Logged In: YES user_id=21627 This patch has been superceded by 534304.

History
Date	User	Action	Args
2022-04-10 16:05:04	admin	set	github: 36217
2002-03-07 08:55:40	loewis	create