Message 39338 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	loewis
Recipients
Date	2002-03-25.13:23:00
SpamBayes Score
Marked as misclassified
Message-id
In-reply-to

Content
Logged In: YES user_id=21627 The patch looks good, but needs a number of improvements. 1. I have problems building this code. When trying to build pgen, I get an error message of Parser/parsetok.c: In function `parsetok': Parser/parsetok.c:175: `encoding_decl' undeclared The problem here is that graminit.h hasn't been built yet, but parsetok refers to the symbol. 2. For some reason, error printing for incorrect encodings does not work - it appears that it prints the wrong line in the traceback. 3. The escape processing in Unicode literals is incorrect. For example, u"\<non-ascii character>" should denote only the non-ascii character. However, your implementation replaces the non-ASCII character with \u<hex>, resulting in \\u<hex>, so the first backslash unescapes the second one. 4. I believe the escape processing in byte strings is also incorrect for encodings that allow \ in the second byte. Before processing escape characters, you convert back into the source encoding. If this produces a backslash character, escape processing will misinterpret that byte as an escape character.

Logged In: YES 
user_id=21627

The patch looks good, but needs a number of improvements.

1. I have problems building this code. When trying to build
pgen, I get an error message of

Parser/parsetok.c: In function `parsetok':
Parser/parsetok.c:175: `encoding_decl' undeclared

The problem here is that graminit.h hasn't been built yet,
but parsetok refers to the symbol.

2. For some reason, error printing for incorrect encodings
does not work - it appears that it prints the wrong line in
the traceback.

3. The escape processing in Unicode literals is incorrect.
For example, u"\<non-ascii character>" should denote only
the non-ascii character. However, your implementation
replaces the non-ASCII character with \u<hex>, resulting in
\\u<hex>, so the first backslash unescapes the second one.

4. I believe the escape processing in byte strings is also
incorrect for encodings that allow \ in the second byte.
Before processing escape characters, you convert back into
the source encoding. If this produces a backslash character,
escape processing will misinterpret that byte as an escape
character.

History
Date	User	Action	Args
2007-08-23 15:11:47	admin	link	issue534304 messages
2007-08-23 15:11:47	admin	create