Title: parser module chokes on unusual characters
Components: Library (Lib) Versions: Python 3.0
Status: closed Resolution: out of date
Created on 2008-03-12 18:33 by dbinger, last changed 2022-04-11 14:56 by admin. This issue is now closed.

parsermodule.patch dbinger, 2008-03-12 18:33 patch with unit test and proposed change
msg63482 - (view) Author: David Binger (dbinger) Date: 2008-03-12 18:33
This is with the current revision of py3k: 61353.

parser.suite('"\u1234"') fails with a TypeError.

Changing the argument format from "s" to "s#" works around this problem.

I added a unit test for this.  After fixing the "s#", another
bug is exposed by the same test: a string literal containing
\u1234 is mangled by sequence2st().

The last section of the patch seems to correct the second bug.

(I think getarg.c's handling of "s" has a problem
handling a unicode string containing a character whose 
encoding is not 1 byte.  It has a test for null bytes
at the end that does not work correctly.)
msg69564 - (view) Author: Kuba Fast (kfast) Date: 2008-07-11 21:35
I get no problem in 3.0b1. Should this be closed?

>>> parser.suite('"\u1234"')
< object at 0xb7ceebd0>
msg69578 - (view) Author: David Binger (dbinger) Date: 2008-07-12 03:07
On Jul 11, 2008, at 5:35 PM, Kuba Fast wrote:

> I get no problem in 3.0b1. Should this be closed?

I think so.  It looks like this has been fixed.

