Issue599377
Created on 2002-08-23 19:16 by dcjim, last changed 2003-06-14 15:10 by loewis.
| Messages (4) | |||
|---|---|---|---|
| msg12145 - (view) | Author: Jim Fulton (dcjim) | Date: 2002-08-23 19:16 | |
For Python 2.2.1 or the CVS head, as of this posting, with Python configured for 4-byte unicode (--enable-unicode=ucs4) searches against unicode regular expressions that use characters above \xff don't seem to work. Here's an example: invalid_xml_char = re.compile(u'[\ud800-\udfff]') invalid_xml_char.search(u'\ud800') returns None, rather than a match. |
|||
| msg12146 - (view) | Author: Peter Schneider-Kamp (nowonder) | Date: 2002-08-27 16:49 | |
Logged In: YES user_id=14463 I could reproduce this behaviour exactly. No idea what is causing it, though. |
|||
| msg12147 - (view) | Author: Martin v. Löwis (loewis) | Date: 2002-09-26 16:53 | |
Logged In: YES user_id=21627 Added a work-around in sre_compile 1.44 and 1.41.14.2: it disables big charsets for UCS-4 builds. I leave this report open, so that a proper fix can be designed. |
|||
| msg12148 - (view) | Author: Martin v. Löwis (loewis) | Date: 2003-06-14 15:10 | |
Logged In: YES user_id=21627 This is now fixed for Python 2.3, with _sre.c 2.89. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2002-08-23 19:16:04 | dcjim | create | |