| File name |
Uploaded |
Description |
Edit |
|
re_unicode_escapes.diff
|
georg.brandl,
2008-08-24 20:33
|
|
|
|
3665.patch
|
ishimoto,
2010-07-11 05:09
|
|
|
|
re_unicode_escapes.diff
|
serhiy.storchaka,
2012-06-01 06:43
|
Regenerate georg.brandl's patch for review |
review
|
|
3665.patch
|
serhiy.storchaka,
2012-06-01 06:44
|
Regenerate ishimoto's patch for review |
review
|
|
re_unicode_escapes-2.patch
|
serhiy.storchaka,
2012-06-17 12:48
|
+ PEP 393, + cleanup, + tests |
review
|
|
re_unicode_escapes-3.patch
|
serhiy.storchaka,
2012-06-18 08:02
|
+ byte patterns, + tests, + docs |
review
|
|
msg71861 - (view) |
Author: Georg Brandl (georg.brandl) *  |
Date: 2008-08-24 20:33 |
Since \u and \U aren't interpolated in raw strings anymore, the re
module should support those escapes in addition to the \x and octal ones
it already does. Attached patch.
|
|
msg71864 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2008-08-24 20:49 |
- Check that it also works for chars > 0xFFFF (even in UCS2 builds, at
least when the chars are not part of [character range])
- What does happen with e.g. [\U00010000-\U00010001] on an UCS build?
|
|
msg71865 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2008-08-24 20:49 |
(in the last sentence, I meant UCS2. Sorry)
|
|
msg71868 - (view) |
Author: Georg Brandl (georg.brandl) *  |
Date: 2008-08-24 20:58 |
These concerns indeed must be handled: On narrow unicode builds, chars >
0xffff must be converted to surrogates. In ranges, they should raise an
error.
Additionally, this should at least raise an error too:
>>> re.compile("[\U00100000]").match("\U00100000").group()
'\udbc0'
|
|
msg109961 - (view) |
Author: Atsuo Ishimoto (ishimoto) * |
Date: 2010-07-11 05:09 |
Here's an updated patch for py3k branch.
As per Georg's comment, I added to check codepoint in the character
ranges, conversion to the surrogate pairs. I also added check to raise
exception if codepoint > 0x10ffff.
I with to English speakers to fix error messages in the patch.
|
|
msg138219 - (view) |
Author: Éric Araujo (merwok) *  |
Date: 2011-06-12 20:30 |
FYI,
+ raise error("bogus escape: %s" % repr(escape))
can be written simply as
+ raise error("bogus escape: %r" % escape)
|
|
msg162052 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2012-06-01 06:25 |
I don't think it is worth to target it for 2.7 and 3.2 (it's new feature, not bugfix), but for 3.3 it will be very useful.
Since PEP 393 conversion to the surrogate pairs is no longer relevant.
|
|
msg162830 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2012-06-14 21:23 |
Georg, Atsuo, how are you?
|
|
msg163065 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2012-06-17 12:48 |
Here is updated (in conforming with PEP 393) patch. In additional octal and hexadecimal escaping cleared, illegal error message for hexadecimal escaping fixed. Added new tests for octal and hexadecimal escaping.
|
|
msg163094 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2012-06-18 08:02 |
I forgot about byte patterns. Here is an updated patch.
|
|
msg163580 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2012-06-23 11:23 |
Any chance to commit the patch today and to get this feature in Python 3.3?
|
|
msg163584 - (view) |
Author: Roundup Robot (python-dev) |
Date: 2012-06-23 11:32 |
New changeset b1dbd8827e79 by Antoine Pitrou in branch 'default':
Issue #3665: \u and \U escapes are now supported in unicode regular expressions.
http://hg.python.org/cpython/rev/b1dbd8827e79
|
|
msg163585 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2012-06-23 11:33 |
> Any chance to commit the patch today and to get this feature in Python
> 3.3?
Thanks for reminding us! It's now in 3.3.
|
|
msg163590 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2012-06-23 11:48 |
Thank you for the quick response.
|
|
| Date |
User |
Action |
Args |
| 2012-06-23 11:48:13 | serhiy.storchaka | set | messages:
+ msg163590 |
| 2012-06-23 11:33:41 | pitrou | set | status: open -> closed resolution: fixed messages:
+ msg163585
stage: commit review -> resolved |
| 2012-06-23 11:32:54 | python-dev | set | nosy:
+ python-dev messages:
+ msg163584
|
| 2012-06-23 11:28:36 | pitrou | set | assignee: pitrou stage: patch review -> commit review |
| 2012-06-23 11:23:04 | serhiy.storchaka | set | messages:
+ msg163580 |
| 2012-06-18 08:02:24 | serhiy.storchaka | set | files:
+ re_unicode_escapes-3.patch
messages:
+ msg163094 |
| 2012-06-17 12:48:06 | serhiy.storchaka | set | files:
+ re_unicode_escapes-2.patch
messages:
+ msg163065 |
| 2012-06-14 21:23:59 | serhiy.storchaka | set | messages:
+ msg162830 |
| 2012-06-01 06:44:38 | serhiy.storchaka | set | files:
+ 3665.patch |
| 2012-06-01 06:43:52 | serhiy.storchaka | set | files:
+ re_unicode_escapes.diff |
| 2012-06-01 06:37:02 | serhiy.storchaka | set | files:
- 3665.patch |
| 2012-06-01 06:36:47 | serhiy.storchaka | set | files:
- re_unicode_escapes.diff |
| 2012-06-01 06:36:02 | serhiy.storchaka | set | files:
+ 3665.patch |
| 2012-06-01 06:35:08 | serhiy.storchaka | set | files:
+ re_unicode_escapes.diff |
| 2012-06-01 06:25:29 | serhiy.storchaka | set | versions:
- Python 2.7, Python 3.2 nosy:
+ serhiy.storchaka
messages:
+ msg162052
components:
+ Regular Expressions, Unicode type: behavior -> enhancement |
| 2011-11-29 06:16:10 | ezio.melotti | set | nosy:
+ mrabarnett
|
| 2011-07-21 05:14:12 | ezio.melotti | set | keywords:
+ needs review stage: patch review |
| 2011-06-12 20:30:55 | merwok | set | nosy:
+ merwok messages:
+ msg138219
|
| 2011-06-12 18:32:20 | terry.reedy | set | versions:
+ Python 3.2, Python 3.3, - Python 3.1 |
| 2010-08-04 14:38:30 | ezio.melotti | set | nosy:
+ ezio.melotti
|
| 2010-07-11 05:09:51 | ishimoto | set | files:
+ 3665.patch nosy:
+ ishimoto messages:
+ msg109961
|
| 2008-09-27 14:27:18 | timehorse | set | versions:
+ Python 3.1, Python 2.7, - Python 3.0 |
| 2008-09-27 14:20:42 | timehorse | set | nosy:
+ timehorse |
| 2008-08-24 20:58:27 | georg.brandl | set | messages:
+ msg71868 |
| 2008-08-24 20:49:33 | pitrou | set | messages:
+ msg71865 |
| 2008-08-24 20:49:11 | pitrou | set | nosy:
+ pitrou messages:
+ msg71864 |
| 2008-08-24 20:33:51 | georg.brandl | create | |