This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Title: In re's named group the name cannot contain unicode characters
Type: enhancement Stage: resolved
Components: Documentation, Regular Expressions Versions: Python 3.2, Python 3.3, Python 3.4
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: docs@python, ezio.melotti, mrabarnett, py.user, python-dev, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2012-04-01 08:16 by py.user, last changed 2022-04-11 14:57 by admin. This issue is now closed.

File name Uploaded Description Edit
re_unicode_identifiers.patch serhiy.storchaka, 2013-01-24 15:27 review
Messages (5)
msg157266 - (view) Author: py.user (py.user) * Date: 2012-04-01 08:16

    Similar to regular parentheses, but the substring matched by the group is accessible within the rest of the regular expression via the symbolic group name name. Group names must be valid Python identifiers, and each group name must be defined only once within a regular expression."

>>> chr(255)
>>> 'ÿ'.isidentifier()
>>> import re
>>>'(?P<ÿ>a)', 'abc')
Traceback (most recent call last):
  File "/usr/local/lib/python3.2/", line 176, in wrapper
    result = cache[key]
KeyError: (<class 'str'>, '(?P<ÿ>a)', 0)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.2/", line 158, in search
    return _compile(pattern, flags).search(string)
  File "/usr/local/lib/python3.2/", line 255, in _compile
    return _compile_typed(type(pattern), pattern, flags)
  File "/usr/local/lib/python3.2/", line 180, in wrapper
    result = user_function(*args, **kwds)
  File "/usr/local/lib/python3.2/", line 267, in _compile_typed
    return sre_compile.compile(pattern, flags)
  File "/usr/local/lib/python3.2/", line 491, in compile
    p = sre_parse.parse(p, flags)
  File "/usr/local/lib/python3.2/", line 692, in parse
    p = _parse_sub(source, pattern, 0)
  File "/usr/local/lib/python3.2/", line 315, in _parse_sub
    itemsappend(_parse(source, state))
  File "/usr/local/lib/python3.2/", line 552, in _parse
    raise error("bad character in group name")
sre_constants.error: bad character in group name

msg159574 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2012-04-29 04:54
There are two options here:

1. fix the doc;
2. fix the code;

Matthew, do you have any opinion on this?  Does this work on regex?
msg159616 - (view) Author: Matthew Barnett (mrabarnett) * (Python triager) Date: 2012-04-29 15:25
It doesn't work in regex, but it probably should. IMHO, if it's a valid identifier, then it should be allowed.
msg180530 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-01-24 15:27
Here is a patch which make re to use for groups the same rule as for Python 3 identifiers. In Python 2 the implementation confirms the documentation.
msg186900 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2013-04-14 09:40
New changeset 2fa27a3818a2 by Georg Brandl in branch '3.3':
Closes #14462: allow any valid Python identifier in sre group names, as documented.
Date User Action Args
2022-04-11 14:57:28adminsetgithub: 58667
2013-04-14 09:40:09python-devsetstatus: open -> closed

nosy: + python-dev
messages: + msg186900

resolution: fixed
stage: patch review -> resolved
2013-01-31 14:32:20serhiy.storchakasetassignee: docs@python -> serhiy.storchaka
2013-01-24 15:27:10serhiy.storchakasetfiles: + re_unicode_identifiers.patch

versions: + Python 3.4, - Python 2.7
keywords: + patch
nosy: + serhiy.storchaka

messages: + msg180530
stage: needs patch -> patch review
2012-04-29 15:25:18mrabarnettsetmessages: + msg159616
2012-04-29 04:54:16ezio.melottisettype: behavior -> enhancement
stage: needs patch
messages: + msg159574
versions: + Python 2.7, Python 3.3
2012-04-01 08:16:48py.usercreate