This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: In regex pattern long unicode character isn't recognized by repetition characters +, * and {}
Type: behavior Stage:
Components: Library (Lib), Regular Expressions Versions: Python 3.2
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: ezio.melotti, loewis, py.user, vstinner
Priority: normal Keywords:

Created on 2012-02-18 03:13 by py.user, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Messages (3)
msg153629 - (view) Author: py.user (py.user) * Date: 2012-02-18 03:13
>>> import re
>>> '\U00000061'
'a'
>>> '\U00100061'
'\U00100061'
>>> re.search('\U00100061', '\U00100061' * 10).group()
'\U00100061'
>>> re.search('\U00100061+', '\U00100061' * 10).group()
'\U00100061'
>>> re.search('(\U00100061)+', '\U00100061' * 10).group()
'\U00100061\U00100061\U00100061\U00100061\U00100061\U00100061\U00100061\U00100061\U00100061\U00100061'
>>> 
>>>
>>> re.search('\U00100061{3}', '\U00100061' * 10)
>>> re.search('(\U00100061){3}', '\U00100061' * 10).group()
'\U00100061\U00100061\U00100061'
>>>
msg153630 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2012-02-18 03:26
The re module doesn't support non-BMP characters in Python 3.2 compiled in narrow mode (sys.maxunicode==65535). This issue is already fixed in Python 3.3 which doesn't have narrow or wide mode anymore thanks to the PEP 393!
msg153631 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2012-02-18 03:53
As Victor says, this issue is fixed in Python 3.3.
History
Date User Action Args
2022-04-11 14:57:26adminsetgithub: 58253
2012-02-18 03:53:35loewissetstatus: open -> closed
resolution: fixed
messages: + msg153631
2012-02-18 03:26:19vstinnersetnosy: + loewis, vstinner
messages: + msg153630
2012-02-18 03:13:37py.usercreate