classification
Title: re doesn't work with big charsets
Type: behavior Stage: resolved
Components: Library (Lib), Regular Expressions Versions: Python 3.4, Python 3.3, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: ezio.melotti, haypo, mrabarnett, python-dev, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2013-10-21 11:24 by serhiy.storchaka, last changed 2013-10-24 19:25 by serhiy.storchaka. This issue is now closed.

Files
File name Uploaded Description Edit
re_bigcharset.patch serhiy.storchaka, 2013-10-21 11:24 review
Messages (4)
msg200747 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-10-21 11:24
>>> import re
>>> re.compile('[%s]' % ''.join(map(chr, range(256, 2**16, 255))))
Traceback (most recent call last):
  File "/home/serhiy/py/cpython/Lib/sre_compile.py", line 211, in _optimize_charset
    charmap[fixup(av)] = 1
IndexError: list assignment index out of range

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/serhiy/py/cpython/Lib/re.py", line 213, in compile
    return _compile(pattern, flags)
  File "/home/serhiy/py/cpython/Lib/re.py", line 280, in _compile
    p = sre_compile.compile(pattern, flags)
  File "/home/serhiy/py/cpython/Lib/sre_compile.py", line 489, in compile
    code = _code(p, flags)
  File "/home/serhiy/py/cpython/Lib/sre_compile.py", line 471, in _code
    _compile_info(code, p, flags)
  File "/home/serhiy/py/cpython/Lib/sre_compile.py", line 459, in _compile_info
    _compile_charset(charset, flags, code)
  File "/home/serhiy/py/cpython/Lib/sre_compile.py", line 177, in _compile_charset
    for op, av in _optimize_charset(charset, fixup):
  File "/home/serhiy/py/cpython/Lib/sre_compile.py", line 220, in _optimize_charset
    return _optimize_unicode(charset, fixup)
  File "/home/serhiy/py/cpython/Lib/sre_compile.py", line 342, in _optimize_unicode
    mapping = array.array('b', mapping).tobytes()
OverflowError: signed char is greater than maximum
msg200748 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2013-10-21 11:30
@Serhiy: Could you please take a look at issue #13100?
msg200751 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-10-21 11:40
I have encountered this bug when writing test for for fragment of my large patch which cleanups and optimize the re module (it is too large to be committed all at once).
msg201163 - (view) Author: Roundup Robot (python-dev) Date: 2013-10-24 19:05
New changeset d2bb0da45c93 by Serhiy Storchaka in branch '2.7':
Issue #19327: Fixed the working of regular expressions with too big charset.
http://hg.python.org/cpython/rev/d2bb0da45c93

New changeset 4431fa917f22 by Serhiy Storchaka in branch '3.3':
Issue #19327: Fixed the working of regular expressions with too big charset.
http://hg.python.org/cpython/rev/4431fa917f22

New changeset 10081a0ca4bd by Serhiy Storchaka in branch 'default':
Issue #19327: Fixed the working of regular expressions with too big charset.
http://hg.python.org/cpython/rev/10081a0ca4bd
History
Date User Action Args
2013-10-24 19:25:42serhiy.storchakasetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2013-10-24 19:05:08python-devsetnosy: + python-dev
messages: + msg201163
2013-10-21 12:01:44serhiy.storchakalinkissue19329 dependencies
2013-10-21 11:40:58serhiy.storchakasetmessages: + msg200751
2013-10-21 11:30:01hayposetmessages: + msg200748
2013-10-21 11:27:30hayposetnosy: + haypo
2013-10-21 11:24:02serhiy.storchakacreate