This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Regex with set of characters and groups raises error
Type: Stage: resolved
Components: Regular Expressions Versions: Python 3.3
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: Isis.Binder, ezio.melotti, mrabarnett, serhiy.storchaka
Priority: normal Keywords:

Created on 2013-10-26 12:53 by Isis.Binder, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
traceback.txt Isis.Binder, 2013-10-26 12:53
Messages (3)
msg201351 - (view) Author: Isis Binder (Isis.Binder) Date: 2013-10-26 12:53
I was working on some SPOJ exercises when the regex module hit me with an error related to '*' being used inside the character set operator.

I looked in the module docs but it says: Special characters lose their special meaning inside sets. For example, [(+*)] will match any of the literal characters '(', '+', '*', or ')'.

Traceback attached.

Offending code (inside IDLE):
import re
a = '73479*5152'
re.match(r'(\d+)([+-*])(\d+)', a).groups()

NOTE: if I write r'(\d+)([*])(\d+)', r'(\d+)([*+-])(\d+)' or r'(\d+)([+*-])(\d+)' it works. Shouldn't it simply work as described in the docs or should the docs be updated with an entry about proper character ordering in the character class?
msg201354 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-10-26 13:30
From re documentation:

"""Ranges of characters can be indicated by giving two characters and separating them by a '-', for example [a-z] will match any lowercase ASCII letter, [0-5][0-9] will match all the two-digits numbers from 00 to 59, and [0-9A-Fa-f] will match any hexadecimal digit. If - is escaped (e.g. [a\-z]) or if it’s placed as the first or last character (e.g. [a-]), it will match a literal '-'."""

A Python exception is not a crash. A crash is a Segmentation Fault (*nix) or 'Your program stopped unexpectedly' (Windows).
msg201374 - (view) Author: Matthew Barnett (mrabarnett) * (Python triager) Date: 2013-10-26 17:06
The traceback says "bad character range" because ord('+') == 43 and ord('*') == 42. It's not surprising that it complains if the range isn't valid.
History
Date User Action Args
2022-04-11 14:57:52adminsetgithub: 63607
2013-10-26 17:09:42serhiy.storchakasetstatus: open -> closed
stage: resolved
2013-10-26 17:06:39mrabarnettsetmessages: + msg201374
2013-10-26 13:30:47serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg201354
resolution: not a bug

type: crash ->
2013-10-26 12:53:41Isis.Bindercreate