This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author serhiy.storchaka
Recipients ezio.melotti, mrabarnett, pitrou, serhiy.storchaka
Date 2014-09-17.16:09:47
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1410970188.53.0.105353417029.issue22434@psf.upfronthosting.co.za>
In-reply-to
Content
Regular expression parser parses a pattern to a tree, marking nodes by string identifiers. Regular expression compiler converts this three into plain list of integers. Node's identifiers are transformed to sequential integers. Resulting list is not human readable. Proposed patch converts string constants in the sre_constants module to named integer constants. These constants doesn't need converting to integers, because they are already integers, and when printed they looks human-friendly. Now intermediate result of regular expression compiler looks much more readable.

Example.

>>> import re, sre_compile, sre_parse
>>> sre_compile._code(sre_parse.parse('[a-z_][a-z_0-9]+', re.I), re.I)

Before patch:

[17, 4, 0, 2, 2147483647, 16, 7, 27, 97, 122, 19, 95, 0, 29, 16, 1, 2147483647, 16, 11, 10, 0, 67043328, 2147483648, 134217726, 0, 0, 0, 0, 0, 1, 1]

After patch:

[INFO, 4, 0, 2, MAXREPEAT, IN_IGNORE, 7, RANGE, 97, 122, LITERAL, 95, FAILURE, REPEAT_ONE, 16, 1, MAXREPEAT, IN_IGNORE, 11, CHARSET, 0, 67043328, 2147483648, 134217726, 0, 0, 0, 0, FAILURE, SUCCESS, SUCCESS]

This patch also affects debugging output when regular expression is compiled with re.DEBUG (identifiers are uppercased and MAXREPEAT is displayed instead of 2147483647 in repeat statements).

Besides debugging output these changes are invisible for ordinal user. They are needed only for developing and debugging the re module itself. The patch doesn't affect performance and almost not affects memory consumption.
History
Date User Action Args
2014-09-17 16:09:48serhiy.storchakasetrecipients: + serhiy.storchaka, pitrou, ezio.melotti, mrabarnett
2014-09-17 16:09:48serhiy.storchakasetmessageid: <1410970188.53.0.105353417029.issue22434@psf.upfronthosting.co.za>
2014-09-17 16:09:48serhiy.storchakalinkissue22434 messages
2014-09-17 16:09:48serhiy.storchakacreate