Improve the repr for regular expression match objects #61289

rhettinger · 2013-01-30T23:53:02Z

BPO	17087
Nosy	@rhettinger, @ezio-melotti, @cjerdonek, @PCManticore, @serhiy-storchaka
Files	sre.patch sre_repr2.patch sre_repr3.patch sre_repr4.patch sre_repr5.patch sre_match_repr.patch sre_repr6.patch

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = 'https://github.com/serhiy-storchaka'
closed_at = <Date 2013-10-20.10:16:20.894>
created_at = <Date 2013-01-30.23:53:02.426>
labels = ['expert-regex', 'type-feature', 'library']
title = 'Improve the repr for regular expression match objects'
updated_at = <Date 2013-11-25.21:20:38.690>
user = 'https://github.com/rhettinger'

bugs.python.org fields:

activity = <Date 2013-11-25.21:20:38.690>
actor = 'python-dev'
assignee = 'serhiy.storchaka'
closed = True
closed_date = <Date 2013-10-20.10:16:20.894>
closer = 'serhiy.storchaka'
components = ['Library (Lib)', 'Regular Expressions']
creation = <Date 2013-01-30.23:53:02.426>
creator = 'rhettinger'
dependencies = []
files = ['31383', '31744', '31746', '32143', '32144', '32159', '32216']
hgrepos = []
issue_num = 17087
keywords = ['patch']
message_count = 24.0
messages = ['180999', '181001', '181002', '181004', '181043', '195687', '197546', '197579', '197609', '197610', '197618', '200042', '200053', '200055', '200059', '200065', '200111', '200157', '200356', '200378', '200549', '200558', '200559', '204412']
nosy_count = 7.0
nosy_names = ['rhettinger', 'ezio.melotti', 'mrabarnett', 'chris.jerdonek', 'Claudiu.Popa', 'python-dev', 'serhiy.storchaka']
pr_nums = []
priority = 'normal'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue17087'
versions = ['Python 3.4']

rhettinger · 2013-01-30T23:53:02Z

Experience teaching Python has shown that people have a hard time learning to work with match objects. A contributing cause is the opaque repr:

    >>> import re
    >>> s = 'On 3/14/2013, Python celebrate Pi day.'
    >>> mo = re.search(r'\d+/\d+/\d+', s)
    >>> mo
    <_sre.SRE_Match object at 0x100456100>

They could explore the match object with dir() and help() and the matchobject methods and attributes:

    >>> dir(mo)
    ['__class__', '__copy__', '__deepcopy__', ...
     'end', 'endpos', 'expand', 'group', ... ]
     
    >>> mo.start()
    3
    >>> mo.end()
    12
    >>> mo.group(0)
    '3/14/2013'

However, this gets old when experimenting with alternative regular expressions. A better solution is to improve the repr:

    >>> re.search(r'\d+/\d+/\d+', s)
    <SRE Match object: start=3, stop=12, group(0)='3/14/2013'>

This would make the regular expression module much easier to work with.

ezio-melotti · 2013-01-31T00:07:35Z

Showing start and stop would be OK, but there might be many groups and they might contain lot of text, so they can't simply be included in the repr as they are.
FWIW there was another issue about changing _sre.SRE_Match to something better, but I can't find it right now.

cjerdonek · 2013-01-31T00:59:44Z

Is this a duplicate of bpo-13592?

rhettinger · 2013-01-31T02:05:11Z

Just showing group(0) should be helpful. And perhaps the number of groups. If a string is really long, we can truncate it like reprlib does.

The main goal is to make it easier to work with match objects at the interactive prompt. They are currently too opaque.

ezio-melotti · 2013-01-31T20:57:33Z

bpo-13592 is indeed the issue I was thinking about, but apparently that's about _sre.SRE_Pattern, so it's not the same thing.

Just showing group(0) should be helpful.

Often the interesting group is group(1), so showing only group(0) seems a bit arbitrary.

And perhaps the number of groups.

If we show only group(0), this might be useful as an indication that there are(n't) other groups.

If a string is really long, we can truncate it like reprlib does.

That's certainly an option.

FWIW I don't usually care about the start/end, and, if included, these values could be included as span=(3,12).

PCManticore · 2013-08-20T13:23:34Z

Here's my patch attempt. The repr of a match object has the following format:
(groups=\d+, span=(start, end), group0=the entire group or the first X characters, where X is represented by a new constant in sre_constants.h, SRE_MATCH_REPR_SIZE).

serhiy-storchaka · 2013-09-13T04:16:39Z

What about such output?

    >>> re.search('p((a)|(b))(c)?', 'unpack')
    <SRE Match object: [2: 5]: 'p'(('a')())('c')>

Or may be ('p', [['a'], []], ['c']) if you prefer legal Python expression.

PCManticore · 2013-09-13T14:15:37Z

Serhiy, at the first glance, that repr doesn't make sense to me, thus it seems a little difficult to comprehend.

serhiy-storchaka · 2013-09-13T16:23:15Z

Well, then first will commit a simpler patch. I left comments on Rietveld.

PCManticore · 2013-09-13T16:53:36Z

Here's the new version. I added a few replies on the Rietveld.

PCManticore · 2013-09-13T17:33:25Z

Added the new version.

PCManticore · 2013-10-16T09:02:33Z

Serhiy, are there any left issues with my latest patch? It would be nice if we could get this into 3.4.

PCManticore · 2013-10-16T11:28:34Z

Added the new patch, which addresses Serhiy's comments.
Also, this approach fails when bytes are involved:

>>> import re
>>> re.search(b"a", b"a")
Assertion failed: (PyUnicode_Check(op)), function _PyUnicode_CheckConsistency, file Objects/unicodeobject.c, line 309.

Should a check be added for this also?

serhiy-storchaka · 2013-10-16T12:26:46Z

Use correct first argument to getslice().

PCManticore · 2013-10-16T14:26:12Z

Latest patch attached.

serhiy-storchaka · 2013-10-16T17:45:56Z

It is too complicated (and perhaps erroneous). Why not use just self->pattern->logical_charsize?

PCManticore · 2013-10-17T06:19:16Z

I could use self->pattern->logical_size, but it seems that I still need the call to getstring for bytes & co, to obtain the view to the underlying buffer (otherwise the group0 part from the repr will contain random bytes). I didn't find a simpler way to achieve this.

serhiy-storchaka · 2013-10-17T19:57:36Z

Well. Here is a patch. I have changed repr() a little. repr() now contains match type qualified name (_sre.SRE_Match). "groups" now equals len(m.groups()). "span" representation now contains a comma (as repr(m.span())).

Raymond, Ezio, is it good to you?

ezio-melotti · 2013-10-19T01:40:32Z

I discussed this briefly with Serhiy on IRC and I think the repr can be improved.
Currently it looks like:
>>> re.compile(r'[/\\]([.]svn)').match('/.svn')
<_sre.SRE_Match object: groups=1, span=(0, 5), group0='/.svn'>

One problem is that the group count doesn't include group 0, so from the example repr one would expect that the info are about the 1 (and only) group in "groups=", whereas that is actually group 0 and there's an additional group 1 that is not included in the repr.

A possible solution is to separate the group count from the info about group 0:
<_sre.SRE_Match object (1 group); group0='/.svn', span=(0, 5)>

To make things even less confusing we could avoid calling it group0 and use something like "match=", or alternatively remove the group count (doesn't the count depend only on the regex, and not on the string?).

PCManticore · 2013-10-19T06:57:08Z

Added patch based on Serhiy's, which addresses your comments. It drops the group count and renames group0 to match.

serhiy-storchaka · 2013-10-20T07:33:33Z

LGTM (except unrelated empty line at the end of Modules/_sre.c).

python-dev · 2013-10-20T10:13:53Z

New changeset 29764a7bd6ba by Serhiy Storchaka in branch 'default':
Issue bpo-17087: Improved the repr for regular expression match objects.
http://hg.python.org/cpython/rev/29764a7bd6ba

serhiy-storchaka · 2013-10-20T10:16:21Z

Thanks all participants for the discussion.

python-dev · 2013-11-25T21:20:39Z

New changeset 4ba7a29fe02c by Ezio Melotti in branch 'default':
bpo-13592, bpo-17087: add whatsnew entry about regex/match object repr improvements.
http://hg.python.org/cpython/rev/4ba7a29fe02c

rhettinger added stdlib Python modules in the Lib dir type-feature A feature request or enhancement labels Jan 30, 2013

ezio-melotti added the topic-regex label Jan 31, 2013

serhiy-storchaka self-assigned this Oct 17, 2013

serhiy-storchaka closed this as completed Oct 20, 2013

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve the repr for regular expression match objects #61289

Improve the repr for regular expression match objects #61289

rhettinger commented Jan 30, 2013

rhettinger commented Jan 30, 2013

ezio-melotti commented Jan 31, 2013

cjerdonek commented Jan 31, 2013

rhettinger commented Jan 31, 2013

ezio-melotti commented Jan 31, 2013

PCManticore mannequin commented Aug 20, 2013

serhiy-storchaka commented Sep 13, 2013

PCManticore mannequin commented Sep 13, 2013

serhiy-storchaka commented Sep 13, 2013

PCManticore mannequin commented Sep 13, 2013

PCManticore mannequin commented Sep 13, 2013

PCManticore mannequin commented Oct 16, 2013

PCManticore mannequin commented Oct 16, 2013

serhiy-storchaka commented Oct 16, 2013

PCManticore mannequin commented Oct 16, 2013

serhiy-storchaka commented Oct 16, 2013

PCManticore mannequin commented Oct 17, 2013

serhiy-storchaka commented Oct 17, 2013

ezio-melotti commented Oct 19, 2013

PCManticore mannequin commented Oct 19, 2013

serhiy-storchaka commented Oct 20, 2013

python-dev mannequin commented Oct 20, 2013

serhiy-storchaka commented Oct 20, 2013

python-dev mannequin commented Nov 25, 2013

Improve the repr for regular expression match objects #61289

Improve the repr for regular expression match objects #61289

Comments

rhettinger commented Jan 30, 2013

rhettinger commented Jan 30, 2013

ezio-melotti commented Jan 31, 2013

cjerdonek commented Jan 31, 2013

rhettinger commented Jan 31, 2013

ezio-melotti commented Jan 31, 2013

PCManticore mannequin commented Aug 20, 2013

serhiy-storchaka commented Sep 13, 2013

PCManticore mannequin commented Sep 13, 2013

serhiy-storchaka commented Sep 13, 2013

PCManticore mannequin commented Sep 13, 2013

PCManticore mannequin commented Sep 13, 2013

PCManticore mannequin commented Oct 16, 2013

PCManticore mannequin commented Oct 16, 2013

serhiy-storchaka commented Oct 16, 2013

PCManticore mannequin commented Oct 16, 2013

serhiy-storchaka commented Oct 16, 2013

PCManticore mannequin commented Oct 17, 2013

serhiy-storchaka commented Oct 17, 2013

ezio-melotti commented Oct 19, 2013

PCManticore mannequin commented Oct 19, 2013

serhiy-storchaka commented Oct 20, 2013

python-dev mannequin commented Oct 20, 2013

serhiy-storchaka commented Oct 20, 2013

python-dev mannequin commented Nov 25, 2013