Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

repr(regex) doesn't include actual regex #57801

Closed
dwt mannequin opened this issue Dec 13, 2011 · 29 comments
Closed

repr(regex) doesn't include actual regex #57801

dwt mannequin opened this issue Dec 13, 2011 · 29 comments
Assignees
Labels
topic-regex type-feature A feature request or enhancement

Comments

@dwt
Copy link
Mannequin

dwt mannequin commented Dec 13, 2011

BPO 13592
Nosy @rhettinger, @terryjreedy, @pitrou, @ezio-melotti, @alex, @cjerdonek, @ericsnowcurrently, @serhiy-storchaka
Files
  • issue13592_add_repr_to_regex.patch
  • issue13592_add_repr_to_regex_v2.patch
  • issue13592_add_repr_to_regex_v2_1.patch
  • issue13592_add_repr_to_regex_v3.patch
  • issue13592_add_repr_to_regex_v4.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/serhiy-storchaka'
    closed_at = <Date 2013-11-23.20:50:09.328>
    created_at = <Date 2011-12-13.10:42:23.532>
    labels = ['expert-regex', 'type-feature']
    title = "repr(regex) doesn't include actual regex"
    updated_at = <Date 2016-06-20.14:36:06.345>
    user = 'https://bugs.python.org/dwt'

    bugs.python.org fields:

    activity = <Date 2016-06-20.14:36:06.345>
    actor = 'serhiy.storchaka'
    assignee = 'serhiy.storchaka'
    closed = True
    closed_date = <Date 2013-11-23.20:50:09.328>
    closer = 'serhiy.storchaka'
    components = ['Regular Expressions']
    creation = <Date 2011-12-13.10:42:23.532>
    creator = 'dwt'
    dependencies = []
    files = ['26441', '26453', '26454', '32806', '32807']
    hgrepos = []
    issue_num = 13592
    keywords = ['patch']
    message_count = 29.0
    messages = ['149382', '149383', '149395', '149396', '149397', '149398', '149407', '149409', '149562', '149649', '150108', '150112', '150113', '150115', '150116', '165885', '165890', '165891', '165930', '165932', '168966', '181051', '204049', '204072', '204103', '204105', '204411', '268899', '268903']
    nosy_count = 13.0
    nosy_names = ['rhettinger', 'terry.reedy', 'pitrou', 'ezio.melotti', 'mrabarnett', 'alex', 'chris.jerdonek', 'python-dev', 'eric.snow', 'dwt', 'serhiy.storchaka', 'hltbra', 'Drekin']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue13592'
    versions = ['Python 3.4']

    @dwt
    Copy link
    Mannequin Author

    dwt mannequin commented Dec 13, 2011

    When calling repr() on a compiled regex pattern like this:

    import re
    repr(re.compile('foo'))

    you don't get the pattern of the regex out of the compiled form. Also all my research has shown no getter to allow this.

    I noticed this in my application because I was unable to show good error messages for things involving regexes, which is a shame.

    So please add the actual regex to the repr() form of the compiled regex, or alternatively provide a getter / property to get at it.

    @dwt dwt mannequin added the type-bug An unexpected behavior, bug, or error label Dec 13, 2011
    @ezio-melotti
    Copy link
    Member

    I'm not sure having the pattern in the repr will make it more readable, since the regex might even be very long. You can use the .pattern attribute if you want to see the pattern.

    @pitrou
    Copy link
    Member

    pitrou commented Dec 13, 2011

    I'm not sure having the pattern in the repr will make it more readable,
    since the regex might even be very long.

    Hmm, I think it's a reasonable feature request myself.
    Oops, I meant "enhancement", not "feature request" :)

    @pitrou pitrou added type-feature A feature request or enhancement and removed type-bug An unexpected behavior, bug, or error labels Dec 13, 2011
    @mrabarnett
    Copy link
    Mannequin

    mrabarnett mannequin commented Dec 13, 2011

    In reply to Ezio, the repr of a large string, list, tuple or dict is also long.

    The repr of a compiled regex should probably also show the flags, but should it just be the numeric value?

    @rhettinger
    Copy link
    Contributor

    ISTM that .pattern is the one way to do it.

    @pitrou
    Copy link
    Member

    pitrou commented Dec 13, 2011

    ISTM that .pattern is the one way to do it.

    To me this is like saying the repr() of functions should not show their
    name since .__name__ is the one way to do it. repr() is useful for
    debugging and logging, why not make it more useful?

    @rhettinger
    Copy link
    Contributor

    If you change the repr, it should at least eval-able, so be sure to capture the flags and whatnot.

    @mrabarnett
    Copy link
    Mannequin

    mrabarnett mannequin commented Dec 13, 2011

    Actually, one possibility that occurs to me is to provide the flags within the pattern. The .pattern attribute gives the original pattern, but repr could give the flags in-line at the start of the pattern:

    >>> # Assuming Python 3.
    >>> r = re.compile("a", re.I)
    >>> r.flags
    34
    >>> r.pattern
    'a'
    >>> repr(r)
    "<_sre.SRE_Pattern '(?i)a'>"

    I'm not sure how to make it eval-able, unless you mean something more like:

    >>> repr(r)
    "re.Regex('(?i)a')"

    where re.Regex == re.compile, which would be more meaningful than:

    >>> repr(r)
    "re.compile('(?i)a')"

    @ezio-melotti
    Copy link
    Member

    If an eval-able re.Regex is used, the flags can be showed as second arg, like:
    re.Regex('a', re.I|re.S)
    instead of being added to the pattern as in
    re.Regex('(?is)a')

    The repr can be generated with something like
    're.Regex({r.pattern!r}, {r.flags})'.format(r=r)
    that currently prints
    re.Regex('abc', 50)
    but if bpo-11957 is fixed, the result will look like
    re.Regex('abc', re.I|re.S)
    for a regex created with
    r = re.compile('abc', re.I|re.S)

    @terryjreedy
    Copy link
    Member

    but if bpo-11957 is fixed, the result will look like
    re.Regex('abc', re.I|re.S)

    That is what I would like to see.

    @mrabarnett
    Copy link
    Mannequin

    mrabarnett mannequin commented Dec 22, 2011

    I'm just adding this to the regex module and I've come up against a possible issue. The regex module supports named lists, which could be very big. Should the entire contents of those lists also be shown in the repr?They would have to be if the repr is to be a eval-able.

    @pitrou
    Copy link
    Member

    pitrou commented Dec 22, 2011

    I'm just adding this to the regex module and I've come up against a
    possible issue. The regex module supports named lists, which could be
    very big. Should the entire contents of those lists also be shown in
    the repr?They would have to be if the repr is to be a eval-able.

    I don't see how eval()able repr is a big deal. Most reprs aren't, and I
    think a readable and informative representation is the real goal.

    @rhettinger
    Copy link
    Contributor

    "I don't see how eval()able repr is a big deal. Most reprs aren't, ..."

    Sometimes, I wonder if we're even talking about the same programming language. Historically, a good deal of effort has gone into creating evalable reprs, if only because they accurately describe an object and because they teach users how to create similar objects.

    But it only takes one committer who doesn't care about evalable reprs to permanently break the pattern for everyone :-(

    @alex
    Copy link
    Member

    alex commented Dec 22, 2011

    Raymond, Antoine: I don't see your claims as contradictory, it's definitely true that the Python standardlib has historically tried to keep reprs as being eval-able, I think Antoine's correct that the vast majority of 3rd-party code does not keep with that trend.

    @pitrou
    Copy link
    Member

    pitrou commented Dec 22, 2011

    But it only takes one committer who doesn't care about evalable reprs
    to permanently break the pattern for everyone :-(

    So 95% of our datatypes were committed by a single person? :)

    @hltbra
    Copy link
    Mannequin

    hltbra mannequin commented Jul 19, 2012

    Hey, I started the patch under default branch, and get the following working:

        >>> import re
        >>> re.compile("foo")
        re.compile("foo", re.UNICODE)
        >>> re.compile("foo", re.DOTALL)
        re.compile("foo", re.DOTALL|re.UNICODE)
        >>> re.compile("foo", re.DOTALL|re.MULTILINE)
        re.compile("foo", re.MULTILINE|re.DOTALL|re.UNICODE)
        >>>

    Do you have any comments on it?

    I want to adapt the patch to make it work with python 2.7 too. Do you think is it worthful?

    The attached patch was done after commit 3fbfa61634de.

    @mrabarnett
    Copy link
    Mannequin

    mrabarnett mannequin commented Jul 19, 2012

    Python 2.7 is the end of the Python 2 line, and it's closed except for security fixes.

    @terryjreedy
    Copy link
    Member

    2.7 is on extended maintenance for normal bugs, but does not get new features/enhancements. It is too late for 3.3 also.

    @hltbra
    Copy link
    Mannequin

    hltbra mannequin commented Jul 20, 2012

    Thanks for the review ezio.melotti.

    He has notice a few things in my patch:

    • assertEquals is deprecated; should use assertEqual
    • the convention is assertEqual(result, expected), not assertEqual(expected, result)
    • it should handle quotes correctly
    • some lines were longer than 80 chars
    • add tests using inline flags (re.I instead of re.IGNORECASE)

    And I realized I was not covering the case where no flags are enabled (byte string, for instance). And I have fixed all this issues.

    And now I think this patch would work against py2x and py3k anyway.

    Attaching a new patch.

    @hltbra
    Copy link
    Mannequin

    hltbra mannequin commented Jul 20, 2012

    Changed two test names to avoid misunderstanding.

    @hltbra
    Copy link
    Mannequin

    hltbra mannequin commented Aug 23, 2012

    Any news about this patch? Is it going to be merged?

    When is next CPython release?

    @cjerdonek
    Copy link
    Member

    See also bpo-17087 which is essentially the same issue but for match objects.

    @serhiy-storchaka serhiy-storchaka self-assigned this Oct 27, 2013
    @serhiy-storchaka
    Copy link
    Member

    Here is fixed and simplified patch.

    @serhiy-storchaka
    Copy link
    Member

    • re.UNICODE omitted for string patterns.
    • Long patterns are truncated.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Nov 23, 2013

    New changeset 8c00677da6c0 by Serhiy Storchaka in branch 'default':
    Issue bpo-13592: Improved the repr for regular expression pattern objects.
    http://hg.python.org/cpython/rev/8c00677da6c0

    @serhiy-storchaka
    Copy link
    Member

    Thank you Hugo for your contribution. Thank you Thomas and Ezio for your reviews and suggestions.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Nov 25, 2013

    New changeset 4ba7a29fe02c by Ezio Melotti in branch 'default':
    bpo-13592, bpo-17087: add whatsnew entry about regex/match object repr improvements.
    http://hg.python.org/cpython/rev/4ba7a29fe02c

    @Drekin
    Copy link
    Mannequin

    Drekin mannequin commented Jun 20, 2016

    Isn't the trucation of long patterns too rough? Currently, repr(re.compile("a" * 1000)) returns something like "re.compile('aaaaaaaaaaaaa)", i.e. no ending quote and no indication that something was truncated (besides the missing quote). It looked like a bug to me at first sight.

    @serhiy-storchaka
    Copy link
    Member

    This looks weird, but is not a bug. See bpo-26090. After implementing that feature truncating a pattern would look more explicit.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    topic-regex type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    7 participants