Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

re.groupindex is available for modification and continues to work, having incorrect data inside it #58468

Closed
py-user mannequin opened this issue Mar 12, 2012 · 18 comments
Assignees
Labels
topic-regex type-bug An unexpected behavior, bug, or error

Comments

@py-user
Copy link
Mannequin

py-user mannequin commented Mar 12, 2012

BPO 14260
Nosy @gvanrossum, @birkenfeld, @vstinner, @ezio-melotti, @merwok, @py-user, @ericsnowcurrently, @serhiy-storchaka
Files
  • re_groupindex_copy.patch
  • re_groupindex_proxy.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/serhiy-storchaka'
    closed_at = <Date 2015-03-30.07:08:36.635>
    created_at = <Date 2012-03-12.05:28:22.645>
    labels = ['expert-regex', 'type-bug']
    title = 're.groupindex is available for modification and continues to work, having incorrect data inside it'
    updated_at = <Date 2015-03-30.07:08:36.634>
    user = 'https://github.com/py-user'

    bugs.python.org fields:

    activity = <Date 2015-03-30.07:08:36.634>
    actor = 'serhiy.storchaka'
    assignee = 'serhiy.storchaka'
    closed = True
    closed_date = <Date 2015-03-30.07:08:36.635>
    closer = 'serhiy.storchaka'
    components = ['Regular Expressions']
    creation = <Date 2012-03-12.05:28:22.645>
    creator = 'py.user'
    dependencies = []
    files = ['37098', '37099']
    hgrepos = []
    issue_num = 14260
    keywords = ['patch', 'needs review']
    message_count = 18.0
    messages = ['155442', '155459', '155484', '155546', '155560', '155570', '155593', '155702', '155734', '175494', '175497', '175502', '175505', '230450', '234878', '239041', '239094', '239526']
    nosy_count = 11.0
    nosy_names = ['gvanrossum', 'georg.brandl', 'vstinner', 'ezio.melotti', 'eric.araujo', 'mrabarnett', 'py.user', 'python-dev', 'eric.snow', 'serhiy.storchaka', 'Abel.Farias']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue14260'
    versions = ['Python 3.5']

    @py-user
    Copy link
    Mannequin Author

    py-user mannequin commented Mar 12, 2012

    >>> import re
    >>> p = re.compile(r'abc(?P<n>def)')
    >>> p.sub(r'\g<n>', 'abcdef123abcdef')
    'def123def'
    >>> p.groupindex['n'] = 2
    >>> p.sub(r'\g<n>', 'abcdef123abcdef')
    'def123def'
    >>> p.groupindex
    {'n': 2}
    >>>

    @py-user py-user mannequin added topic-regex type-bug An unexpected behavior, bug, or error labels Mar 12, 2012
    @ericvsmith ericvsmith changed the title regex.groupindex available for modification and continues to work, having incorrect data inside it re.groupindex available for modification and continues to work, having incorrect data inside it Mar 12, 2012
    @mrabarnett
    Copy link
    Mannequin

    mrabarnett mannequin commented Mar 12, 2012

    The re module creates the dict purely for the benefit of the user, and as it's a normal dict, it's mutable.

    An alternative would to use an immutable dict or dict-like object, but Python doesn't have such a class, and it's probably not worth writing one just for this use-case.

    @py-user
    Copy link
    Mannequin Author

    py-user mannequin commented Mar 12, 2012

    Matthew Barnett wrote:

    The re module creates the dict purely for the benefit of the user

    this dict affects on regex.sub()

    >>> import re
    >>> p = re.compile(r'abc(?P<n>def)')
    >>> p.groupindex
    {'n': 1}
    >>> p.groupindex['n'] = 2
    >>> p.sub(r'\g<n>', 'abcdef')
    Traceback (most recent call last):
      File "/usr/local/lib/python3.2/sre_parse.py", line 811, in expand_template
        literals[index] = s = g(group)
    IndexError: no such group
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/usr/local/lib/python3.2/re.py", line 286, in filter
        return sre_parse.expand_template(template, match)
      File "/usr/local/lib/python3.2/sre_parse.py", line 815, in expand_template
        raise error("invalid group reference")
    sre_constants.error: invalid group reference
    >>>

    @mrabarnett
    Copy link
    Mannequin

    mrabarnett mannequin commented Mar 13, 2012

    It appears I was wrong. :-(

    The simplest solution in that case is for it to return a _copy_ of the dict.

    @merwok
    Copy link
    Member

    merwok commented Mar 13, 2012

    But regex.sub is affected only if you manually muck with the dict, right? If so, then it looks like a case of “it hurts when I do this” (the doctor’s reply: “Don’t do this.”)

    @py-user
    Copy link
    Mannequin Author

    py-user mannequin commented Mar 13, 2012

    the first message shows how it can work with a broken dict

    Éric Araujo wrote:

    But regex.sub is affected only if you manually muck with the dict, right?

    I can get code from anywhere

    @merwok
    Copy link
    Member

    merwok commented Mar 13, 2012

    I can get code from anywhere
    I am afraid I don’t understand. Could you start again and explain what bug you ran into, i.e. what behavior does not match what the docs say? At present this report looks like it is saying “when I put random things in an internal data structures then bad things happen”, and I don‘t think Python promises to not break when people do random editions to internal data structures.

    @py-user
    Copy link
    Mannequin Author

    py-user mannequin commented Mar 14, 2012

    I take someone's code
    make tests for its behavior
    all tests say "the code is working"
    I continue to write the code
    make tests for its behavior
    all tests say "the code is working"
    I install it somewhere and it crashes

    now it is depending on the cache, when this exception is raised

    Éric Araujo wrote:

    and I don‘t think Python promises to not break when people do random editions

    when people do something wrong, python should raise an exception

    @birkenfeld
    Copy link
    Member

    Looks like a case for a read-only dict/dictproxy :)

    @serhiy-storchaka
    Copy link
    Member

    I fully agree with Éric. Just don't do this.

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Nov 13, 2012

    I'm not so sure. If dicts or classes are used for configuration
    or informational purposes, I prefer them to be locked down.

    An example of the first is the decimal context, where it was possible
    to write context.emax = 9 instead of context.Emax = 9 without getting
    an error. This is an easy mistake to make and can be hard to track
    down in a large program.

    The mistake here is maybe less likely, but I agree with Georg that
    it's a case for a read-only dict/dictproxy.

    @AbelFarias AbelFarias mannequin changed the title re.groupindex available for modification and continues to work, having incorrect data inside it re.grupindex available for modification and continues to work, having incorrect data inside it Nov 13, 2012
    @skrah skrah mannequin changed the title re.grupindex available for modification and continues to work, having incorrect data inside it re.groupindex available for modification and continues to work, having incorrect data inside it Nov 13, 2012
    @merwok
    Copy link
    Member

    merwok commented Nov 13, 2012

    I propose using a MappingProxy type in 3.4 and add an example to the docs for stable versions.

    @merwok merwok removed the invalid label Nov 13, 2012
    @serhiy-storchaka
    Copy link
    Member

    Copy or proxy may affect performance. We will need to make benchmarks to see how much.

    @py-user py-user mannequin changed the title re.groupindex available for modification and continues to work, having incorrect data inside it re.groupindex is available for modification and continues to work, having incorrect data inside it Nov 13, 2012
    @serhiy-storchaka
    Copy link
    Member

    Here are two patches which implement two alternative solutions. They are based on regex code.

    Dict copying patch matches current regex behavior and needs modifying other code to avoid small slowdown. Artificial example:

    $ ./python -m timeit -s 'import re; n = 100; m = re.match("".join("(?P<g%d>.)" % g for g in range(n)), "x" * n); t = ",".join(r"\g<g%d>" % g for g in range(n))' -- 'm.expand(t)'

    Without patch: 7.48 msec per loop
    With re_groupindex_copy.patch but without modifying _expand: 9.61 msec per loop
    With re_groupindex_copy.patch and with modifying _expand: 7.41 msec per loop

    While stdlib code can be modified, this patch can cause small slowdown of some third-party code.

    Dict proxying patch has no performance effect, but it is slightly less compatible. Some code can accept dict but not dict-like object.

    @serhiy-storchaka
    Copy link
    Member

    Ping.

    @serhiy-storchaka
    Copy link
    Member

    What approach looks better, a copy or a read-only proxy?

    @py-user
    Copy link
    Mannequin Author

    py-user mannequin commented Mar 24, 2015

    @serhiy Storchaka

    What approach looks better, a copy or a read-only proxy?

    ISTM, your proxy patch is better, because it expects an exception rather than silence.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Mar 29, 2015

    New changeset 4d5826fa77a1 by Serhiy Storchaka in branch 'default':
    Issue bpo-14260: The groupindex attribute of regular expression pattern object
    https://hg.python.org/cpython/rev/4d5826fa77a1

    @serhiy-storchaka serhiy-storchaka self-assigned this Mar 30, 2015
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    topic-regex type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants