Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

memory leak (reference cycles) using re #69740

Closed
joente mannequin opened this issue Nov 5, 2015 · 4 comments
Closed

memory leak (reference cycles) using re #69740

joente mannequin opened this issue Nov 5, 2015 · 4 comments
Assignees
Labels
performance Performance or resource usage stdlib Python modules in the Lib dir topic-regex

Comments

@joente
Copy link
Mannequin

joente mannequin commented Nov 5, 2015

BPO 25554
Nosy @ezio-melotti, @serhiy-storchaka
Files
  • fix_mem_sre_parse.patch: patched sre_parse.py
  • fix_mem_sre_parse_2.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/serhiy-storchaka'
    closed_at = <Date 2015-11-05.16:43:26.658>
    created_at = <Date 2015-11-05.08:27:44.737>
    labels = ['expert-regex', 'library', 'performance']
    title = 'memory leak (reference cycles) using re'
    updated_at = <Date 2015-11-05.16:43:26.656>
    user = 'https://bugs.python.org/joente'

    bugs.python.org fields:

    activity = <Date 2015-11-05.16:43:26.656>
    actor = 'serhiy.storchaka'
    assignee = 'serhiy.storchaka'
    closed = True
    closed_date = <Date 2015-11-05.16:43:26.658>
    closer = 'serhiy.storchaka'
    components = ['Library (Lib)', 'Regular Expressions']
    creation = <Date 2015-11-05.08:27:44.737>
    creator = 'joente'
    dependencies = []
    files = ['40948', '40952']
    hgrepos = []
    issue_num = 25554
    keywords = ['patch']
    message_count = 4.0
    messages = ['254092', '254099', '254114', '254115']
    nosy_count = 5.0
    nosy_names = ['ezio.melotti', 'mrabarnett', 'python-dev', 'serhiy.storchaka', 'joente']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'resource usage'
    url = 'https://bugs.python.org/issue25554'
    versions = ['Python 3.5', 'Python 3.6']

    @joente
    Copy link
    Mannequin Author

    joente mannequin commented Nov 5, 2015

    When compiling a regular expression with groups (subpatterns),
    circular references are created.
    Here is an example to illustrate the problem:

    >>> import gc
    >>> import re
    >>> gc.disable() # disable garbage collector
    >>> gc.collect() # make sure we start with 0
    0
    >>> re.compile('(a|b)') # compile something with groups
    re.compile('(a|b)')
    >>> gc.collect() # collects x objects depending on the compiled string
    11

    To fix the issue a weakref object for p is used.

    @joente joente mannequin added stdlib Python modules in the Lib dir performance Performance or resource usage labels Nov 5, 2015
    @serhiy-storchaka
    Copy link
    Member

    Thank you for your report and patch Jeroen.

    Indeed, there is a regression, and your patch fixes it. But I don't like the idea of using weakref. For now sre_parse has very little dependencies, but weakref depends on collections that depends on a number of modules. For now importing weakref works, but it is too easy to create a dependency loop in future.

    Here is alternative patch that gets rid of references at all. The subpatterns list was added in the patch for bpo-9179 and is an implementation detail. We can replace it with a list of subpattern widths.

    @joente
    Copy link
    Mannequin Author

    joente mannequin commented Nov 5, 2015

    Thanks Serhiy,

    I totally agree with your solution. Using a list with subpattern widths is definitely better compared to using weakref.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Nov 5, 2015

    New changeset 7f4fca8f13a2 by Serhiy Storchaka in branch '3.5':
    Issue bpo-25554: Got rid of circular references in regular expression parsing.
    https://hg.python.org/cpython/rev/7f4fca8f13a2

    New changeset 8621727dd9f7 by Serhiy Storchaka in branch 'default':
    Issue bpo-25554: Got rid of circular references in regular expression parsing.
    https://hg.python.org/cpython/rev/8621727dd9f7

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    performance Performance or resource usage stdlib Python modules in the Lib dir topic-regex
    Projects
    None yet
    Development

    No branches or pull requests

    1 participant