classification
Title: memory leak (reference cycles) using re
Type: resource usage Stage: resolved
Components: Library (Lib), Regular Expressions Versions: Python 3.6, Python 3.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: ezio.melotti, joente, mrabarnett, python-dev, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2015-11-05 08:27 by joente, last changed 2015-11-05 16:43 by serhiy.storchaka. This issue is now closed.

Files
File name Uploaded Description Edit
fix_mem_sre_parse.patch joente, 2015-11-05 08:27 patched sre_parse.py review
fix_mem_sre_parse_2.patch serhiy.storchaka, 2015-11-05 10:31 review
Messages (4)
msg254092 - (view) Author: Jeroen van der Heijden (joente) * Date: 2015-11-05 08:27
When compiling a regular expression with groups (subpatterns), 
circular references are created.
Here is an example to illustrate the problem:

>>> import gc
>>> import re
>>> gc.disable() # disable garbage collector
>>> gc.collect() # make sure we start with 0
0
>>> re.compile('(a|b)') # compile something with groups
re.compile('(a|b)')
>>> gc.collect() # collects x objects depending on the compiled string
11


To fix the issue a weakref object for p is used.
msg254099 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-11-05 10:31
Thank you for your report and patch Jeroen.

Indeed, there is a regression, and your patch fixes it. But I don't like the idea of using weakref. For now sre_parse has very little dependencies, but weakref depends on collections that depends on a number of modules. For now importing weakref works, but it is too easy to create a dependency loop in future.

Here is alternative patch that gets rid of references at all. The subpatterns list was added in the patch for issue9179 and is an implementation detail. We can replace it with a list of subpattern widths.
msg254114 - (view) Author: Jeroen van der Heijden (joente) * Date: 2015-11-05 15:13
Thanks Serhiy,

I totally agree with your solution. Using a list with subpattern widths is definitely better compared to using weakref.
msg254115 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2015-11-05 15:52
New changeset 7f4fca8f13a2 by Serhiy Storchaka in branch '3.5':
Issue #25554: Got rid of circular references in regular expression parsing.
https://hg.python.org/cpython/rev/7f4fca8f13a2

New changeset 8621727dd9f7 by Serhiy Storchaka in branch 'default':
Issue #25554: Got rid of circular references in regular expression parsing.
https://hg.python.org/cpython/rev/8621727dd9f7
History
Date User Action Args
2015-11-05 16:43:26serhiy.storchakasetstatus: open -> closed
stage: patch review -> resolved
resolution: fixed
versions: - Python 2.7, Python 3.4
2015-11-05 15:52:03python-devsetnosy: + python-dev
messages: + msg254115
2015-11-05 15:13:59joentesetmessages: + msg254114
2015-11-05 10:31:24serhiy.storchakasetfiles: + fix_mem_sre_parse_2.patch

assignee: serhiy.storchaka
components: + Regular Expressions
versions: + Python 2.7, Python 3.4, Python 3.6
nosy: + serhiy.storchaka, ezio.melotti, mrabarnett

messages: + msg254099
stage: patch review
2015-11-05 08:27:44joentecreate