This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Unloading docstrings from memory if -OO is given
Type: enhancement Stage:
Components: Interpreter Core Versions: Python 3.3
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Sworddragon, pconnell, r.david.murray, serhiy.storchaka
Priority: normal Keywords:

Created on 2013-11-09 06:54 by Sworddragon, last changed 2022-04-11 14:57 by admin.

Files
File name Uploaded Description Edit
test.py Sworddragon, 2013-11-09 06:54
Messages (8)
msg202465 - (view) Author: (Sworddragon) Date: 2013-11-09 06:54
Using -OO on a script will remove the __doc__ attributes but the docstrings will still be in the process memory. In the attachments is an example script which demonstrates this with a docstring of ~10 MiB (opening the file in an editor can need some time). Calling "python3 -OO test.py" will result in a memory usage of ~16 MiB on my system (Linux 64 Bit) while test.__doc__ is None.
msg202485 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2013-11-09 16:06
Do realize this is a one-time memory cost, though, because next execution will load from the .pyo and thus will never load the docstring into memory. If you pre-compile all bytecode with -OO this will never even occur.
msg202486 - (view) Author: (Sworddragon) Date: 2013-11-09 16:24
> Do realize this is a one-time memory cost, though, because next execution will load from the .pyo and thus will never load the docstring into memory.

Except in 2 cases:

- The bytecode was previously generated with -O.
- The bytecode couldn't be written (for example permission issues or Python was invoked with -B).
msg202491 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2013-11-09 18:35
So the question is, if there is no longer a reference to the docstring, why isn't it garbage collected?  (I tested adding a gc.collect(), and it didn't make any difference.)
msg202500 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2013-11-09 20:43
R. David Murray <report@bugs.python.org> wrote:
> So the question is, if there is no longer a reference to the docstring, why isn't it garbage collected?  (I tested adding a gc.collect(), and it didn't make any difference.)

I think it probably is garbage collected but the freed memory is not returned to the OS
by the memory allocator.
msg202501 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2013-11-09 21:12
Hmm.  If I turn on gc debugging before the def, I don't see anything get collected.  If I allocate a series of new 10K strings, the memory keeps growing.  Of course, that could still be down to the vagaries of OS memory management.  Time to break out Victor's tracemalloc, but I probably don't have that much ambition today :)
msg202526 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2013-11-10 14:19
It looks like the memory management is based directly on Py_Arenas:

def f():
    """squeamish ossifrage"""
    pass

Breakpoint 1, PyArena_Free (arena=0x9a5120) at Python/pyarena.c:159
159         assert(arena);
(gdb) p arena->a_objects
$1 = ['f', 'squeamish ossifrage']
(gdb) bt
#0  PyArena_Free (arena=0x9a5120) at Python/pyarena.c:159
#1  0x0000000000425af5 in PyRun_FileExFlags (fp=0xa1b780, filename_str=0x7ffff7f37eb0 "docstr.py", start=257, globals=
    {'f': <function at remote 0x7ffff7f04058>, '__builtins__': <module at remote 0x7ffff7f6a358>, '__name__': '__main__', '__file__': 'docstr.py', '__package__': None, '__loader__': <SourceFileLoader(name='__main__', path='docstr.py') at remote 0x7ffff7ede608>, '__cached__': None, '__doc__': None}, locals=
    {'f': <function at remote 0x7ffff7f04058>, '__builtins__': <module at remote 0x7ffff7f6a358>, '__name__': '__main__', '__file__': 'docstr.py', '__package__': None, '__loader__': <SourceFileLoader(name='__main__', path='docstr.py') at remote 0x7ffff7ede608>, '__cached__': None, '__doc__': None}, closeit=1, flags=0x7fffffffe490) at Python/pythonrun.c:2114
#2  0x0000000000423a0c in PyRun_SimpleFileExFlags (fp=0xa1b780, filename=0x7ffff7f37eb0 "docstr.py", closeit=1, flags=
    0x7fffffffe490) at Python/pythonrun.c:1589
#3  0x000000000042289c in PyRun_AnyFileExFlags (fp=0xa1b780, filename=0x7ffff7f37eb0 "docstr.py", closeit=1, flags=0x7fffffffe490)
    at Python/pythonrun.c:1276
#4  0x000000000043bc83 in run_file (fp=0xa1b780, filename=0x9669b0 L"docstr.py", p_cf=0x7fffffffe490) at Modules/main.c:336
#5  0x000000000043c8c5 in Py_Main (argc=3, argv=0x964020) at Modules/main.c:780
#6  0x000000000041cdb5 in main (argc=3, argv=0x7fffffffe688) at ./Modules/python.c:69

So the string 'squeamish ossifrage' is still in arena->a_objects right until
end of PyRun_FileExFlags(), even with -OO.
msg363550 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2020-03-06 20:43
Do note that .pyc files now encode their optimization levels, so the only thing to potentially do here is change the compiler to toss docstrings out and make sure they are freed when they are parsed to avoid holding on to them.
History
Date User Action Args
2022-04-11 14:57:53adminsetgithub: 63732
2020-03-06 20:43:28brett.cannonsetnosy: - brett.cannon
2020-03-06 20:43:24brett.cannonsetmessages: + msg363550
2014-05-22 22:09:30skrahsetnosy: - skrah
2014-04-24 05:52:19pconnellsetnosy: + pconnell
2013-11-10 14:19:40skrahsetmessages: + msg202526
2013-11-09 21:12:13r.david.murraysetmessages: + msg202501
2013-11-09 20:43:36skrahsetmessages: + msg202500
2013-11-09 18:35:28r.david.murraysetnosy: + r.david.murray
messages: + msg202491
2013-11-09 16:24:18Sworddragonsetmessages: + msg202486
2013-11-09 16:06:12brett.cannonsetnosy: + brett.cannon
messages: + msg202485
2013-11-09 09:09:31serhiy.storchakasetversions: + Python 3.3, - Python 2.7
2013-11-09 09:08:48serhiy.storchakasetmessages: - msg202474
2013-11-09 09:08:37serhiy.storchakasettype: behavior -> enhancement
components: - Tests
versions: - Python 3.3, Python 3.4
2013-11-09 09:05:08serhiy.storchakasetversions: + Python 2.7, Python 3.4
nosy: + skrah, serhiy.storchaka

messages: + msg202474

components: + Tests
type: enhancement -> behavior
2013-11-09 06:54:19Sworddragoncreate