classification
Title: unnecessary overhead in tempfile
Type: enhancement Stage: resolved
Components: Library (Lib) Versions: Python 3.10
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: Nosy List: Deric-W, benjamin.peterson, methane, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2020-10-26 21:07 by Deric-W, last changed 2020-11-01 13:48 by vstinner. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 22997 merged Deric-W, 2020-10-27 00:27
PR 23055 merged methane, 2020-10-31 01:17
PR 23068 merged methane, 2020-11-01 01:02
Messages (17)
msg379688 - (view) Author: Eric Wolf (Deric-W) * Date: 2020-10-26 21:19
The tempfile module contains the class _RandomNameSequence, which has the rng property.
This property checks os.getpid() every time and re-initializes a random number generator when it has changed.
However, this is only necessary on systems which allow the process to be cloned and should be solved on such systems with os.register_at_fork (see the random module).
msg379710 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2020-10-27 01:39
Wouldn't it be simpler to use random.SystemRandom instead?
msg379716 - (view) Author: Eric Wolf (Deric-W) * Date: 2020-10-27 02:32
SystemRandom seems to be slower:


from random import Random, SystemRandom
from timeit import timeit

user = Random()
system = SystemRandom()
characters = "abcdefghijklmnopqrstuvwxyz0123456789_"

timeit(lambda: user.choice(characters))
>>> 0.5491522020020057

timeit(lambda: system.choice(characters))
>>> 2.9195130389998667
msg379717 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2020-10-27 02:33
I also wonder if the overhead of getpid() is actually significant for anything.
msg379722 - (view) Author: Eric Wolf (Deric-W) * Date: 2020-10-27 03:21
It seems to be insignificant, however it would allow for easier monkey-patching: https://bugs.python.org/issue32276

Instead of changing _Random one could simply assign a new instance to _named_sequence
msg379916 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2020-10-30 04:56
New changeset 8e409cebad42032bb7d0f2cadd8b1e36081d98af by Eric W in branch 'master':
bpo-42160: tempfile: Reduce overhead of pid check. (GH-22997)
https://github.com/python/cpython/commit/8e409cebad42032bb7d0f2cadd8b1e36081d98af
msg379921 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2020-10-30 06:14
Thanks
msg379995 - (view) Author: Eric Wolf (Deric-W) * Date: 2020-10-30 19:59
Thanks
msg380030 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-10-31 01:03
The commit 8e409cebad42032bb7d0f2cadd8b1e36081d98af introduced a reference leak:
https://buildbot.python.org/all/#/builders/320/builds/71

$ make && ./python -m test test_tempfile -R 3:3
...
test_tempfile leaked [49, 49, 49] references, sum=147
test_tempfile leaked [28, 28, 28] memory blocks, sum=84
...

Single test reproducing the leak:

$ ./python -m test test_tempfile -R 3:3 -m test.test_tempfile.TestGetDefaultTempdir.test_no_files_left_behind
...
test_tempfile leaked [21, 21, 21] references, sum=63
test_tempfile leaked [12, 12, 12] memory blocks, sum=36
...
msg380032 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2020-10-31 01:16
Oh, _RandomNameSequence is not true singleton. Instance is used in tests actually.
Using `os.register_at_fork` was but idea. Let's reject it.
msg380034 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2020-10-31 02:15
New changeset 43ca084c88d1e46a44199f347c72e26db84903c9 by Inada Naoki in branch 'master':
Revert "bpo-42160: tempfile: Reduce overhead of pid check. (GH-22997)"
https://github.com/python/cpython/commit/43ca084c88d1e46a44199f347c72e26db84903c9
msg380035 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2020-10-31 02:16
@Deric-W Please create another PR to change from `choise` to `choises` if you want to optimize it.
msg380052 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-10-31 10:40
See also issue30030.
msg380062 - (view) Author: Eric Wolf (Deric-W) * Date: 2020-10-31 13:16
It would be possible to allow the GC to finalize the Random instances through weak references.
On the other hand, if many _RandomNameSequence instances were used temporarily, a lot of callbacks would be registered via os.register_at_fork(), could that cause problems?
msg380064 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-10-31 13:50
Instead of os.getpid() we can use a global variable, and the single callback can reset it.

But is it worth? Is os.getpid() so slow?
msg380067 - (view) Author: Eric Wolf (Deric-W) * Date: 2020-10-31 14:15
>>> timeit(os.getpid)
0.08990733299999931

Considering the reference leaks, os.getpid() seems to be the better solution.
msg380069 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-10-31 14:46
I have tested my idea, such optimization speeds up _RandomNameSequence by 16%. At the cost of more complex code. It is not worth.
History
Date User Action Args
2020-11-01 13:48:18vstinnersetnosy: - vstinner
2020-11-01 01:02:59methanesetpull_requests: + pull_request21987
2020-10-31 14:46:39serhiy.storchakasetstatus: open -> closed
resolution: rejected
messages: + msg380069

stage: patch review -> resolved
2020-10-31 14:15:00Deric-Wsetmessages: + msg380067
2020-10-31 13:50:11serhiy.storchakasetmessages: + msg380064
2020-10-31 13:16:11Deric-Wsetmessages: + msg380062
2020-10-31 10:40:32serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg380052
2020-10-31 02:16:55methanesetmessages: + msg380035
2020-10-31 02:15:45methanesetmessages: + msg380034
2020-10-31 01:17:28methanesetstage: resolved -> patch review
pull_requests: + pull_request21974
2020-10-31 01:16:55methanesetmessages: + msg380032
2020-10-31 01:03:35vstinnersetstatus: closed -> open

nosy: + vstinner
messages: + msg380030

resolution: fixed -> (no value)
2020-10-30 19:59:37Deric-Wsetmessages: + msg379995
2020-10-30 06:14:04methanesetstatus: open -> closed
versions: + Python 3.10, - Python 3.6, Python 3.7, Python 3.8, Python 3.9
messages: + msg379921

resolution: fixed
stage: patch review -> resolved
2020-10-30 04:56:40methanesetnosy: + methane
messages: + msg379916
2020-10-27 03:21:41Deric-Wsetmessages: + msg379722
2020-10-27 02:33:29benjamin.petersonsetmessages: + msg379717
2020-10-27 02:32:31Deric-Wsetmessages: + msg379716
2020-10-27 01:39:03benjamin.petersonsetnosy: + benjamin.peterson
messages: + msg379710
2020-10-27 00:27:04Deric-Wsetkeywords: + patch
stage: patch review
pull_requests: + pull_request21911
2020-10-26 21:19:24Deric-Wsetmessages: + msg379688
2020-10-26 21:07:04Deric-Wcreate