This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: there is no way to make tempfile reproducible (i.e. seed the used RNG)
Type: Stage: resolved
Components: Library (Lib) Versions: Python 3.7
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: Nosy List: Yaroslav.Halchenko, mdk, r.david.murray, rhettinger, serhiy.storchaka
Priority: normal Keywords:

Created on 2017-12-11 14:25 by Yaroslav.Halchenko, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Messages (8)
msg308043 - (view) Author: Yaroslav Halchenko (Yaroslav.Halchenko) Date: 2017-12-11 14:25
It is quite often desired to reproduce the same failure identically. In many cases sufficient to seed the shared random._inst (via random.seed). tempfile creates new instance(s) for its own operation and does not provide API to seed it.  I do not think it would be easy (unless I miss some pattern) to make it deterministic/reproducible for multi-process apps, but I wondered why initially (for the main process) tempfile module doesn't just reuse the random._inst while only creating a new _Random in children processes?
Another alternative solution would be to allow to specify seed for all those mkstemp/mkdtemp/... and pass it all way to _RandomNameSequence which would initialize _Random with it.  This way, developers who need to seed it, could do so
msg308095 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2017-12-12 08:51
I'm suspect this not would be wise thing to do.  Have you seen any precedent for this in other languages?
msg308098 - (view) Author: Julien Palard (mdk) * (Python committer) Date: 2017-12-12 09:01
Or is there any issue that would have been easier to fix with reproductible temporary file names?
msg308132 - (view) Author: Yaroslav Halchenko (Yaroslav.Halchenko) Date: 2017-12-12 14:21
I have spent too much time in Python to be able to compare to other languages ;)  but anywhere I saw RNG being used, there was a way to seed it or to provide a state.  tempfile provides no such API

my usecase -- comparison of logs from two runs where I need to troubleshoot the point of divergence in execution .  Logs in our case (datalad) contain temporary directory filenames, so they always "diff" and I need to sift through them or to come up with some obscure sed regex to unify them.  I found in other projects of ours a really handy to be able to seed RNG globally so two runs result in identical execution path -- allows for easier reproducibility/comparison.  But when it got to those temporary filenames -- apparently I could not make it happen and would need to resort to some heavy monkey patching.
msg308458 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2017-12-16 07:55
This seems ill-advised:

* If a deterministic filename is needed, tempfile probably shouldn't be used.

* Mock objects are typically used when alternative behavior is needed for testing.

* The proposed feature would preclude a possible future development path to use SystemRandom which wouldn't be able to accept a seed.

* Having the same filename used on successive runs will likely cause other problems. Generally, a user of tempname can assume that filename doesn't already exist and the proposed option would undermine that assumption.

* I suspect if there were a real need here, we would have heard about it a long time ago.  This module is over two decades old.
msg308461 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-12-16 08:28
I concur with Raymond.

Monkey patching the tempfile module looks the right solution of this uncommon problem.
msg308471 - (view) Author: Julien Palard (mdk) * (Python committer) Date: 2017-12-16 10:44
I concur with Raymond and Serhiy here, and monkeypatching tmpfile should not be that hard, maybe patching `tempfile._Random` with `partial(random.Random, seed)` if done soon enough?
msg308557 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2017-12-18 13:37
IMO it is better to have an API that can be used when, for example, writing tests, than to monkey patch.  On the other hand, I've never had an occasion when I cared about the names of tempfiles (or directories) in my test code, and it is hard to imagine a circumstance when being able to reproduce the sequence of tempfile names chosen would matter for debugging...especially since which filenames are actually chosen from the randomly generated sequence can depend on other activity on the system.  So I concur with the rejection. 

I wouldn't object to some sort of API that allowed one to control the filename generation without worrying that later changes to the module would break ones code, but that isn't actually the use case here, so no one has actually asked for this feature ;)
History
Date User Action Args
2022-04-11 14:58:55adminsetgithub: 76457
2017-12-18 13:37:06r.david.murraysetresolution: fixed -> rejected

messages: + msg308557
nosy: + r.david.murray
2017-12-16 10:44:10mdksetmessages: + msg308471
2017-12-16 08:28:39serhiy.storchakasetstatus: open -> closed

nosy: + serhiy.storchaka
messages: + msg308461

resolution: fixed
stage: resolved
2017-12-16 07:55:51rhettingersetmessages: + msg308458
components: + Library (Lib)
versions: + Python 3.7
2017-12-12 14:21:47Yaroslav.Halchenkosetmessages: + msg308132
2017-12-12 09:01:07mdksetnosy: + mdk
messages: + msg308098
2017-12-12 08:51:31rhettingersetnosy: + rhettinger
messages: + msg308095
2017-12-11 14:25:27Yaroslav.Halchenkocreate