classification
Title: Add `-r`, as opposed to `-R` to Python core interpreter
Type: enhancement Stage: resolved
Components: Interpreter Core Versions: Python 3.8
process
Status: closed Resolution: duplicate
Dependencies: Superseder:
Assigned To: Nosy List: benjamin.peterson, lepaperwan
Priority: normal Keywords:

Created on 2018-09-14 17:42 by lepaperwan, last changed 2018-12-20 21:29 by lepaperwan. This issue is now closed.

Messages (9)
msg325371 - (view) Author: Erwan Le Pape (lepaperwan) * Date: 2018-09-14 17:42
I'm attempting to leverage PEP 552 to make the core interpreter build process more deterministic. However, the standard Python Makefile uses `python -E` when calling the compileall (and all other setup scripts), which forces randomization since it can only be turned off through environment variables (which in turn leads to nondeterministic behaviour as noted in PEP 552 [1],[2]).

Also, is adding a flag that disables randomization something that would be acceptable? Or are options to the core interpreter to be kept to a minimum and this does not represent a large enough use-case?

This would basically hold in Modules/main.c in something as short as:
+ case 'r':
+     config->use_hash_seed = 1;
+     config->hash_seed = 0;
+     break;
msg325381 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2018-09-14 18:25
It would be preferable to fix build outputs to be deterministic even under randomization in the interpreter.
msg325387 - (view) Author: Erwan Le Pape (lepaperwan) * Date: 2018-09-14 19:30
How would you suggest going about doing that?

Without the proposed option, the alternative is leaving the build process vulnerable to environment variables potentially breaking the build process by patching configure.ac as follows:
-PYTHON_FOR_BUILD='./$(BUILDPYTHON) -E'
+PYTHON_FOR_BUILD='PYTHONHASHSEED=0 ./$(BUILDPYTHON) -E'

Otherwise all environment variables affecting the Python interpreter would need to be cleared in addition to setting PYTHONHASHSEED=0.

Without these `hacks`, making build outputs to be deterministic means fixing marshal to essentially sort elements when dumping unordered objects. Would you rather see a patch going in that direction?
msg325413 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2018-09-14 23:08
On Fri, Sep 14, 2018, at 12:30, Erwan Le Pape wrote:
> Without these `hacks`, making build outputs to be deterministic means 
> fixing marshal to essentially sort elements when dumping unordered 
> objects. Would you rather see a patch going in that direction?

Yes.
msg325424 - (view) Author: Erwan Le Pape (lepaperwan) * Date: 2018-09-15 06:59
Great! My only concern with that is marshalling of untrusted data at runtime (I know, you shouldn't do that) can become a much more expensive operation.

Is there any internal use of marshal beyond .pycs used at runtime by the core interpreter that might be affected by such a change?

If not, it seems (to me) an acceptable modification of marshal and I'll submit a PR for it.
msg325450 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2018-09-15 17:24
On Fri, Sep 14, 2018, at 23:59, Erwan Le Pape wrote:
> 
> Erwan Le Pape <lepaperwan3@gmail.com> added the comment:
> 
> Great! My only concern with that is marshalling of untrusted data at 
> runtime (I know, you shouldn't do that) can become a much more expensive 
> operation.
> 
> Is there any internal use of marshal beyond .pycs used at runtime by the 
> core interpreter that might be affected by such a change?

Writing pycs is the only supported use of marhsal.

> 
> If not, it seems (to me) an acceptable modification of marshal and I'll 
> submit a PR for it.

What exactly are you proposing?
msg325570 - (view) Author: Erwan Le Pape (lepaperwan) * Date: 2018-09-17 19:42
Given that marshal basically only just dumps code objects, the only viable solution I can see is adding a flag that can be passed all the way to the AST from `Python/bltinmodule.c:builtin_compile_impl` that would sort elements when creating code objects of unordered types.

This could be automatically enabled when compiling a file if we assume imported files are trustworthy or add a flag to the `Lib/compileall.py` module.

I only fear that this might break compiled code objects that make use of unordered types. Since I haven't sifted through the AST internals and the implications of such a change yet, so this is largely up for debate.
msg325839 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2018-09-20 05:16
See #34722.
msg332263 - (view) Author: Erwan Le Pape (lepaperwan) * Date: 2018-12-20 21:29
As mentioned by Benjamin Peterson, this is a duplicate of #34722, which already has an implementation proposal by Peter Ebden.

While implementation specifics change, the aim is the same and #34722 is the better solution (again, credit to Benjamin Peterson).
History
Date User Action Args
2018-12-20 21:29:51lepaperwansetstatus: open -> closed
resolution: duplicate
messages: + msg332263

stage: resolved
2018-09-22 04:11:47terry.reedysetversions: - Python 3.7
2018-09-20 05:16:25benjamin.petersonsetmessages: + msg325839
2018-09-17 19:42:00lepaperwansetmessages: + msg325570
2018-09-15 17:24:30benjamin.petersonsetmessages: + msg325450
2018-09-15 06:59:55lepaperwansetmessages: + msg325424
2018-09-14 23:08:57benjamin.petersonsetmessages: + msg325413
2018-09-14 19:30:40lepaperwansetmessages: + msg325387
2018-09-14 18:25:38benjamin.petersonsetnosy: + benjamin.peterson
messages: + msg325381
2018-09-14 17:42:00lepaperwancreate