classification
Title: Expose _PyCoreConfig structure to Python
Type: enhancement Stage: resolved
Components: Interpreter Core Versions: Python 3.8
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: barry Nosy List: barry, christian.heimes, emilyemorehouse, eric.snow, ncoghlan, vstinner
Priority: normal Keywords: patch

Created on 2018-06-20 20:06 by barry, last changed 2019-05-27 23:40 by vstinner. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 7867 closed barry, 2018-06-23 00:35
Messages (19)
msg320107 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2018-06-20 20:06
The _PyCoreConfig structure in pystate.h has some interesting fields that I don't think are exposed anywhere else to Python-land.  I was particularly interested recently in hash_seed and use_hash_seed.   I'm thinking that it may be useful to expose this structure in the sys module.
msg320108 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2018-06-20 20:28
hash_seed and use_hash_seed could be added to sys.hash_info. This would be the first place I'd look for the information. After all I implemented it. :)
msg320109 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2018-06-20 20:30
On Jun 20, 2018, at 13:28, Christian Heimes <report@bugs.python.org> wrote:
> 
> Christian Heimes <lists@cheimes.de> added the comment:
> 
> hash_seed and use_hash_seed could be added to sys.hash_info. This would be the first place I'd look for the information. After all I implemented it. :)

That was the first place I looked too :)
msg320111 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-06-20 20:53
Is is still a secret seed if it's public? :)
msg320198 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2018-06-21 23:39
I think the basic implementation problem is that by the time you get to get_hash_info() in sysmodule.c, you no longer have access to the _PyCoreConfig object, nor the _PyMain object that it's generally attached to.
msg320211 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-06-22 07:04
> I think the basic implementation problem is that by the time you get to get_hash_info() in sysmodule.c, you no longer have access to the _PyCoreConfig object, nor the _PyMain object that it's generally attached to.

An interpreter now keeps a copy of _PyCoreConfig and _PyMainInterpreterConfig.

See for example make_flags() in sysmodule.c:

_PyCoreConfig *core_config = &_PyGILState_GetInterpreterStateUnsafe()->core_config;
...
PyStructSequence_SET_ITEM(seq, pos++, PyBool_FromLong(core_config->dev_mode));

The interpreter really owns the copy of these configs and they are kept until the interpreter object is destroyed.

Another example:

static PyObject *
import_find_and_load(PyObject *abs_name)
{
    ...
    int import_time = interp->core_config.import_time;
    ...
}
msg320250 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2018-06-22 18:13
Thanks for the hint!  I had a feeling there had to be an API to get at it, but I couldn’t find it.  Maybe we should start documenting the Python Secret Underscore API? :)

On Jun 22, 2018, at 00:04, STINNER Victor <report@bugs.python.org> wrote:
> 
> _PyCoreConfig *core_config = &_PyGILState_GetInterpreterStateUnsafe()->core_config;
> ...
> PyStructSequence_SET_ITEM(seq, pos++, PyBool_FromLong(core_config->dev_mode));
> 
> The interpreter really owns the copy of these configs and they are kept until the interpreter object is destroyed.
msg320265 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2018-06-22 20:53
I think there's another thing I'd like to change, and it seems like it's "just" an implementation detail.  In _Py_HashRandomization_Init(), if use_hash_seed is 0, then we directly inject the random bits into the buffer, and then there's no hash_seed.  I'd like to change that so that if use_hash_seed is 0, then we create a random hash seed first, and then call lcg_urandom() for the hash secret.  That way, even if Python itself uses a random hash seed, we'll have a record of that in the runtime that can be used to reproduce the hashing.  In this case, I'd still leave use_hash_seed == 0, and that would tell you what combinations of env vars were used.
msg320267 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-06-22 20:54
Nick plans to finish his PEP 432 for Python 3.8 and make the API public.
See with him? The PEP should document these structures but I was ahead and
made changes which were not scheduled and the PEP is now outdated.
msg320269 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2018-06-22 20:56
Nosying Nick.  I agree there's some overlap with Python startup restructuring, but it feels kind of orthogonal too.  I really am only exposing (some elements) of that structure to Python.

What might be interesting though would be if we want to expose the entire structure and not just the hash seeds, as I'm leaning toward here (given that we already have sys.hash_info).
msg320270 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-06-22 20:57
Barry: generating a 32 bit seed gives less entropy and so makes Python
easier to crash. If you need reproducible Python: generate a seed and set
the env var before starting Python. Tox does that. Regrtest should do that.
msg320285 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2018-06-22 22:20
We could make the hash_seed 64 bits.
msg320289 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2018-06-22 22:30
Although I guess that would require modifications to lcg_urandom().  I don't feel qualified to change that function.
msg320299 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-06-23 08:32
> We could make the hash_seed 64 bits.

On my 64-bit Linux, _Py_HashSecret_t takes 24 bytes (192 bits).
msg320345 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2018-06-24 03:03
I'm thoroughly open to co-author requests for PEP 432 - the "Let's implement it as a private API for Python 3.7 and see what we learn from the experience" plan worked beautifully, but it *also* means the PEP text is now woefully out of date with reality :)

The pieces that are missing are:

- bring it up to date with what we actually did for 3.7
- decide which of those pieces we want to make public as-is, and which we want to tweak before making them generally available (e.g. does the "ConfigureMainInterpreter" naming still make sense? Or should we go back to the earlier "BeginInitialization" and "EndInitialization" pair?)
- now that we store this state in a more coherent way, what do we want to make public at the Python layer, and where should we make it public to avoid causing too many problems for other implementations? 

However, while I'd definitely be able to make time to review a PR to the PEP, I can't make any promises as to when I'd be able to sit down and actually draft that update myself.
msg320347 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2018-06-24 03:21
Back on the original hash seed topic:

1. The exact size of the seed ranges from 128 bits (SIPHash) to 32-bits depending on exactly which hash algorithm you're talking about (https://www.python.org/dev/peps/pep-0456/#hash-secret)

2. While PEP 456 doesn't state it explicitly, my recollection is that omitting the exact hash seed value from the Python level API was a deliberate decision, since one of the *purposes* of PEP 456 was to protect against seed recovery attacks like https://131002.net/siphash/poc.py. Being able to read the seed directly from the sys modules would rather simplify the task of seed recovery :)

Only exposing a `forced_hash_seed` (and hiding randomly generated ones as `forced_hash_seed=None`) seems reasonable though, since those can already be read from os.environ anyway.
msg320438 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2018-06-25 20:25
On Jun 23, 2018, at 20:21, Nick Coghlan <report@bugs.python.org> wrote:
> 
> Only exposing a `forced_hash_seed` (and hiding randomly generated ones as `forced_hash_seed=None`) seems reasonable though, since those can already be read from os.environ anyway.

Only mirroring $PYTHONHASHSEED probably makes the whole ask less useful.  Maybe I should abandon the PR, although it may still make sense to export the full _PyCoreConfig structure.
msg342530 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-05-15 02:07
Update. I implemented _testinternalcapi.get_configs() which exports *all* Python configuration used to initialize Python. It contains the hash seed for example. The function is only written for tests.

Moreover, I proposed the PEP 587 to expose the new _PyCoreConfig as a public C API.
msg343706 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-05-27 23:40
So far, there is no clear agreement to expose C PyConfig structure in Python, so I close the issue.

My PEP 587 has been accepted. I chose to not expose PyConfig in Python in the PEP. But I'm open to revisit this idea later, especially to move towards PEP 432: implement multi-phase initialization (only partially supported in my PEP 587).

But I would prefer to a different rationale than exposing hash_seed. For hash_seed alone, I don't think that it's worth it. Moreover, Christian wrote:

> hash_seed and use_hash_seed could be added to sys.hash_info. This would be the first place I'd look for the information. After all I implemented it. :)
History
Date User Action Args
2019-05-27 23:40:30vstinnersetstatus: open -> closed
resolution: rejected
messages: + msg343706

stage: patch review -> resolved
2019-05-15 02:07:27vstinnersetmessages: + msg342530
2018-06-25 20:25:45barrysetmessages: + msg320438
2018-06-24 03:21:08ncoghlansetmessages: + msg320347
2018-06-24 03:03:58ncoghlansetmessages: + msg320345
2018-06-23 08:32:15vstinnersetmessages: + msg320299
2018-06-23 00:35:33barrysetkeywords: + patch
stage: needs patch -> patch review
pull_requests: + pull_request7475
2018-06-22 23:50:29eric.snowsetnosy: + eric.snow, emilyemorehouse
2018-06-22 22:30:52barrysetmessages: + msg320289
2018-06-22 22:20:26barrysetmessages: + msg320285
2018-06-22 20:57:38vstinnersetmessages: + msg320270
2018-06-22 20:56:31barrysetnosy: + ncoghlan
messages: + msg320269
2018-06-22 20:54:22vstinnersetmessages: + msg320267
2018-06-22 20:53:09barrysetmessages: + msg320265
2018-06-22 18:13:28barrysetmessages: + msg320250
2018-06-22 07:04:41vstinnersetmessages: + msg320211
2018-06-21 23:39:36barrysetmessages: + msg320198
2018-06-20 20:53:35vstinnersetnosy: + vstinner
messages: + msg320111
2018-06-20 20:30:22barrysetmessages: + msg320109
2018-06-20 20:28:55christian.heimessetnosy: + christian.heimes
messages: + msg320108
2018-06-20 20:06:15barrycreate