Author vstinner
Recipients FFY00, brett.cannon, christian.heimes, frenzy, hroncok, vstinner
Date 2020-05-11.15:07:23
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1589209644.49.0.0886233182745.issue40495@roundup.psfhosted.org>
In-reply-to
Content
Is it possible that the PYC file of optimization level 0 content is modified if the PY file content changed, with would make PYC files or optimization level 1 and 2 inconsistent?

Christian Heimes:
> Python's import system is fully compatible with this approach. importlib never directly writes to a .pyc file. Instead it always creates a new temporary file next to the .pyc file and then overrides the .pyc file with an atomic file system operation. See _write_atomic() in Lib/importlib/_bootstrap_external.py.

It seems like importlib doesn't have the issue because it doesn't open PYC file to write its content, but _write_atomic() creates a *new* file and then call os.replace() to rename the temporary file to the PYC final name.

Alright, I think that I understood :-)

--

PYC file became more complicated with PEP 552. Here are my own notes to try to understand how it's supposed to be used.


Python 3.9 now has _imp.check_hash_based_pycs string which can be overriden by --check-hash-based-pycs command line option. It can have 3 values:
* "always"
* "never"
* "default"

These values are defined by the PEP 552:

* "never" causes the interpreter to always assume hash-based pycs are valid
* "default" means the check_source flag in hash-based pycs determines invalidation
* "always" causes the interpreter to hash the source file for invalidation regardless of value of check_source bit

When a PYC file is created, it has a "check_source" bit:

* Bit set: If the check_source flag is set, Python will determine the validity of the pyc by hashing the source file and comparing the hash with the expected hash in the pyc. If the pyc needs to be regenerated, it will be regenerated as a hash-based pyc again with the check_source flag set.
* Bit unset, Python will simply load the pyc without checking the hash of the source file. The expectation in this case is that some external system (e.g., the local Linux distribution’s package manager) is responsible for keeping pycs up to date, so Python itself doesn’t have to check.

I mostly copied/pasted the PEP 552 :-)

py_compile and compileall have a new invalidation_mode which can have 3 values:

class PycInvalidationMode(Enum):
    TIMESTAMP
    CHECKED_HASH
    UNCHECKED_HASH

The default is compiled in py_compile by:

def _get_default_invalidation_mode():
    if os.environ.get('SOURCE_DATE_EPOCH'):
        return PycInvalidationMode.CHECKED_HASH
    else:
        return PycInvalidationMode.TIMESTAMP

importlib: SourceLoader.get_code(filename) uses:

    flags = _classify_pyc(data, fullname, exc_details)
    bytes_data = memoryview(data)[16:]
    hash_based = flags & 0b1 != 0
    if hash_based:
        check_source = flags & 0b10 != 0
        if (_imp.check_hash_based_pycs != 'never' and
            (check_source or
             _imp.check_hash_based_pycs == 'always')):
            source_bytes = self.get_data(source_path)
            source_hash = _imp.source_hash(
                _RAW_MAGIC_NUMBER,
                source_bytes,
            )
            _validate_hash_pyc(data, source_hash, fullname,
                               exc_details)
    else:
        _validate_timestamp_pyc(
            data,
            source_mtime,
            st['size'],
            fullname,
            exc_details,
        )
History
Date User Action Args
2020-05-11 15:07:24vstinnersetrecipients: + vstinner, brett.cannon, christian.heimes, hroncok, frenzy, FFY00
2020-05-11 15:07:24vstinnersetmessageid: <1589209644.49.0.0886233182745.issue40495@roundup.psfhosted.org>
2020-05-11 15:07:24vstinnerlinkissue40495 messages
2020-05-11 15:07:23vstinnercreate