classification
Title: year 2038 problem in compileall.py
Type: compile error Stage: patch review
Components: Build Versions: Python 3.9, Python 3.8, Python 3.7, Python 3.6, Python 3.5
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: ammar2, bmwiedemann, matrixise, vstinner, xtreak
Priority: normal Keywords: patch

Created on 2018-10-15 11:22 by bmwiedemann, last changed 2020-04-29 17:13 by vstinner.

Pull Requests
URL Status Linked Edit
PR 9892 open matrixise, 2018-10-15 18:40
PR 19651 closed ammar2, 2020-04-22 10:07
PR 19708 open ammar2, 2020-04-25 03:53
Messages (10)
msg327743 - (view) Author: Bernhard M. Wiedemann (bmwiedemann) * Date: 2018-10-15 11:22
To reproduce:
touch -d 2038-01-20 /usr/lib/python3.6/site-packages/six.py
python3 /usr/lib64/python3.6/compileall.py


  File "/usr/lib64/python3.6/compileall.py", line 198, in compile_path
    legacy=legacy, optimize=optimize)
  File "/usr/lib64/python3.6/compileall.py", line 90, in compile_dir
    legacy, optimize):
  File "/usr/lib64/python3.6/compileall.py", line 138, in compile_file
    mtime)
struct.error: 'l' format requires -2147483648 <= number <= 2147483647

It could use either 
64 bit int (requires new .pyc format with different magic number) or
unsigned 32 bit int (gives us only another 68 years)
msg327747 - (view) Author: Stéphane Wirtel (matrixise) * (Python committer) Date: 2018-10-15 13:08
With 3.8a

Traceback (most recent call last):
  File "/home/stephane/src/github.com/python/cpython/Lib/runpy.py", line 192, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/stephane/src/github.com/python/cpython/Lib/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/stephane/src/github.com/python/cpython/Lib/compileall.py", line 326, in <module>
    exit_status = int(not main())
  File "/home/stephane/src/github.com/python/cpython/Lib/compileall.py", line 303, in main
    if not compile_file(dest, args.ddir, args.force, args.rx,
  File "/home/stephane/src/github.com/python/cpython/Lib/compileall.py", line 142, in compile_file
    expect = struct.pack('<4sll', importlib.util.MAGIC_NUMBER,
struct.error: 'l' format requires -2147483648 <= number <= 2147483647
msg327748 - (view) Author: Stéphane Wirtel (matrixise) * (Python committer) Date: 2018-10-15 13:09
But until 2038, maybe there will be a new format for the .pyc file.

We should keep this issue and try to fix it for 3.8 or 3.9?
msg327749 - (view) Author: Bernhard M. Wiedemann (bmwiedemann) * Date: 2018-10-15 13:15
It does not need to be fixed tomorrow, but 2037 is too late, because by then there will be a lot of legacy systems around.
(Un)fortunately many systems live 10+ years now
msg327750 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-10-15 13:16
Timestamp with year >= 2038 are accepted: importlib._bootstrap_external._code_to_timestamp_pyc() uses (int(x) & 0xFFFFFFFF). It's not a bug, but by design. compileall should just do the same. Sorry, I don't know if it's specified somewhere, but I know that it's done on purpose.
msg327751 - (view) Author: Stéphane Wirtel (matrixise) * (Python committer) Date: 2018-10-15 13:18
So we need to fix compileall.py.

maybe we could add the label 'easy' to this issue.
msg327753 - (view) Author: Karthikeyan Singaravelan (xtreak) * (Python committer) Date: 2018-10-15 13:29
A reproducer in Python that can be added to test_compileall if needed : 

def test_compile_all_2038(self):
    with open(self.source_path, 'r') as f:
        os.utime(f.name, (2147558400, 2147558400)) # Jan 20, 2038 as touch
    self.assertTrue(compileall.compile_file(pathlib.Path(self.source_path)))


./python.exe -m unittest -v test.test_compileall.CompileallTestsWithSourceEpoch.test_compile_all_2038
test_compile_all_2038 (test.test_compileall.CompileallTestsWithSourceEpoch) ... ERROR

======================================================================
ERROR: test_compile_all_2038 (test.test_compileall.CompileallTestsWithSourceEpoch)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/karthikeyansingaravelan/stuff/python/cpython/Lib/test/test_py_compile.py", line 30, in wrapper
    return fxn(*args, **kwargs)
  File "/Users/karthikeyansingaravelan/stuff/python/cpython/Lib/test/test_compileall.py", line 114, in test_compile_all_2038
    self.assertTrue(compileall.compile_file(pathlib.Path(self.source_path)))
  File "/Users/karthikeyansingaravelan/stuff/python/cpython/Lib/compileall.py", line 142, in compile_file
    expect = struct.pack('<4sll', importlib.util.MAGIC_NUMBER,
struct.error: 'l' format requires -2147483648 <= number <= 2147483647

----------------------------------------------------------------------


Thanks
msg327774 - (view) Author: Karthikeyan Singaravelan (xtreak) * (Python committer) Date: 2018-10-15 19:16
Victor seems there was some discussion about 2038 problem in the original PR but I don't know if it's related to this. Reference : https://github.com/python/cpython/pull/4575#discussion_r153376173

Thanks
msg360270 - (view) Author: Bernhard M. Wiedemann (bmwiedemann) * Date: 2020-01-19 20:27
ping.
Another 19th of January passed.

I'd still like to see progress on this, because this hinders my other y2038 bug discovery work.
msg367679 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-04-29 17:13
I would prefer to mimick importlib._bootstrap_external which uses:

def _pack_uint32(x):
    """Convert a 32-bit integer to little-endian."""
    return (int(x) & 0xFFFFFFFF).to_bytes(4, 'little')

Using 64-bit timestamp (PR 19651), treat timestamp as unsigned (PR 9892 and PR 19708) have drawback:

* 64-bit timestamp make .pyc files larger
* unsigned timestamp no longer support timestamp before 1969 which can cause practical issues

"& 0xFFFFFFFF" looks dead simple, uses a fixed size of 4 bytes and doesn't have any limitation of year 2038.

The timestamp doesn't have to be exact. In practice, it sounds very unlikely that two timestamps are equal when compared using (ts1 & 0xFFFFFFFF) == (ts2 & 0xFFFFFFFF). I expect file modification times to be close by a few days, not separated by 2**32 seconds (136 years).

Use hash based .pyc to avoid any issuse with file modification time: it should make Python more deterministic (more "reproducible").
https://docs.python.org/dev/reference/import.html#pyc-invalidation
History
Date User Action Args
2020-04-29 17:13:16vstinnersetmessages: + msg367679
2020-04-25 03:53:51ammar2setpull_requests: + pull_request19029
2020-04-22 10:07:55ammar2setnosy: + ammar2
pull_requests: + pull_request18977
2020-01-19 20:27:16bmwiedemannsetmessages: + msg360270
versions: + Python 3.5, Python 3.8, Python 3.9
2018-10-15 19:16:38xtreaksetmessages: + msg327774
2018-10-15 18:40:08matrixisesetkeywords: + patch
stage: patch review
pull_requests: + pull_request9256
2018-10-15 13:29:37xtreaksetmessages: + msg327753
2018-10-15 13:18:02matrixisesetmessages: + msg327751
2018-10-15 13:16:45vstinnersetnosy: + vstinner
messages: + msg327750
2018-10-15 13:15:46bmwiedemannsetmessages: + msg327749
2018-10-15 13:09:21matrixisesetmessages: + msg327748
2018-10-15 13:08:30matrixisesetnosy: + matrixise
messages: + msg327747
2018-10-15 12:50:29xtreaksetnosy: + xtreak
2018-10-15 11:22:16bmwiedemanncreate