classification
Title: Use `statx(2)` system call on Linux for extended `os.stat` information
Type: enhancement Stage: patch review
Components: Library (Lib) Versions: Python 3.9
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: christian.heimes, ntninja, scrool, slow franklin, vstinner
Priority: normal Keywords: patch

Created on 2020-02-03 04:26 by ntninja, last changed 2020-05-17 19:40 by christian.heimes.

Pull Requests
URL Status Linked Edit
PR 19125 open ntninja, 2020-03-23 19:43
Messages (4)
msg361265 - (view) Author: (ntninja) * Date: 2020-02-03 04:26
Background: For a long time several Linux filesystems have been tracking two extra bits of information. The file attributes bits[1] and the file creation time (aka crtime aka btime aka birthtime)[2]. Before Linux 4.11 accessing these required secret knowledge (ioctl numbers) or access to unstable interfaces (debugfs). However since that version the statx(2) system call[3] has finally been added (it has a long history), which exposes these two fields adds (struct) space for potentially more.

Since CPython already exposes `st_birthtime` on FreeBSD and friends, I think it would be fair to also expose this field on Linux. As the timestamp value is only available on some file systems and configurations it is not guaranteed that the system call will return a value for btime at all. I suppose the field should be set to `None` in that case. In my opinion it should also become a regular field (available on all platforms) since, with this addition, we now have a suitable value to return on every major platform CPython targets: `stx_btime` on Linux, `st_birthtime` on macOS/FreeBSD and `st_ctime` on Windows.

`stx_attributes` could be exposed as a new `st_attributes` flag specific to Linux as there is no equivalent on other platforms to my knowledge (Window's `st_file_attributes` is similar in some aspects but has a completely different format and content).

There is a Python script I created, that calls statx(2) using ctypes here: https://github.com/ipfs/py-datastore/blob/e566d40a8ca81d8628147e255fe7830b5f928a43/datastore/filesystem/util/statx.py
It may be useful as a reference when implementing this in C.

  [1]: https://man.cx/chattr(1)
  [2]: https://unix.stackexchange.com/a/50184/47938
  [3]: https://man.cx/statx(2)
msg369128 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2020-05-17 15:29
The statx call was introduced by Kernel 4.11 in 2017. Major LTS Linux distributions like Debian 9, Ubuntu 16.04, and CentOS 7 use older Kernels like Linux 4.9 LTS or 3.10 LTS.

In general we try to support older Kernel ABIs even when Python is compiled on a system with more recent ABI. This means you have to perform a runtime feature detection and fall back to old stat when the syscall fails.
msg369149 - (view) Author: (ntninja) * Date: 2020-05-17 19:34
I thought this might be the case, I'll look into adapting the patch accordingly then.
msg369150 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2020-05-17 19:40
You can find an example in Python/bootstrap_hash.c that deals with getrandom syscall.
History
Date User Action Args
2020-05-17 19:40:44christian.heimessetmessages: + msg369150
2020-05-17 19:34:44ntninjasetmessages: + msg369149
2020-05-17 15:29:10christian.heimessetnosy: + christian.heimes
messages: + msg369128
2020-05-17 15:02:02scroolsetnosy: + scrool
2020-04-10 08:37:26slow franklinsetnosy: + slow franklin
2020-03-26 00:21:54vstinnersetnosy: + vstinner

components: + Library (Lib)
versions: + Python 3.9
2020-03-23 19:43:43ntninjasetkeywords: + patch
stage: patch review
pull_requests: + pull_request18486
2020-02-03 04:26:02ntninjacreate