classification
Title: Open a file in text mode requires too many syscalls
Type: performance Stage: resolved
Components: IO Versions: Python 3.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: haypo, pitrou, serhiy.storchaka
Priority: normal Keywords:

Created on 2017-05-02 11:11 by haypo, last changed 2017-09-16 01:29 by haypo. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 1384 merged haypo, 2017-05-02 11:31
PR 1385 closed haypo, 2017-05-02 12:06
Messages (6)
msg292744 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2017-05-02 11:11
Example:

with open("x", "w", encoding="utf-8") as fp:
    fp.write("HERE")
    fp.close()

syscalls:

14249 open("x", O_WRONLY|O_CREAT|O_TRUNC|O_CLOEXEC, 0666) = 3
14249 fstat(3, {st_mode=S_IFREG|0664, st_size=0, ...}) = 0
14249 ioctl(3, TCGETS, 0x7fff07d43330)  = -1 ENOTTY (Inappropriate ioctl for device)
14249 lseek(3, 0, SEEK_CUR)             = 0
14249 lseek(3, 0, SEEK_CUR)             = 0
14249 lseek(3, 0, SEEK_CUR)             = 0
14249 write(3, "HERE", 4)               = 4
14249 close(3)                          = 0

I only expected 3 syscalls: open, write, close.

* fstat() is used by the FileIO constructor to check if the file is a directory or not, and to get the block size
* ioctl() comes from open() which checks if the file is a TTY or not, to decide how to configure buffering
* the first lseek() is used by the BuffererWriter constructor to initialize the private abs_pos attribute
* the second lseek() is used by the TextIOWrapper constructor to check if the underlying file object (buffered writer) is seekable or not
* the last lseek() is used to create the cookie object in TextIOWrapper constructor

Can we maybe reduce the number of lseek() to a single syscall?

For example, BuffererWriter constructor calls FileIO.tell(): can't this method set the seekable attribute depending on lseek() success, as the FileIO.seekable property?
msg292752 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2017-05-02 13:10
New changeset 999707373630ce090300c3c542066f493b12faa0 by Victor Stinner in branch 'master':
bpo-30228: FileIO seek() and tell() set seekable (#1384)
https://github.com/python/cpython/commit/999707373630ce090300c3c542066f493b12faa0
msg292759 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-05-02 13:56
I don't like PR 1385. abs_pos is a private attribute used only in _io._Buffered.seek() for readable streams when whence is SEEK_SET or SEEK_CUR. There is no guarantee that it contains relevant value for non-readable stream.

You could call buffer.seek(0, SEEK_CUR) rather than buffer.tell() for avoiding a system call for readable stream. But this looks as a shamanism too.

Or provide a function similar to the RAW_TELL macro but just checking if the current position is 0. If define it in bufferedio.c near _buffered_raw_tell() it is more chance that it is consistent with abs_pos and future changes don't break it.
msg292881 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2017-05-03 10:32
> You could call buffer.seek(0, SEEK_CUR) rather than buffer.tell() for avoiding a system call for readable stream. But this looks as a shamanism too.

Note: Buffered.seek(0, SEEK_CUR) only has a fast-path for readable file: it cannot be used to optimize open(filename, "w") (BufferedWriter.seek() isn't optimized).
msg292882 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2017-05-03 10:33
> Or provide a function similar to the RAW_TELL macro but just checking if the current position is 0.

I will try to implement such function and use it in textio.c.
msg302307 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2017-09-16 01:25
Microbenchmark on Fedora 26 for https://github.com/python/cpython/pull/1385

Working directly uses ext4, the filesystem operations are likely cached in memory, so syscalls should be very fast.

$ ./python -m perf timeit --inherit=PYTHONPATH 'open("x.txt", "w").close()' -o open_ref.json -v
$ ./python -m perf timeit --inherit=PYTHONPATH 'open("x.txt", "w").close()' -o open_patch.json -v
$ ./python -m perf compare_to open_ref.json open_patch.json 
Mean +- std dev: [open_ref] 18.6 us +- 0.2 us -> [open_patch] 18.2 us +- 0.2 us: 1.02x faster (-2%)


Microbenchmark using a btrfs filesystem mounted on NFS over wifi: not significant!

$ ./python -m perf timeit --inherit=PYTHONPATH 'open("nfs/x.txt", "w").close()' --append open_patch.json -v
$ ./python -m perf timeit --inherit=PYTHONPATH 'open("nfs/x.txt", "w").close()' --append open_patch.json -v
haypo@selma$ ./python -m perf compare_to open_ref.json open_patch.json  -v
Mean +- std dev: [open_ref] 17.8 ms +- 1.0 ms -> [open_patch] 17.8 ms +- 1.0 ms: 1.00x faster (-0%)
Not significant!

Note: open().close() is 1000x slower over NFS!

According to strace, on NFS, open() and close() are slow, but syscalls in the middle are as fast as syscalls on a local filesystem.

Well, it's hard to see a significant speedup, even on NFS. So I abandon my change.
History
Date User Action Args
2017-09-16 01:29:24hayposetresolution: rejected -> fixed
2017-09-16 01:28:23hayposetstatus: open -> closed
resolution: rejected
stage: resolved
2017-09-16 01:25:31hayposetmessages: + msg302307
2017-05-03 10:33:51hayposetmessages: + msg292882
2017-05-03 10:32:43hayposetmessages: + msg292881
2017-05-02 13:56:48serhiy.storchakasetmessages: + msg292759
2017-05-02 13:10:41hayposetmessages: + msg292752
2017-05-02 12:17:20hayposetcomponents: + IO
2017-05-02 12:06:25hayposetpull_requests: + pull_request1492
2017-05-02 11:31:28hayposetpull_requests: + pull_request1491
2017-05-02 11:11:25haypocreate