New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open a file in text mode requires too many syscalls #74414
Comments
Example: with open("x", "w", encoding="utf-8") as fp:
fp.write("HERE")
fp.close() syscalls: 14249 open("x", O_WRONLY|O_CREAT|O_TRUNC|O_CLOEXEC, 0666) = 3 I only expected 3 syscalls: open, write, close.
Can we maybe reduce the number of lseek() to a single syscall? For example, BuffererWriter constructor calls FileIO.tell(): can't this method set the seekable attribute depending on lseek() success, as the FileIO.seekable property? |
I don't like PR 1385. abs_pos is a private attribute used only in _io._Buffered.seek() for readable streams when whence is SEEK_SET or SEEK_CUR. There is no guarantee that it contains relevant value for non-readable stream. You could call buffer.seek(0, SEEK_CUR) rather than buffer.tell() for avoiding a system call for readable stream. But this looks as a shamanism too. Or provide a function similar to the RAW_TELL macro but just checking if the current position is 0. If define it in bufferedio.c near _buffered_raw_tell() it is more chance that it is consistent with abs_pos and future changes don't break it. |
Note: Buffered.seek(0, SEEK_CUR) only has a fast-path for readable file: it cannot be used to optimize open(filename, "w") (BufferedWriter.seek() isn't optimized). |
I will try to implement such function and use it in textio.c. |
Microbenchmark on Fedora 26 for #1385 Working directly uses ext4, the filesystem operations are likely cached in memory, so syscalls should be very fast. $ ./python -m perf timeit --inherit=PYTHONPATH 'open("x.txt", "w").close()' -o open_ref.json -v
$ ./python -m perf timeit --inherit=PYTHONPATH 'open("x.txt", "w").close()' -o open_patch.json -v
$ ./python -m perf compare_to open_ref.json open_patch.json
Mean +- std dev: [open_ref] 18.6 us +- 0.2 us -> [open_patch] 18.2 us +- 0.2 us: 1.02x faster (-2%) Microbenchmark using a btrfs filesystem mounted on NFS over wifi: not significant! $ ./python -m perf timeit --inherit=PYTHONPATH 'open("nfs/x.txt", "w").close()' --append open_patch.json -v
$ ./python -m perf timeit --inherit=PYTHONPATH 'open("nfs/x.txt", "w").close()' --append open_patch.json -v
haypo@selma$ ./python -m perf compare_to open_ref.json open_patch.json -v
Mean +- std dev: [open_ref] 17.8 ms +- 1.0 ms -> [open_patch] 17.8 ms +- 1.0 ms: 1.00x faster (-0%)
Not significant! Note: open().close() is 1000x slower over NFS! According to strace, on NFS, open() and close() are slow, but syscalls in the middle are as fast as syscalls on a local filesystem. Well, it's hard to see a significant speedup, even on NFS. So I abandon my change. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: