Title: Prevent extraneous fstat during open()
Type: resource usage Stage: resolved
Components: IO Versions: Python 3.5
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: benjamin.peterson, bkabrda, josh.r, pitrou, python-dev, serhiy.storchaka, stutzbach
Priority: normal Keywords: patch

Created on 2014-06-06 11:48 by bkabrda, last changed 2014-06-30 00:13 by pitrou. This issue is now closed.

File name Uploaded Description Edit
python3-remove-extraneous-fstat-on-file-open.patch bkabrda, 2014-06-06 11:48 review
python3-remove-extraneous-fstat-on-file-open-v2.patch bkabrda, 2014-06-09 08:12 review
python3-remove-extraneous-fstat-on-file-open-v3.patch bkabrda, 2014-06-10 08:38 review
python3-remove-extraneous-fstat-on-file-open-v4.patch bkabrda, 2014-06-20 07:47 review
Messages (9)
msg219874 - (view) Author: Bohuslav "Slavek" Kabrda (bkabrda) * Date: 2014-06-06 11:48
with Python 3.3/3.4, I noticed that there are lots of syscalls on open() - I noticed 2x fstat, 2x ioctl and 2x lseek. This is not noticable when working with small amounts of files on local filesystem, but if working with files via NSF or if working with huge amounts of files, lots of syscalls cost a lot of time. Therefore I'd like to create patches that would reduce the number of syscalls on open().
I've already managed to create first one (attached), that gets rid of one of the fstat calls (all the information are obtained from one fstat call).
I hope this makes sense and that the patch is acceptable. If not, I'll be happy to work on it to make it better. (This is my first real patch for C part of Python, so I hope I did everything right...)
msg220084 - (view) Author: Bohuslav "Slavek" Kabrda (bkabrda) * Date: 2014-06-09 08:12
Thanks a lot for the code review! I'm attaching a revised version of the patch. Fixes I made:

- added check whether PyLong_AsLong returned an error
- removed "ADD_INTERNED(_blksize)" and "PyObject *_PyIO_str__blksize;" - I noticed that these are only necessary when exported by _iomodule.h, which isn't needed for _blksize ATM
- moved blksize to a place of fileio structure where it won't create unnecessary padding

I hope attaching the version 2 of the patch here is ok, if I should have attached it in the code review tool somehow, please let me know.
msg220152 - (view) Author: Bohuslav "Slavek" Kabrda (bkabrda) * Date: 2014-06-10 08:38
Again, thanks for the review. It's true that HAVE_FSTAT can be defined without stat structure containing st_blksize. I added an ifdef HAVE_STRUCT_STAT_ST_BLKSIZE for that. Attaching third version of the patch, hopefully everything will be ok now.
msg220706 - (view) Author: Bohuslav "Slavek" Kabrda (bkabrda) * Date: 2014-06-16 09:40
So, as pointed out by haypo, blksize_t is actually signed long on Linux. This means that my patch (as well as the current code) is not right.
Both with and without my patch, io_open function uses "int" to store blksize_t and it also passes it to one of PyBuffered{Random,Reader,Writer}_Type. These three however use Py_ssize_t, which is inconsistent with io_open, and it's not correct, too.
I'm unsure how to proceed here - should I fix buffer size types throughout the _io module to long and submit one big patch? It doesn't feel right to put two not-very-related changes into one patch (I'm afraid that changing buffer sizes to long everywhere will result in a rather large patch). Or should I first submit a patch that fixes the long usage and then rebase the patch attached to this issue on top of it?

Thanks a lot.
msg220752 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-06-16 20:16
I think it's ok to keep the block size an int for now. It would be surprising for a block size to be more than 2 GB, IMO.
msg220810 - (view) Author: Bohuslav "Slavek" Kabrda (bkabrda) * Date: 2014-06-17 08:08
Thanks, Antoine. So, is there anything else that should be done about the patch so that it gets accepted?
msg221071 - (view) Author: Bohuslav "Slavek" Kabrda (bkabrda) * Date: 2014-06-20 07:47
I'm attaching fourth version of the patch. Changes:
- fileio's _blksize member is now of type T_UINT
- extended test_fileio.AutoFileTests.testAttributes to also test _blksize value
msg221923 - (view) Author: Roundup Robot (python-dev) Date: 2014-06-30 00:12
New changeset 3b5279b5bfd1 by Antoine Pitrou in branch 'default':
Issue #21679: Prevent extraneous fstat() calls during open().  Patch by Bohuslav Kabrda.
msg221925 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-06-30 00:13
Thank you very much. I've committed the patch to the default branch (I've just moved the _blksize test to a separate method).
Date User Action Args
2014-06-30 00:13:29pitrousetstatus: open -> closed
versions: - Python 3.4
messages: + msg221925

resolution: fixed
stage: resolved
2014-06-30 00:12:18python-devsetnosy: + python-dev
messages: + msg221923
2014-06-20 07:47:30bkabrdasetfiles: + python3-remove-extraneous-fstat-on-file-open-v4.patch

messages: + msg221071
2014-06-17 08:08:38bkabrdasetmessages: + msg220810
2014-06-16 21:44:59josh.rsetnosy: + josh.r
2014-06-16 20:16:12pitrousetmessages: + msg220752
2014-06-16 09:40:29bkabrdasetmessages: + msg220706
2014-06-10 08:38:19bkabrdasetfiles: + python3-remove-extraneous-fstat-on-file-open-v3.patch

messages: + msg220152
2014-06-09 08:12:08bkabrdasetfiles: + python3-remove-extraneous-fstat-on-file-open-v2.patch

messages: + msg220084
2014-06-06 17:46:05serhiy.storchakasetnosy: + pitrou, benjamin.peterson, stutzbach, serhiy.storchaka
2014-06-06 11:48:05bkabrdacreate