Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent extraneous fstat during open() #65878

Closed
bkabrda mannequin opened this issue Jun 6, 2014 · 10 comments
Closed

Prevent extraneous fstat during open() #65878

bkabrda mannequin opened this issue Jun 6, 2014 · 10 comments
Labels
performance Performance or resource usage topic-IO

Comments

@bkabrda
Copy link
Mannequin

bkabrda mannequin commented Jun 6, 2014

BPO 21679
Nosy @pitrou, @benjaminp, @vadmium, @serhiy-storchaka, @MojoVampire
Files
  • python3-remove-extraneous-fstat-on-file-open.patch
  • python3-remove-extraneous-fstat-on-file-open-v2.patch
  • python3-remove-extraneous-fstat-on-file-open-v3.patch
  • python3-remove-extraneous-fstat-on-file-open-v4.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2014-06-30.00:13:29.653>
    created_at = <Date 2014-06-06.11:48:05.445>
    labels = ['expert-IO', 'performance']
    title = 'Prevent extraneous fstat during open()'
    updated_at = <Date 2015-11-24.08:37:58.383>
    user = 'https://bugs.python.org/bkabrda'

    bugs.python.org fields:

    activity = <Date 2015-11-24.08:37:58.383>
    actor = 'martin.panter'
    assignee = 'none'
    closed = True
    closed_date = <Date 2014-06-30.00:13:29.653>
    closer = 'pitrou'
    components = ['IO']
    creation = <Date 2014-06-06.11:48:05.445>
    creator = 'bkabrda'
    dependencies = []
    files = ['35494', '35540', '35548', '35702']
    hgrepos = []
    issue_num = 21679
    keywords = ['patch']
    message_count = 10.0
    messages = ['219874', '220084', '220152', '220706', '220752', '220810', '221071', '221923', '221925', '255253']
    nosy_count = 8.0
    nosy_names = ['pitrou', 'benjamin.peterson', 'stutzbach', 'python-dev', 'martin.panter', 'serhiy.storchaka', 'bkabrda', 'josh.r']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'resource usage'
    url = 'https://bugs.python.org/issue21679'
    versions = ['Python 3.5']

    @bkabrda
    Copy link
    Mannequin Author

    bkabrda mannequin commented Jun 6, 2014

    Hi,
    with Python 3.3/3.4, I noticed that there are lots of syscalls on open() - I noticed 2x fstat, 2x ioctl and 2x lseek. This is not noticable when working with small amounts of files on local filesystem, but if working with files via NSF or if working with huge amounts of files, lots of syscalls cost a lot of time. Therefore I'd like to create patches that would reduce the number of syscalls on open().
    I've already managed to create first one (attached), that gets rid of one of the fstat calls (all the information are obtained from one fstat call).
    I hope this makes sense and that the patch is acceptable. If not, I'll be happy to work on it to make it better. (This is my first real patch for C part of Python, so I hope I did everything right...)

    @bkabrda bkabrda mannequin added topic-IO performance Performance or resource usage labels Jun 6, 2014
    @bkabrda
    Copy link
    Mannequin Author

    bkabrda mannequin commented Jun 9, 2014

    Thanks a lot for the code review! I'm attaching a revised version of the patch. Fixes I made:

    • added check whether PyLong_AsLong returned an error
    • removed "ADD_INTERNED(_blksize)" and "PyObject *_PyIO_str__blksize;" - I noticed that these are only necessary when exported by _iomodule.h, which isn't needed for _blksize ATM
    • moved blksize to a place of fileio structure where it won't create unnecessary padding

    I hope attaching the version 2 of the patch here is ok, if I should have attached it in the code review tool somehow, please let me know.

    @bkabrda
    Copy link
    Mannequin Author

    bkabrda mannequin commented Jun 10, 2014

    Again, thanks for the review. It's true that HAVE_FSTAT can be defined without stat structure containing st_blksize. I added an ifdef HAVE_STRUCT_STAT_ST_BLKSIZE for that. Attaching third version of the patch, hopefully everything will be ok now.

    @bkabrda
    Copy link
    Mannequin Author

    bkabrda mannequin commented Jun 16, 2014

    So, as pointed out by haypo, blksize_t is actually signed long on Linux. This means that my patch (as well as the current code) is not right.
    Both with and without my patch, io_open function uses "int" to store blksize_t and it also passes it to one of PyBuffered{Random,Reader,Writer}_Type. These three however use Py_ssize_t, which is inconsistent with io_open, and it's not correct, too.
    I'm unsure how to proceed here - should I fix buffer size types throughout the _io module to long and submit one big patch? It doesn't feel right to put two not-very-related changes into one patch (I'm afraid that changing buffer sizes to long everywhere will result in a rather large patch). Or should I first submit a patch that fixes the long usage and then rebase the patch attached to this issue on top of it?

    Thanks a lot.

    @pitrou
    Copy link
    Member

    pitrou commented Jun 16, 2014

    I think it's ok to keep the block size an int for now. It would be surprising for a block size to be more than 2 GB, IMO.

    @bkabrda
    Copy link
    Mannequin Author

    bkabrda mannequin commented Jun 17, 2014

    Thanks, Antoine. So, is there anything else that should be done about the patch so that it gets accepted?

    @bkabrda
    Copy link
    Mannequin Author

    bkabrda mannequin commented Jun 20, 2014

    I'm attaching fourth version of the patch. Changes:

    • fileio's _blksize member is now of type T_UINT
    • extended test_fileio.AutoFileTests.testAttributes to also test _blksize value

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Jun 30, 2014

    New changeset 3b5279b5bfd1 by Antoine Pitrou in branch 'default':
    Issue bpo-21679: Prevent extraneous fstat() calls during open(). Patch by Bohuslav Kabrda.
    http://hg.python.org/cpython/rev/3b5279b5bfd1

    @pitrou
    Copy link
    Member

    pitrou commented Jun 30, 2014

    Thank you very much. I've committed the patch to the default branch (I've just moved the _blksize test to a separate method).

    @pitrou pitrou closed this as completed Jun 30, 2014
    @vadmium
    Copy link
    Member

    vadmium commented Nov 24, 2015

    The change made here no longer tolerates fstat() returning an error, which seems to be the cause of bpo-25717.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    performance Performance or resource usage topic-IO
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants