Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

3.0 file.read dreadfully slow #48783

Closed
terryjreedy opened this issue Dec 4, 2008 · 10 comments
Closed

3.0 file.read dreadfully slow #48783

terryjreedy opened this issue Dec 4, 2008 · 10 comments
Labels
extension-modules C modules in the Modules dir interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage stdlib Python modules in the Lib dir

Comments

@terryjreedy
Copy link
Member

BPO 4533
Nosy @terryjreedy, @gpshead, @pitrou, @tiran
Files
  • fileio_buffer.patch
  • fileio_buffer2.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2009-07-02.16:06:30.311>
    created_at = <Date 2008-12-04.18:30:19.453>
    labels = ['extension-modules', 'interpreter-core', 'library', 'performance']
    title = '3.0 file.read dreadfully slow'
    updated_at = <Date 2009-07-02.16:06:30.309>
    user = 'https://github.com/terryjreedy'

    bugs.python.org fields:

    activity = <Date 2009-07-02.16:06:30.309>
    actor = 'pitrou'
    assignee = 'none'
    closed = True
    closed_date = <Date 2009-07-02.16:06:30.311>
    closer = 'pitrou'
    components = ['Extension Modules', 'Interpreter Core', 'Library (Lib)']
    creation = <Date 2008-12-04.18:30:19.453>
    creator = 'terry.reedy'
    dependencies = []
    files = ['12227', '12228']
    hgrepos = []
    issue_num = 4533
    keywords = ['patch']
    message_count = 10.0
    messages = ['76915', '76920', '76934', '76936', '76940', '76944', '76971', '76981', '78098', '90024']
    nosy_count = 4.0
    nosy_names = ['terry.reedy', 'gregory.p.smith', 'pitrou', 'christian.heimes']
    pr_nums = []
    priority = 'critical'
    resolution = 'fixed'
    stage = 'commit review'
    status = 'closed'
    superseder = None
    type = 'performance'
    url = 'https://bugs.python.org/issue4533'
    versions = ['Python 2.6', 'Python 2.7']

    @terryjreedy
    Copy link
    Member Author

    C.l.p poster reported that 3.0 file.read is orders of magnitude slower
    than with 2.5 (but confused issue with buffer = 0). Jerry Hill reported

    "Here's a quick comparison between 2.5 and
    3.0 on a relatively small 17 meg file:

    C:\>c:\Python30\python -m timeit -n 1
    "open('C:\\work\\temp\\bppd_vsub.csv', 'rb').read()"
    1 loops, best of 3: 36.8 sec per loop

    C:\>c:\Python25\python -m timeit -n 1
    "open('C:\\work\\temp\\bppd_vsub.csv', 'rb').read()"
    1 loops, best of 3: 33 msec per loop

    That's 3 orders of magnitude slower on python3.0!"

    I verified this informally on WinXP by opening and then reading
    Doc/Pythonxy.chm (about 4 megs) -- an eye blink versus 3 seconds,
    repeated.  Even the open seemed slower but I did not time it.
    >>> f=open('Doc/Python30.chm','rb')
    >>> d=f.read()

    @terryjreedy terryjreedy added stdlib Python modules in the Lib dir interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage labels Dec 4, 2008
    @tiran
    Copy link
    Member

    tiran commented Dec 4, 2008

    This needs definitely some testing!

    @tiran tiran added extension-modules C modules in the Modules dir release-blocker labels Dec 4, 2008
    @tiran
    Copy link
    Member

    tiran commented Dec 4, 2008

    The small buffer size in Modules/_fileio.c is one reason for the slowness.

    $ dd if=/dev/zero of=zeros bs=1MB count=50
    $ cat testread.py
    open("zeros", "rb").read()
    $ ./python -m cProfile testread.py
             40 function calls (39 primitive calls) in 4.246 CPU seconds

    Ordered by: standard name

    ncalls tottime percall cumtime percall filename:lineno(function)
    1 0.016 0.016 4.246 4.246 <string>:1(<module>)
    1 0.000 0.000 0.000 0.000 io.py:277(new)
    2 0.000 0.000 0.000 0.000 io.py:355(flush)
    2 0.000 0.000 0.000 0.000 io.py:364(close)
    2 0.000 0.000 0.000 0.000 io.py:376(del)
    1 0.000 0.000 0.000 0.000 io.py:413(_checkReadable)
    1 0.000 0.000 0.000 0.000 io.py:614(init)
    2 0.000 0.000 0.000 0.000 io.py:618(close)
    1 0.000 0.000 0.000 0.000 io.py:708(init)
    1 0.000 0.000 0.000 0.000 io.py:733(flush)
    1 0.000 0.000 0.000 0.000 io.py:736(close)
    1 0.000 0.000 0.000 0.000 io.py:755(closed)
    1 0.000 0.000 0.000 0.000 io.py:82(open)
    1 0.000 0.000 0.000 0.000 io.py:896(init)
    2 0.000 0.000 0.000 0.000 io.py:905(_reset_read_buf)
    1 0.021 0.021 4.230 4.230 io.py:909(read)
    1 0.000 0.000 4.209 4.209 io.py:920(_read_unlocked)
    1 0.000 0.000 0.000 0.000 {built-in method
    allocate_lock}
    2/1 0.000 0.000 4.246 4.246 {built-in method exec}

        1    0.000    0.000    0.000    0.000 {built-in method fstat}  
     
        2    0.000    0.000    0.000    0.000 {built-in method
    

    isinstance}
    3 0.000 0.000 0.000 0.000 {built-in method len}

        1    0.000    0.000    0.000    0.000 {method '\_\_enter__' of
    

    '_thread.lock' objects}
    1 0.000 0.000 0.000 0.000 {method 'append' of 'list'
    objects}
    1 0.000 0.000 0.000 0.000 {method 'disable' of
    '_lsprof.Profiler' objects}
    1 0.000 0.000 0.000 0.000 {method 'fileno' of
    '_FileIO' objects}
    1 0.000 0.000 0.000 0.000 {method 'isatty' of
    '_FileIO' objects}
    1 0.825 0.825 0.825 0.825 {method 'join' of 'bytes'
    objects}
    2 3.384 1.692 3.384 1.692 {method 'read' of
    '_FileIO' objects}
    1 0.000 0.000 0.000 0.000 {method 'readable' of
    '_FileIO' objects}

    $ vi Modules/_fileio.c
    -#define DEFAULT_BUFFER_SIZE (8*1024)
    +#define DEFAULT_BUFFER_SIZE (80*1024)
    $ ./python -m cProfile testread.py                 
             40 function calls (39 primitive calls) in 1.273 CPU seconds   
                

    Ordered by: standard name

    ncalls tottime percall cumtime percall filename:lineno(function)
    1 0.019 0.019 1.273 1.273 <string>:1(<module>)
    1 0.000 0.000 0.000 0.000 io.py:277(new)
    2 0.000 0.000 0.000 0.000 io.py:355(flush)
    2 0.000 0.000 0.000 0.000 io.py:364(close)
    2 0.000 0.000 0.000 0.000 io.py:376(del)
    1 0.000 0.000 0.000 0.000 io.py:413(_checkReadable)
    1 0.000 0.000 0.000 0.000 io.py:614(init)
    2 0.000 0.000 0.000 0.000 io.py:618(close)
    1 0.000 0.000 0.000 0.000 io.py:708(init)
    1 0.000 0.000 0.000 0.000 io.py:733(flush)
    1 0.000 0.000 0.000 0.000 io.py:736(close)
    1 0.000 0.000 0.000 0.000 io.py:755(closed)
    1 0.000 0.000 0.000 0.000 io.py:82(open)
    1 0.000 0.000 0.000 0.000 io.py:896(init)
    2 0.000 0.000 0.000 0.000 io.py:905(reset_read_buf)
    1 0.016 0.016 1.254 1.254 io.py:909(read)
    1 0.000 0.000 1.238 1.238 io.py:920(read_unlocked)
    1 0.000 0.000 0.000 0.000 {built-in method
    allocate_lock}
    2/1 0.000 0.000 1.273 1.273 {built-in method exec}
    1 0.000 0.000 0.000 0.000 {built-in method fstat}
    2 0.000 0.000 0.000 0.000 {built-in method isinstance}
    3 0.000 0.000 0.000 0.000 {built-in method len}
    1 0.000 0.000 0.000 0.000 {method '__enter
    ' of
    '_thread.lock' objects}
    1 0.000 0.000 0.000 0.000 {method 'append' of 'list'
    objects}
    1 0.000 0.000 0.000 0.000 {method 'disable' of
    '_lsprof.Profiler' objects}
    1 0.000 0.000 0.000 0.000 {method 'fileno' of
    '_FileIO' objects}
    1 0.000 0.000 0.000 0.000 {method 'isatty' of
    '_FileIO' objects}
    1 1.156 1.156 1.156 1.156 {method 'join' of 'bytes'
    objects}
    2 0.081 0.041 0.081 0.041 {method 'read' of
    '_FileIO' objects}
    1 0.000 0.000 0.000 0.000 {method 'readable' of
    '_FileIO' objects}

    @tiran
    Copy link
    Member

    tiran commented Dec 4, 2008

    The fileio_buffer.patch implements the same progressive buffer as Python
    2.x' Object/fileobject.c.

    @gpshead
    Copy link
    Member

    gpshead commented Dec 4, 2008

    patch looks good to me.

    nitpick comments: use += instead of = and + in:

    newsize = newsize + newsize
     and
    newsize = newsize + BIGCHUNK.

    As for the XXX about overflow, so long as BUFSIZ is not defined to be an
    insanely large number (it should never be) this will be fine. add a
    preprocessor test for that in.

    #if (BUFSIZ >= 2**30)
    #error "unreasonable BUFSIZ defined"
    #endif

    @tiran
    Copy link
    Member

    tiran commented Dec 4, 2008

    The preprocessor doesn't handle power. 2 << 24 (64MB) sounds sufficient
    for me.

    @gpshead
    Copy link
    Member

    gpshead commented Dec 5, 2008

    fileio_buffer2.patch looks good other than minor touchups:

    Turn the XXX comment into:

    /* NOTE: overflow impossible due to limits on BUFSIZ *

    Also, 2 << 24 is 32MB yet your error message test says >= 64MB. I think
    you meant 1 << 26.

    fix those and commit. :)

    @tiran
    Copy link
    Member

    tiran commented Dec 5, 2008

    The updated patch has been committed to 3.0 and 3.1. I'm going to
    backport the patch to 2.x later.

    @loewis loewis mannequin added deferred-blocker and removed release-blocker labels Dec 10, 2008
    @loewis loewis mannequin added release-blocker and removed deferred-blocker labels Dec 20, 2008
    @pitrou
    Copy link
    Member

    pitrou commented Dec 20, 2008

    Since it is solved for 3.x and only needs to be bacported to 2.x (where
    the "io" module isn't the default), downgrading to critical.

    @pitrou
    Copy link
    Member

    pitrou commented Jul 2, 2009

    This has been fixed as part of the big IO update in trunk. I assume
    nobody really cares about making a separate patch for 2.6, please
    re-open if you are interested!

    @pitrou pitrou closed this as completed Jul 2, 2009
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    extension-modules C modules in the Modules dir interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage stdlib Python modules in the Lib dir
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants