classification
Title: 3.0 file.read dreadfully slow
Type: performance Stage: commit review
Components: Extension Modules, Interpreter Core, Library (Lib) Versions: Python 2.7, Python 2.6
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: christian.heimes, gregory.p.smith, pitrou, terry.reedy
Priority: critical Keywords: patch

Created on 2008-12-04 18:30 by terry.reedy, last changed 2009-07-02 16:06 by pitrou. This issue is now closed.

Files
File name Uploaded Description Edit
fileio_buffer.patch christian.heimes, 2008-12-04 22:35
fileio_buffer2.patch christian.heimes, 2008-12-04 23:37
Messages (10)
msg76915 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2008-12-04 18:30
C.l.p poster reported that 3.0 file.read is orders of magnitude slower
than with 2.5 (but confused issue with buffer = 0).  Jerry Hill reported

"Here's a quick comparison between 2.5 and
3.0 on a relatively small 17 meg file:

C:\>c:\Python30\python -m timeit -n 1
"open('C:\\work\\temp\\bppd_vsub.csv', 'rb').read()"
1 loops, best of 3: 36.8 sec per loop

C:\>c:\Python25\python -m timeit -n 1
"open('C:\\work\\temp\\bppd_vsub.csv', 'rb').read()"
1 loops, best of 3: 33 msec per loop

That's 3 orders of magnitude slower on python3.0!"

I verified this informally on WinXP by opening and then reading
Doc/Pythonxy.chm (about 4 megs) -- an eye blink versus 3 seconds,
repeated.  Even the open seemed slower but I did not time it.
>>> f=open('Doc/Python30.chm','rb')
>>> d=f.read()
msg76920 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2008-12-04 20:21
This needs definitely some testing!
msg76934 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2008-12-04 21:56
The small buffer size in Modules/_fileio.c is one reason for the slowness.

$ dd if=/dev/zero of=zeros bs=1MB count=50
$ cat testread.py
open("zeros", "rb").read()
$ ./python -m cProfile testread.py
         40 function calls (39 primitive calls) in 4.246 CPU seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.016    0.016    4.246    4.246 <string>:1(<module>)     
        1    0.000    0.000    0.000    0.000 io.py:277(__new__)       
        2    0.000    0.000    0.000    0.000 io.py:355(flush)         
        2    0.000    0.000    0.000    0.000 io.py:364(close)         
        2    0.000    0.000    0.000    0.000 io.py:376(__del__)       
        1    0.000    0.000    0.000    0.000 io.py:413(_checkReadable)
        1    0.000    0.000    0.000    0.000 io.py:614(__init__)      
        2    0.000    0.000    0.000    0.000 io.py:618(close)         
        1    0.000    0.000    0.000    0.000 io.py:708(__init__)      
        1    0.000    0.000    0.000    0.000 io.py:733(flush)         
        1    0.000    0.000    0.000    0.000 io.py:736(close)         
        1    0.000    0.000    0.000    0.000 io.py:755(closed)        
        1    0.000    0.000    0.000    0.000 io.py:82(open)           
        1    0.000    0.000    0.000    0.000 io.py:896(__init__)      
        2    0.000    0.000    0.000    0.000 io.py:905(_reset_read_buf)
        1    0.021    0.021    4.230    4.230 io.py:909(read)           
        1    0.000    0.000    4.209    4.209 io.py:920(_read_unlocked) 
        1    0.000    0.000    0.000    0.000 {built-in method
allocate_lock}
      2/1    0.000    0.000    4.246    4.246 {built-in method exec}   
     
        1    0.000    0.000    0.000    0.000 {built-in method fstat}  
     
        2    0.000    0.000    0.000    0.000 {built-in method
isinstance}   
        3    0.000    0.000    0.000    0.000 {built-in method len}    
     
        1    0.000    0.000    0.000    0.000 {method '__enter__' of
'_thread.lock' objects}
        1    0.000    0.000    0.000    0.000 {method 'append' of 'list'
objects}           
        1    0.000    0.000    0.000    0.000 {method 'disable' of
'_lsprof.Profiler' objects}
        1    0.000    0.000    0.000    0.000 {method 'fileno' of
'_FileIO' objects}          
        1    0.000    0.000    0.000    0.000 {method 'isatty' of
'_FileIO' objects}          
        1    0.825    0.825    0.825    0.825 {method 'join' of 'bytes'
objects}              
        2    3.384    1.692    3.384    1.692 {method 'read' of
'_FileIO' objects}            
        1    0.000    0.000    0.000    0.000 {method 'readable' of
'_FileIO' objects} 

$ vi Modules/_fileio.c
-#define DEFAULT_BUFFER_SIZE (8*1024)
+#define DEFAULT_BUFFER_SIZE (80*1024)
$ ./python -m cProfile testread.py                 
         40 function calls (39 primitive calls) in 1.273 CPU seconds   
            

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.019    0.019    1.273    1.273 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 io.py:277(__new__)
        2    0.000    0.000    0.000    0.000 io.py:355(flush)
        2    0.000    0.000    0.000    0.000 io.py:364(close)
        2    0.000    0.000    0.000    0.000 io.py:376(__del__)
        1    0.000    0.000    0.000    0.000 io.py:413(_checkReadable)
        1    0.000    0.000    0.000    0.000 io.py:614(__init__)
        2    0.000    0.000    0.000    0.000 io.py:618(close)
        1    0.000    0.000    0.000    0.000 io.py:708(__init__)
        1    0.000    0.000    0.000    0.000 io.py:733(flush)
        1    0.000    0.000    0.000    0.000 io.py:736(close)
        1    0.000    0.000    0.000    0.000 io.py:755(closed)
        1    0.000    0.000    0.000    0.000 io.py:82(open)
        1    0.000    0.000    0.000    0.000 io.py:896(__init__)
        2    0.000    0.000    0.000    0.000 io.py:905(_reset_read_buf)
        1    0.016    0.016    1.254    1.254 io.py:909(read)
        1    0.000    0.000    1.238    1.238 io.py:920(_read_unlocked)
        1    0.000    0.000    0.000    0.000 {built-in method
allocate_lock}
      2/1    0.000    0.000    1.273    1.273 {built-in method exec}
        1    0.000    0.000    0.000    0.000 {built-in method fstat}
        2    0.000    0.000    0.000    0.000 {built-in method isinstance}
        3    0.000    0.000    0.000    0.000 {built-in method len}
        1    0.000    0.000    0.000    0.000 {method '__enter__' of
'_thread.lock' objects}
        1    0.000    0.000    0.000    0.000 {method 'append' of 'list'
objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of
'_lsprof.Profiler' objects}
        1    0.000    0.000    0.000    0.000 {method 'fileno' of
'_FileIO' objects}
        1    0.000    0.000    0.000    0.000 {method 'isatty' of
'_FileIO' objects}
        1    1.156    1.156    1.156    1.156 {method 'join' of 'bytes'
objects}
        2    0.081    0.041    0.081    0.041 {method 'read' of
'_FileIO' objects}
        1    0.000    0.000    0.000    0.000 {method 'readable' of
'_FileIO' objects}
msg76936 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2008-12-04 22:35
The fileio_buffer.patch implements the same progressive buffer as Python
2.x' Object/fileobject.c.
msg76940 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2008-12-04 22:50
patch looks good to me.

nitpick comments: use += instead of = and + in:

newsize = newsize + newsize
 and
newsize = newsize + BIGCHUNK.

As for the XXX about overflow, so long as BUFSIZ is not defined to be an
insanely large number (it should never be) this will be fine.  add a
preprocessor test for that in.

#if (BUFSIZ >= 2**30)
#error "unreasonable BUFSIZ defined"
#endif
msg76944 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2008-12-04 23:37
The preprocessor doesn't handle power. 2 << 24 (64MB) sounds sufficient
for me.
msg76971 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2008-12-05 07:43
fileio_buffer2.patch looks good other than minor touchups:

Turn the XXX comment into:

/* NOTE: overflow impossible due to limits on BUFSIZ *

Also, 2 << 24 is 32MB yet your error message test says >= 64MB.  I think
you meant 1 << 26.

fix those and commit. :)
msg76981 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2008-12-05 08:33
The updated patch has been committed to 3.0 and 3.1. I'm going to
backport the patch to 2.x later.
msg78098 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008-12-20 12:48
Since it is solved for 3.x and only needs to be bacported to 2.x (where
the "io" module isn't the default), downgrading to critical.
msg90024 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-07-02 16:06
This has been fixed as part of the big IO update in trunk. I assume
nobody really cares about making a separate patch for 2.6, please
re-open if you are interested!
History
Date User Action Args
2009-07-02 16:06:30pitrousetstatus: open -> closed
resolution: accepted -> fixed
messages: + msg90024
2008-12-20 12:48:44pitrousetpriority: release blocker -> critical
nosy: + pitrou
messages: + msg78098
2008-12-20 02:40:30loewissetpriority: deferred blocker -> release blocker
2008-12-10 08:22:55loewissetpriority: release blocker -> deferred blocker
2008-12-05 08:33:25christian.heimessetresolution: accepted
stage: patch review -> commit review
messages: + msg76981
versions: + Python 2.6, Python 2.7, - Python 3.0, Python 3.1
2008-12-05 07:43:31gregory.p.smithsetmessages: + msg76971
2008-12-04 23:37:50christian.heimessetfiles: + fileio_buffer2.patch
messages: + msg76944
2008-12-04 22:50:34gregory.p.smithsetnosy: + gregory.p.smith
messages: + msg76940
2008-12-04 22:35:12christian.heimessetfiles: + fileio_buffer.patch
keywords: + patch
messages: + msg76936
stage: test needed -> patch review
2008-12-04 21:56:20christian.heimessetmessages: + msg76934
2008-12-04 20:21:30christian.heimessetversions: + Python 3.1
nosy: + christian.heimes
messages: + msg76920
priority: release blocker
components: + Extension Modules
stage: test needed
2008-12-04 18:30:19terry.reedycreate