This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author april
Recipients
Date 2005-09-04.17:53:43
SpamBayes Score
Marked as misclassified
Message-id
In-reply-to
Content
See bug 849046 for history.  This patch passes both the
regression test and the standard test.  Hopefully the
extra information below won't be too difficult to read.
 I can attach this info to the bug, if need be.

Fixed:
  - Add self.min_readsize to __init__.
    Follows the principal that lines are likely to be
the same length in size,
    and doesn't start over at a minimum length string
every call to readline()
  - Rewriting of assignment for readsize and size at
the beginning of function.
    Eliminates almost all calls to min()
  - Change bufs to a string, and not an array.  No
point in using an array when
    all you do with it is "".join(bufs).  Uses string
addition instead.
  - Remove extra assignments to bufs (in return())
  - Changes readline() to be much more readable (loop
reordering, more comments)

Recommendations:
  - Delete _unread() function.  It is used _only_ by
readline(), and moving its
    functionality into readline() itself saves the
function call overhead.
    _unread() is only 3 lines long.  Testing shows that
removing it speeds
    readline() up by about 3%.  Backwards compatibility
concerns?

Testing results:
test_append (__main__.TestGzip) ... ok
test_many_append (__main__.TestGzip) ... ok
test_mode (__main__.TestGzip) ... ok
test_read (__main__.TestGzip) ... ok
test_readline (__main__.TestGzip) ... ok
test_readlines (__main__.TestGzip) ... ok
test_seek_read (__main__.TestGzip) ... ok
test_seek_write (__main__.TestGzip) ... ok
test_write (__main__.TestGzip) ... ok

----------------------------------------------------------------------
Ran 9 tests in 0.331s

Regression tests:
python regrtest.py -g test_gzip.py
test_gzip
1 test OK.

---

Profiling Results (performed on a common compressed log
file - 200748 lines).

With patch...

         1213961 function calls in 12.188 CPU seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall
filename:lineno(function)
        1    0.000    0.000    0.000    0.000 :0(close)
     1159    0.020    0.000    0.020    0.000 :0(crc32)
     1158    0.100    0.000    0.100    0.000
:0(decompress)
        1    0.000    0.000    0.000    0.000
:0(decompressobj)
   200774    0.812    0.000    0.812    0.000 :0(find)
   403865    0.902    0.000    0.902    0.000 :0(len)
     1183    0.000    0.000    0.000    0.000 :0(min)
        2    0.000    0.000    0.000    0.000 :0(ord)
     1173    0.000    0.000    0.000    0.000 :0(read)
       12    0.000    0.000    0.000    0.000 :0(seek)
        1    0.000    0.000    0.000    0.000
:0(setprofile)
       18    0.000    0.000    0.000    0.000 :0(tell)
        2    0.000    0.000    0.000    0.000 :0(unpack)
        1    0.000    0.000   12.188   12.188 <string>:1(?)
        1    0.000    0.000    0.000    0.000
gzip_new.py:156(_init_read)
        1    0.000    0.000    0.000    0.000
gzip_new.py:160(_read_gzip_header)
        3    0.000    0.000    0.000    0.000
gzip_new.py:18(U32)
   200774    2.453    0.000    2.593    0.000
gzip_new.py:207(read)
   200749    2.894    0.000    3.796    0.000
gzip_new.py:239(_unread)
     1166    0.010    0.000    0.140    0.000
gzip_new.py:244(_read)
        1    0.000    0.000    0.000    0.000
gzip_new.py:27(LOWU32)
     1158    0.010    0.000    0.030    0.000
gzip_new.py:294(_add_read_data)
        1    0.000    0.000    0.000    0.000
gzip_new.py:300(_read_eof)
        1    0.000    0.000    0.000    0.000
gzip_new.py:314(close)
        1    0.000    0.000    0.000    0.000
gzip_new.py:327(__del__)
   200749    3.916    0.000   11.117    0.000
gzip_new.py:384(readline)
        2    0.000    0.000    0.000    0.000
gzip_new.py:39(read32)
        1    0.000    0.000    0.000    0.000
gzip_new.py:42(open)
        1    0.000    0.000    0.000    0.000
gzip_new.py:60(__init__)
        1    0.000    0.000   12.188   12.188
profile:0(gunzip_gzip_new_open())
        0    0.000             0.000         
profile:0(profiler)
        1    1.071    1.071   12.188   12.188
test_gzip_speed.py:14(gunzip_gzip_new_open)

Without patch...

         2073328 function calls in 18.597 CPU seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall
filename:lineno(function)
   243820    0.735    0.000    0.735    0.000 :0(append)
        1    0.000    0.000    0.000    0.000 :0(close)
     1159    0.040    0.000    0.040    0.000 :0(crc32)
     1158    0.100    0.000    0.100    0.000
:0(decompress)
        1    0.000    0.000    0.000    0.000
:0(decompressobj)
   243820    0.960    0.000    0.960    0.000 :0(find)
   200749    0.801    0.000    0.801    0.000 :0(join)
   489958    1.330    0.000    1.330    0.000 :0(len)
   243820    0.791    0.000    0.791    0.000 :0(min)
        2    0.000    0.000    0.000    0.000 :0(ord)
     1173    0.030    0.000    0.030    0.000 :0(read)
        6    0.000    0.000    0.000    0.000 :0(seek)
        1    0.000    0.000    0.000    0.000
:0(setprofile)
        6    0.000    0.000    0.000    0.000 :0(tell)
        2    0.000    0.000    0.000    0.000 :0(unpack)
        1    0.000    0.000   18.597   18.597 <string>:1(?)
        1    0.000    0.000    0.000    0.000
gzip.py:154(_init_read)
        1    0.000    0.000    0.000    0.000
gzip.py:158(_read_gzip_header)
        3    0.000    0.000    0.000    0.000
gzip.py:18(U32)
   243820    2.711    0.000    2.921    0.000
gzip.py:205(read)
   200749    3.083    0.000    4.143    0.000
gzip.py:237(_unread)
     1160    0.010    0.000    0.210    0.000
gzip.py:242(_read)
        1    0.000    0.000    0.000    0.000
gzip.py:27(LOWU32)
     1158    0.030    0.000    0.070    0.000
gzip.py:292(_add_read_data)
        1    0.000    0.000    0.000    0.000
gzip.py:298(_read_eof)
        1    0.000    0.000    0.000    0.000
gzip.py:312(close)
        1    0.000    0.000    0.000    0.000
gzip.py:325(__del__)
   200749    6.934    0.000   17.555    0.000
gzip.py:379(readline)
        2    0.000    0.000    0.000    0.000
gzip.py:39(read32)
        1    0.000    0.000    0.000    0.000
gzip.py:42(open)
        1    0.000    0.000    0.000    0.000
gzip.py:59(__init__)
        1    0.000    0.000   18.597   18.597
profile:0(gunzip_gzip_open())
        0    0.000             0.000         
profile:0(profiler)
        1    1.042    1.042   18.597   18.597
test_gzip_speed.py:7(gunzip_gzip_open)

Using popen + gunzip -c...

         200754 function calls in 4.338 CPU seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall
filename:lineno(function)
        1    0.000    0.000    0.000    0.000 :0(popen)
   200749    3.578    0.000    3.578    0.000 :0(readline)
        1    0.000    0.000    0.000    0.000
:0(setprofile)
        1    0.240    0.240    4.338    4.338 <string>:1(?)
        1    0.000    0.000    4.338    4.338
profile:0(gunzip_popen())
        0    0.000             0.000         
profile:0(profiler)
        1    0.520    0.520    4.098    4.098
test_gzip_speed.py:21(gunzip_popen)
History
Date User Action Args
2007-08-23 15:43:49adminlinkissue1281707 messages
2007-08-23 15:43:49admincreate