Fixed: - Add self.min_readsize to __init__. Follows the principal that lines are likely to be the same length in size, and doesn't start over at a minimum length string every call to readline() - Rewriting of assignment for readsize and size at the beginning of function. Eliminates almost all calls to min() - Change bufs to a string, and not an array. No point in using an array when all you do with it is "".join(bufs). Uses string addition instead. - Remove extra assignments to bufs (in return()) - Changes readline() to be much more readable (loop reordering, more comments) Recommendations: - Delete _unread() function. It is used _only_ by readline(), and moving its functionality into readline() itself saves the function call overhead. _unread() is only 3 lines long. Testing shows that removing it speeds readline() up by about 3%. Backwards compatibility concerns? Testing results: test_append (__main__.TestGzip) ... ok test_many_append (__main__.TestGzip) ... ok test_mode (__main__.TestGzip) ... ok test_read (__main__.TestGzip) ... ok test_readline (__main__.TestGzip) ... ok test_readlines (__main__.TestGzip) ... ok test_seek_read (__main__.TestGzip) ... ok test_seek_write (__main__.TestGzip) ... ok test_write (__main__.TestGzip) ... ok ---------------------------------------------------------------------- Ran 9 tests in 0.331s Regression tests: python regrtest.py -g test_gzip.py test_gzip 1 test OK. Profiling Results (performed on a common compressed log file - 200748 lines). With patch... 1213961 function calls in 12.188 CPU seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 1 0.000 0.000 0.000 0.000 :0(close) 1159 0.020 0.000 0.020 0.000 :0(crc32) 1158 0.100 0.000 0.100 0.000 :0(decompress) 1 0.000 0.000 0.000 0.000 :0(decompressobj) 200774 0.812 0.000 0.812 0.000 :0(find) 403865 0.902 0.000 0.902 0.000 :0(len) 1183 0.000 0.000 0.000 0.000 :0(min) 2 0.000 0.000 0.000 0.000 :0(ord) 1173 0.000 0.000 0.000 0.000 :0(read) 12 0.000 0.000 0.000 0.000 :0(seek) 1 0.000 0.000 0.000 0.000 :0(setprofile) 18 0.000 0.000 0.000 0.000 :0(tell) 2 0.000 0.000 0.000 0.000 :0(unpack) 1 0.000 0.000 12.188 12.188 :1(?) 1 0.000 0.000 0.000 0.000 gzip_new.py:156(_init_read) 1 0.000 0.000 0.000 0.000 gzip_new.py:160(_read_gzip_header) 3 0.000 0.000 0.000 0.000 gzip_new.py:18(U32) 200774 2.453 0.000 2.593 0.000 gzip_new.py:207(read) 200749 2.894 0.000 3.796 0.000 gzip_new.py:239(_unread) 1166 0.010 0.000 0.140 0.000 gzip_new.py:244(_read) 1 0.000 0.000 0.000 0.000 gzip_new.py:27(LOWU32) 1158 0.010 0.000 0.030 0.000 gzip_new.py:294(_add_read_data) 1 0.000 0.000 0.000 0.000 gzip_new.py:300(_read_eof) 1 0.000 0.000 0.000 0.000 gzip_new.py:314(close) 1 0.000 0.000 0.000 0.000 gzip_new.py:327(__del__) 200749 3.916 0.000 11.117 0.000 gzip_new.py:384(readline) 2 0.000 0.000 0.000 0.000 gzip_new.py:39(read32) 1 0.000 0.000 0.000 0.000 gzip_new.py:42(open) 1 0.000 0.000 0.000 0.000 gzip_new.py:60(__init__) 1 0.000 0.000 12.188 12.188 profile:0(gunzip_gzip_new_open()) 0 0.000 0.000 profile:0(profiler) 1 1.071 1.071 12.188 12.188 test_gzip_speed.py:14(gunzip_gzip_new_open) Without patch... 2073328 function calls in 18.597 CPU seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 243820 0.735 0.000 0.735 0.000 :0(append) 1 0.000 0.000 0.000 0.000 :0(close) 1159 0.040 0.000 0.040 0.000 :0(crc32) 1158 0.100 0.000 0.100 0.000 :0(decompress) 1 0.000 0.000 0.000 0.000 :0(decompressobj) 243820 0.960 0.000 0.960 0.000 :0(find) 200749 0.801 0.000 0.801 0.000 :0(join) 489958 1.330 0.000 1.330 0.000 :0(len) 243820 0.791 0.000 0.791 0.000 :0(min) 2 0.000 0.000 0.000 0.000 :0(ord) 1173 0.030 0.000 0.030 0.000 :0(read) 6 0.000 0.000 0.000 0.000 :0(seek) 1 0.000 0.000 0.000 0.000 :0(setprofile) 6 0.000 0.000 0.000 0.000 :0(tell) 2 0.000 0.000 0.000 0.000 :0(unpack) 1 0.000 0.000 18.597 18.597 :1(?) 1 0.000 0.000 0.000 0.000 gzip.py:154(_init_read) 1 0.000 0.000 0.000 0.000 gzip.py:158(_read_gzip_header) 3 0.000 0.000 0.000 0.000 gzip.py:18(U32) 243820 2.711 0.000 2.921 0.000 gzip.py:205(read) 200749 3.083 0.000 4.143 0.000 gzip.py:237(_unread) 1160 0.010 0.000 0.210 0.000 gzip.py:242(_read) 1 0.000 0.000 0.000 0.000 gzip.py:27(LOWU32) 1158 0.030 0.000 0.070 0.000 gzip.py:292(_add_read_data) 1 0.000 0.000 0.000 0.000 gzip.py:298(_read_eof) 1 0.000 0.000 0.000 0.000 gzip.py:312(close) 1 0.000 0.000 0.000 0.000 gzip.py:325(__del__) 200749 6.934 0.000 17.555 0.000 gzip.py:379(readline) 2 0.000 0.000 0.000 0.000 gzip.py:39(read32) 1 0.000 0.000 0.000 0.000 gzip.py:42(open) 1 0.000 0.000 0.000 0.000 gzip.py:59(__init__) 1 0.000 0.000 18.597 18.597 profile:0(gunzip_gzip_open()) 0 0.000 0.000 profile:0(profiler) 1 1.042 1.042 18.597 18.597 test_gzip_speed.py:7(gunzip_gzip_open) Using popen + gunzip -c... 200754 function calls in 4.338 CPU seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 1 0.000 0.000 0.000 0.000 :0(popen) 200749 3.578 0.000 3.578 0.000 :0(readline) 1 0.000 0.000 0.000 0.000 :0(setprofile) 1 0.240 0.240 4.338 4.338 :1(?) 1 0.000 0.000 4.338 4.338 profile:0(gunzip_popen()) 0 0.000 0.000 profile:0(profiler) 1 0.520 0.520 4.098 4.098 test_gzip_speed.py:21(gunzip_popen)