Issue41265
Created on 2020-07-10 08:45 by malin, last changed 2020-08-05 10:06 by malin. This issue is now closed.
Messages (4) | |||
---|---|---|---|
msg373454 - (view) | Author: Ma Lin (malin) * | Date: 2020-07-10 08:45 | |
lzma/bz2 modules are using the same buffer growth algorithm: [1][2] newsize = size + (size >> 3) + 6; lzma/bz2 modules' default output buffer is 8192 bytes [3][4], so the growth step is below. For many cases, maybe the buffer is resized too many times. Is it possible to design a new growth algorithm that grows faster when the size is not very large. 1: 8,196 bytes 2: 9,226 bytes 3: 10,385 bytes 4: 11,689 bytes 5: 13,156 bytes 6: 14,806 bytes 7: 16,662 bytes 8: 18,750 bytes 9: 21,099 bytes 10: 23,742 bytes 11: 26,715 bytes 12: 30,060 bytes 13: 33,823 bytes 14: 38,056 bytes 15: 42,819 bytes 16: 48,177 bytes 17: 54,205 bytes 18: 60,986 bytes 19: 68,615 bytes 20: 77,197 bytes 21: 86,852 bytes 22: 97,714 bytes 23: 109,934 bytes 24: 123,681 bytes 25: 139,147 bytes 26: 156,546 bytes 27: 176,120 bytes 28: 198,141 bytes 29: 222,914 bytes 30: 250,784 bytes 31: 282,138 bytes 32: 317,411 bytes 33: 357,093 bytes 34: 401,735 bytes 35: 451,957 bytes 36: 508,457 bytes 37: 572,020 bytes 38: 643,528 bytes 39: 723,975 bytes 40: 814,477 bytes 41: 916,292 bytes 42: 1,030,834 bytes 43: 1,159,694 bytes 44: 1,304,661 bytes 45: 1,467,749 bytes 46: 1,651,223 bytes 47: 1,857,631 bytes 48: 2,089,840 bytes 49: 2,351,076 bytes 50: 2,644,966 bytes 51: 2,975,592 bytes 52: 3,347,547 bytes 53: 3,765,996 bytes 54: 4,236,751 bytes 55: 4,766,350 bytes 56: 5,362,149 bytes 57: 6,032,423 bytes 58: 6,786,481 bytes 59: 7,634,797 bytes 60: 8,589,152 bytes 61: 9,662,802 bytes 62: 10,870,658 bytes 63: 12,229,496 bytes 64: 13,758,189 bytes 65: 15,477,968 bytes 66: 17,412,720 bytes 67: 19,589,316 bytes 68: 22,037,986 bytes 69: 24,792,740 bytes 70: 27,891,838 bytes 71: 31,378,323 bytes 72: 35,300,619 bytes 73: 39,713,202 bytes 74: 44,677,358 bytes 75: 50,262,033 bytes 76: 56,544,793 bytes 77: 63,612,898 bytes 78: 71,564,516 bytes 79: 80,510,086 bytes 80: 90,573,852 bytes 81: 101,895,589 bytes 82: 114,632,543 bytes 83: 128,961,616 bytes 84: 145,081,824 bytes 85: 163,217,058 bytes 86: 183,619,196 bytes 87: 206,571,601 bytes 88: 232,393,057 bytes 89: 261,442,195 bytes 90: 294,122,475 bytes 91: 330,887,790 bytes 92: 372,248,769 bytes 93: 418,779,871 bytes 94: 471,127,360 bytes 95: 530,018,286 bytes 96: 596,270,577 bytes 97: 670,804,405 bytes 98: 754,654,961 bytes 99: 848,986,837 bytes 100: 955,110,197 bytes [1] lzma buffer growth algorithm: https://github.com/python/cpython/blob/v3.9.0b4/Modules/_lzmamodule.c#L133 [2] bz2 buffer growth algorithm: https://github.com/python/cpython/blob/v3.9.0b4/Modules/_bz2module.c#L121 [3] lzma default buffer size: https://github.com/python/cpython/blob/v3.9.0b4/Modules/_lzmamodule.c#L124 [4] bz2 default buffer size: https://github.com/python/cpython/blob/v3.9.0b4/Modules/_bz2module.c#L109 |
|||
msg373509 - (view) | Author: Ma Lin (malin) * | Date: 2020-07-11 03:09 | |
Maybe the zlib module can also use the same algorithm. zlib module's initial buffer size is 16KB [1], each time the size doubles [2]. [1] zlib module's initial buffer size: https://github.com/python/cpython/blob/v3.9.0b4/Modules/zlibmodule.c#L32 [2] zlib module buffer growth algorithm: https://github.com/python/cpython/blob/v3.9.0b4/Modules/zlibmodule.c#L174 |
|||
msg374116 - (view) | Author: Ma Lin (malin) * | Date: 2020-07-23 00:18 | |
I'm working on a patch. lzma decompressing speed increases: baseline: 0.275722 sec patched: 0.140405 sec (Uncompressed data size 52.57 MB) The new algorithm looks like this: #define INITIAL_BUFFER_SIZE (16*1024) static inline Py_ssize_t get_newsize(Py_ssize_t size) { const Py_ssize_t MB = 1024*1024; const Py_ssize_t GB = 1024*1024*1024; if (size <= 1*MB) { return size << 2; // x4 } else if (size <= 128*MB) { return size << 1; // x2 } else if (size <= 1*GB) { return size + (size >> 1); // x1.5 } else if (size <= 2*GB) { return size + (size >> 2); // x1.25 } else { return size + (size >> 3); // x1.125 } } |
|||
msg374868 - (view) | Author: Ma Lin (malin) * | Date: 2020-08-05 10:06 | |
A more thorough solution was used, see issue41486. So I close this issue. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2020-08-05 10:06:39 | malin | set | status: open -> closed messages: + msg374868 stage: resolved |
2020-07-23 00:18:26 | malin | set | messages: + msg374116 |
2020-07-11 03:09:00 | malin | set | nosy:
+ rhettinger, methane messages: + msg373509 |
2020-07-10 08:45:33 | malin | create |