This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Add _GenerateCRCTable() to zipfile.py to pre-compute CRC Table
Type: performance Stage: resolved
Components: Versions: Python 3.7
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: Nosy List: jkloth, serhiy.storchaka, shubhar
Priority: normal Keywords:

Created on 2017-05-25 20:25 by shubhar, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Messages (5)
msg294513 - (view) Author: Shubha Ramani (shubhar) * Date: 2017-05-25 20:25
It is wasteful to generate the CRC Table every time, via _crctable = list(map(_gen_crc, range(256))). Better to have a pre-computed table.

I will submit the patch which incorporates this feature along with micro-benchmark results. Related issue:

http://bugs.python.org/issue30467

http://bugs.python.org/issue30468
msg294562 - (view) Author: Shubha Ramani (shubhar) * Date: 2017-05-26 18:14
It looks like _GenerateCRCTable() is already implemented here:

https://svn.python.org/projects/python/trunk/Lib/zipfile.py

The question is, why isn't it implemented here ?

https://github.com/python/cpython/blob/master/Lib/zipfile.py
msg294567 - (view) Author: Jeremy Kloth (jkloth) * Date: 2017-05-26 19:36
See bpo-10030 (and pr #550).  It removed that function in favor of the current approach.

Also, that current code does not generate *every* time, but just once.  Note that the computed value is stored in a global cache.
msg294570 - (view) Author: Shubha Ramani (shubhar) * Date: 2017-05-26 20:48
This is not needed after all jkloth explained. Please feel free to close.
msg294578 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-05-27 05:02
After some investigations this idea no longer looks good to me.

The table of 256 32-bit integers will take at least 44 lines. This is too large. It is easier to check the algorithm than all these numbers. The error even in one digit can break all, but for testing we should check every element of the table. In any case we would need a generating code, either for generating the sources, or for testing.

Current generating code is 25% faster than the code used in 2.7, and (surprisingly!) Python 3.7 is 50% faster than Python 2.7 in the computation of the same code. On my computer the generating takes 1 ms (the startup time of the interpreter is 50 ms), and the table is generated only if ZIP file decryption is used, and only once.
History
Date User Action Args
2022-04-11 14:58:46adminsetgithub: 74661
2017-05-27 05:02:59serhiy.storchakasetresolution: not a bug -> rejected
messages: + msg294578
2017-05-27 00:58:01martin.pantersetstatus: open -> closed
resolution: not a bug
stage: resolved
2017-05-26 20:48:29shubharsetmessages: + msg294570
2017-05-26 19:36:25jklothsetnosy: + jkloth
messages: + msg294567
2017-05-26 18:14:06shubharsetmessages: + msg294562
2017-05-25 20:25:14shubharcreate