classification
Title: zipfile._ZipDecryptor generates wasteful crc32 table on import
Type: performance Stage: resolved
Components: Library (Lib) Versions: Python 3.3, Python 3.4, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: alanmcintyre, dholth, larry, loewis, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2013-07-20 14:37 by dholth, last changed 2014-01-07 17:37 by serhiy.storchaka. This issue is now closed.

Files
File name Uploaded Description Edit
zipfile-no-crc32.patch dholth, 2013-07-20 14:37 review
zdlazy.patch dholth, 2013-07-20 15:01 review
Messages (11)
msg193408 - (view) Author: Daniel Holth (dholth) Date: 2013-07-20 14:37
http://hg.python.org/cpython/file/e7305517260b/Lib/zipfile.py#l460

I noticed this table taking up time on import. I'd guess that it pre-dates zlib.crc32 which is imported at the top of the file. I also suspect that most of the time this table isn't even used at all.
msg193409 - (view) Author: Daniel Holth (dholth) Date: 2013-07-20 15:01
Someone who has a better understanding of zipfile may be able to get zlib.crc32(bytes[ch], running_crc) to work correctly. This patch that passes the zipfile tests generates the crctable only when _ZipDecrypter() is instantiated.
msg193411 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-07-20 16:13
How much time take it?
msg193416 - (view) Author: Daniel Holth (dholth) Date: 2013-07-20 17:36
It takes 706 microseconds.

On my machine 

%timeit import sys; del sys.modules['zipfile']; import zipfile "import
zipfile"

takes 2.51 ms without the patch and 1.7 ms with the patch.

On Sat, Jul 20, 2013, at 12:13 PM, Serhiy Storchaka wrote:
> 
> Serhiy Storchaka added the comment:
> 
> How much time take it?
> 
> ----------
> 
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue18515>
> _______________________________________
msg193422 - (view) Author: Daniel Holth (dholth) Date: 2013-07-20 22:14
I tried it on a raspberry pi. zipfile takes 36 ms to import and 10 ms if it does not generate the crctable.
msg194473 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-08-05 12:48
The objection to zipfile-no-crc32.patch is that binascii.crc32() and _crc32() have different signatures. binascii.crc32() accepts a byte object while _crc32() accepts a single integer. With packing this value into a bytes object _crc32() will be much slower.

As for zdlazy.patch, there is a proposed patch in issue10030 which speedups pure Python decryption. I'm afraid that with zdlazy.patch this path will lose a part of it's speedup.
msg197824 - (view) Author: Daniel Holth (dholth) Date: 2013-09-15 19:55
I am withdrawing zipfile-no-crc32.patch. It did not work correctly. 

zdlazy.patch should go in. It avoids creating the rarely-needed crc32 table until the first time it is needed, saving some memory and the majority of time needed to import the module.

zdlazy.patch should not affect its C replacement at all, except that the rarely-needed CRC32 table that we are not generating becomes a never-needed table.
msg197836 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-09-15 21:15
I meant my pure Python patch in issue10030. Binding crctable to local variable is one of microoptimizations. Not the largest one however. So in general I not objects. Your patch LGTM. Only one nitpick -- instead "not _ZipDecrypter.crctable" use "_ZipDecrypter.crctable is None".
msg207238 - (view) Author: Daniel Holth (dholth) Date: 2014-01-03 19:42
Fixed in http://hg.python.org/cpython/rev/536a2cf5f1d2
msg207442 - (view) Author: Larry Hastings (larry) * (Python committer) Date: 2014-01-06 15:22
Since this isn't a bugfix, it was inappropriate to check this in after feature-freeze for 3.4.  However it looks harmless enough, so I'm not asking you to revert it at this time.

I guess it's easier to get forgiveness than permission, huh.
msg207445 - (view) Author: Daniel Holth (dholth) Date: 2014-01-06 15:55
Thanks. I guess I know who to ask now.

It was just painful seeing so much import-time computation, pretty much
guaranteed to happen every time Python starts up, being wasted on a
feature that is rarely used. On the Raspberry Pi the majority of the
import time is spent building this table and the speed difference is
noticeable.

On Mon, Jan 6, 2014, at 10:22 AM, Larry Hastings wrote:
> 
> Larry Hastings added the comment:
> 
> Since this isn't a bugfix, it was inappropriate to check this in after
> feature-freeze for 3.4.  However it looks harmless enough, so I'm not
> asking you to revert it at this time.
> 
> I guess it's easier to get forgiveness than permission, huh.
> 
> ----------
> nosy: +larry
> 
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue18515>
> _______________________________________
History
Date User Action Args
2014-01-07 17:37:39serhiy.storchakasetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2014-01-06 15:55:30dholthsetmessages: + msg207445
2014-01-06 15:22:43larrysetnosy: + larry
messages: + msg207442
2014-01-03 19:42:00dholthsetmessages: + msg207238
2013-09-15 21:15:06serhiy.storchakasetmessages: + msg197836
2013-09-15 19:55:47dholthsetmessages: + msg197824
2013-08-05 12:48:39serhiy.storchakasetmessages: + msg194473
2013-07-20 22:14:57dholthsetmessages: + msg193422
2013-07-20 17:36:06dholthsetmessages: + msg193416
2013-07-20 16:13:21serhiy.storchakasetmessages: + msg193411
2013-07-20 16:09:17serhiy.storchakasetnosy: + loewis, alanmcintyre, serhiy.storchaka

type: performance
components: + Library (Lib)
stage: patch review
2013-07-20 15:01:36dholthsetfiles: + zdlazy.patch

messages: + msg193409
2013-07-20 14:37:24dholthcreate