Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zipfile._ZipDecryptor generates wasteful crc32 table on import #62715

Closed
dholth mannequin opened this issue Jul 20, 2013 · 11 comments
Closed

zipfile._ZipDecryptor generates wasteful crc32 table on import #62715

dholth mannequin opened this issue Jul 20, 2013 · 11 comments
Labels
performance Performance or resource usage stdlib Python modules in the Lib dir

Comments

@dholth
Copy link
Mannequin

dholth mannequin commented Jul 20, 2013

BPO 18515
Nosy @loewis, @larryhastings, @dholth, @serhiy-storchaka
Files
  • zipfile-no-crc32.patch
  • zdlazy.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2014-01-07.17:37:39.886>
    created_at = <Date 2013-07-20.14:37:24.463>
    labels = ['library', 'performance']
    title = 'zipfile._ZipDecryptor generates wasteful crc32 table on import'
    updated_at = <Date 2014-01-07.17:37:39.886>
    user = 'https://github.com/dholth'

    bugs.python.org fields:

    activity = <Date 2014-01-07.17:37:39.886>
    actor = 'serhiy.storchaka'
    assignee = 'none'
    closed = True
    closed_date = <Date 2014-01-07.17:37:39.886>
    closer = 'serhiy.storchaka'
    components = ['Library (Lib)']
    creation = <Date 2013-07-20.14:37:24.463>
    creator = 'dholth'
    dependencies = []
    files = ['30990', '30991']
    hgrepos = []
    issue_num = 18515
    keywords = ['patch']
    message_count = 11.0
    messages = ['193408', '193409', '193411', '193416', '193422', '194473', '197824', '197836', '207238', '207442', '207445']
    nosy_count = 5.0
    nosy_names = ['loewis', 'alanmcintyre', 'larry', 'dholth', 'serhiy.storchaka']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'performance'
    url = 'https://bugs.python.org/issue18515'
    versions = ['Python 2.7', 'Python 3.3', 'Python 3.4']

    @dholth
    Copy link
    Mannequin Author

    dholth mannequin commented Jul 20, 2013

    http://hg.python.org/cpython/file/e7305517260b/Lib/zipfile.py#l460

    I noticed this table taking up time on import. I'd guess that it pre-dates zlib.crc32 which is imported at the top of the file. I also suspect that most of the time this table isn't even used at all.

    @dholth
    Copy link
    Mannequin Author

    dholth mannequin commented Jul 20, 2013

    Someone who has a better understanding of zipfile may be able to get zlib.crc32(bytes[ch], running_crc) to work correctly. This patch that passes the zipfile tests generates the crctable only when _ZipDecrypter() is instantiated.

    @serhiy-storchaka serhiy-storchaka added stdlib Python modules in the Lib dir performance Performance or resource usage labels Jul 20, 2013
    @serhiy-storchaka
    Copy link
    Member

    How much time take it?

    @dholth
    Copy link
    Mannequin Author

    dholth mannequin commented Jul 20, 2013

    It takes 706 microseconds.

    On my machine

    %timeit import sys; del sys.modules['zipfile']; import zipfile "import
    zipfile"

    takes 2.51 ms without the patch and 1.7 ms with the patch.

    On Sat, Jul 20, 2013, at 12:13 PM, Serhiy Storchaka wrote:

    Serhiy Storchaka added the comment:

    How much time take it?

    ----------


    Python tracker <report@bugs.python.org>
    <http://bugs.python.org/issue18515\>


    @dholth
    Copy link
    Mannequin Author

    dholth mannequin commented Jul 20, 2013

    I tried it on a raspberry pi. zipfile takes 36 ms to import and 10 ms if it does not generate the crctable.

    @serhiy-storchaka
    Copy link
    Member

    The objection to zipfile-no-crc32.patch is that binascii.crc32() and _crc32() have different signatures. binascii.crc32() accepts a byte object while _crc32() accepts a single integer. With packing this value into a bytes object _crc32() will be much slower.

    As for zdlazy.patch, there is a proposed patch in bpo-10030 which speedups pure Python decryption. I'm afraid that with zdlazy.patch this path will lose a part of it's speedup.

    @dholth
    Copy link
    Mannequin Author

    dholth mannequin commented Sep 15, 2013

    I am withdrawing zipfile-no-crc32.patch. It did not work correctly.

    zdlazy.patch should go in. It avoids creating the rarely-needed crc32 table until the first time it is needed, saving some memory and the majority of time needed to import the module.

    zdlazy.patch should not affect its C replacement at all, except that the rarely-needed CRC32 table that we are not generating becomes a never-needed table.

    @serhiy-storchaka
    Copy link
    Member

    I meant my pure Python patch in bpo-10030. Binding crctable to local variable is one of microoptimizations. Not the largest one however. So in general I not objects. Your patch LGTM. Only one nitpick -- instead "not _ZipDecrypter.crctable" use "_ZipDecrypter.crctable is None".

    @dholth
    Copy link
    Mannequin Author

    dholth mannequin commented Jan 3, 2014

    @larryhastings
    Copy link
    Contributor

    Since this isn't a bugfix, it was inappropriate to check this in after feature-freeze for 3.4. However it looks harmless enough, so I'm not asking you to revert it at this time.

    I guess it's easier to get forgiveness than permission, huh.

    @dholth
    Copy link
    Mannequin Author

    dholth mannequin commented Jan 6, 2014

    Thanks. I guess I know who to ask now.

    It was just painful seeing so much import-time computation, pretty much
    guaranteed to happen every time Python starts up, being wasted on a
    feature that is rarely used. On the Raspberry Pi the majority of the
    import time is spent building this table and the speed difference is
    noticeable.

    On Mon, Jan 6, 2014, at 10:22 AM, Larry Hastings wrote:

    Larry Hastings added the comment:

    Since this isn't a bugfix, it was inappropriate to check this in after
    feature-freeze for 3.4. However it looks harmless enough, so I'm not
    asking you to revert it at this time.

    I guess it's easier to get forgiveness than permission, huh.

    ----------
    nosy: +larry


    Python tracker <report@bugs.python.org>
    <http://bugs.python.org/issue18515\>


    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    performance Performance or resource usage stdlib Python modules in the Lib dir
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants