Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quadratic complexity in b32encode #62012

Closed
serhiy-storchaka opened this issue Apr 21, 2013 · 10 comments
Closed

Quadratic complexity in b32encode #62012

serhiy-storchaka opened this issue Apr 21, 2013 · 10 comments
Assignees
Labels
performance Performance or resource usage stdlib Python modules in the Lib dir

Comments

@serhiy-storchaka
Copy link
Member

BPO 17812
Nosy @pitrou, @serhiy-storchaka
Files
  • base32_fix_2.patch: Fix quadratic complexity
  • base32_optimize_2.patch: Optimize b32encode and b32decode
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/serhiy-storchaka'
    closed_at = <Date 2013-05-19.08:52:26.224>
    created_at = <Date 2013-04-21.21:34:19.423>
    labels = ['library', 'performance']
    title = 'Quadratic complexity in b32encode'
    updated_at = <Date 2013-05-19.08:52:26.223>
    user = 'https://github.com/serhiy-storchaka'

    bugs.python.org fields:

    activity = <Date 2013-05-19.08:52:26.223>
    actor = 'serhiy.storchaka'
    assignee = 'serhiy.storchaka'
    closed = True
    closed_date = <Date 2013-05-19.08:52:26.224>
    closer = 'serhiy.storchaka'
    components = ['Library (Lib)']
    creation = <Date 2013-04-21.21:34:19.423>
    creator = 'serhiy.storchaka'
    dependencies = []
    files = ['30308', '30309']
    hgrepos = []
    issue_num = 17812
    keywords = ['patch']
    message_count = 10.0
    messages = ['187527', '187529', '189538', '189539', '189544', '189551', '189554', '189555', '189573', '189574']
    nosy_count = 3.0
    nosy_names = ['pitrou', 'python-dev', 'serhiy.storchaka']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'performance'
    url = 'https://bugs.python.org/issue17812'
    versions = ['Python 3.3', 'Python 3.4']

    @serhiy-storchaka
    Copy link
    Member Author

    b32encode accumulates encoded data in a bytes object and this operation has quadratic complexity.

    Here is a patch, which fixes this issue by accumulating in a list.

    @serhiy-storchaka serhiy-storchaka added stdlib Python modules in the Lib dir performance Performance or resource usage labels Apr 21, 2013
    @serhiy-storchaka
    Copy link
    Member Author

    And here are other patch, which not only fixes an issue with quadratic complexity, but optimize b32encode and b32decode about 2.5 times.

    Microbenchmarks:

    ./python -m timeit -r 1 -n 10 -s "from base64 import b32encode as encode; data = open('python', 'rb').read(1000001)" "encode(data)"
    ./python -m timeit -r 1 -n 1 -s "from base64 import b32encode as encode, b32decode as decode; data = encode(open('python', 'rb').read(1000001))" "decode(data)"

    Results:
    First patch Second patch
    b32encode 1.25 sec 486 msec
    b32decode 2.08 sec 835 msec

    @serhiy-storchaka
    Copy link
    Member Author

    Note that without patches both tests run many minutes.

    base32_fix.patch causes little (5%) performance degradation for small data (< KB), this is perhaps the cause for the use of string concatenation. However base32_optimize.patch speeds up about 3x for such data.

    1 similar comment
    @serhiy-storchaka
    Copy link
    Member Author

    Note that without patches both tests run many minutes.

    base32_fix.patch causes little (5%) performance degradation for small data (< KB), this is perhaps the cause for the use of string concatenation. However base32_optimize.patch speeds up about 3x for such data.

    @pitrou
    Copy link
    Member

    pitrou commented May 18, 2013

    Using a bytearray, rather than accumulating into a list, may reduce memory consumption.

    @serhiy-storchaka
    Copy link
    Member Author

    Thank you Antoine. Here are updated patches.

    @serhiy-storchaka
    Copy link
    Member Author

    I think 3.3 needs a fix. The quadratic complexity is a bug.

    @pitrou
    Copy link
    Member

    pitrou commented May 18, 2013

    Then I would recommend committing the simple patch in 3.3 and the more optimized one in 3.4. Both look fine to me, althgugh I haven't really examined the padding stuff.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented May 19, 2013

    New changeset 4b5467d997f1 by Serhiy Storchaka in branch '3.3':
    Issue bpo-17812: Fixed quadratic complexity of base64.b32encode().
    http://hg.python.org/cpython/rev/4b5467d997f1

    New changeset 1b5ef05d6ced by Serhiy Storchaka in branch 'default':
    Issue bpo-17812: Fixed quadratic complexity of base64.b32encode().
    http://hg.python.org/cpython/rev/1b5ef05d6ced

    @serhiy-storchaka
    Copy link
    Member Author

    That is why I proposed two patches.

    @serhiy-storchaka serhiy-storchaka self-assigned this May 19, 2013
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    performance Performance or resource usage stdlib Python modules in the Lib dir
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants