Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

json.dumps(ensure_ascii=False) is ~10x slower than json.dumps() #67395

Closed
methane opened this issue Jan 9, 2015 · 8 comments
Closed

json.dumps(ensure_ascii=False) is ~10x slower than json.dumps() #67395

methane opened this issue Jan 9, 2015 · 8 comments
Labels
performance Performance or resource usage stdlib Python modules in the Lib dir

Comments

@methane
Copy link
Member

methane commented Jan 9, 2015

BPO 23206
Nosy @rhettinger, @pitrou, @ezio-melotti, @methane, @serhiy-storchaka
Files
  • json-fast-unicode-encode.patch: Add non ascii version of speedup function.
  • test_encode_basestring.py: Lib/test/test_json/test_encode_basestring.py
  • json-fast-unicode-encode.patch
  • json-fast-unicode-encode.patch
  • json-fast-unicode-encode.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2015-01-11.15:44:17.799>
    created_at = <Date 2015-01-09.13:17:25.193>
    labels = ['library', 'performance']
    title = 'json.dumps(ensure_ascii=False) is ~10x slower than json.dumps()'
    updated_at = <Date 2015-01-11.15:44:17.798>
    user = 'https://github.com/methane'

    bugs.python.org fields:

    activity = <Date 2015-01-11.15:44:17.798>
    actor = 'pitrou'
    assignee = 'none'
    closed = True
    closed_date = <Date 2015-01-11.15:44:17.799>
    closer = 'pitrou'
    components = ['Library (Lib)']
    creation = <Date 2015-01-09.13:17:25.193>
    creator = 'methane'
    dependencies = []
    files = ['37653', '37654', '37656', '37669', '37670']
    hgrepos = []
    issue_num = 23206
    keywords = ['patch']
    message_count = 8.0
    messages = ['233752', '233753', '233762', '233787', '233825', '233826', '233858', '233859']
    nosy_count = 6.0
    nosy_names = ['rhettinger', 'pitrou', 'ezio.melotti', 'Arfrever', 'methane', 'serhiy.storchaka']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'performance'
    url = 'https://bugs.python.org/issue23206'
    versions = ['Python 3.5']

    @methane
    Copy link
    Member Author

    methane commented Jan 9, 2015

    I prefer ensure_ascii=False because it's efficient.
    But I notice it is very slower.

    On Python 3.4.2:
    In [3]: %timeit json.dumps([{'hello': 'world'}]*100)
    10000 loops, best of 3: 74.8 µs per loop

    In [4]: %timeit json.dumps([{'hello': 'world'}]*100, ensure_ascii=False)
    1000 loops, best of 3: 259 µs per loop

    On Python HEAD with attached patch:
    In [2]: %timeit json.dumps([{'hello': 'world'}]*100)
    10000 loops, best of 3: 80.8 µs per loop

    In [3]: %timeit json.dumps([{'hello': 'world'}]*100, ensure_ascii=False)
    10000 loops, best of 3: 80.4 µs per loop

    @methane methane added the stdlib Python modules in the Lib dir label Jan 9, 2015
    @methane
    Copy link
    Member Author

    methane commented Jan 9, 2015

    I've copied test_encode_basestring_ascii.py and modify it for this patch.

    @serhiy-storchaka serhiy-storchaka added the performance Performance or resource usage label Jan 9, 2015
    @methane
    Copy link
    Member Author

    methane commented Jan 9, 2015

    Patch update.
    Now C version does escaping same way to Python version.

    @pitrou
    Copy link
    Member

    pitrou commented Jan 9, 2015

    Thank you for the patch! I posted a review.

    @methane
    Copy link
    Member Author

    methane commented Jan 10, 2015

    I've updated patch to use PyUnicode_MAX_CHAR_VALUE().

    @methane
    Copy link
    Member Author

    methane commented Jan 10, 2015

    test_encode_basestring_ascii.py has duplicated test cases.

    @pitrou
    Copy link
    Member

    pitrou commented Jan 11, 2015

    I get the following compile error:

    In file included from ./Include/Python.h:48:0,
    from /home/antoine/cpython/default/Modules/_json.c:1:
    /home/antoine/cpython/default/Modules/_json.c: In function ‘escape_unicode’:
    /home/antoine/cpython/default/Modules/_json.c:301:24: error: ‘PyUnicode_4BYTE_DATA’ undeclared (first use in this function)
    assert(kind == PyUnicode_4BYTE_DATA);
    ^

    Fixing it is trivial, I can do it when committing.

    @pitrou
    Copy link
    Member

    pitrou commented Jan 11, 2015

    The patch was committed in b312b256931e. Thank you!

    @pitrou pitrou closed this as completed Jan 11, 2015
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    performance Performance or resource usage stdlib Python modules in the Lib dir
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants