classification
Title: json.dumps(ensure_ascii=False) is ~10x slower than json.dumps()
Type: performance Stage: resolved
Components: Library (Lib) Versions: Python 3.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Arfrever, ezio.melotti, inada.naoki, pitrou, rhettinger, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2015-01-09 13:17 by inada.naoki, last changed 2015-01-11 15:44 by pitrou. This issue is now closed.

Files
File name Uploaded Description Edit
json-fast-unicode-encode.patch inada.naoki, 2015-01-09 13:17 Add non ascii version of speedup function. review
test_encode_basestring.py inada.naoki, 2015-01-09 13:34 Lib/test/test_json/test_encode_basestring.py
json-fast-unicode-encode.patch inada.naoki, 2015-01-09 16:01 review
json-fast-unicode-encode.patch inada.naoki, 2015-01-10 21:03 review
json-fast-unicode-encode.patch inada.naoki, 2015-01-10 21:07 review
Messages (8)
msg233752 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2015-01-09 13:17
I prefer ensure_ascii=False because it's efficient.
But I notice it is very slower.

On Python 3.4.2:
In [3]: %timeit json.dumps([{'hello': 'world'}]*100)
10000 loops, best of 3: 74.8 µs per loop

In [4]: %timeit json.dumps([{'hello': 'world'}]*100, ensure_ascii=False)
1000 loops, best of 3: 259 µs per loop

On Python HEAD with attached patch:
In [2]: %timeit json.dumps([{'hello': 'world'}]*100)
10000 loops, best of 3: 80.8 µs per loop

In [3]: %timeit json.dumps([{'hello': 'world'}]*100, ensure_ascii=False)
10000 loops, best of 3: 80.4 µs per loop
msg233753 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2015-01-09 13:34
I've copied test_encode_basestring_ascii.py and modify it for this patch.
msg233762 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2015-01-09 16:01
Patch update.
Now C version does escaping same way to Python version.
msg233787 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2015-01-09 23:40
Thank you for the patch! I posted a review.
msg233825 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2015-01-10 21:03
I've updated patch to use PyUnicode_MAX_CHAR_VALUE().
msg233826 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2015-01-10 21:07
test_encode_basestring_ascii.py has duplicated test cases.
msg233858 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2015-01-11 15:34
I get the following compile error:

In file included from ./Include/Python.h:48:0,
                 from /home/antoine/cpython/default/Modules/_json.c:1:
/home/antoine/cpython/default/Modules/_json.c: In function ‘escape_unicode’:
/home/antoine/cpython/default/Modules/_json.c:301:24: error: ‘PyUnicode_4BYTE_DATA’ undeclared (first use in this function)
         assert(kind == PyUnicode_4BYTE_DATA);
                        ^

Fixing it is trivial, I can do it when committing.
msg233859 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2015-01-11 15:44
The patch was committed in b312b256931e. Thank you!
History
Date User Action Args
2015-01-11 15:44:17pitrousetstatus: open -> closed
resolution: fixed
messages: + msg233859

stage: patch review -> resolved
2015-01-11 15:34:05pitrousetmessages: + msg233858
2015-01-11 02:05:52Arfreversetnosy: + Arfrever
2015-01-10 21:07:31inada.naokisetfiles: + json-fast-unicode-encode.patch

messages: + msg233826
2015-01-10 21:03:30inada.naokisetfiles: + json-fast-unicode-encode.patch

messages: + msg233825
2015-01-09 23:40:18pitrousetmessages: + msg233787
2015-01-09 16:01:32inada.naokisetfiles: + json-fast-unicode-encode.patch

messages: + msg233762
2015-01-09 15:04:18serhiy.storchakasetnosy: + rhettinger, pitrou, ezio.melotti, serhiy.storchaka

type: performance
stage: patch review
2015-01-09 13:34:21inada.naokisetfiles: + test_encode_basestring.py

messages: + msg233753
2015-01-09 13:17:25inada.naokicreate