Message 347832 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	mpaolini
Recipients	christian.heimes, mpaolini, pablogsal
Date	2019-07-13.14:42:01
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1563028922.54.0.821649437171.issue37587@roundup.psfhosted.org>
In-reply-to

Content
I analysed the performance of json.loads in one production workload we have. Spy-py tells me the majority of time is spent into C json module (see events.svg) Digging deeper, linux perf tells me hottest loop (where 20%+ of the time is spent) in _json.scanstring_unicode, in this loop: 189: movzx eax,BYTE PTR [rbp+rbx1+0x0] mov DWORD PTR [rsp+0x44],eax cmp eax,0x22 je 22f cmp eax,0x5c je 22f test r13d,r13d je 180 cmp eax,0x1f which is related to this code in Modules/_json.c / Find the end of the string or the next escape / Py_UCS4 c = 0; for (next = end; next < len; next++) { c = PyUnicode_READ(kind, buf, next); if (c == '"' \|\| c == '\\') { break; } else if (strict && c <= 0x1f) { raise_errmsg("Invalid control character at", pystr, next); goto bail; } } Two optimisations can be done: 1. remove the mov entirely. It is not needed inside the loop and it is only needed later, outside the loop to access the variable 2. switch around the strict check (in the second if) because strict defaults to 1 so it will likely pass the test, while the likelyness of finding an invalid character is lower Running the pyperformance json_loads benchmark I get: +------------+----------------------+-----------------------------+ \| Benchmark \| vanilla-pyperf-pgo38 \| patched-pyperf-pgo38 \| +============+======================+=============================+ \| json_loads \| 54.9 us \| 53.9 us: 1.02x faster (-2%) \| +------------+----------------------+-----------------------------+ A micro benchmark on a 1MB long json string gives better results: python -m pyperf timeit -s "import json; x = json.dumps({'k': '1' 2 ** 20})" "json.loads(x)" +-----------+------------+-----------------------------+ \| Benchmark \| vanilla-1m \| patched-1m \| +===========+============+=============================+ \| timeit \| 2.62 ms \| 2.39 ms: 1.10x faster (-9%) \| +-----------+------------+-----------------------------+

I analysed the performance of json.loads in one production workload we have.

Spy-py tells me the majority of time is spent into C json module (see events.svg)

Digging deeper, linux perf tells me hottest loop (where 20%+ of the time is spent) in _json.scanstring_unicode, in this loop:

189:   movzx  eax,BYTE PTR [rbp+rbx*1+0x0]
       mov    DWORD PTR [rsp+0x44],eax
       cmp    eax,0x22
       je     22f
       cmp    eax,0x5c
       je     22f
       test   r13d,r13d
       je     180
       cmp    eax,0x1f

which is related to this code in Modules/_json.c


        /* Find the end of the string or the next escape */
        Py_UCS4 c = 0;
        for (next = end; next < len; next++) {
            c = PyUnicode_READ(kind, buf, next);
            if (c == '"' || c == '\\') {
                break;
            }
            else if (strict && c <= 0x1f) {
                raise_errmsg("Invalid control character at", pystr, next);
                goto bail;
            }
        }

Two optimisations can be done:

1. remove the mov entirely. It is not needed inside the loop and it is only needed later, outside the loop to access the variable
2. switch around the strict check (in the second if) because strict defaults to 1 so it will likely pass the test, while the likelyness of finding an invalid character is lower

Running the pyperformance json_loads benchmark I get:

+------------+----------------------+-----------------------------+
| Benchmark  | vanilla-pyperf-pgo38 | patched-pyperf-pgo38        |
+============+======================+=============================+
| json_loads | 54.9 us              | 53.9 us: 1.02x faster (-2%) |
+------------+----------------------+-----------------------------+


A micro benchmark on a 1MB long json string gives better results:

python -m pyperf timeit -s "import json; x = json.dumps({'k': '1' * 2 ** 20})" "json.loads(x)"

+-----------+------------+-----------------------------+
| Benchmark | vanilla-1m | patched-1m                  |
+===========+============+=============================+
| timeit    | 2.62 ms    | 2.39 ms: 1.10x faster (-9%) |
+-----------+------------+-----------------------------+

History
Date	User	Action	Args
2019-07-13 14:42:02	mpaolini	set	recipients: + mpaolini, christian.heimes, pablogsal
2019-07-13 14:42:02	mpaolini	set	messageid: <1563028922.54.0.821649437171.issue37587@roundup.psfhosted.org>
2019-07-13 14:42:02	mpaolini	link	issue37587 messages
2019-07-13 14:42:01	mpaolini	create