msg346115 - (view) |
Author: Inada Naoki (methane) * |
Date: 2019-06-20 11:40 |
_PyUnicode_FromASCII(s, len) is faster than PyUnicode_FromString(s) because PyUnicode_FromString() uses temporary _PyUnicodeWriter to support UTF-8.
But _PyUnicode_FromASCII("hello", strlen("hello"))` is not as easy as `PyUnicode_FromString("hello")`.
_PyUnicode_FROM_ASCII() is simple macro which wraps _PyUnicode_FromASCII which calls strlen() automatically:
```
/* Convenient wrapper for _PyUnicode_FromASCII */
#define _PyUnicode_FROM_ASCII(s) _PyUnicode_FromASCII((s), strlen(s))
```
I believe recent compilers optimize away calls of strlen().
|
msg346116 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2019-06-20 11:58 |
> #define _PyUnicode_FROM_ASCII(s) _PyUnicode_FromASCII((s), strlen(s))
LGTM.
|
msg346117 - (view) |
Author: Inada Naoki (methane) * |
Date: 2019-06-20 12:27 |
I confirmed at least one measurable speedup:
./python -m pyperf timeit -s 'd={}' -- 'repr(d)'
FromString: Mean +- std dev: 157 ns +- 3 ns
FROM_ASCII: Mean +- std dev: 132 ns +- 3 ns
>>> (157-132)/157
0.1592356687898089
|
msg346118 - (view) |
Author: Inada Naoki (methane) * |
Date: 2019-06-20 12:39 |
Should we make these APIs public?
* rename `_PyUnicode_FromASCII` to `PyUnicode_FromASCIIAndSize`.
* add `PyUnicode_FromASCII` as static inline function (without ABI).
* add `#define _PyUnicode_FromASCII PyUnicode_FromASCIIAndSize` for source code compatibility.
|
msg346121 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2019-06-20 14:17 |
Most of changes are in not performance sensitive code. I do not think there is a benefit of using new macro there.
If PyUnicode_FromString() is slow I prefer to optimize it instead of adding yet one esoteric private function for internal needs.
In case of dict.__repr__() we can get even more gain by using _Py_IDENTIFIER or more general API proposed by Victor.
|
msg346127 - (view) |
Author: Inada Naoki (methane) * |
Date: 2019-06-20 16:03 |
> Most of changes are in not performance sensitive code. I do not think there is a benefit of using new macro there.
Because I used just sed.
> If PyUnicode_FromString() is slow I prefer to optimize it instead of adding yet one esoteric private function for internal needs.
OK. There are some code like `PyUnicode_FromString(name)`. Optimizing PyUnicode_FromString will be more effective. I created PR 14273.
> In case of dict.__repr__() we can get even more gain by using _Py_IDENTIFIER or more general API proposed by Victor.
Of course, I used it just for micro benchmarking. Optimizing it is not a goal. In case of PR 14273:
$ ./python -m pyperf timeit -s 'd={}' -- 'repr(d)'
.....................
Mean +- std dev: 138 ns +- 2 ns
|
msg346132 - (view) |
Author: Inada Naoki (methane) * |
Date: 2019-06-20 16:32 |
Oh, wait. Why we used _PyUnicodeWriter here?
Decoding UTF-8 must not require it. 2-pass is enough.
|
msg346134 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2019-06-20 16:33 |
> _PyUnicode_FromASCII(s, len) is faster than PyUnicode_FromString(s) because PyUnicode_FromString() uses temporary _PyUnicodeWriter to support UTF-8.
I don't understand how _PyUnicodeWriter could be slow. It does not overallocate by default. It's just wrapper to implement efficient memory management.
> Oh, wait. Why we used _PyUnicodeWriter here?
To optimize decoding errors: the error handler can use replacement string longer than 1 character. Overallocation is used in this case.
|
msg346202 - (view) |
Author: Inada Naoki (methane) * |
Date: 2019-06-21 10:30 |
> I don't understand how _PyUnicodeWriter could be slow. It does not overallocate by default. It's just wrapper to implement efficient memory management.
I misunderstood _PyUnicodeWriter. I thought it caused one more allocation, but it doesn't.
But _PyUnicodeWriter is still slow, because gcc and clang are not smart enough to optimize _PyUnicodeWriter_Init() & _PyUnicodeWriter_Prepare().
See this example:
```
#define PY_SSIZE_T_CLEAN
#include <Python.h>
#define S(s) (s),strlen(s)
int
main(int argc, char *argv[])
{
Py_Initialize();
for (int i=0; i<100000000; i++) {
//PyObject *s = PyUnicode_FromString("aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa");
PyObject *s = _PyUnicode_FromASCII(S("aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"));
Py_DECREF(s);
}
return 0;
}
```
PyUnicode_FromString() takes about 4 sec on my machine. _PyUnicode_FromASCII() is about 2 sec.
By skipping _PyUnicodeWriter for ASCII string (GH-14283), PyUnicode_FromString() takes about 3 sec.
```
$ time ./x # PyUnicode_FromString
real 0m4.085s
user 0m4.081s
sys 0m0.004s
$ time ./y # PyUnicode_FromString (skip _PyUnicode_Writer, GH-14283)
real 0m2.988s
user 0m2.988s
sys 0m0.000s
$ time ./z # _PyUnicode_FromASCII
$ time ./z
real 0m1.975s
user 0m1.975s
sys 0m0.000s
```
|
msg346203 - (view) |
Author: Inada Naoki (methane) * |
Date: 2019-06-21 10:37 |
Another micro benchmark:
```
$ ./python-master -m pyperf timeit -o m1.json 'b=b"foobar"' -- 'b.decode()'
.....................
Mean +- std dev: 93.1 ns +- 2.4 ns
$ ./python -m pyperf timeit -o m2.json 'b=b"foobar"' -- 'b.decode()'
.....................
Mean +- std dev: 83.1 ns +- 2.6 ns
$ ./python -m pyperf compare_to m1.json m2.json
Mean +- std dev: [m1] 93.1 ns +- 2.4 ns -> [m2] 83.1 ns +- 2.6 ns: 1.12x faster (-11%)
```
|
msg346237 - (view) |
Author: Inada Naoki (methane) * |
Date: 2019-06-21 17:10 |
PR 14291 seems simpler than PR 14283. But PR 14283 is faster because _PyUnicodeWriter is a learge struct.
master: 3.7sec
PR 14283: 2.9sec
PR 14291: 3.45sec
compiler: gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0
without LTO, without PGO
|
msg346247 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2019-06-21 21:03 |
I'm confused by the issue title: PyUnicode_GetString() doesn't exist, it's PyUnicode_FromString() :-) I changed the title.
|
msg346351 - (view) |
Author: Inada Naoki (methane) * |
Date: 2019-06-24 03:30 |
New changeset 770847a7db33b3d4c451b42372b6942687aa6121 by Inada Naoki in branch 'master':
bpo-37348: optimize decoding ASCII string (GH-14283)
https://github.com/python/cpython/commit/770847a7db33b3d4c451b42372b6942687aa6121
|
msg346354 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2019-06-24 06:13 |
Could you please measure the performance for long strings (1000, 10000 and 100000 characters): a long ASCII string and a long ASCII string ending with a non-ASCII character?
|
msg346356 - (view) |
Author: Inada Naoki (methane) * |
Date: 2019-06-24 07:08 |
This optimization is only for short strings. There is no significant difference for long and non-ASCII strings.
```
# 1000 ASCII
$ ./python -m pyperf timeit --compare-to=./python-master -s 'b=b"f"*1000' -- 'b.decode()'
python-master: ..................... 196 ns +- 1 ns
python: ..................... 185 ns +- 1 ns
Mean +- std dev: [python-master] 196 ns +- 1 ns -> [python] 185 ns +- 1 ns: 1.06x faster (-6%)
# 10000 ASCII
$ ./python -m pyperf timeit --compare-to=./python-master -s 'b=b"f"*10000' -- 'b.decode()'
python-master: ..................... 671 ns +- 4 ns
python: ..................... 662 ns +- 1 ns
Mean +- std dev: [python-master] 671 ns +- 4 ns -> [python] 662 ns +- 1 ns: 1.01x faster (-1%)
# 100000 ASCII
$ ./python -m pyperf timeit --compare-to=./python-master -s 'b=b"f"*100000' -- 'b.decode()'
python-master: ..................... 5.86 us +- 0.03 us
python: ..................... 5.85 us +- 0.02 us
Mean +- std dev: [python-master] 5.86 us +- 0.03 us -> [python] 5.85 us +- 0.02 us: 1.00x faster (-0%)
# 1 non-ASCII
$ ./python -m pyperf timeit --compare-to=./python-master -s 'b="あ".encode()' -- 'b.decode()'
python-master: ..................... 138 ns +- 2 ns
python: ..................... 136 ns +- 2 ns
Mean +- std dev: [python-master] 138 ns +- 2 ns -> [python] 136 ns +- 2 ns: 1.02x faster (-2%)
# 1000 ASCII + 1 non-ASCII
$ ./python -m pyperf timeit --compare-to=./python-master -s 'b=b"x"*1000 + "あ".encode()' -- 'b.decode()'
python-master: ..................... 361 ns +- 9 ns
python: ..................... 360 ns +- 5 ns
Mean +- std dev: [python-master] 361 ns +- 9 ns -> [python] 360 ns +- 5 ns: 1.00x faster (-0%)
Not significant!
# 10000 ASCII + 1 non-ASCII
$ ./python -m pyperf timeit --compare-to=./python-master -s 'b=b"x"*10000 + "あ".encode()' -- 'b.decode()'
python-master: ..................... 2.83 us +- 0.02 us
python: ..................... 2.83 us +- 0.03 us
Mean +- std dev: [python-master] 2.83 us +- 0.02 us -> [python] 2.83 us +- 0.03 us: 1.00x slower (+0%)
Not significant!
```
|
|
Date |
User |
Action |
Args |
2022-04-11 14:59:16 | admin | set | github: 81529 |
2019-06-24 07:08:11 | methane | set | messages:
+ msg346356 |
2019-06-24 06:13:28 | serhiy.storchaka | set | messages:
+ msg346354 |
2019-06-24 03:30:43 | methane | set | status: open -> closed resolution: fixed stage: patch review -> resolved |
2019-06-24 03:30:30 | methane | set | messages:
+ msg346351 |
2019-06-21 21:03:51 | vstinner | set | messages:
+ msg346247 title: Optimize PyUnicode_GetString for short ASCII strings -> Optimize PyUnicode_FromString for short ASCII strings |
2019-06-21 17:10:02 | methane | set | messages:
+ msg346237 |
2019-06-21 16:53:37 | methane | set | pull_requests:
+ pull_request14114 |
2019-06-21 10:37:24 | methane | set | messages:
+ msg346203 |
2019-06-21 10:30:07 | methane | set | messages:
+ msg346202 |
2019-06-21 10:28:36 | methane | set | pull_requests:
+ pull_request14105 |
2019-06-20 16:33:54 | vstinner | set | messages:
+ msg346134 |
2019-06-20 16:32:09 | methane | set | messages:
+ msg346132 |
2019-06-20 16:03:34 | methane | set | messages:
+ msg346127 title: add _PyUnicode_FROM_ASCII macro -> Optimize PyUnicode_GetString for short ASCII strings |
2019-06-20 16:00:33 | methane | set | pull_requests:
+ pull_request14097 |
2019-06-20 14:17:35 | serhiy.storchaka | set | nosy:
+ serhiy.storchaka messages:
+ msg346121
|
2019-06-20 12:39:17 | methane | set | messages:
+ msg346118 |
2019-06-20 12:27:06 | methane | set | messages:
+ msg346117 |
2019-06-20 11:58:17 | vstinner | set | nosy:
+ vstinner messages:
+ msg346116
|
2019-06-20 11:41:20 | methane | set | keywords:
+ patch stage: patch review pull_requests:
+ pull_request14091 |
2019-06-20 11:40:57 | methane | create | |