Issue27078
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2016-05-21 19:18 by ztane, last changed 2022-04-11 14:58 by admin. This issue is now closed.
Files | ||||
---|---|---|---|---|
File name | Uploaded | Description | Edit | |
fstrtup.patch | Demur Rumed, 2016-07-13 01:10 | review | ||
fstrtup2.patch | Demur Rumed, 2016-07-13 13:14 | BINARY_ADD | review | |
fstring_build_string.patch | serhiy.storchaka, 2016-07-13 20:11 | review | ||
fstring_build_string_2.patch | serhiy.storchaka, 2016-07-15 19:41 | review |
Messages (32) | |||
---|---|---|---|
msg266016 - (view) | Author: Antti Haapala (ztane) * | Date: 2016-05-21 19:18 | |
I benchmarked some f'' strings against .format, and surprisingly f'' was slower than .format in about all the simple cases I could think of. I guess this is because f'' strings implicitly use `''.join([])`. The byte code for f'foo is {foo}' currently is 1 0 LOAD_CONST 1 ('') 3 LOAD_ATTR 0 (join) 6 LOAD_CONST 2 ('foo is ') 9 LOAD_GLOBAL 1 (foo) 12 FORMAT_VALUE 0 15 BUILD_LIST 2 18 CALL_FUNCTION 1 (1 positional, 0 keyword pair) It was mentioned here https://bugs.python.org/issue24965 but since I came up with the idea when inspecting this, I'd propose again here that a new opcode be added for f'' strings - BUILD_STRING n, with which f'foo is {foo}' could be compiled to 0 LOAD_CONST 2 ('foo is ') 3 LOAD_GLOBAL 1 (foo) 6 FORMAT_VALUE 0 9 BUILD_STRING 2 instead. Internally this wouldn't even need to call `PyUnicode_Join`, but instead `seplen == 0` case could be refactored into a separate function. Not only with this is it possible to know the length of the string, but also the number of string components are already known, so it'd be possible to build a tuple fast from the values on stack. And that way people doing micro benchmarks would find out that the new Python 3.6 feature would not only look nice but also be a way to write better-performing code. |
|||
msg266021 - (view) | Author: Eric V. Smith (eric.smith) * ![]() |
Date: 2016-05-21 20:15 | |
I considered doing this, and have some of it implemented. I discussed it with Larry Hastings and Mark Shannon at PyCon Ireland, and we decided to just use ''.format() and the new FORMAT_VALUE opcode, since that was the simplest way to fix the previous implementation's faults. That doesn't mean I've given up on improving it, of course. The speed increase would indeed come from avoiding the LOAD_ATTR lookup, not building list, and not calling a function. I'm curious about your benchmarks. Can you share them? I've found f-strings to be typically faster than .format(). But I can't say my benchmarks are particularly awesome. |
|||
msg266031 - (view) | Author: Martijn Pieters (mjpieters) * | Date: 2016-05-21 22:16 | |
The catalyst for this question was a Stack Overflow question I answered: https://stackoverflow.com/questions/37365311/why-are-python-3-6-literal-formatted-strings-so-slow Compared the `str.format()` the BUILD_LIST is the bottleneck here; dropping the LOAD_ATTR and CALL_FUNCTION opcodes are nice bonuses. |
|||
msg270083 - (view) | Author: Antti Haapala (ztane) * | Date: 2016-07-10 12:52 | |
I am not an expert on writing new opcodes to CPython (having never done it, don't know where to change the disassembler and such, how to make compiler generate them properly and such), but I'd be glad to help with testing, timing and writing the possible joiner algorithm for it, to help it make into Python 3.6. |
|||
msg270095 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * ![]() |
Date: 2016-07-10 16:08 | |
For comparison: $ ./python -m timeit -s "x = 2" -- "f'X is {x}'" 1000000 loops, best of 3: 1.04 usec per loop $ ./python -m timeit -s "x = 2; j = ''.join" -- "j(['X is ', f'{x}'])" 1000000 loops, best of 3: 0.93 usec per loop $ ./python -m timeit -s "x = 2" -- "'X is {}'.format(x)" 1000000 loops, best of 3: 0.808 usec per loop $ ./python -m timeit -s "x = 2; f = 'X is {}'.format" -- "f(x)" 1000000 loops, best of 3: 0.748 usec per loop $ ./python -m timeit -s "x = 2" -- "'X is %s' % (x,)" 1000000 loops, best of 3: 0.467 usec per loop |
|||
msg270135 - (view) | Author: Antti Haapala (ztane) * | Date: 2016-07-10 21:35 | |
And the expected performance for optimal `f'X is {x}'` code would be *faster* than `"'X is %s' % (x,)"` which still needs to interpret the string at runtime, and build a proper tuple object on stack. |
|||
msg270136 - (view) | Author: Eric V. Smith (eric.smith) * ![]() |
Date: 2016-07-10 22:18 | |
> And the expected performance for optimal `f'X is {x}'` code would > be *faster* than `"'X is %s' % (x,)"` which still needs to > interpret the string at runtime, and build a proper tuple object > on stack. That's not necessarily true. The f-string version still needs to invoke the .format() method on the object, instead of only working for a handful of hard-coded types. I'm not saying there aren't optimization opportunities, but it may be that %-formatting is always faster. |
|||
msg270167 - (view) | Author: Antti Haapala (ztane) * | Date: 2016-07-11 09:25 | |
Ah so it seems. Somehow I thought __format__ was slotted, but that is not the case and it needs to be looked up, and what is worse, of course a tuple needs to be built as well. Oh well, at least it should be quite minimal to make it be faster than `f(x)` there, which necessarily has one extra bound method call and interpretation of format string as the overhead, so there's minimally at least 30 % performance boost to achieve. |
|||
msg270280 - (view) | Author: Philip Dubé (Demur Rumed) * | Date: 2016-07-13 01:03 | |
The simplest perf fix is to first use BUILD_TUPLE instead of BUILD_LIST timeit 'x=1;(x,x,x)' 0.36 usec per loop timeit 'x=1;[x,x,x]' 0.425 usec per loop Introducing a new opcode BUILD_STRING to inline PyTuple_New + PyUnicode_Join to replace BUILD_TUPLE + CALL_FUNCTION should benchmark against BUILD_TUPLE version, not BUILD_LIST + CALL_FUNCTION |
|||
msg270281 - (view) | Author: Philip Dubé (Demur Rumed) * | Date: 2016-07-13 01:10 | |
Benchmarked f'X is {x}' with BUILD_TUPLE change: Before: 6.62 usec per loop After: 6.33 usec per loop f'X is {x} {x+2} {x+3}' Before: 15.1 usec per loop After: 14.7 usec per loop Attached patch |
|||
msg270308 - (view) | Author: Philip Dubé (Demur Rumed) * | Date: 2016-07-13 13:14 | |
fstrtup2.patch is a bit more of an involved optimization for when we are joining 2 strings. Instead it emits BINARY_ADD. This may be considered too 'niche' since it only triggers when the substitution occurs at the start or end of a string & there is only one substitution However, it reduces the "X is {x}" testcase on my machine to 4.9 usec Interesting thing, to consider ceiling of what we can do, Prefixing ./python -m timeit -s "x=1" '%s'%x 1.08 usec '%s'%(x,) 2.04 usec str(x) 2.9 usec f'{x}' 3.65 usec So we should not be aiming to compete with %. It may be worthy to convert the code to generate "x is {}".format calls tho |
|||
msg270312 - (view) | Author: Antti Haapala (ztane) * | Date: 2016-07-13 14:44 | |
Yet the test cases just prove what is so expensive there: name lookups (global name `str`; looking up `join` on a string instance); building a tuple (for function arguments) is expensive as well. Of course `__format__` will be costly as well as it is not a slot-method, needs to build a new string etc. However for strings, 'foo'.format() already returns the instance itself, so if you were formatting other strings into strings there are cheap shortcuts available to even overtake a = 'Hello' b = 'World' '%s %s' % (a, b) for fast string templates, namely, make FORMAT_VALUE without args return the original if `PyUnicode_CheckExact` and no arguments, don't need to build a tuple to join it. |
|||
msg270319 - (view) | Author: Philip Dubé (Demur Rumed) * | Date: 2016-07-13 15:47 | |
I'm not understanding your message. We don't call FORMAT_VALUE on constant strings in f"x is {x}" & FORMAT_VALUE doesn't take an argument. Are you saying in a hypothetical FORMAT_VALUE where BUILD_STRING takes a set of objects & applies formatting to them, thus allowing us to remove FORMAT_VALUE as an opcode? That seems like I'm imposing my own internal thoughts on what you're saying, but when I don't understand what someone's saying I'm prone to raise my own imaginations. Also note that f'{x}' compiles to 'LOAD X, FORMAT_VALUE' so there is no join lookup in the last benchmarks I posted Nitpick about fstrtup2: it assumes compiler_joined_str's len is at least 2. Either an assert should be added or the last if-else should be `else if (len == 1)` instead of a plain `else` |
|||
msg270333 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * ![]() |
Date: 2016-07-13 18:09 | |
Yet one case for comparison (with msg270095): $ ./python -m timeit -s "x = 2" -- 'f"X is {x!s}"' 1000000 loops, best of 3: 0.625 usec per loop Seems f'{x!s}' is the fastest way to stringify a value. Thus there is an opportunity to speed up default formatting by special casing PyObject_Format() for most popular types (str and int). |
|||
msg270339 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * ![]() |
Date: 2016-07-13 20:11 | |
Proposed patch adds the BUILD_STRING opcode and speeds up PyObject_Format() in common cases. It makes f-strings the fastest method for simple formatting. $ ./python -m timeit -s "x = 2" -- 'f"X is {x}"' 1000000 loops, best of 3: 0.347 usec per loop |
|||
msg270344 - (view) | Author: Antti Haapala (ztane) * | Date: 2016-07-13 20:40 | |
It seems Eric has done some special casing for strings already in FORMAT_VALUE. Here are the results from my computer after applying Demur's patch for concatenating *strings*. python3.6 -m timeit -s "x = 'a'" -- '"X is %s" % x' 1000000 loops, best of 3: 0.187 usec per loop python3.6 -m timeit -s "x = 'a'" -- 'f"X is {x}"' 10000000 loops, best of 3: 0.0972 usec per loop But then as more components are added, it starts to slow down rapidly: python3.6 -m timeit -s "x = 'a'" -- 'f"X is {x}a"' 1000000 loops, best of 3: 0.191 usec per loop python3.6 -m timeit -s "x = 'a'" -- '"X is %sa" % x' 1000000 loops, best of 3: 0.216 usec per loop Of course there is also the matter of string conversion vs "look up __format__": python3.6 -m timeit -s "x = 1" -- 'f"X is {x}"' 1000000 loops, best of 3: 0.349 usec per loop python3.6 -m timeit -s "x = 1" -- 'f"X is {x!s}"' 10000000 loops, best of 3: 0.168 usec per loop For FORMAT_VALUE opcode already has a special case for the latter. However I'd too say that switch/case for the some fastest builtin types in `PyObject_Format` as Eric has intended to do with Unicode in PyObject_Format (str, int, float), as those are commonly used to build strings such as text templates, text-like protocols like emails, HTTP protocol headers and such. For others the speed-up wouldn't really matter either way: either PyObject_Format would fall back to object.__format__ (for example functions) - no one really cares about efficiency when doing such debug dumps - if you cared, you'd not do them at all; or they'd have complex representations (say, lists, dictionaries) - and their use again would mostly be that of debug dumps; or they'd have `__format__` written in Python and that'd be dwarfed by anything so far. |
|||
msg270347 - (view) | Author: Antti Haapala (ztane) * | Date: 2016-07-13 22:04 | |
Thanks Serhiy, I was writing my comment for a long time, and only now noticed that you'd already posted the patch. Indeed, it seems that not only is this the fastest method, it might also be the fastest string concatenation method in the history of Python. I did some comparison with 8-bit strings in Python 2, doing the equivalent of f'http://{domain}/{lang}/{path}' with domain = 'some_really_long_example.com' lang = 'en' path = 'some/really/long/path/' and the results look quite favourable: 0.151 µs with your patch; 0.250ish for the second fastest method in Python 3.6 (''.join(tuple)) And the fastest method in Python 2 is a tie between concatenating with + or ''.join with bound method reference; both of them take 0.203 µs on Python 2.7 with 8-bit strings and about 0.245 - 0.255 µs if everything is Unicode. |
|||
msg270469 - (view) | Author: Antti Haapala (ztane) * | Date: 2016-07-15 07:03 | |
Serhiy suggested this in Rietveld: > For additional optimization we can pack all constant strings, parsed formats and > flags in one constant object and use the single LOAD_CONST. But this requires > much larger changes (perhaps including changing the marshal format), and the > benefit may be small. Maybe we'll get to it eventually, if this approach proves > efficient enough. I was thinking about this and got an idea on how to do this too, without changes to marshal. Essentially, let TOS be a tuple of (flags, str1, str2, str3, str4, str5, str6, str7, str8, str9...) flags would be n bytes for n-part format string; each byte would tell whether: - the next component is a constant string (bit 0 = 0) from the tuple - the next component is an interpolated value (bit 0 = 1) - and whether it has !s, !r, !a or default conversions (bits 1-2) - and whether it has extra argument to format() or not (bit 3) (argument is the next string from the tuple) thus that tuple for a, b = 'Hello', 'World!' f'{a!s} {b:10}!' would be (b'\x03\x00\x05\x00', ' ', '10', '!') and the opcodes would be LOAD_FAST (b) LOAD_FAST (a) LOAD_CONST (0) (the tuple) BUILD_FORMAT_STRING 3 |
|||
msg270505 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * ![]() |
Date: 2016-07-15 19:41 | |
Nice idea, Antti. But I tried to implement it, and surprisingly found that this approach is slower than FORMAT_VALUE + BUILD_STRING. At least for this particular example. Perhaps because we can't use a stack and need to allocate a new tuple containing literal strings and formatted values for PyUnicode_Join(). Not mentioning that the code is much more complex. Here is updated previous patch with fixed leak. |
|||
msg271743 - (view) | Author: Antti Haapala (ztane) * | Date: 2016-07-31 14:33 | |
I would very much like to see this in 3.6. Who could review it? |
|||
msg271745 - (view) | Author: Eric V. Smith (eric.smith) * ![]() |
Date: 2016-07-31 17:22 | |
I intend to review this. As this is not a new feature, it doesn't need to be completed by beta 1. I'm focusing my energies on new features, then I'll look at this. |
|||
msg271747 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * ![]() |
Date: 2016-07-31 17:59 | |
Since this introduces a new opcode, this is a new feature. Seems opcodes never were added at beta stage before 3.5b1. |
|||
msg273776 - (view) | Author: Antti Haapala (ztane) * | Date: 2016-08-27 12:23 | |
So does this (new opcode) count as a new feature? It would be great to give f'' strings a flying start, saying that not only they're cool, they're also faster than anything that you've used before. Here some more mini-benchmarks with serhiy's patch2 applied, the times are pretty stable: % python3.6 -mtimeit -s 'x = 42' '"%s-" % x' 10000000 loops, best of 3: 0.184 usec per loop % python3.6 -mtimeit -s 'x = 42' 'f"{x}-"' 10000000 loops, best of 3: 0.142 usec per loop and % python3.6 -mtimeit -s 'x = "42"' 'f"{x}{x}"' 10000000 loops, best of 3: 0.0709 usec per loop % python3.6 -mtimeit -s 'x = "42"' '"%s%s" % (x,x)' 1000000 loops, best of 3: 0.213 usec per loop python3.6 -mtimeit -s 'x = "42"' '"".join((x, x))' 10000000 loops, best of 3: 0.155 usec per loop This is only really achievable with some kind of bytecode support. |
|||
msg273778 - (view) | Author: Philip Dubé (Demur Rumed) * | Date: 2016-08-27 13:04 | |
I don't think lack of precedence is a reason to say new opcodes are new features. More that generally new opcodes are created for new features |
|||
msg273816 - (view) | Author: Raymond Hettinger (rhettinger) * ![]() |
Date: 2016-08-28 08:27 | |
I put this in the "new feature" category in the sense that after beta we're trying to stabilize the code base. Introducing a new opcode is a higher risk change that needs to go in the release cycle as early as possible. |
|||
msg274322 - (view) | Author: Eric V. Smith (eric.smith) * ![]() |
Date: 2016-09-03 19:48 | |
Left a review comment. I'd like to see this in before 3.6 beta 1. |
|||
msg274326 - (view) | Author: Antti Haapala (ztane) * | Date: 2016-09-03 20:40 | |
I tried to post a comment on rietveld, but 500 infernal error... PyUnicode_New(0, 0) will return unicode_empty. If unicode_empty is NULL, then it will also initialize it. It would be cleanest if unicode_empty was statically created. NULL cannot be used as the separator argument to `PyUnicode_Join(PyObject *separator, PyObject *seq)` to mean an empty string, as it already means ' ' (space!). |
|||
msg274327 - (view) | Author: Eric V. Smith (eric.smith) * ![]() |
Date: 2016-09-03 20:58 | |
Of course. Never mind. LGTM. |
|||
msg274345 - (view) | Author: Antti Haapala (ztane) * | Date: 2016-09-04 06:58 | |
Though it is clean to do like this: let _PyUnicode_JoinArray have `NULL` mean empty string, as it is more logical anyway; then PyUnicode_Join itself just needs to: if (separator == NULL) { separator = PyUnicode_FromOrdinal(' '); /* check omitted */ res = _PyUnicode_JoinArray(separator, items, seqlen); Py_DECREF(separator); } else { res = _PyUnicode_JoinArray(separator, items, seqlen); } The NULL argument I guess isn't that common to pass to PyUnicode_Join (Python code especially would always have to pass in ' ' instead), so this shouldn't really affect performance for any case at all. |
|||
msg274605 - (view) | Author: Roundup Robot (python-dev) ![]() |
Date: 2016-09-06 19:08 | |
New changeset 28e280915508 by Serhiy Storchaka in branch 'default': Issue #27078: Added BUILD_STRING opcode. Optimized f-strings evaluation. https://hg.python.org/cpython/rev/28e280915508 |
|||
msg274608 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * ![]() |
Date: 2016-09-06 19:23 | |
Thank you for your review Eric. I considered passing NULL, but as Antti said, it is already used for space separated concatenation. PyUnicode_New(0, 0) is the obvious way to get an empty string, and I think it is fast enough. If this affects performance we can add additional microoptimization later. |
|||
msg274610 - (view) | Author: Eric V. Smith (eric.smith) * ![]() |
Date: 2016-09-06 19:31 | |
Now that I've looked at PyUnicode_New, I agree with using PyUnicode_New(0, 0). I can't imagine we could measure the difference with optimizing it in the opcode itself before calling PyUnicode_New. Thanks for adding this, Serhiy. I think it's great stuff, and works well with FORMAT_VALUE. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:58:31 | admin | set | github: 71265 |
2016-09-06 19:31:16 | eric.smith | set | assignee: eric.smith -> serhiy.storchaka messages: + msg274610 |
2016-09-06 19:23:54 | serhiy.storchaka | set | status: open -> closed resolution: fixed messages: + msg274608 stage: resolved |
2016-09-06 19:08:16 | python-dev | set | nosy:
+ python-dev messages: + msg274605 |
2016-09-04 06:58:22 | ztane | set | messages: + msg274345 |
2016-09-03 20:58:25 | eric.smith | set | messages: + msg274327 |
2016-09-03 20:40:37 | ztane | set | messages: + msg274326 |
2016-09-03 19:48:13 | eric.smith | set | messages: + msg274322 |
2016-08-28 08:27:10 | rhettinger | set | messages: + msg273816 |
2016-08-27 13:04:54 | Demur Rumed | set | messages: + msg273778 |
2016-08-27 12:23:50 | ztane | set | messages: + msg273776 |
2016-07-31 17:59:17 | serhiy.storchaka | set | messages: + msg271747 |
2016-07-31 17:22:05 | eric.smith | set | messages: + msg271745 |
2016-07-31 14:33:09 | ztane | set | messages: + msg271743 |
2016-07-15 19:41:07 | serhiy.storchaka | set | files:
+ fstring_build_string_2.patch messages: + msg270505 |
2016-07-15 07:03:00 | ztane | set | messages: + msg270469 |
2016-07-13 22:04:03 | ztane | set | messages: + msg270347 |
2016-07-13 21:56:47 | Demur Rumed | set | nosy:
+ rhettinger |
2016-07-13 20:40:31 | ztane | set | messages: + msg270344 |
2016-07-13 20:11:37 | serhiy.storchaka | set | files:
+ fstring_build_string.patch messages: + msg270339 |
2016-07-13 18:09:17 | serhiy.storchaka | set | messages: + msg270333 |
2016-07-13 17:02:38 | Demur Rumed | set | type: enhancement -> performance |
2016-07-13 15:47:56 | Demur Rumed | set | messages: + msg270319 |
2016-07-13 14:44:09 | ztane | set | messages: + msg270312 |
2016-07-13 13:14:44 | Demur Rumed | set | files:
+ fstrtup2.patch messages: + msg270308 |
2016-07-13 01:10:15 | Demur Rumed | set | files:
+ fstrtup.patch keywords: + patch messages: + msg270281 |
2016-07-13 01:03:24 | Demur Rumed | set | messages: + msg270280 |
2016-07-11 09:25:06 | ztane | set | messages: + msg270167 |
2016-07-10 22:18:43 | eric.smith | set | messages: + msg270136 |
2016-07-10 21:35:56 | ztane | set | messages: + msg270135 |
2016-07-10 16:10:10 | serhiy.storchaka | set | nosy:
+ Demur Rumed |
2016-07-10 16:08:43 | serhiy.storchaka | set | nosy:
+ serhiy.storchaka messages: + msg270095 |
2016-07-10 12:52:17 | ztane | set | messages: + msg270083 |
2016-05-21 22:16:07 | mjpieters | set | messages: + msg266031 |
2016-05-21 22:12:17 | mjpieters | set | nosy:
+ mjpieters |
2016-05-21 20:15:28 | eric.smith | set | messages: + msg266021 |
2016-05-21 19:44:44 | eric.smith | set | assignee: eric.smith |
2016-05-21 19:22:45 | ned.deily | set | nosy:
+ eric.smith |
2016-05-21 19:18:36 | ztane | create |