This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Folding ''.join() into f-strings
Type: Stage:
Components: Versions:
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: BTaskaya, eric.smith, pablogsal, rhettinger, serhiy.storchaka
Priority: normal Keywords:

Created on 2021-05-20 19:03 by BTaskaya, last changed 2022-04-11 14:59 by admin.

Messages (7)
msg394049 - (view) Author: Batuhan Taskaya (BTaskaya) * (Python committer) Date: 2021-05-20 19:03
Since strings are immutable types, we could fold some operation on top of them. Serhiy has done some work on issue 28307 regarding folding `str % args` combination, which I think we can extend even further. One simple example that I daily encounter is that, there is this pattern

'/'.join([something, something_else, user_arg])

where people would join N number of elements together with a separator, where the N is something that we know. Just a search over some locally cloned PyPI projects (a couple thousand, showed 5000+ occurrences where this optimization applies).

The preliminary benchmarks indicate that the speedup is about %40. Though there are multiple issues that might concern others:
   - type checking, f-strings cast automatically but .join() requires each element to be a string subclass. The current implementation introduces a new conversion called 'c' which actually does the type checking instead of converting the value.
   - preventing a call to a runtime function, I belive that this work is not that different than issue 28307 which prevents str.__mod__ though I understand the concern

Here is the implementation if anybody wants to take a look at it: https://github.com/isidentical/cpython/commit/d7ea8f6e38578ba06d28deb4b4a8df676887ec26

I believe that the implementation can be optimized further (etc if a continuous block of elements are a string, then we can load all of them in one go!). And some cases proved that the f-strings might be faster than the join by 1.7-1.8x.
msg394092 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2021-05-21 01:10
To be pedantic, f-strings don't "cast" to a string, they call format(obj) on the argument. Typically this is the same as str(obj), but it need not be. I guess if all of the arguments are already exact strings then this distinction doesn't matter, but I'd have to give it some more thought.

I've never seen the pattern of joining a fixed size list, but I guess it exists in the wild. I'm skeptical that this optimization is worth doing. We should check on a real-world benchmark instead of a micro benchmark.
msg394152 - (view) Author: Batuhan Taskaya (BTaskaya) * (Python committer) Date: 2021-05-21 20:41
> We should check on a real-world benchmark instead of a micro benchmark.

It is a pretty-targeted optimization, so I don't expect it to speed up the macro benchmarks. Though that shouldn't be a blocker for small, targeted optimizations.
msg394153 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2021-05-21 20:45
> It is a pretty-targeted optimization, so I don't expect it to speed up the macro benchmarks. Though that shouldn't be a blocker for small, targeted optimizations.

In the past we've rejected optimizations that make the code more complex and don't result in any noticeable real-world speedups. I don't think that policy has changed.
msg394154 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2021-05-21 20:54
I'm also dubious that this would be of value in real code.  Looking at the implementation, it seems to throw way too much code at too small of a problem.  I suspect is is more likely to cause maintenance problems than to produce noticeable benefits for users.

Historically, we've avoided folding higher level operations.  Serhiy's optimization of str.__mod__ went beyond those limits and shouldn't set a precedent that we will regret later.
msg394222 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2021-05-23 20:02
I only seen this idiom in the code of Argument Clinic (written by Larry). I am sure that it is used in third-party code, but I do not know how common is it.

As for the proposed code, checking that the value is a string is not enough.

1. It can be a str subclass with an overridden __format__. In this case the result will be different.
2. If it is a str subclass without an overridden __format__, the optimization adds an overhead for looking up and calling the __format__ method. It can reverse the benefit in this case.

To fix this you can skip formatting for this "converter". But there is a corner case for ''.join([value]). The result should always be a str. It may be complicated. You may need two new converter codes.

3. It is worth to merge consequent string constants and skip empty string constants. '/'.join([base, 'data', user_arg]) should produce [base, '/data/', user_arg] instead of [base, '/', 'data', '/', user_arg], and ''.join([base, 'data', user_arg]) should produce [base, 'data', user_arg] instead of [base, '', 'data', '', user_arg].
msg394224 - (view) Author: Batuhan Taskaya (BTaskaya) * (Python committer) Date: 2021-05-23 20:16
> 1. It can be a str subclass with an overridden __format__. In this case the result will be different.

Thanks! Though this should be possible to do, so not a blocker for suggestion (if I am not missing something)

> 2. If it is a str subclass without an overridden __format__, the optimization adds an overhead for looking up and calling the __format__ method. It can reverse the benefit in this case.

I'll double check, though the compromise seems to make sense considering that %99 of the time this use case is with raw strings.

> 3. It is worth to merge consequent string constants and skip empty string constants. '/'.join([base, 'data', user_arg]) should produce [base, '/data/', user_arg] instead of [base, '/', 'data', '/', user_arg], and ''.join([base, 'data', user_arg]) should produce [base, 'data', user_arg] instead of [base, '', 'data', '', user_arg].

> I believe that the implementation can be optimized further (etc if a continuous block of elements are a string, then we can load all of them in one go!). And some cases proved that the f-strings might be faster than the join by 1.7-1.8x.


I am aware of these possible cases as I stated in the last paragraph, though didn't want to start with them on the PoC. And from what I observe, the speedup goes even further with these.
History
Date User Action Args
2022-04-11 14:59:45adminsetgithub: 88360
2021-05-23 20:16:20BTaskayasetmessages: + msg394224
2021-05-23 20:02:09serhiy.storchakasetmessages: + msg394222
2021-05-21 20:54:52rhettingersetnosy: + rhettinger
messages: + msg394154
2021-05-21 20:45:21eric.smithsetmessages: + msg394153
2021-05-21 20:41:57BTaskayasetmessages: + msg394152
2021-05-21 01:10:12eric.smithsetmessages: + msg394092
2021-05-20 19:03:34BTaskayacreate