Issue44276
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2021-06-01 05:08 by Kshitiz17, last changed 2022-04-11 14:59 by admin. This issue is now closed.
Files | ||||
---|---|---|---|---|
File name | Uploaded | Description | Edit | |
match_test.py | Kshitiz17, 2021-06-02 07:43 | |||
if_match.py | brandtbucher, 2021-06-02 16:51 |
Messages (14) | |||
---|---|---|---|
msg394839 - (view) | Author: Kshitiz Arya (Kshitiz17) * | Date: 2021-06-01 05:08 | |
Replace if-elif-else with match-case for pattern matching, which is generally faster and more intuitive. |
|||
msg394840 - (view) | Author: Karthikeyan Singaravelan (xtreak) * | Date: 2021-06-01 05:16 | |
This is done on a case by case basis and in places necessary in future code. Modifying existing code can pollute git history, make backports hard, might introduce subtle bugs etc. This is similar to proposal to use walrus operator, f-strings, etc. throughout the codebase. |
|||
msg394843 - (view) | Author: Kshitiz Arya (Kshitiz17) * | Date: 2021-06-01 07:15 | |
Pardon my ignorance here but can I start working on this issue? I am a new contributor therefore I am unsure about how to proceed from here |
|||
msg394847 - (view) | Author: Kshitiz Arya (Kshitiz17) * | Date: 2021-06-01 10:22 | |
This is a relatively simple example of how this will improve readability of the code. (This example is form Lib/json/encoder.py) Original ------------------------------------------------------------------------- if isinstance(value, str): yield _encoder(value) elif value is None: yield 'null' elif value is True: yield 'true' elif value is False: yield 'false' elif isinstance(value, int): yield _intstr(value) elif isinstance(value, float): yield _floatstr(value) else: if isinstance(value, (list, tuple)): chunks = _iterencode_list(value, _current_indent_level) elif isinstance(value, dict): chunks = _iterencode_dict(value, _current_indent_level) else: chunks = _iterencode(value, _current_indent_level) yield from chunks -------------------------------------------------------------------------- Suggested -------------------------------------------------------------------------- match value: case str(): yield _encoder(value) case None: yield 'null' case True: yield 'true' case False: yield 'false' case int(): # see comment for int/float in _make_iterencode yield _intstr(value) case float(): # see comment for int/float in _make_iterencode yield _floatstr(value) case _: match value: case list()|tuple(): chunks = _iterencode_list(value, _current_indent_level) case dict(): chunks = _iterencode_dict(value, _current_indent_level) case _: chunks = _iterencode(value, _current_indent_level) yield from chunks ---------------------------------------------------------------------------- |
|||
msg394850 - (view) | Author: Steven D'Aprano (steven.daprano) * | Date: 2021-06-01 10:40 | |
match-case has not even reached a stable version of Python yet, it is only available in Python 3.10 which is still in beta. Are we sure that it is faster in all cases and how do you know it is more intuitive when the vast majority of Python users have likely not even heard of match-case yet? Personally, I am looking forward to using match-case, but I can see myself having to write lots of comments explaining what the code is doing for many years to come. |
|||
msg394855 - (view) | Author: Kshitiz Arya (Kshitiz17) * | Date: 2021-06-01 12:21 | |
I guess we will have to wait until Python 3.10 is released and we have more conclusive data about speed and readability of match-case before we can go ahead with this issue. |
|||
msg394879 - (view) | Author: Raymond Hettinger (rhettinger) * | Date: 2021-06-02 00:06 | |
If the json.encoder code does get updated, it doesn't need two levels of matching. It can be flattened by eliminating the *chunks* variable. match value: case str(): yield _encoder(value) case None: yield 'null' case True: yield 'true' case False: yield 'false' case int(): yield _intstr(value) case float(): yield _floatstr(value) case list() | tuple(): yield from _iterencode_list(value, _current_indent_level) case dict(): yield from _iterencode_dict(value, _current_indent_level) case _: yield from _iterencode(value, _current_indent_level) |
|||
msg394881 - (view) | Author: Raymond Hettinger (rhettinger) * | Date: 2021-06-02 01:11 | |
In difflib, there's an example where it would be easy to run performance tests. match tag: case 'replace': g = self._fancy_replace(a, alo, ahi, b, blo, bhi) case 'delete': g = self._dump('-', a, alo, ahi) case 'insert': g = self._dump('+', b, blo, bhi) case 'equal': g = self._dump(' ', a, alo, ahi) case _: raise ValueError('unknown tag %r' % (tag,)) |
|||
msg394893 - (view) | Author: Kshitiz Arya (Kshitiz17) * | Date: 2021-06-02 07:30 | |
I have timed the execution of if-else and match-case on Differ().compare from difflib module and here is what I got When both strings are same ********************************************************************************************** if-else: 2.504900476196781e-06, match-case: 2.587399649200961e-06, comparisions : 10 if-else: 2.222519979113713e-06, match-case: 1.7874199693324045e-06, comparisions : 100 if-else: 1.954343999386765e-06, match-case: 1.8695319959078916e-06, comparisions : 1000 if-else: 3.2847548005520366e-06, match-case: 1.928162499825703e-06, comparisions : 10000 if-else: 1.2241538699890953e-06, match-case: 7.870903900038684e-07, comparisions : 100000 if-else: 7.950048359998618e-07, match-case: 7.883418589990469e-07, comparisions : 1000000 if-else: 7.941918295000505e-07, match-case: 7.882559125006082e-07, comparisions : 10000000 if-else: 7.928842861700104e-07, match-case: 7.855620772999828e-07, comparisions : 100000000 ********************************************************************************************** When strings have some difference ********************************************************************************************** if-else: 2.7084999601356687e-06, match-case: 2.6756002625916154e-06, comparisions : 10 if-else: 2.207159996032715e-06, match-case: 1.8606500088935719e-06, comparisions : 100 if-else: 2.139014999556821e-06, match-case: 1.928671001223847e-06, comparisions : 1000 if-else: 2.682303600158775e-06, match-case: 2.626289399631787e-06, comparisions : 10000 if-else: 1.1338948200136655e-06, match-case: 7.989683500636602e-07, comparisions : 100000 if-else: 7.862168830033624e-07, match-case: 7.83532044995809e-07, comparisions : 1000000 if-else: 7.918311449997419e-07, match-case: 7.843428884996683e-07, comparisions : 10000000 if-else: 7.843063791000168e-07, match-case: 7.842913352399773e-07, comparisions : 100000000 ********************************************************************************************** |
|||
msg394894 - (view) | Author: Steven D'Aprano (steven.daprano) * | Date: 2021-06-02 07:41 | |
How did you do the timing? |
|||
msg394895 - (view) | Author: Kshitiz Arya (Kshitiz17) * | Date: 2021-06-02 07:43 | |
I have used timeit module. I have also attached the test file with this message |
|||
msg394927 - (view) | Author: Brandt Bucher (brandtbucher) * | Date: 2021-06-02 16:51 | |
Hm, that benchmark seems really noisy. Looking at your code, it appears that you aren't actually iterating over the results. I've attached a version of your benchmark rewritten to use pyperf. Here are the results on a system with a fresh PGO/LTO build of CPython main, run with CPU isolation: (py) bucher@is-bucher-p1:~/src/patmaperformance$ sudo $(which pyperf) system tune > /dev/null (py) bucher@is-bucher-p1:~/src/patmaperformance$ sudo $(which python) if_match.py --rigorous ......................................... if: Mean +- std dev: 2.16 ms +- 0.13 ms ......................................... match: Mean +- std dev: 2.13 ms +- 0.07 ms It appears that there is no significant difference between the two (which is what I expect, given the current implementation). With that said, our work in issue 44283 will likely have a significant impact for cases like this. |
|||
msg395120 - (view) | Author: Terry J. Reedy (terry.reedy) * | Date: 2021-06-04 20:54 | |
To me, Raymond's flattening is a plausible replacement, especially ater a future speedup. However, I would re-order the patterns to None, False, True, str(), int(), ..., _. These independent conditions, other than _ (or else in the if chain), can be ordered for best reading and comprehension speed, and I think the above is an improvement. Ordering for execution speed would instead start with the most frequently true conditions. An advantage of match over 'if' is that patterns are apparently easier to re-order or otherwise automatically optimize. I agree that replacement should only be done when there is significant improvement in reading and/or execution speed. To avoid backport issues, they should best be preceded by a check for bug reports and especially PRs that will or do affect the same area of code. The proposal for a general, +- global replacement has been and will be rejected. I think that this issue should therefore be closed, fairly soon, as such. |
|||
msg395153 - (view) | Author: Kshitiz Arya (Kshitiz17) * | Date: 2021-06-05 05:07 | |
As Brandt shows us, match-case in its current implementation is not significantly different from an if-else ladder in term of performance, though I still maintain that match-case is much more readable than an if-else ladder. I also agree with Karthikeyan and Terry that global replacement is not a good idea, at least not without a significant performance improvement. Therefore I will close this issue. Currently, some people at issue 44283 are working on improving the performance of match-case so fingers are crossed. If there is some significant gain in performance then I will open new issues on case to case basis. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:59:46 | admin | set | github: 88442 |
2021-06-05 05:07:28 | Kshitiz17 | set | status: open -> closed resolution: postponed messages: + msg395153 stage: resolved |
2021-06-04 20:54:54 | terry.reedy | set | nosy:
+ terry.reedy messages: + msg395120 |
2021-06-02 16:51:47 | brandtbucher | set | files:
+ if_match.py messages: + msg394927 |
2021-06-02 07:43:23 | Kshitiz17 | set | files:
+ match_test.py messages: + msg394895 |
2021-06-02 07:41:27 | steven.daprano | set | messages: + msg394894 |
2021-06-02 07:30:41 | Kshitiz17 | set | messages: + msg394893 |
2021-06-02 01:11:06 | rhettinger | set | messages: + msg394881 |
2021-06-02 00:06:36 | rhettinger | set | nosy:
+ rhettinger messages: + msg394879 |
2021-06-01 12:21:44 | Kshitiz17 | set | messages: + msg394855 |
2021-06-01 10:40:32 | steven.daprano | set | nosy:
+ steven.daprano messages: + msg394850 |
2021-06-01 10:22:01 | Kshitiz17 | set | messages: + msg394847 |
2021-06-01 07:15:25 | Kshitiz17 | set | messages: + msg394843 |
2021-06-01 05:24:26 | brandtbucher | set | nosy:
+ brandtbucher |
2021-06-01 05:16:24 | xtreak | set | nosy:
+ xtreak messages: + msg394840 |
2021-06-01 05:08:47 | Kshitiz17 | create |