Issue 37101: Filterer.filter can be rewritten using built-ins just as efficient much more readable

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/81282

classification

Title:	Filterer.filter can be rewritten using built-ins just as efficient much more readable
Type:	enhancement	Stage:	resolved
Components:	Library (Lib)	Versions:	Python 3.8

process

Status:	closed	Resolution:	rejected
Dependencies:		Superseder:
Assigned To:		Nosy List:	dwvisser, vinay.sajip, zach.ware
Priority:	normal	Keywords:	patch

Created on 2019-05-30 15:49 by dwvisser, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
test_filter_perf.py	dwvisser, 2019-05-31 13:26	The script used for performance comparisons
pr13683.diff	dwvisser, 2019-05-31 18:03	GitHub Pull Request 13683, commit hash de38631

Pull Requests
URL	Status	Linked	Edit
PR 13683	closed	dwvisser, 2019-05-30 16:43

Messages (9)
msg343983 - (view)	Author: Dale Visser (dwvisser) *	Date: 2019-05-30 15:49
Alternative version of Filterer.filter(...) would look like this, which takes advantage of efficient Python 3.x built-ins, and is immediately understandable: def _filter_callable(filter): return filter.filter if hasattr(filter, 'filter') else filter def filter(self, record): filters = map(_filter_callable, self.filters) return all(f(record) for f in filters) I will add a tested pull request on GitHub.
msg343985 - (view)	Author: Zachary Ware (zach.ware) *	Date: 2019-05-30 16:01
This could simplify even further with the following: def filter(self, record): return all(getattr(f, 'filter', f)(record) for f in self.filters)
msg343986 - (view)	Author: Dale Visser (dwvisser) *	Date: 2019-05-30 17:22
Great! My PR now uses getattr as @zach.ware has suggested.
msg344045 - (view)	Author: Vinay Sajip (vinay.sajip) *	Date: 2019-05-31 09:20
I grant that your version is more compact, but I don't find it more readable in general (perhaps it is for very experienced Python developers/developers who like a functional style, but I persoanlly don't think it improves readability). Since the rationale proposed for making this change is "efficiency", I would like to see some real-world performance numbers to back this up. Can you please provide some quantitative data on the amount of speedup achievable in real-world scenarios?
msg344075 - (view)	Author: Dale Visser (dwvisser) *	Date: 2019-05-31 13:26
It is with great surprise and sadness that I report that I found that performance measurements don't support merging this change. I'm attaching my performance test script. I ran it with/without the logging filters for 1000000 iterations, e.g., ./python all 1000000 The results on my system were as follows, where the numbers may be interpreted as milliseconds per 1000 log messages: No Filters 3 Filters master 17.9 19.8 CBO-37101 19.6 23.1 I imagine I could put a guard 'if' statement that would restore the 'no filters' performance to match 'master', but my use of generators apparently causes the use of significantly more cycles than the existing code.
msg344076 - (view)	Author: Dale Visser (dwvisser) *	Date: 2019-05-31 13:45
Correction on that example of running the test script: ./python test_filter_perf.py all 1000000 I simply prepended this with the Linux `time` command for measuring.
msg344098 - (view)	Author: Dale Visser (dwvisser) *	Date: 2019-05-31 18:03
Adding the patch file associated with GitHub pull request #13683, as well as the performance test I just reported.
msg344130 - (view)	Author: Dale Visser (dwvisser) *	Date: 2019-05-31 21:42
FWIW, when I tried this instead, I got 21 to 22 msec per 1000 log messages (still a 5-10% performance hit): def filter(self, record): rv = True if self.filters: rv = all(getattr(f, 'filter', f)(record) for f in self.filters) return rv I don't see that getattr(...) should be less efficient than hasattr(...), so the generator expression is the likely culprit.
msg344171 - (view)	Author: Dale Visser (dwvisser) *	Date: 2019-06-01 13:08
I've learned a lot about the performance trade-offs of generator expressions. The only way of shortening this code (readability is subjective) that I've found not to negatively impact performance is this: def filter(self, record): rv = True for f in self.filters: if not getattr(f, 'filter', f)(record): rv = False break return rv

History
Date	User	Action	Args
2022-04-11 14:59:16	admin	set	github: 81282
2019-06-01 13:08:51	dwvisser	set	messages: + msg344171
2019-05-31 21:42:22	dwvisser	set	messages: + msg344130
2019-05-31 18:03:01	dwvisser	set	files: + pr13683.diff messages: + msg344098
2019-05-31 13:45:40	dwvisser	set	messages: + msg344076
2019-05-31 13:26:16	dwvisser	set	status: open -> closed files: + test_filter_perf.py messages: + msg344075 resolution: rejected stage: patch review -> resolved
2019-05-31 09:20:07	vinay.sajip	set	messages: + msg344045
2019-05-30 17:22:11	dwvisser	set	messages: + msg343986
2019-05-30 16:43:49	dwvisser	set	keywords: + patch stage: patch review pull_requests: + pull_request13570
2019-05-30 16:01:21	zach.ware	set	nosy: + zach.ware messages: + msg343985 versions: + Python 3.8, - Python 3.9
2019-05-30 15:51:03	SilentGhost	set	nosy: + vinay.sajip
2019-05-30 15:49:09	dwvisser	create