This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Filterer.filter can be rewritten using built-ins just as efficient much more readable
Type: enhancement Stage: resolved
Components: Library (Lib) Versions: Python 3.8
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: Nosy List: dwvisser, vinay.sajip, zach.ware
Priority: normal Keywords: patch

Created on 2019-05-30 15:49 by dwvisser, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
test_filter_perf.py dwvisser, 2019-05-31 13:26 The script used for performance comparisons
pr13683.diff dwvisser, 2019-05-31 18:03 GitHub Pull Request 13683, commit hash de38631
Pull Requests
URL Status Linked Edit
PR 13683 closed dwvisser, 2019-05-30 16:43
Messages (9)
msg343983 - (view) Author: Dale Visser (dwvisser) * Date: 2019-05-30 15:49
Alternative version of Filterer.filter(...) would look like this, which takes advantage of efficient Python 3.x built-ins, and is immediately understandable:

def _filter_callable(filter):
    return filter.filter if hasattr(filter, 'filter') else filter

def filter(self, record):
    filters = map(_filter_callable, self.filters)
    return all(f(record) for f in filters)

I will add a tested pull request on GitHub.
msg343985 - (view) Author: Zachary Ware (zach.ware) * (Python committer) Date: 2019-05-30 16:01
This could simplify even further with the following:

def filter(self, record):
    return all(getattr(f, 'filter', f)(record) for f in self.filters)
msg343986 - (view) Author: Dale Visser (dwvisser) * Date: 2019-05-30 17:22
Great! My PR now uses getattr as @zach.ware has suggested.
msg344045 - (view) Author: Vinay Sajip (vinay.sajip) * (Python committer) Date: 2019-05-31 09:20
I grant that your version is more compact, but I don't find it more readable in general (perhaps it is for very experienced Python developers/developers who like a functional style, but I persoanlly don't think it improves readability).

Since the rationale proposed for making this change is "efficiency", I would like to see some real-world performance numbers to back this up. Can you please provide some quantitative data on the amount of speedup achievable in real-world scenarios?
msg344075 - (view) Author: Dale Visser (dwvisser) * Date: 2019-05-31 13:26
It is with great surprise and sadness that I report that I found that performance measurements don't support merging this change. I'm attaching my performance test script. I ran it with/without the logging filters for 1000000 iterations, e.g., ./python all 1000000

The results on my system were as follows, where the numbers may be interpreted as milliseconds per 1000 log messages:		
		
          No Filters    3 Filters
master    17.9          19.8
CBO-37101 19.6          23.1

I imagine I could put a guard 'if' statement that would restore the 'no filters' performance to match 'master', but my use of generators apparently causes the use of significantly more cycles than the existing code.
msg344076 - (view) Author: Dale Visser (dwvisser) * Date: 2019-05-31 13:45
Correction on that example of running the test script:

./python test_filter_perf.py all 1000000

I simply prepended this with the Linux `time` command for measuring.
msg344098 - (view) Author: Dale Visser (dwvisser) * Date: 2019-05-31 18:03
Adding the patch file associated with GitHub pull request #13683, as well as the performance test I just reported.
msg344130 - (view) Author: Dale Visser (dwvisser) * Date: 2019-05-31 21:42
FWIW, when I tried this instead, I got 21 to 22 msec per 1000 log messages (still a 5-10% performance hit):

def filter(self, record):
    rv = True
    if self.filters:
        rv = all(getattr(f, 'filter', f)(record) for f in self.filters)
    return rv

I don't see that getattr(...) should be less efficient than hasattr(...), so the generator expression is the likely culprit.
msg344171 - (view) Author: Dale Visser (dwvisser) * Date: 2019-06-01 13:08
I've learned a lot about the performance trade-offs of generator expressions. The only way of shortening this code (readability is subjective) that I've found not to negatively impact performance is this:

    def filter(self, record):
        rv = True
        for f in self.filters:
            if not getattr(f, 'filter', f)(record):
                rv = False
                break
        return rv
History
Date User Action Args
2022-04-11 14:59:16adminsetgithub: 81282
2019-06-01 13:08:51dwvissersetmessages: + msg344171
2019-05-31 21:42:22dwvissersetmessages: + msg344130
2019-05-31 18:03:01dwvissersetfiles: + pr13683.diff

messages: + msg344098
2019-05-31 13:45:40dwvissersetmessages: + msg344076
2019-05-31 13:26:16dwvissersetstatus: open -> closed
files: + test_filter_perf.py
messages: + msg344075

resolution: rejected
stage: patch review -> resolved
2019-05-31 09:20:07vinay.sajipsetmessages: + msg344045
2019-05-30 17:22:11dwvissersetmessages: + msg343986
2019-05-30 16:43:49dwvissersetkeywords: + patch
stage: patch review
pull_requests: + pull_request13570
2019-05-30 16:01:21zach.waresetnosy: + zach.ware

messages: + msg343985
versions: + Python 3.8, - Python 3.9
2019-05-30 15:51:03SilentGhostsetnosy: + vinay.sajip
2019-05-30 15:49:09dwvissercreate