This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Setting logger.level directly has no effect due to caching in 3.7+
Type: behavior Stage: patch review
Components: Library (Lib) Versions: Python 3.8, Python 3.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: doughellmann, vinay.sajip, vstinner, zaneb
Priority: normal Keywords: patch

Created on 2019-08-14 19:46 by zaneb, last changed 2022-04-11 14:59 by admin.

Pull Requests
URL Status Linked Edit
PR 15286 closed zaneb, 2019-08-14 19:55
Messages (9)
msg349741 - (view) Author: Zane Bitter (zaneb) * Date: 2019-08-14 19:46
This is a related issue to bug 34269, in the sense that it is also due to the caching added by the fix for bug 30962.

In this case, the bug is triggered by setting the public 'level' attribute of a logging.Logger object, instead of calling the setLevel() method. Although this was probably never a good idea, prior to Python3.7 it worked as expected. Now it renders the level out of sync with the cache, leading to inconsistent results that are hard to debug.

An example in the wild: https://review.opendev.org/676450
msg351400 - (view) Author: Vinay Sajip (vinay.sajip) * (Python committer) Date: 2019-09-09 10:02
That code in the wild that sets the level attribute directly is wrong and should be changed, right? The documentation has always been clear that setLevel() should be used. If we now take steps to support setting the level via attribute, isn't that encouraging bypassing the documented APIs? I'm not sure such misuse is widespread.

Using a property might seem a good solution, but there is a performance impact, so I am not keen on this change. I don't know what the real-world performance impact would be, but this simple script to look at the base access timing:

import timeit

class Logger(object):
    def __init__(self):
        self._levelprop = self.level = 0

    @property
    def levelprop(self):
        return self.level


def main():
    logger = Logger()
    print(timeit.timeit('logger.level', globals=locals()))
    print(timeit.timeit('logger.levelprop', globals=locals()))

if __name__ == '__main__':
    main()

gives, for example,

0.12630631799993353
0.4208384449998448
msg351472 - (view) Author: Zane Bitter (zaneb) * Date: 2019-09-09 14:11
> That code in the wild that sets the level attribute directly is wrong and should be changed, right?

It definitely should be changed (and in our case we did change it once we found it). Whether it's "wrong" is more of a philosophical question ;). But the problem with that is that it's very difficult to find - the object's state appears to be set OK but the behaviour doesn't match because of the hidden cache with different values.

> I'm not sure such misuse is widespread.

There's no way to be sure. But if you have a public (i.e. non-underscore-prfixed) attribute then some percentage of people are going to set it, particularly if it seems to work. I went back and looked at how this was introduced in the example I linked, and the patch was reviewed by the guy who literally wrote the book on the Python standard library. That suggests to me that this could happen to anyone. The code doesn't look wrong, so mistakes are going to happen.

> I don't know what the real-world performance impact would be

Me neither, but the good news is that the values are now cached :) So it doesn't impact the performance of logging a message at all. It will have some impact on the performance of rebuilding the cache, but I'd expect it to be negligible in the context of the other work required to rebuild the cache (which includes obtaining a lock).
msg351480 - (view) Author: Vinay Sajip (vinay.sajip) * (Python committer) Date: 2019-09-09 14:33
> There's no way to be sure. But if you have a public (i.e. non-underscore-prfixed) attribute then some percentage of people are going to set it, particularly if it seems to work.

Well, reading it isn't problematic, so there's no leading underscore. (The setXXX() code in logging predates Python 2 and properties).

> The code doesn't look wrong, so mistakes are going to happen.

That's true for lots of code. Sure, mistakes will happen, and that's the price paid when one doesn't follow documentation. It feels wrong to do this as a band-aid to help out people who didn't do the right thing.

> but the good news is that the values are now cached

That's not relevant to the performance numbers I posted above, is it? In that code snippet, that's a single constant value (0) that's being returned either via an attribute, or via a property.
msg351532 - (view) Author: Zane Bitter (zaneb) * Date: 2019-09-09 16:51
> It feels wrong to do this as a band-aid to help out people who didn't do the right thing.

I mean... are we here to punish people for making mistakes by sending them to schroedinbug purgatory, or are we here to help them upgrade to the latest Python without feeling like it's a minefield?

I genuinely don't know what Python's stance on this is. Certainly the Linux kernel and Windows both have a philosophy of not breaking working userspace code no matter how wrong it might be, so it's not unheard-of to choose the latter.

I suggested on the PR that we could deprecate the setter so that people who are Doing It Wrong will get a warning that they can much more easily track down and fix, and we won't encourage any wrong new code. Would that change your opinion?

> > but the good news is that the values are now cached

> That's not relevant to the performance numbers I posted above, is it?

Not directly - your performance numbers show that accessing the attribute is ~300ns slower when it's a property. But the thing that gets called all the time that we want to be fast is isEnabledFor(), and the results of that are cached so it only calls getEffectiveLevel() (which is the thing accessing the attribute) once.

Note that populating the cache is already considerably more expensive than the previous path (it requires handling an exception and acquiring a lock), but it's still worth it because the cost is typically amortised across many log calls. The addition of an extra property lookup in this path is not going to be noticeable.
msg351597 - (view) Author: Vinay Sajip (vinay.sajip) * (Python committer) Date: 2019-09-10 09:03
> are we here to punish people

No, that's not what I'm saying. But it's not a priority to support situations where people don't follow published APIs and still expect things to work the way they would like. I would prefer people to use setLevel(), as it helps to have consistent usage.

> Certainly the Linux kernel and Windows both have a philosophy of not breaking working userspace code

It's Python's philosophy, too. The philosophy is that of not breaking userspace code *which follows published APIs*. If you use unpublished APIs or rely on implementation details, you're generally on your own. That's the situation here, in reference to code that's out in the wild setting the attribute directly rather than via setLevel().
msg352893 - (view) Author: Zane Bitter (zaneb) * Date: 2019-09-20 21:17
In turns out that setting the level directly is used in the standard library, so we definitely have a bug here that needs fixing in 3.7 & 3.8.

Obviously the fix could be just to stop doing that in the standard library, but I'd argue that this is even stronger evidence that it can happen to anyone, and that we should fix it for Python users in general.

> But it's not a priority to support situations where people don't follow published APIs and still expect things to work the way they would like.

That's fair, of course. This isn't and shouldn't be anyone's number 1 priority, but that doesn't mean we can't do it.

> I would prefer people to use setLevel(), as it helps to have consistent usage.

Me too. I added a new patch to the PR that deprecates the setter so that anyone doing it wrong will get a warning. (In fact that's how I discovered it was being set directly in the standard library.)

> The philosophy is that of not breaking userspace code *which follows published APIs*. If you use unpublished APIs or rely on implementation details, you're generally on your own.

I was referring to stuff like https://devblogs.microsoft.com/oldnewthing/20031015-00/?p=42163 (I won't link to a famous equivalent in Linux because it involves Linus swearing at people), where the users are clearly wrong but they have a policy of not breaking them anyway.
msg352959 - (view) Author: Vinay Sajip (vinay.sajip) * (Python committer) Date: 2019-09-22 03:30
> Obviously the fix could be just to stop doing that in the standard library

Well, that's the right fix, and thanks for spotting it. I've applied those changes (42b6c5d and 19c42fb) to the master, 3.8 and 3.7 branches.
msg371678 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-06-16 15:54
Vinay:
> That code in the wild that sets the level attribute directly is wrong and should be changed, right? The documentation has always been clear that setLevel() should be used. If we now take steps to support setting the level via attribute, isn't that encouraging bypassing the documented APIs? I'm not sure such misuse is widespread.

I understood that Zane Bitter opened this issue *because* people misuses the logging API.


Vinay:
> (...) Sure, mistakes will happen, and that's the price paid when one doesn't follow documentation. It feels wrong to do this as a band-aid to help out people who didn't do the right thing.

setLevel() enforces consistency: it calls _checkLevel() to convert string to integer. Also, Logger.setLevel() also invalidates caches.

Having a way to update the level without invalidating caches sounds like a bug to me.

--

Zane proposed to deprecate setting the level attribute in PR 15286. I dislike this idea. It's kind of weird to be able to *read* a public attribute but not to *set* it. Either the attribute should be removed by adding a getLevel() method (and deprecate the attribute and then remove it), or it should be possible to get and set it.

I'm in favor of not deprecating "logger.level = level". Just wrap it into a property silently.

--

I rewrote the microbenchmark using pyperf:
---
import pyperf

class Logger(object):
    def __init__(self):
        self._levelprop = self.level = 0

    @property
    def levelprop(self):
        return self.level


logger = Logger()
ns = {'logger': logger}
runner = pyperf.Runner()
runner.timeit('read attribute', 'logger.level', globals=ns)
runner.timeit('read property', 'logger.levelprop', globals=ns)
---

Result:

* read attribute: Mean +- std dev: 38.9 ns +- 0.8 ns
* read property: Mean +- std dev: 104 ns +- 3 ns

Reading a property is 2.7x slower than reading an attribute. Well, that's not surprising to have an overhead. But the absolute numbers remain reasonable. We're talking about 100 ns, not 100 ms.

IMHO the overhead is reasonable in 3rd party code. Inside the logging module, only the private _level attribute should be used and so there is no overhead.
History
Date User Action Args
2022-04-11 14:59:19adminsetgithub: 82038
2020-06-16 15:54:37vstinnersetnosy: + vstinner
messages: + msg371678
2020-06-15 14:43:00doughellmannsetnosy: + doughellmann
2019-09-22 03:30:37vinay.sajipsetmessages: + msg352959
2019-09-20 21:17:02zanebsetmessages: + msg352893
2019-09-10 09:03:27vinay.sajipsetmessages: + msg351597
2019-09-09 16:51:39zanebsetmessages: + msg351532
2019-09-09 14:33:10vinay.sajipsetmessages: + msg351480
2019-09-09 14:11:30zanebsetmessages: + msg351472
2019-09-09 10:02:05vinay.sajipsetmessages: + msg351400
2019-08-14 19:55:48zanebsetkeywords: + patch
stage: patch review
pull_requests: + pull_request15008
2019-08-14 19:52:49SilentGhostsetnosy: + vinay.sajip
2019-08-14 19:46:10zanebcreate