classification
Title: remove the 'loop' parameter from __init__ in all classes in asyncio.locks
Type: Stage: patch review
Components: asyncio Versions: Python 3.10
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: aeros, asvetlov, uriyyo, yselivanov
Priority: normal Keywords: patch

Created on 2020-11-17 19:04 by yselivanov, last changed 2020-11-20 13:02 by uriyyo.

Pull Requests
URL Status Linked Edit
PR 23420 open uriyyo, 2020-11-20 13:02
Messages (16)
msg381275 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2020-11-17 19:04
asyncio.Lock and other primitives should no longer accept the `loop` parameter. They should also stop storing the current loop in the `self._loop` attribute. Instead, they should use `get_running_loop()` whenever they need to access the loop.
msg381276 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2020-11-17 19:05
(or alternatively they can cache the running `loop`, but the first loop lookup should be performed with `asyncio.get_running_loop()`)
msg381277 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2020-11-17 19:05
Kyle, lmk if you want to work on this.
msg381278 - (view) Author: Andrew Svetlov (asvetlov) * (Python committer) Date: 2020-11-17 19:20
This was on my to-do list but I very much appreciate if somebody champions this issue.
I should finish sslproto PR first.
msg381304 - (view) Author: Kyle Stanley (aeros) * (Python committer) Date: 2020-11-17 23:20
Sure, I would be interested in helping with this. Although if a newer contributor takes it up before I'm able to, I wouldn't be opposed to that either (my main priority at the moment is helping with PEP 594 since it's a concrete goal of my internship w/ the PSF, but I should have some time to work on this as well).

As far as I can tell though, there's currently a similar PR open: https://github.com/python/cpython/pull/18195 . This attempts to deprecate the loop argument and creating primitives such as asyncio.Lock outside of a running event loop with the following approach:

```
def __init__(self, *, loop=None):
        self._waiters = None
        self._locked = False
        if loop is None:
            self._loop = events._get_running_loop()
            if self._loop is None:
                warnings.warn("The creation of asyncio objects without a running "
                              "event loop is deprecated as of Python 3.9.",
                              DeprecationWarning, stacklevel=2)
                self._loop = events.get_event_loop()
        else:
                warnings.warn("The loop argument is deprecated since Python 3.8, "
                          "and scheduled for removal in Python 3.10.",
                          DeprecationWarning, stacklevel=2) 
```

So, do we want to add more strictness to that with always using `get_running_loop()` to access the event loop each time instead of accessing self._loop, and effectively ignore the user-supplied one? Presumably, this would start with a warning for passing a *loop* arg and then be removed entirely as a parameter ~two versions later.

> (or alternatively they can cache the running `loop`, but the first loop lookup should be performed with `asyncio.get_running_loop()`)

AFAIK, at the C-API extension level, get_running_loop() already caches the running loop in `cached_running_holder`. (https://github.com/python/cpython/blob/9c98e8cc3ebf56d01183c67adbc000ed19b8e0f4/Modules/_asynciomodule.c#L232). So from a performance perspective, wouldn't it effectively be the same if we repeatedly use `get_running_loop()` to access the same event loop? I think it also adds a nice integrity check to be certain that the primitive wasn't initially created within a running event loop, and then later accessed outside of one.

The only concern that I can see with this approach is that users could potentially create a primitive in one running event loop and then access it in a separate loop running in a different thread (without using something like self._loop, the primitive would no longer be associated with a specific event loop and could potentially used within *any* running event loop). I'm not entirely certain if that is a real problem though, and if anything, it seems like it could prove to be useful in some multi-loop environment. I just want to make sure that it's intended.
msg381305 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2020-11-17 23:40
Oh my.  FWIW I think that we need to implement this differently. I don't think it matters where, say, an asyncio.Lock was instantiated. It can be created anywhere. So IMO its __init__ shouldn't try to capture the current loop -- there's no need for that. The loop can be and should be captured when the first `await lock.acquire()` is called.

I'm writing a piece of code right now that would need to jump through the hoops to simply create a new `asyncio.Lock()` in a place where there's no asyncio loop yet.
msg381306 - (view) Author: Kyle Stanley (aeros) * (Python committer) Date: 2020-11-18 00:08
> Oh my.  FWIW I think that we need to implement this differently. I don't think it matters where, say, an asyncio.Lock was instantiated. It can be created anywhere. So IMO its __init__ shouldn't try to capture the current loop -- there's no need for that. The loop can be and should be captured when the first `await lock.acquire()` is called.

That's good to know and I think more convenient to work with, so +1 from me. I guess my remaining question though is whether it's okay to `await lock.acquire()` on a single lock instance from multiple different running event loops (presumably each in their own separate threads) or if there's a need to restrict it to only one event loop.

> I'm writing a piece of code right now that would need to jump through the hoops to simply create a new `asyncio.Lock()` in a place where there's no asyncio loop yet.

From what I've seen of asyncio user code, it seems reasonably common to create async primitives (lock, semaphore, queue, etc.) in the __init__ for some class prior to using the event loop, which would fail with usage of `get_running_loop()` in the __init__ for the primitives. So, if it's not an issue to wait until accessing the event loop until it's actually needed (e.g. in the lock.acquire() or queue.get/put()), I think we should definitely try to be conscious about when we call `get_running_loop()` going forward to ensure we're not imposing arbitrary inconveniences on users.
msg381310 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2020-11-18 01:43
> That's good to know and I think more convenient to work with, so +1 from me. I guess my remaining question though is whether it's okay to `await lock.acquire()` on a single lock instance from multiple different running event loops (presumably each in their own separate threads) or if there's a need to restrict it to only one event loop.

No, it's not OK to use one lock across multiple loops at the same time. But most asyncio code out there doesn't have protections against that, and we never advertised that having multiple loops run in parallel is a good idea.  So while we could build protections against that, I'm not sure its needed. 

Andrew, thoughts?


> From what I've seen of asyncio user code, it seems reasonably common to create async primitives (lock, semaphore, queue, etc.) in the __init__ for some class prior to using the event loop, which would fail with usage of `get_running_loop()` in the __init__ for the primitives. So, if it's not an issue to wait until accessing the event loop until it's actually needed (e.g. in the lock.acquire() or queue.get/put()), I think we should definitely try to be conscious about when we call `get_running_loop()` going forward to ensure we're not imposing arbitrary inconveniences on users.

Yep. This sums up how I think of this now.
msg381329 - (view) Author: Andrew Svetlov (asvetlov) * (Python committer) Date: 2020-11-18 12:15
My initial thought was protecting the Lock (and other primitives) creation when a loop is not running.

Yuri insists that Lock can be created without a loop. Technically it is possible, sure.
But the lock is tightly coupled with a loop instance. In other words, the loop belongs to the loop.
The lock cannot be used after the loop dies (stopped and closed).

Thus, the global scoped lock object still looks suspicious in my eyes.
The lock's lifecycle should be closer than the lifecycle of the loop, isn't it? I know, asyncio.Lock() can safely live after the loop closing but should we encourage this style? There are many other asyncio objects like HTTP clients and DB drivers that should be closed before the loop finishing for graceful closing TCP transports etc.

Another thing to consider is: whether to cache a loop inside a lock;  whether to add a check when the lock is used by two loops?

I think for the last question the answer is "yes". I recall many questions and bug reports on StackOverflow and aiohttp bug tracker when people use the multithreaded model for some reason and tries to get access to a shared object from different threads that executes each own loop.

The check becomes extremely important for synchronization primitives like asyncio.Lock class; threading.Lock is supposed to be shared between threads and users can apply the same pattern for asyncio.Lock by oversight.
Also, supporting global asyncio.Lock instances increase the chance that the lock is used by different loops.
msg381370 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2020-11-18 19:20
Yeah, I follow the reasoning.

My use case:

   class Something:
      def __init__(self):
          self._lock = asyncio.Lock()

      async def do_something():
          async with self._lock:
              ...

And `Something` won't be created in a coroutine. So now I have to jump through the hoops to implement lazy lock instantiation.


> But the lock is tightly coupled with a loop instance. In other words, the loop belongs to the loop.
>The lock cannot be used after the loop dies (stopped and closed).

I agree. Maybe the solution should be this then:


    class asyncio.Lock:

       def _get_loop(self):
           loop = asyncio.get_running_loop()
           if self._loop is None:
                self._loop = loop
           if loop is not self._loop: raise
           if not loop.is_running(): raise

       async def acquire(self):
           loop = self._get_loop()
           ...

This is what would guarantee all protections you want and would also allow to instantiate `asyncio.Lock` in class constructors, simplifying code.
msg381372 - (view) Author: Andrew Svetlov (asvetlov) * (Python committer) Date: 2020-11-18 19:34
Despite the fact that asyncio.get_running_loop() never returns None but raises RuntimeError (and maybe other tiny cleanups), 
I can live with the proposal.

It doesn't make a system worse at least and backward compatible.
We can return to the idea of raising a warning from the constructor later, on collecting more feedback.

P.S.
There is a subtle non-deterministic behavior in the proposal: if the lock is accessed from a concurrent thread, the exception about wrong usage will be raised later at an arbitrary code point.
This is a well-known problem of the lazy initialization pattern and maybe we should do nothing with it. The programming is a compromise always.
msg381382 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2020-11-18 23:43
> Despite the fact that asyncio.get_running_loop() never returns None but raises RuntimeError (and maybe other tiny cleanups), 

It never raises when called from inside a coroutine. And I propose to do that. So it will never fail.
msg381385 - (view) Author: Kyle Stanley (aeros) * (Python committer) Date: 2020-11-19 00:18
Regarding the example _get_loop():

```
       def _get_loop(self):
           loop = asyncio.get_running_loop()
           if self._loop is None:
                self._loop = loop
           if loop is not self._loop: raise
           if not loop.is_running(): raise
```

Would this be added to all asyncio primitives to be called anytime a reference to the loop is needed within a coroutine?

Also, regarding the last line "if not loop.is_running(): raise" I'm not 100% certain that I understand the purpose. Wouldn't it already raise a RuntimeError from `asyncio.get_running_loop()` if the event loop wasn't running?

The only thing I can think of where it would have an effect is if somehow the event loop was running at the start of `_get_loop()` and then the event loop was stopped (e.g. a loop in an alternative thread was stopped by the main thread while the alternative thread was in the middle of executing `_get_loop()`). But to me, that seems to be enough of an edge case to simplify it to the following:


```
       def _get_loop(self):
           loop = asyncio.get_running_loop()
           if self._loop is None:
                self._loop = loop
           if loop is not self._loop: raise
```

(Unless you intended for the first line `loop = asyncio.get_running_loop()` to instead be using the private `asyncio._get_running_loop()`, which returns None and doesn't raise. In that case, the original would be good to me.)

Other than that, I think the approach seems solid.
msg381389 - (view) Author: Andrew Svetlov (asvetlov) * (Python committer) Date: 2020-11-19 06:58
Perhaps Kyle is right, I had a misunderstanding with `get_running_loop()` vs `_get_running_loop()`.

The last version seems good except for the rare chance of race condition.

The safe code can look like:

global_lock = threading.Lock()  like GIL

       def _get_loop(self):
           loop = asyncio.get_running_loop()
           if self._loop is None:
                # the lock is required because
                # the thread switch can happen
                # between `self._loop is None` check
                # and `self._loop = loop` assignment
                with global_lock:
                    if self._loop is not None:
                        self._loop = loop
           if loop is not self._loop: raise

The alternative is using the fast C atomic `compare_and_swap` function
which is executed under the hold GIL.
We need the pure-Python fallback anyway.

Multithreading is hard...
msg381451 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2020-11-19 18:17
> The safe code can look like:
>
> global_lock = threading.Lock()  like GIL

SGTM. We can use this strategy for all synchronization primitives and for objects like asyncio.Queue.
msg381456 - (view) Author: Andrew Svetlov (asvetlov) * (Python committer) Date: 2020-11-19 21:17
Perfect!

We have a consensus now and waiting for a champion who propose a Pull Request.
History
Date User Action Args
2020-11-20 13:02:31uriyyosetkeywords: + patch
nosy: + uriyyo

pull_requests: + pull_request22313
stage: patch review
2020-11-19 21:17:38asvetlovsetmessages: + msg381456
2020-11-19 18:17:52yselivanovsetmessages: + msg381451
2020-11-19 06:58:16asvetlovsetmessages: + msg381389
2020-11-19 00:18:23aerossetmessages: + msg381385
2020-11-18 23:43:30yselivanovsetmessages: + msg381382
2020-11-18 19:34:21asvetlovsetmessages: + msg381372
2020-11-18 19:20:35yselivanovsetmessages: + msg381370
2020-11-18 12:15:19asvetlovsetmessages: + msg381329
2020-11-18 01:43:42yselivanovsetmessages: + msg381310
2020-11-18 00:08:23aerossetmessages: + msg381306
2020-11-17 23:40:08yselivanovsetmessages: + msg381305
2020-11-17 23:20:57aerossetmessages: + msg381304
2020-11-17 19:20:30asvetlovsetmessages: + msg381278
2020-11-17 19:05:35yselivanovsetnosy: + aeros
messages: + msg381277
2020-11-17 19:05:18yselivanovsetmessages: + msg381276
2020-11-17 19:04:05yselivanovcreate