Author spenczar
Recipients asvetlov, josh.r, kj, spenczar, yselivanov
Date 2021-02-05.19:52:14
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1612554734.9.0.110196707322.issue43119@roundup.psfhosted.org>
In-reply-to
Content
Josh,

> Making literally every await equivalent to:
> 
> await asyncio.sleep(0)
> 
> followed by the actual await (which is effectively what you're proposing when you expect all await to be preemptible) means adding non-trivial overhead to all async operations (asyncio is based on system calls of the select/poll/epoll/kpoll variety, which add meaningful overhead when we're talking about an operation that is otherwise equivalent to an extremely cheap simple collections.deque.append call).

A few things:

First, I don't think I proposed that. I was simply saying that my expectations on behavior were incorrect, which points towards documentation.

Second, I don't think making a point "preemptible" is the same as actually executing a cooperative-style yield to the scheduler. I just expected that it would always be in the cards - that it would always be a potential point where I'd get scheduled away.

Third, I don't think that await asyncio.sleep(0) triggers a syscall, but I certainly could be mistaken. It looks to me like it is special-cased in asyncio, from my reading of the source. Again - could be wrong.

Fourth, I think that the idea of non-cooperative preempting scheduling is not nearly as bizarre as you make it sound. There's certainly plenty of prior art on preemptive schedulers out there. Go uses a sort of partial preemption at function call sites *because* it's a particularly efficient way to do things.

But anyway - I didn't really want to discuss this. As I said above, it's obviously a way way way bigger design discussion than my specific issue.


> It also breaks many reasonable uses of asyncio.wait and asyncio.as_completed, where the caller can reasonably expect to be able to await the known-complete tasks without being preempted (if you know the coroutine is actually done, it could be quite surprising/problematic when you await it and get preempted, potentially requiring synchronization that wouldn't be necessary otherwise).

I think this cuts both ways. Without reading the source code of asyncio.Queue, I don't see how it's possible to know whether its put method yields. Because of this, I tend to assume synchronization is necessary everywhere. The way I know for sure that a function call can complete without yielding is supposed to be that it isn't an `async` function, right? That's why asyncio.Queue.put_nowait exists and isn't asynchronous.

> In real life, if whatever you're feeding the queue with is infinite and requires no awaiting to produce each value, you should probably just avoid the queue and have the consumer consume the iterable directly.

The stuff I'm feeding the queue doesn't require awaiting, but I *wish* it did. It's just a case of not having the libraries for asynchronicity yet on the source side. I was hoping that the queue would let me pace my work in a way that would let me do more concurrent work.

> Or just apply a maximum size to the queue; since the source of data to put is infinite and not-awaitable, there's no benefit to an unbounded queue, you may as well use a bound roughly fitted to the number of consumers, because any further items are just wasting memory well ahead of when it's needed.

The problem isn't really that put doesn't yield for unbounded queues - it's that put doesn't yield *unless the queue is full*. That means that, if I use a very high maximum size for the queue, I'll still spend a big chunk of time filling up the queue, and only then will consumers start doing work.

I could pick a small queue bound, but then I'm more likely to waste time doing nothing if consumers are slower than the producer - I'll sit there with a full-but-tiny queue. Work-units in the queue can take wildly different amounts of time, so consumers will often be briefly slow, so the producer races ahead - until it hits its tiny limit. But then new work units arrive, and so the consumers are fast again - and they're quickly starved for work because the producer didn't build a good backlog.

So, the problem still remains, if work takes an uncertain amount of time which would seem to be the common reason for using a queue in the first place.

> Point is, regular queue puts only block (and potentially release the GIL early) when they're full or, as a necessary consequence of threading being less predictable than asyncio, when there is contention on the lock protecting the queue internals (which is usually resolved quickly); why would asyncio queues go out of their way to block when they don't need to?

I think you have it backwards. asyncio.Queue.put *always* blocks other coroutines' execution for unbounded queues. Why do they always block? If I wanted that, I wouldn't use anything in asyncio.Queue. I'd just use a collections.deque.
History
Date User Action Args
2021-02-05 19:52:14spenczarsetrecipients: + spenczar, asvetlov, yselivanov, josh.r, kj
2021-02-05 19:52:14spenczarsetmessageid: <1612554734.9.0.110196707322.issue43119@roundup.psfhosted.org>
2021-02-05 19:52:14spenczarlinkissue43119 messages
2021-02-05 19:52:14spenczarcreate