NP.  I have another PR in the pipeline:

Both optimizations make your benchmark run 30% faster on 3.7.  If you compile asyncio.gather() with Cython you will get it another 5-15% faster.  If you use uvloop - another 10-20%.

If it's still slower than asynq, then the issue must be in how asynq schedules its callbacks, it might be more optimal for some specific use cases than asyncio.

FWIW I don't expect asynq to be any faster than asyncio (or than uvloop) for network IO.  And there's definitely no problem with async/await performance -- we're optimizing asyncio here, not the interpreter.
