_send_bytes() now looks a little complicated.

There are no need in separate branches for n==0. header + buf where buf is b'' is fast (it is not slower than additional check n > 0). So this microoptimization is not needed.

The chunks list is not needed, we can just call self._send(). This will get rid of small overhead of creating and iterating a list.
