> Right now, the pattern is tokenize -> parse -> AST -> genbytecode ->
> peephole optimization (which disassembles the bytecode, analyzed it
> and rewrites it) -> final bytecode.   The correct pattern is tokenize
> -> parse -> AST -> optimize -> genbytecode -> peephole optimization
> with minimal disassembly, analysis, and opcode rewrites -> final bytecode.

Actually, optimizing on AST is not ideal too. Ideally you should convert it into a specialized IR, preferably in SSA form and with explicit control flow.

Re size saving: I've ran make test with and without my patch and measured total size of all generated pyc files:
without patch: 16_619_340
with patch: 16_467_867
So it's about 150KB or 1% of the size, not just a few bytes.
