Message 385033 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	Mark.Shannon
Recipients	Mark.Shannon
Date	2021-01-13.15:03:23
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1610550204.0.0.923150494092.issue42926@roundup.psfhosted.org>
In-reply-to

Content
Currently the compiler operates in three main passes: Code-gen Optimize Assemble The problem is that these passes use the same basic-block based CFG, leading to unnecessary coupling and inefficiencies. A basic block CFG is awkward and error-prone for the code-gen, but not very efficient for the optimizer and assembler. A better design would be for the code-gen to create a single linear sequence of instructions. The optimizer would take this and produce a list of extended-blocks for the assembler to consume. code-gen -> (list of instructions) -> optimizer optimizer -> (list of extended blocks) -> assembler (Extended blocks have a single entry and multiple exits, unlike basic blocks which have a single entry and single exit) This would: 1. Reduce memory use considerably (the size of instruction and block data structures would be about 60% of current) 2. Be faster (Less CFG management). 3. Produce better code (extended blocks are a better unit for optimization that basic blocks). 4. Be easier to maintain: a) Code-gen wouldn't have to worry about creating a correct CFG. b) The optimizer wouldn't need to handle empty blocks and track which basic blocks form an extended block. Apart from the changes to the compiler, it would help if we made all branch instructions absolute (or have a backward dual) to accommodate free reordering of blocks in the optimizer.

Currently the compiler operates in three main passes:

Code-gen
Optimize
Assemble

The problem is that these passes use the same basic-block based CFG, leading to unnecessary coupling and inefficiencies.
A basic block CFG is awkward and error-prone for the code-gen, but not very efficient for the optimizer and assembler.

A better design would be for the code-gen to create a single linear sequence of instructions. The optimizer would take this and produce a list of extended-blocks for the assembler to consume.

code-gen -> (list of instructions) -> optimizer
optimizer -> (list of extended blocks) -> assembler

(Extended blocks have a single entry and multiple exits, unlike basic blocks which have a single entry and single exit)

This would:
1. Reduce memory use considerably (the size of instruction and block data structures would be about 60% of current)
2. Be faster (Less CFG management).
3. Produce better code (extended blocks are a better unit for optimization that basic blocks).
4. Be easier to maintain:
  a) Code-gen wouldn't have to worry about creating a correct CFG.
  b) The optimizer wouldn't need to handle empty blocks and track which basic blocks form an extended block.


Apart from the changes to the compiler, it would help if we made all branch instructions absolute (or have a backward dual) to accommodate free reordering of blocks in the optimizer.

History
Date	User	Action	Args
2021-01-13 15:03:24	Mark.Shannon	set	recipients: + Mark.Shannon
2021-01-13 15:03:23	Mark.Shannon	set	messageid: <1610550204.0.0.923150494092.issue42926@roundup.psfhosted.org>
2021-01-13 15:03:23	Mark.Shannon	link	issue42926 messages
2021-01-13 15:03:23	Mark.Shannon	create