Message 349020 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	Greg Price
Recipients	Greg Price, ezio.melotti, vstinner
Date	2019-08-05.03:55:34
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1564977335.6.0.0909548880632.issue37760@roundup.psfhosted.org>
In-reply-to

Content
I spent some time yesterday on #18236, and I have a patch for it. Most of that work happens in the script Tools/unicode/makeunicode.py , and along the way I made several changes there that I found made it somewhat nicer to work on, and I think will help other people reading that script too. I'd like to try to merge those improvements first. The main changes are: * As the script has grown over the years, it's gained many copies and reimplementations of logic to parse the standard format of the Unicode character database. I factored those out into a single place, which makes the parsing code shorter and the interesting parts stand out more easily. * The main per-character record type in the script's data structures is a length-18 tuple. Using the magic of dataclasses, I converted this so that e.g. the code says `record.numeric_value` instead of `record[8]`. There's no radical restructuring or rewrite here; this script has served us well. I've kept these changes focused where there's a high ratio of value, in future ease of development, to cost, in a reviewer's effort as well as mine. I'll send PRs of my changes shortly.

I spent some time yesterday on #18236, and I have a patch for it.

Most of that work happens in the script Tools/unicode/makeunicode.py , and along the way I made several changes there that I found made it somewhat nicer to work on, and I think will help other people reading that script too.  I'd like to try to merge those improvements first.

The main changes are:

 * As the script has grown over the years, it's gained many copies and reimplementations of logic to parse the standard format of the Unicode character database.  I factored those out into a single place, which makes the parsing code shorter and the interesting parts stand out more easily.

 * The main per-character record type in the script's data structures is a length-18 tuple.  Using the magic of dataclasses, I converted this so that e.g. the code says `record.numeric_value` instead of `record[8]`.

There's no radical restructuring or rewrite here; this script has served us well.  I've kept these changes focused where there's a high ratio of value, in future ease of development, to cost, in a reviewer's effort as well as mine.

I'll send PRs of my changes shortly.

History
Date	User	Action	Args
2019-08-05 03:55:35	Greg Price	set	recipients: + Greg Price, vstinner, ezio.melotti
2019-08-05 03:55:35	Greg Price	set	messageid: <1564977335.6.0.0909548880632.issue37760@roundup.psfhosted.org>
2019-08-05 03:55:35	Greg Price	link	issue37760 messages
2019-08-05 03:55:34	Greg Price	create