This one is so tiny that I'm not really sure we want to merge it…
=== Problem ===
`Objects/unicodetype_db.h` starts in a following way:
```c
/* a list of unique character type descriptors */
const _PyUnicode_TypeRecord _PyUnicode_TypeRecords[] = {
{0, 0, 0, 0, 0, 0},
{0, 0, 0, 0, 0, 0},
{0, 0, 0, 0, 0, 32},
{0, 0, 0, 0, 0, 48},
…
```
The 1st record (`{0, 0, 0, 0, 0, 0}`) is duplicated.
This is not a problem, since the 1st occurrence is never used, but if we wanted to remove it then this is the ticket about it.
=== Detailed description ===
`Objects/unicodetype_db.h` is generated by `Tools/unicode/makeunicodedata.py` (I removed irrelevant lines):
```py
def makeunicodetype(unicode, trace):
dummy = (0, 0, 0, 0, 0, 0)
table = [dummy] # (1)
cache = {0: dummy} # (2)
for char in unicode.chars:
# Things…
item = (upper, lower, title, decimal, digit, flags)
i = cache.get(item) # (3)
if i is None:
cache[item] = i = len(table)
table.append(item)
index[char] = i
```
- (1) - list which contains unique character properties (as `(upper, lower, title, decimal, digit, flags)` tuples)
- (2) - mapping from character properties to index in `table` - improperly initialized as a mapping from index to character properties
- (3) - we check if the current tuple is in `cache`
=== Result ===
The first time we get to a character that has `(0, 0, 0, 0, 0, 0)` properties (which is code point 0 - `NULL`) we check if it is in cache. It it not (there is an entry that goes from index `0` to `(0, 0, 0, 0, 0, 0)` - the other way around), so we add this entry to `table` and `cache`.
=== Fix ===
In the line `(2)` we should have: `cache = {dummy: 0}`. Obviously after doing so we have to run `makeunicodedata.py` - this is why this simple change modifies a lot of lines.
I will submit PR on github in just a sec…
|