Title: Add block info to unicodedata
Type: enhancement Stage: needs patch
Components: Unicode Versions: Python 3.5
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Denis Jacquerye, ezio.melotti, flying sheep, vstinner
Priority: normal Keywords:

Created on 2014-10-11 17:58 by flying sheep, last changed 2022-04-11 14:58 by admin.

Messages (2)
msg229101 - (view) Author: (flying sheep) * Date: 2014-10-11 17:58
See also #6331.

The repo contains pretty much the functionality i’d like to see: a way to get access to information about all blocks, and a way to get the block name a char is in.

I propose to include something very similar to those two APIs in unicodedata:

unicodedata.Block: class with start, end, and name property.

its __contains__ should work for single-char-strings (which tests if that char is in the block) and for ints (which tests if the codepoint is in the block)

maybe make it iterable over its chars?

unicodedata.blocks: OrderedDict of str (block name) → Block object mappings ordered by Block.start.

then blocks.keys() would yield the names in order, and blocks.values() the block objects in order.

unicodedata.block_of(chr, name_only=False): returns the Block object for which “chr in block” is True, or its name.


alternative: make the Block class an unfancy namedtuple without __contains__ method.


Together with #18234, fixing this bug will complete UnicodeData support in python, i guess.
msg230498 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2014-11-02 16:01
I needed this in the past and had to implement it myself, so adding it to unicodedata might be OK.  A script to generate the list of blocks from the official Unicode files should probably be created and used whenever we update the version of the Unicode database.
