This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Title: Add block info to unicodedata
Type: enhancement Stage: needs patch
Components: Unicode Versions: Python 3.5
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Denis Jacquerye, ezio.melotti, flying sheep, vstinner
Priority: normal Keywords:

Created on 2014-10-11 17:58 by flying sheep, last changed 2022-04-11 14:58 by admin.

Messages (2)
msg229101 - (view) Author: (flying sheep) * Date: 2014-10-11 17:58
See also #6331.

The repo contains pretty much the functionality i’d like to see: a way to get access to information about all blocks, and a way to get the block name a char is in.

I propose to include something very similar to those two APIs in unicodedata:

unicodedata.Block: class with start, end, and name property.

its __contains__ should work for single-char-strings (which tests if that char is in the block) and for ints (which tests if the codepoint is in the block)

maybe make it iterable over its chars?

unicodedata.blocks: OrderedDict of str (block name) → Block object mappings ordered by Block.start.

then blocks.keys() would yield the names in order, and blocks.values() the block objects in order.

unicodedata.block_of(chr, name_only=False): returns the Block object for which “chr in block” is True, or its name.


alternative: make the Block class an unfancy namedtuple without __contains__ method.


Together with #18234, fixing this bug will complete UnicodeData support in python, i guess.
msg230498 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2014-11-02 16:01
I needed this in the past and had to implement it myself, so adding it to unicodedata might be OK.  A script to generate the list of blocks from the official Unicode files should probably be created and used whenever we update the version of the Unicode database.
Date User Action Args
2022-04-11 14:58:09adminsetgithub: 66802
2015-10-17 07:47:47Denis Jacqueryesetnosy: + Denis Jacquerye
2014-11-02 16:01:20ezio.melottisetnosy: + ezio.melotti, vstinner
messages: + msg230498

components: + Unicode
stage: needs patch
2014-10-11 17:58:45flying sheepcreate