This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author PanderMusubi
Recipients PanderMusubi, ezio.melotti
Date 2012-12-14.17:33:12
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1355506392.97.0.314398568458.issue16684@psf.upfronthosting.co.za>
In-reply-to
Content
The package unicodedata
  http://docs.python.org/3/library/unicodedata.html
offers looking up of property values in terms of general category, bidirectional class and east asian width for Unicode characters
  unicodedata.category(unichr)
  unicodedata.bidirectional(unichr)
  unicodedata.east_asian_width(chr)

The abbreviated name of the specific category is returned. However, for certain applications it is important to be able to get the from abbreviated name to the long name and vice versa.

The data needed to do this can be found at
  http://www.unicode.org/Public/UNIDATA/PropertyValueAliases.txt
under sections
  # General_Category (gc)
  # Bidi_Class (bc)
  # East_Asian_Width (ea)
Use only the second (abbreviated name) and third (long name) fields and ignoring other fields and possible comments.

For general category, also support translation back and forth of the one-letter abbreviations which are groups representing two-letter general categories abbreviations with the same initial letter.

Please extend this package with a way of translating back and forth between abbreviated name and long name for property values defined in Unicode for general category, bidirectional class and East Asian width. This functionality should be independent of retrieving the abbreviated names for Unicode character as is available now and should be accessible via separate methods or dictionaries in which developers can perform lookups themselves.

Implementing the functionality requested in this issue allows Python developers to get from an abbreviated property value to a meaningful property value name and vice versa without having to retrieve this information from the Unicode Consortium and/or shipping this information with their code with the risk of using outdated information.
History
Date User Action Args
2012-12-14 17:33:13PanderMusubisetrecipients: + PanderMusubi, ezio.melotti
2012-12-14 17:33:12PanderMusubisetmessageid: <1355506392.97.0.314398568458.issue16684@psf.upfronthosting.co.za>
2012-12-14 17:33:12PanderMusubilinkissue16684 messages
2012-12-14 17:33:12PanderMusubicreate