Issue 19819: reversing a Unicode ligature doesn't work

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/64018

classification

Title:	reversing a Unicode ligature doesn't work
Type:	behavior	Stage:	resolved
Components:	Unicode	Versions:	Python 3.4

process

Status:	closed	Resolution:	not a bug
Dependencies:		Superseder:
Assigned To:		Nosy List:	christian.heimes, ezio.melotti, larry, vstinner
Priority:	low	Keywords:

Created on 2013-11-27 23:51 by larry, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Messages (5)
msg204628 - (view)	Author: Larry Hastings (larry) *	Date: 2013-11-27 23:51
Read this today: http://mortoray.com/2013/11/27/the-string-type-is-broken/ In it the author talks about how the 'ffl' ligature breaks some string processing. He claimed that Python 3 doesn't uppercase it correctly--well, it does. However I discovered that it doesn't reverse it properly. x = b'ba\xef\xac\x84e'.decode('utf-8') # "baffle", where "ffl" is a ligature print(x) # prints "baffle", with the ligature print(x.upper()) # prints "BAFFLE", no ligature, which is fine print("".join(reversed(x))) # prints "efflab" Shouldn't that last line print "elffab"? If this gets marked as "wontfix" I wouldn't complain. Just wondering what the Right Thing is to do here.
msg204629 - (view)	Author: Christian Heimes (christian.heimes) *	Date: 2013-11-28 00:07
There is no ligature for "lff", just "ffl". Ligatures are treated as one char. I guess Python would have to grow a str.reverse() method to handle ligatures and combining chars correctly. At work I ran into the issue with ligatures and combining chars multiple times in medieval and early modern age scripts. Eventually I started to normalize all incoming data to NFKC. That solves most of the issues. s = b'ba\xef\xac\x84e'.decode('utf-8') >>> print("".join(reversed(s))) eﬄab >>> print("".join(reversed(unicodedata.normalize("NFKC", s)))) elffab
msg204630 - (view)	Author: Christian Heimes (christian.heimes) *	Date: 2013-11-28 00:16
A proper str.reverse function must deal with more extra cases. For example there are special rules for the Old German long s (ſ) and the round s (s). A round s may only occur at the end of a syllable. Hebrew has a special variant of several characters if the character is placed at the end of a word (HEBREW LETTER PE / HEBREW LETTER FINAL PE). A simple reversed(s) can never deal with all the complicated rules.
msg204631 - (view)	Author: STINNER Victor (vstinner) *	Date: 2013-11-28 00:22
Python implements the Unicode standards. Except if Python failed to implement the standard correctly, the author should complain to the Unicode Consortium directly! http://www.unicode.org/contacts.html Example of data for the "ﬄ" character, U+FB04: FB04;LATIN SMALL LIGATURE FFL;Ll;0;L;<compat> 0066 0066 006C;;;;N;;;;; http://www.unicode.org/Public/6.0.0/ucd/UnicodeData.txt (I'm unable to decode these raw data :-))
msg204632 - (view)	Author: STINNER Victor (vstinner) *	Date: 2013-11-28 00:33
I don't understand the purpose of using reversed(). Don't use it to display a text backward. Handling bidirectional text requires more complex tools to display such text. See for example the pango library: https://developer.gnome.org/pango/stable/pango-Bidirectional-Text.html I don't see anything wrong with Python here, it just implements the Unicode standards, so I'm closing the issue as invalid.

History
Date	User	Action	Args
2022-04-11 14:57:54	admin	set	github: 64018
2013-11-28 14:26:49	ezio.melotti	set	nosy: + ezio.melotti components: + Unicode stage: needs patch -> resolved
2013-11-28 00:33:12	vstinner	set	status: open -> closed resolution: not a bug messages: + msg204632
2013-11-28 00:22:18	vstinner	set	nosy: + vstinner messages: + msg204631
2013-11-28 00:16:31	christian.heimes	set	messages: + msg204630
2013-11-28 00:07:37	christian.heimes	set	nosy: + christian.heimes messages: + msg204629
2013-11-27 23:51:14	larry	create