Author Guillaume Sanchez
Recipients Guillaume Sanchez
Date 2017-06-20.19:15:21
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1497986122.28.0.540580196076.issue30717@psf.upfronthosting.co.za>
In-reply-to
Content
"a⃑".center(width=5, fillchar=".")
produces
'..a⃑.' instead of '..a⃑..'

The reason is that "a⃑" is composed of two code points (2 UCS4 chars), one 'a' and one combining code point "above arrow". str.center() counts the size of the string and fills it both sides with `fillchar` until the size reaches `width`. However, this size is certainly intended to be the number of characters and not the number of code points.

The correct way to count characters is to use the grapheme clustering algorithm from UAX TR29.

Turns out I implemented this myself already, and might do the PR if asked so, with a little help to make the C <-> Python glue.

Thanks for your time.
History
Date User Action Args
2017-06-20 19:15:22Guillaume Sanchezsetrecipients: + Guillaume Sanchez
2017-06-20 19:15:22Guillaume Sanchezsetmessageid: <1497986122.28.0.540580196076.issue30717@psf.upfronthosting.co.za>
2017-06-20 19:15:22Guillaume Sanchezlinkissue30717 messages
2017-06-20 19:15:21Guillaume Sanchezcreate