Message 122296 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	belopolsky
Recipients	belopolsky, eric.smith, pitrou
Date	2010-11-24.19:06:15
SpamBayes Score	3.024129e-08
Marked as misclassified	No
Message-id	<AANLkTi=_vgBhVBnt3+4DVt7xOPEruyA=GssoZcyL_Cjx@mail.gmail.com>
In-reply-to	<1290612823.89.0.808626412832.issue10521@psf.upfronthosting.co.za>

Content
On Wed, Nov 24, 2010 at 10:33 AM, Antoine Pitrou <report@bugs.python.org> wrote: .. > The question is, what should it do with such an input? I think the rule for such functions should be that if input.encode('utf-8') is the same on wide and narrow builds, then the output.encode('utf-8') should be the same. > Pretend it's a single char (but other chars in the source string won't get the same treatment)? Yes, and surrogate pairs in the source string should count for one char as well. > Treat it as a two-char string (but then center() and friends should logically be > extended to accept strings of arbitrary lengths)? No. For better or worse, on wide builds these methods effectively operate on code points. They don't interpret multi-code-point- graphemes or take grapheme width into account: -------------------- 123 -------------------- Application code has to ascertain that it is dealing with with fixed width characters in the target font before using these methods for text alignment.

On Wed, Nov 24, 2010 at 10:33 AM, Antoine Pitrou <report@bugs.python.org> wrote:
..
> The question is, what should it do with such an input?

I think the rule for such functions should be that if
input.encode('utf-8') is the same on wide and narrow builds, then the
output.encode('utf-8') should be the same.

> Pretend it's a single char (but other chars in the source string won't get the same treatment)?

Yes, *and* surrogate pairs in the source string should count for one
char as well.

> Treat it as a two-char string (but then center() and friends should logically be
> extended to accept strings of arbitrary lengths)?

No.  For better or worse, on wide builds these methods effectively
operate on code points.  They don't interpret multi-code-point-
graphemes or take grapheme width into account:

--------------------
123
--------------------

Application code has to ascertain that it is dealing with with fixed
width characters in the target font before using these methods for
text alignment.

History
Date	User	Action	Args
2010-11-24 19:06:17	belopolsky	set	recipients: + belopolsky, pitrou, eric.smith
2010-11-24 19:06:15	belopolsky	link	issue10521 messages
2010-11-24 19:06:15	belopolsky	create