Message 185963 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	Rosuav
Recipients	Rosuav
Date	2013-04-03.21:54:31
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1365026071.28.0.865581650366.issue17629@psf.upfronthosting.co.za>
In-reply-to

Content
As of PEP 393, a string's width is recorded in its header - effectively, a marker that says whether the highest codepoint in the string is >0xFFFF, >0xFF, or <=0xFF. This is, on some occasions, useful to know; for instance, when testing string performance, it's handy to be able to very quickly throw something down that, without scanning the contents of all the strings used, can identify the width spread. A similar facility is provided by Pike, which has a similar flexible string representation: http://pike.lysator.liu.se/generated/manual/modref/ex/7.2_3A_3A/String/width.html accessible to a script as String.width(). Since this is not something frequently needed, it would make sense to hide it away in the sys or inspect modules, or possibly in strings or as a method on the string itself. Currently, the best way to do this is something like: def str_width(s): width=1 for ch in map(ord,s): if n > 0xFFFF: return 4 if n > 0xFF: width=2 return width which necessitates a scan of the entire string, unless it has an astral character.

As of PEP 393, a string's width is recorded in its header - effectively, a marker that says whether the highest codepoint in the string is >0xFFFF, >0xFF, or <=0xFF. This is, on some occasions, useful to know; for instance, when testing string performance, it's handy to be able to very quickly throw something down that, without scanning the contents of all the strings used, can identify the width spread.

A similar facility is provided by Pike, which has a similar flexible string representation: http://pike.lysator.liu.se/generated/manual/modref/ex/7.2_3A_3A/String/width.html accessible to a script as String.width().

Since this is not something frequently needed, it would make sense to hide it away in the sys or inspect modules, or possibly in strings or as a method on the string itself.

Currently, the best way to do this is something like:

def str_width(s):
  width=1
  for ch in map(ord,s):
    if n > 0xFFFF: return 4
    if n > 0xFF: width=2
  return width

which necessitates a scan of the entire string, unless it has an astral character.

History
Date	User	Action	Args
2013-04-03 21:54:31	Rosuav	set	recipients: + Rosuav
2013-04-03 21:54:31	Rosuav	set	messageid: <1365026071.28.0.865581650366.issue17629@psf.upfronthosting.co.za>
2013-04-03 21:54:31	Rosuav	link	issue17629 messages
2013-04-03 21:54:31	Rosuav	create