Message 220211 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	Jim.Jewett
Recipients	Jim.Jewett, Rosuav, docs@python, gvanrossum, ncoghlan, pitrou, python-dev, serhiy.storchaka, vstinner
Date	2014-06-10.23:20:46
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1402442446.68.0.0725472083721.issue21667@psf.upfronthosting.co.za>
In-reply-to

Content
And even my rewrite showed path dependency; a slight further improvement is to re-order encoding ahead of bytes. I also added a paragraph that I hope answers the speed issue. Proposal: A string is a sequence of Unicode code points. Strings can include any sequence of code points, including some which are semantically meaningless, or explicitly undefined. Python doesn't have a :c:type:`char` type; a single code point is represented as a string of length ``1``. The built-in function :func:`chr` translates an integer in the range ``U+0000 - U+10FFFF`` to the corresponding length ``1`` string object, and :func:`ord` does the reverse. :meth:`str.encode` provides a concrete representation (in the given text encoding) as a :class:`bytes` object suitable for transport and communication with non-Python utilities. :meth:`bytes.decode` decodes such byte sequences into text strings. .. impl-detail:: There are no methods exposing the internal representation of code points within a string. While the C-API provides some additional constraints on CPython, other implementations are free to use any representation that treats code points (as opposed to either code units or some normalized form of characters) as the unit of measure.

And even my rewrite showed path dependency; a slight further improvement is to re-order encoding ahead of bytes.  I also added a paragraph that I hope answers the speed issue.

Proposal:

A string is a sequence of Unicode code points.  Strings can include any sequence of code points, including some which are semantically meaningless, or explicitly undefined.

Python doesn't have a :c:type:`char` type; a single code point is represented as a string of length ``1``.  The built-in function :func:`chr` translates an integer in the range ``U+0000 - U+10FFFF`` to the corresponding length ``1`` string object, and :func:`ord` does the reverse.

:meth:`str.encode` provides a concrete representation (in the given text encoding) as a :class:`bytes` object suitable for transport and communication with non-Python utilities.  :meth:`bytes.decode` decodes such byte sequences into text strings.

.. impl-detail::  There are no methods exposing the internal representation of code points within a string.  While the C-API provides some additional constraints on CPython, other implementations are free to use any representation that treats code points (as opposed to either code units or some normalized form of characters) as the unit of measure.

History
Date	User	Action	Args
2014-06-10 23:20:46	Jim.Jewett	set	recipients: + Jim.Jewett, gvanrossum, ncoghlan, pitrou, vstinner, docs@python, python-dev, Rosuav, serhiy.storchaka
2014-06-10 23:20:46	Jim.Jewett	set	messageid: <1402442446.68.0.0725472083721.issue21667@psf.upfronthosting.co.za>
2014-06-10 23:20:46	Jim.Jewett	link	issue21667 messages
2014-06-10 23:20:46	Jim.Jewett	create