Author Jim.Jewett
Recipients Jim.Jewett, Rosuav, docs@python, gvanrossum, ncoghlan, pitrou, python-dev, serhiy.storchaka, vstinner
Date 2014-06-10.23:20:46
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1402442446.68.0.0725472083721.issue21667@psf.upfronthosting.co.za>
In-reply-to
Content
And even my rewrite showed path dependency; a slight further improvement is to re-order encoding ahead of bytes.  I also added a paragraph that I hope answers the speed issue.

Proposal:

A string is a sequence of Unicode code points.  Strings can include any sequence of code points, including some which are semantically meaningless, or explicitly undefined.

Python doesn't have a :c:type:`char` type; a single code point is represented as a string of length ``1``.  The built-in function :func:`chr` translates an integer in the range ``U+0000 - U+10FFFF`` to the corresponding length ``1`` string object, and :func:`ord` does the reverse.

:meth:`str.encode` provides a concrete representation (in the given text encoding) as a :class:`bytes` object suitable for transport and communication with non-Python utilities.  :meth:`bytes.decode` decodes such byte sequences into text strings.

.. impl-detail::  There are no methods exposing the internal representation of code points within a string.  While the C-API provides some additional constraints on CPython, other implementations are free to use any representation that treats code points (as opposed to either code units or some normalized form of characters) as the unit of measure.
History
Date User Action Args
2014-06-10 23:20:46Jim.Jewettsetrecipients: + Jim.Jewett, gvanrossum, ncoghlan, pitrou, vstinner, docs@python, python-dev, Rosuav, serhiy.storchaka
2014-06-10 23:20:46Jim.Jewettsetmessageid: <1402442446.68.0.0725472083721.issue21667@psf.upfronthosting.co.za>
2014-06-10 23:20:46Jim.Jewettlinkissue21667 messages
2014-06-10 23:20:46Jim.Jewettcreate