This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Remove redundant note about surrogates in string escape doc
Type: behavior Stage: resolved
Components: Documentation Versions: Python 3.6, Python 3.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: docs@python Nosy List: berker.peksag, docs@python, ezio.melotti, python-dev, r.david.murray, steven.daprano, terry.reedy
Priority: normal Keywords:

Created on 2013-07-27 16:12 by steven.daprano, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Messages (8)
msg193787 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2013-07-27 16:12
The documentation for string escapes suggests that \uxxxx escapes can be used to generate characters in the Supplementary Multilingual Planes by using surrogate pairs:

"Individual code units which form parts of a surrogate pair can be encoded using this escape sequence."

http://docs.python.org/3/reference/lexical_analysis.html#string-and-bytes-literals

E.g. in Python 3.2:

py> '\uD80C\uDC80' == '\U00013080'
True

but that is no longer the case in Python 3.3. I suggest the documentation should just remove that note.
msg193790 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2013-07-27 20:03
3.3.2:
>>> '\uD80C\uDC80' == '\U00013080'
False

The statement that surrogate code units can be encoded this way is still true. Indeed, it is now the only way to get such code units into a string. The suggestion that a pair will make an astral char is now false. The sentence could be changed to 

"Individual surrogate code units can be encoded using this escape sequence."

On the other hand, the same is true of *any* BMP char, including all the *other* non-graphic chars that can only be entered this way. So I think the sentence, if not deleted, should be replaced by what seems to me a more useful (complete) statement.

"Any Basic Multilingual  Plane (BMP) codepoint can be encoded using this escape sequence."
msg193860 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2013-07-29 12:27
Python 3.2.3 (default, Jun 15 2013, 14:13:52) 
[GCC 4.7.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> '\uD80C\uDC80'
'\ud80c\udc80'
>>> '\uD80C\uDC80' == '\U00013080'
False
msg193870 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2013-07-29 15:03
On 29/07/13 22:27, R. David Murray wrote:

>>>> '\uD80C\uDC80' == '\U00013080'
> False

Are you running a wide build? In a narrow build, it returns True.
msg193881 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2013-07-29 16:58
Probably.  I think the default build on Gentoo is wide.

That seems to make the existing text even more incorrect :)
msg194671 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2013-08-08 13:34
I think it's OK to remove the sentence.

Converting a surrogate pair to a non-BMP char is something that works only while decoding a UTF-16 byte sequence.  Surrogates are invalid in UTF-8/32, and while dealing with Unicode strings, surrogates have no special meaning and are no different from any other codepoint, whether they are lone or paired.
msg264080 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2016-04-24 00:13
New changeset 79e7808c3941 by Berker Peksag in branch '3.5':
Issue #18572: Remove redundant note about surrogates in string escape doc
https://hg.python.org/cpython/rev/79e7808c3941

New changeset ee815d3535f5 by Berker Peksag in branch 'default':
Issue #18572: Remove redundant note about surrogates in string escape doc
https://hg.python.org/cpython/rev/ee815d3535f5
msg264081 - (view) Author: Berker Peksag (berker.peksag) * (Python committer) Date: 2016-04-24 00:14
I removed the sentence in 3.5 and default branches.
History
Date User Action Args
2022-04-11 14:57:48adminsetgithub: 62772
2016-04-24 00:14:43berker.peksagsetstatus: open -> closed

versions: + Python 3.5, Python 3.6, - Python 3.3, Python 3.4
nosy: + berker.peksag

messages: + msg264081
resolution: fixed
stage: needs patch -> resolved
2016-04-24 00:13:50python-devsetnosy: + python-dev
messages: + msg264080
2013-08-08 13:34:14ezio.melottisetnosy: + ezio.melotti
messages: + msg194671
2013-07-29 16:58:04r.david.murraysetmessages: + msg193881
2013-07-29 15:03:29steven.dapranosetmessages: + msg193870
2013-07-29 12:27:06r.david.murraysetnosy: + r.david.murray
messages: + msg193860
2013-07-27 20:03:56terry.reedysetnosy: + terry.reedy
messages: + msg193790
2013-07-27 19:05:45terry.reedysetstage: needs patch
type: behavior
versions: + Python 3.4
2013-07-27 16:12:13steven.dapranocreate