Message 121488 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	terry.reedy
Recipients	akuchling, belopolsky, eric.araujo, georg.brandl, terry.reedy
Date	2010-11-18.19:41:53
SpamBayes Score	1.084044e-11
Marked as misclassified	No
Message-id	<1290109315.27.0.672337666392.issue4153@psf.upfronthosting.co.za>
In-reply-to

Content
Thanks for persisting with this. Looking at the patch: @@ -65,7 +63,7 @@ goal was to have Unicode contain the alphabets for every single human language. It turns out that even 16 bits isn't enough to meet that goal, and the modern Unicode specification uses a wider range of codes, 0-1,114,111 (0x10ffff in -base-16). +base 16). I visually parse 0-1,114,111 as 0-1, 114, 111. So I think either the commas should be removed or extra spaces are needed: 0-1114111 or 0 - 1,114,111. In your recent (and excellent) chr/ord doc patch, you used (or stayed with) 'hexadecimal' versus 'base 16'. Do we have a standard? I think I prefer the former. -character with value 0x12ca (4810 decimal). The Unicode standard contains a lot +character with value 0x12ca (4,810 decimal). The Unicode standard contains a lot I prefer without the added comma. >>> b'\x80abc'.decode("utf-8", "replace") - '\ufffdabc' + 'ï¿½abc' Three replacements (i with diaeresis, upside-down ?, 1/2) for one bad char looks wrong. With IDLE I get '�abc' (? in hexagon, codepoint 65533). Perhaps something just went wrong to patch from your file to my browser window. @@ -281,10 +279,10 @@ built-in :func:`ord` function that takes a one-character Unicode string and returns the code point value:: You fixed chr/ord doc, need to fix references thereto in this doc. -point. The ``\U`` escape sequence is similar, but expects 8 hex digits, not 4:: +point. The ``\U`` escape sequence is similar, but expects eight base 16 +digits, not four:: I really think of them as hex or hexadecimal digits, just as 0-9 are decimal, not base 10 digits. >>> s = "a\xac\u1234\u20ac\U00008000" ^^^^ two-digit hex escape

Thanks for persisting with this. Looking at the patch:

@@ -65,7 +63,7 @@
 goal was to have Unicode contain the alphabets for every single human language.
 It turns out that even 16 bits isn't enough to meet that goal, and the modern
 Unicode specification uses a wider range of codes, 0-1,114,111 (0x10ffff in
-base-16).
+base 16).

I visually parse 0-1,114,111 as 0-1, 114, 111. So I think either the commas should be removed or extra spaces are needed: 0-1114111 or 0 - 1,114,111. In your recent (and excellent) chr/ord doc patch, you used (or stayed with) 'hexadecimal' versus 'base 16'. Do we have a standard? I *think* I prefer the former.

-character with value 0x12ca (4810 decimal).  The Unicode standard contains a lot
+character with value 0x12ca (4,810 decimal).  The Unicode standard contains a lot

I prefer without the added comma.

     >>> b'\x80abc'.decode("utf-8", "replace")
-    '\ufffdabc'
+    'ï¿½abc'

Three replacements (i with diaeresis, upside-down ?, 1/2) for one bad char looks wrong. With IDLE I get '�abc' (? in hexagon, codepoint 65533). Perhaps something just went wrong to patch from your file to my browser window.

@@ -281,10 +279,10 @@
 built-in :func:`ord` function that takes a one-character Unicode string and
 returns the code point value::

You fixed chr/ord doc, need to fix references thereto in this doc.

-point.  The ``\U`` escape sequence is similar, but expects 8 hex digits, not 4::
+point.  The ``\U`` escape sequence is similar, but expects eight base 16
+digits, not four::

I really think of them as hex or hexadecimal digits, just as 0-9 are decimal, not base 10 digits.


 
     >>> s = "a\xac\u1234\u20ac\U00008000"
               ^^^^ two-digit hex escape

History
Date	User	Action	Args
2010-11-18 19:41:55	terry.reedy	set	recipients: + terry.reedy, akuchling, georg.brandl, belopolsky, eric.araujo
2010-11-18 19:41:55	terry.reedy	set	messageid: <1290109315.27.0.672337666392.issue4153@psf.upfronthosting.co.za>
2010-11-18 19:41:54	terry.reedy	link	issue4153 messages
2010-11-18 19:41:53	terry.reedy	create