Message 214025 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	pitrou
Recipients	benjamin.peterson, docs@python, eric.araujo, ezio.melotti, gwideman, lemburg, pitrou, tshepang, vstinner
Date	2014-03-18.21:38:21
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1395178701.85.0.72133892695.issue20906@psf.upfronthosting.co.za>
In-reply-to

Content
> Agreed. How about "In documentation such as the current article..." It's better, but how about simply "In this article"? > I concur with reducing unnecessary abstraction. No sure what you mean > by "true form". Do you mean show the glyph which the code point > represents? Or the sequence of bytes? Or display the code point value > in decimal? I mean the glyph. > In the older schemes, "encoding" referred to the one mapping: chars <--> > numbers in particular binary format. In Unicode, "encoding" refers only to > the mapping: code point numbers <--> binary format. It does not refer to > the chars <--> code point mapping. (At least, I think that's the case. > Regardless, the two mappings need to be rigorously distinguished.) This is true, but in this HOWTO's context the term "code system" is a confusing distraction, IMHO. For all intents and purposes, iso-8859-1 and friends are encodings (and this is how Python actually names them). > On review, there are many points in the article that muddy this up. For > example, "Unicode started out using 16-bit characters instead of 8-bit > characters". Saying "so-an-so-bit characters" about Unicode, in the > current article, is either wrong, or very confusing. So it should say "16-bit code points" instead, right? > The subject of one-chararacter-to-one-code mapping is important > (normalization etc), though perhaps beyond the current article. But I > think the article should avoid suggesting that many-to-one or one-to-many > scenarios are common. Agreed.

> Agreed. How about "In documentation such as the current article..."

It's better, but how about simply "In this article"?

> I concur with reducing unnecessary abstraction. No sure what you mean 
> by "true form". Do you mean show the glyph which the code point
> represents? Or the sequence of bytes? Or display the code point value 
> in decimal? 

I mean the glyph.

> In the older schemes, "encoding" referred to the one mapping: chars <-->
> numbers in particular binary format. In Unicode, "encoding" refers only to 
> the mapping: code point numbers <--> binary format. It does not refer to
> the chars <--> code point mapping. (At least, I think that's the case.
> Regardless, the two mappings need to be rigorously distinguished.)

This is true, but in this HOWTO's context the term "code system" is a confusing distraction, IMHO. For all intents and purposes, iso-8859-1 and friends *are* encodings (and this is how Python actually names them).

> On review, there are many points in the article that muddy this up.  For
> example, "Unicode started out using 16-bit characters instead of 8-bit
> characters". Saying "so-an-so-bit characters" about Unicode, in the
> current article, is either wrong, or very confusing.

So it should say "16-bit code points" instead, right?

> The subject of one-chararacter-to-one-code mapping is important
> (normalization etc), though perhaps beyond the current article. But I
> think the article should avoid suggesting that many-to-one or one-to-many 
> scenarios are common.

Agreed.

History
Date	User	Action	Args
2014-03-18 21:38:21	pitrou	set	recipients: + pitrou, lemburg, vstinner, benjamin.peterson, ezio.melotti, eric.araujo, docs@python, tshepang, gwideman
2014-03-18 21:38:21	pitrou	set	messageid: <1395178701.85.0.72133892695.issue20906@psf.upfronthosting.co.za>
2014-03-18 21:38:21	pitrou	link	issue20906 messages
2014-03-18 21:38:21	pitrou	create