Author Rhamphoryncus
Recipients Rhamphoryncus, amaury.forgeotdarc, bupjae, ezio.melotti, lemburg, vstinner
Date 2009-10-05.18:48:52
SpamBayes Score 1.10704e-06
Marked as misclassified No
Message-id <>
In-reply-to <>
On Mon, Oct 5, 2009 at 12:10, Marc-Andre Lemburg <> wrote:
> All this is just nitpicking, really. UCS2 is a character set,
> UTF-16 an encoding.

UCS is a character set, for most purposes synonymous with the Unicode
character set.  UCS-2 and UTF-16 are both encodings of that character
set.  However, UCS-2 can only represent the BMP, while UTF-16 can
represent the full range.

> If we were to implement Unicode using UTF-16 as storage format,
> we would not be able to store single lone surrogates, since these
> are not allowed in UTF-16. Ditto for unassigned ordinals, invalid
> code points, etc.

No.  Internal usage may become temporarily ill-formed, but this is a
compromise, and acceptable so long as we never export them to other

Not that I wouldn't *prefer* a system that wouldn't store lone
surrogates, but.. pragmatics prevail.

> Note that I wrote the PEP and worked on the implementation at a time
> when Unicode 2.x was still in use wide-spread use (mostly on Windows)
> and 3.0 was just being release:

I think you hit the nail on the head there.  10 years ago, unicode
meant something different than it does today.  That's reflected in PEP
100 and in the code.  Now it's time to move on, switch to the modern
terminology, modern usage, and modern specs.

> But all that is off-topic for this ticket, so please let's just
> stop such discussions.

It needs to be discussed somewhere.  It's a distraction from fixing
the bug, but at least it's more private here.  Would you prefer email?
Date User Action Args
2009-10-05 18:48:55Rhamphoryncussetrecipients: + Rhamphoryncus, lemburg, amaury.forgeotdarc, vstinner, ezio.melotti, bupjae
2009-10-05 18:48:53Rhamphoryncuslinkissue5127 messages
2009-10-05 18:48:52Rhamphoryncuscreate