This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author lemburg
Recipients dangra, ezio.melotti, lemburg, sjmachin
Date 2010-04-01.14:12:59
SpamBayes Score 9.417073e-10
Marked as misclassified No
Message-id <4BB4A9EA.7010005@egenix.com>
In-reply-to <1270129658.49.0.127634285615.issue8271@psf.upfronthosting.co.za>
Content
John Machin wrote:
> 
> John Machin <sjmachin@users.sourceforge.net> added the comment:
> 
> @lemburg: RFC 2279 was obsoleted by RFC 3629 over 6 years ago. 

I know.

> The standard now says 21 bits is it. 

It says that the current Unicode codespace only uses 21 bits. In the
early days 16 bits were considered enough, so it wouldn't surprise me,
if they extend that range again at some point in the future - after
all, leaving 11 bits unused in UCS-4 is a huge waste of space.

If you have a reference that the Unicode consortium has decided
to stay with that limit forever, please quote it.

> F5-FF are declared to be invalid. I don't understand what you mean by "supporting those possibilities". The code is correctly issuing an error message. The goal of supporting the new resyncing and FFFD-emitting rules might be better met however by throwing away the code in the default clause and instead merely setting the entries for F5-FF in the utf8_code_length array to zero.

Fair enough. Let's do that.

The reference in the table should then be updated to RFC 3629.
History
Date User Action Args
2010-04-01 14:13:01lemburgsetrecipients: + lemburg, sjmachin, ezio.melotti, dangra
2010-04-01 14:12:59lemburglinkissue8271 messages
2010-04-01 14:12:59lemburgcreate