Author lemburg
Recipients Rhamphoryncus, amaury.forgeotdarc, bupjae, ezio.melotti, lemburg, vstinner
Date 2009-10-05.11:41:15
SpamBayes Score 2.60071e-11
Marked as misclassified No
Message-id <4AC9DB5A.1030204@egenix.com>
In-reply-to <1254740974.88.0.268460461251.issue5127@psf.upfronthosting.co.za>
Content
Amaury Forgeot d'Arc wrote:
> 
> Amaury Forgeot d'Arc <amauryfa@gmail.com> added the comment:
> 
>> we should make sure that it's not possible to load an extension
>> compiled with 3.1 in 3.2 to prevent segfaults and buffer overruns.
> 
> This is the case with this patch: today all these functions
> (_PyUnicode_IsAlpha, _PyUnicode_ToLowercase) are actually #defines to
> _PyUnicodeUCS2_* or _PyUnicodeUCS4_*.
> The patch removes the #defines: 3.1 modules that call
> _PyUnicodeUCS4_IsAlpha wouldn't load into a 3.2 interpreter.

True, but we can do better. For narrow builds, the API currently
exposes the UCS2 APIs. We'd need to expose the UCS4 APIs *in addition*
to those APIs and have the UCS2 APIs redirect to the UCS4 ones.

For wide builds, we don't need to change anything.

>> The change affects the Unicode type database which is implemented
>> in unicodectype.c, not the Unicode database, which already uses UCS4.
> 
> Are you referring to the _PyUnicode_TypeRecord structure?
> The first three fields only contains values up to 65535, so they could
> use "unsigned short" even for UCS4 builds.

I haven't checked, but it's certainly possible to have a code point
use a non-BMP lower/upper/title case mapping, so this should be
made possible as well, if we're going to make changes to the type
database.
History
Date User Action Args
2009-10-05 11:41:17lemburgsetrecipients: + lemburg, amaury.forgeotdarc, Rhamphoryncus, vstinner, ezio.melotti, bupjae
2009-10-05 11:41:15lemburglinkissue5127 messages
2009-10-05 11:41:15lemburgcreate