Message 93594 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	lemburg
Recipients	Rhamphoryncus, amaury.forgeotdarc, bupjae, ezio.melotti, lemburg, vstinner
Date	2009-10-05.11:41:15
SpamBayes Score	2.6007085e-11
Marked as misclassified	No
Message-id	<4AC9DB5A.1030204@egenix.com>
In-reply-to	<1254740974.88.0.268460461251.issue5127@psf.upfronthosting.co.za>

Content
Amaury Forgeot d'Arc wrote: > > Amaury Forgeot d'Arc <amauryfa@gmail.com> added the comment: > >> we should make sure that it's not possible to load an extension >> compiled with 3.1 in 3.2 to prevent segfaults and buffer overruns. > > This is the case with this patch: today all these functions > (_PyUnicode_IsAlpha, _PyUnicode_ToLowercase) are actually #defines to > _PyUnicodeUCS2_* or _PyUnicodeUCS4_. > The patch removes the #defines: 3.1 modules that call > _PyUnicodeUCS4_IsAlpha wouldn't load into a 3.2 interpreter. True, but we can do better. For narrow builds, the API currently exposes the UCS2 APIs. We'd need to expose the UCS4 APIs in addition* to those APIs and have the UCS2 APIs redirect to the UCS4 ones. For wide builds, we don't need to change anything. >> The change affects the Unicode type database which is implemented >> in unicodectype.c, not the Unicode database, which already uses UCS4. > > Are you referring to the _PyUnicode_TypeRecord structure? > The first three fields only contains values up to 65535, so they could > use "unsigned short" even for UCS4 builds. I haven't checked, but it's certainly possible to have a code point use a non-BMP lower/upper/title case mapping, so this should be made possible as well, if we're going to make changes to the type database.

Amaury Forgeot d'Arc wrote:
> 
> Amaury Forgeot d'Arc <amauryfa@gmail.com> added the comment:
> 
>> we should make sure that it's not possible to load an extension
>> compiled with 3.1 in 3.2 to prevent segfaults and buffer overruns.
> 
> This is the case with this patch: today all these functions
> (_PyUnicode_IsAlpha, _PyUnicode_ToLowercase) are actually #defines to
> _PyUnicodeUCS2_* or _PyUnicodeUCS4_*.
> The patch removes the #defines: 3.1 modules that call
> _PyUnicodeUCS4_IsAlpha wouldn't load into a 3.2 interpreter.

True, but we can do better. For narrow builds, the API currently
exposes the UCS2 APIs. We'd need to expose the UCS4 APIs *in addition*
to those APIs and have the UCS2 APIs redirect to the UCS4 ones.

For wide builds, we don't need to change anything.

>> The change affects the Unicode type database which is implemented
>> in unicodectype.c, not the Unicode database, which already uses UCS4.
> 
> Are you referring to the _PyUnicode_TypeRecord structure?
> The first three fields only contains values up to 65535, so they could
> use "unsigned short" even for UCS4 builds.

I haven't checked, but it's certainly possible to have a code point
use a non-BMP lower/upper/title case mapping, so this should be
made possible as well, if we're going to make changes to the type
database.

History
Date	User	Action	Args
2009-10-05 11:41:17	lemburg	set	recipients: + lemburg, amaury.forgeotdarc, Rhamphoryncus, vstinner, ezio.melotti, bupjae
2009-10-05 11:41:15	lemburg	link	issue5127 messages
2009-10-05 11:41:15	lemburg	create