Message 93604 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	lemburg
Recipients	Rhamphoryncus, amaury.forgeotdarc, bupjae, ezio.melotti, lemburg, vstinner
Date	2009-10-05.14:16:24
SpamBayes Score	3.330669e-16
Marked as misclassified	No
Message-id	<4AC9FFB7.1050204@egenix.com>
In-reply-to	<1254745724.36.0.0195575889541.issue5127@psf.upfronthosting.co.za>

Content
Amaury Forgeot d'Arc wrote: > > Amaury Forgeot d'Arc <amauryfa@gmail.com> added the comment: > >> We'd need to expose the UCS4 APIs in addition >> to those APIs and have the UCS2 APIs redirect to the UCS4 ones. > > Why have two names for the same function? it's Python 3, after all. It's not the same function... the UCS2 version would take a Py_UNICODE parameter, the UCS4 version a Py_UCS4 parameter. I don't understand the comment about Python 3.x. FWIW, we're no longer in the backwards incompatible changes are allowed mode for 3.x. > Or is this "no recompile" feature so important (as long as changes are > clearly shown to the user)? It does not work on Windows, FWIW. There are generally two options for API changes within a major release branch: 1. the changes are API backwards compatible and only the Python API version is changed 2. the changes are not API backwards compatible; in such a case, Python has to reject imports of old module (as it always does on Windows), so the Python API version has to be changed and the import mechanism must reject the import The second option was used when transitioning from 2.4 to 2.5 due to the Py_ssize_t changes. We could do the same for 2.7/3.2, but if it's just needed for this one change, then I'd rather stick to implementing the first option. >> I haven't checked, but it's certainly possible to have a code point >> use a non-BMP lower/upper/title case mapping, so this should be >> made possible as well, if we're going to make changes to the type >> database. > > OK, here is a new patch. Even if this does not happen with unicodedata > up to 5.1, the table has only 175 entries so memory usage is not > dramatically increased. > Py_UNICODE is no more used at all in unicodectype.c. Sorry, but this doesn't work: the functions have to return Py_UNICODE and raise an exception if the return value doesn't fit. Otherwise, you'd get completely wrong values in code downcasting the return value to Py_UNICODE on narrow builds. Another good reason to use two sets of APIs. The new set could indeed return Py_UCS4 values.

Amaury Forgeot d'Arc wrote:
> 
> Amaury Forgeot d'Arc <amauryfa@gmail.com> added the comment:
> 
>> We'd need to expose the UCS4 APIs *in addition*
>> to those APIs and have the UCS2 APIs redirect to the UCS4 ones.
> 
> Why have two names for the same function? it's Python 3, after all.

It's not the same function... the UCS2 version would take a
Py_UNICODE parameter, the UCS4 version a Py_UCS4 parameter.

I don't understand the comment about Python 3.x. FWIW, we're no
longer in the backwards incompatible changes are allowed mode
for 3.x.

> Or is this "no recompile" feature so important (as long as changes are
> clearly shown to the user)? It does not work on Windows, FWIW.

There are generally two options for API changes within a
major release branch:

 1. the changes are API backwards compatible and only the Python API
    version is changed

 2. the changes are not API backwards compatible; in such a case,
    Python has to reject imports of old module (as it always
    does on Windows), so the Python API version has to be changed
    *and* the import mechanism must reject the import

The second option was used when transitioning from 2.4 to 2.5 due
to the Py_ssize_t changes.

We could do the same for 2.7/3.2, but if it's just needed for this
one change, then I'd rather stick to implementing the first option.

>> I haven't checked, but it's certainly possible to have a code point
>> use a non-BMP lower/upper/title case mapping, so this should be
>> made possible as well, if we're going to make changes to the type
>> database.
> 
> OK, here is a new patch.  Even if this does not happen with unicodedata
> up to 5.1, the table has only 175 entries so memory usage is not
> dramatically increased.
> Py_UNICODE is no more used at all in unicodectype.c.

Sorry, but this doesn't work: the functions have to return Py_UNICODE
and raise an exception if the return value doesn't fit.

Otherwise, you'd get completely wrong values in code downcasting
the return value to Py_UNICODE on narrow builds.

Another good reason to use two sets of APIs. The new set could
indeed return Py_UCS4 values.

History
Date	User	Action	Args
2009-10-05 14:16:27	lemburg	set	recipients: + lemburg, amaury.forgeotdarc, Rhamphoryncus, vstinner, ezio.melotti, bupjae
2009-10-05 14:16:25	lemburg	link	issue5127 messages
2009-10-05 14:16:24	lemburg	create