Author ezio.melotti
Recipients ezio.melotti, lemburg, loewis, vstinner
Date 2011-09-28.15:47:33
SpamBayes Score 5.15613e-08
Marked as misclassified No
Message-id <1317224855.38.0.863349241023.issue13054@psf.upfronthosting.co.za>
In-reply-to
Content
Now that PEP 393 is in and the distinction between narrow and wide doesn't exist anymore, the value of sys.maxunicode should always be 0x10FFFF.

sys.maxunicode currently uses PyUnicode_GetMax (Objects/unicodeobject.c:196) and still returns either 0x10FFFF if  Py_UNICODE_WIDE is defined or 0xFFFF if it's not (and that should now mean that it's defined on Linux where wchar_t is 4 bytes, but not on Windows where it's 2 bytes (isn't this backward incompatible? if so it probably deserves another issue)).

IIUC the difference between narrow and wide is gone for Python users, but it's still there for C users that use the old API, so changing PyUnicode_GetMax will most likely break their code.

I therefore suggest to set sys.maxunicode to 0x10FFFF and to leave PyUnicode_GetMax as is.

C users that switch to the new API should stop using PyUnicode_GetMax and it should be added along with the other deprecated functions in PEP 393.
If sys.maxunicode becomes a constant, it won't be useful to determine if the build is narrow or wide anymore (that won't actually matter anymore, but this was the main use of sys.maxunicode), but it might still be useful to know the value of the highest codepoint.  Therefore I think that sys.maxunicode can still stay around without being deprecated (its documentation should be fixed though).
History
Date User Action Args
2011-09-28 15:47:35ezio.melottisetrecipients: + ezio.melotti, lemburg, loewis, vstinner
2011-09-28 15:47:35ezio.melottisetmessageid: <1317224855.38.0.863349241023.issue13054@psf.upfronthosting.co.za>
2011-09-28 15:47:34ezio.melottilinkissue13054 messages
2011-09-28 15:47:33ezio.melotticreate