Message 144568 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	ezio.melotti
Recipients	ezio.melotti, lemburg, loewis, vstinner
Date	2011-09-28.15:47:33
SpamBayes Score	5.1561308e-08
Marked as misclassified	No
Message-id	<1317224855.38.0.863349241023.issue13054@psf.upfronthosting.co.za>
In-reply-to

Content
Now that PEP 393 is in and the distinction between narrow and wide doesn't exist anymore, the value of sys.maxunicode should always be 0x10FFFF. sys.maxunicode currently uses PyUnicode_GetMax (Objects/unicodeobject.c:196) and still returns either 0x10FFFF if Py_UNICODE_WIDE is defined or 0xFFFF if it's not (and that should now mean that it's defined on Linux where wchar_t is 4 bytes, but not on Windows where it's 2 bytes (isn't this backward incompatible? if so it probably deserves another issue)). IIUC the difference between narrow and wide is gone for Python users, but it's still there for C users that use the old API, so changing PyUnicode_GetMax will most likely break their code. I therefore suggest to set sys.maxunicode to 0x10FFFF and to leave PyUnicode_GetMax as is. C users that switch to the new API should stop using PyUnicode_GetMax and it should be added along with the other deprecated functions in PEP 393. If sys.maxunicode becomes a constant, it won't be useful to determine if the build is narrow or wide anymore (that won't actually matter anymore, but this was the main use of sys.maxunicode), but it might still be useful to know the value of the highest codepoint. Therefore I think that sys.maxunicode can still stay around without being deprecated (its documentation should be fixed though).

Now that PEP 393 is in and the distinction between narrow and wide doesn't exist anymore, the value of sys.maxunicode should always be 0x10FFFF.

sys.maxunicode currently uses PyUnicode_GetMax (Objects/unicodeobject.c:196) and still returns either 0x10FFFF if  Py_UNICODE_WIDE is defined or 0xFFFF if it's not (and that should now mean that it's defined on Linux where wchar_t is 4 bytes, but not on Windows where it's 2 bytes (isn't this backward incompatible? if so it probably deserves another issue)).

IIUC the difference between narrow and wide is gone for Python users, but it's still there for C users that use the old API, so changing PyUnicode_GetMax will most likely break their code.

I therefore suggest to set sys.maxunicode to 0x10FFFF and to leave PyUnicode_GetMax as is.

C users that switch to the new API should stop using PyUnicode_GetMax and it should be added along with the other deprecated functions in PEP 393.
If sys.maxunicode becomes a constant, it won't be useful to determine if the build is narrow or wide anymore (that won't actually matter anymore, but this was the main use of sys.maxunicode), but it might still be useful to know the value of the highest codepoint.  Therefore I think that sys.maxunicode can still stay around without being deprecated (its documentation should be fixed though).

History
Date	User	Action	Args
2011-09-28 15:47:35	ezio.melotti	set	recipients: + ezio.melotti, lemburg, loewis, vstinner
2011-09-28 15:47:35	ezio.melotti	set	messageid: <1317224855.38.0.863349241023.issue13054@psf.upfronthosting.co.za>
2011-09-28 15:47:34	ezio.melotti	link	issue13054 messages
2011-09-28 15:47:33	ezio.melotti	create