classification
Title: sys.maxunicode value after PEP-393
Type: behavior Stage: resolved
Components: Interpreter Core, Unicode Versions: Python 3.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: ezio.melotti Nosy List: ezio.melotti, haypo, lemburg, loewis, pitrou, python-dev
Priority: high Keywords: patch

Created on 2011-09-28 15:47 by ezio.melotti, last changed 2011-10-04 17:47 by ezio.melotti. This issue is now closed.

Files
File name Uploaded Description Edit
issue13054.diff ezio.melotti, 2011-09-28 16:36 review
issue13054-2.diff ezio.melotti, 2011-09-28 23:22 Fix sys.maxunicode checks in the stdlib review
Messages (10)
msg144568 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2011-09-28 15:47
Now that PEP 393 is in and the distinction between narrow and wide doesn't exist anymore, the value of sys.maxunicode should always be 0x10FFFF.

sys.maxunicode currently uses PyUnicode_GetMax (Objects/unicodeobject.c:196) and still returns either 0x10FFFF if  Py_UNICODE_WIDE is defined or 0xFFFF if it's not (and that should now mean that it's defined on Linux where wchar_t is 4 bytes, but not on Windows where it's 2 bytes (isn't this backward incompatible? if so it probably deserves another issue)).

IIUC the difference between narrow and wide is gone for Python users, but it's still there for C users that use the old API, so changing PyUnicode_GetMax will most likely break their code.

I therefore suggest to set sys.maxunicode to 0x10FFFF and to leave PyUnicode_GetMax as is.

C users that switch to the new API should stop using PyUnicode_GetMax and it should be added along with the other deprecated functions in PEP 393.
If sys.maxunicode becomes a constant, it won't be useful to determine if the build is narrow or wide anymore (that won't actually matter anymore, but this was the main use of sys.maxunicode), but it might still be useful to know the value of the highest codepoint.  Therefore I think that sys.maxunicode can still stay around without being deprecated (its documentation should be fixed though).
msg144570 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2011-09-28 16:36
Attached initial patch that sets the value of sys.maxunicode to 0x10FFFF, adds a test, and document the change in both the sys.rst doc and in the 3.3 whatsnew.

The patch doesn't include any deprecation.  If we decide to deprecate something the PEP and possibly the code should be updated.
msg144574 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2011-09-28 17:52
Sounds all fine to me.

As the PEP specifies, all deprecation will only be on paper for now, not in the code. Adding PyUnicode_GetMax to the list sounds fine to me as well.
msg144578 - (view) Author: Roundup Robot (python-dev) Date: 2011-09-28 21:18
New changeset 606652491366 by Ezio Melotti in branch 'default':
#13054: sys.maxunicode is now always 0x10FFFF.
http://hg.python.org/cpython/rev/606652491366
msg144579 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2011-09-28 23:22
Attached a second patch that fixes checks like:
if sys.maxunicode == 65535:
    ...

There are a couple of places (e.g. test_bigmem) where I'm not sure what the best fix is, so I added a couple of XXX in the patch.  If you have any suggestion please comment either here on in the review page.
msg144580 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2011-09-28 23:28
I added PyUnicode_GetMax to the list of deprecated functions in PEP 393 in http://hg.python.org/peps/rev/9a154edf18e6.
(I'm also adding Antoine to the nosy because he might know something about test_bigmem.)
msg144611 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-09-29 18:13
As said on IRC, unicodesize and character_size should be 1 before the test is something like 'x'*1. Or you can just remove this constant, it's not very useful to have a constant equal to 1 :-)
msg144618 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-09-29 19:37
I think there's no point in deprecating a function (or data) with a perfectly valid definition.
As for test_bigmem, better keep character_size hardwired to 1 (if all tests really use only ASCII chars, which I'm not sure they do).
msg144897 - (view) Author: Roundup Robot (python-dev) Date: 2011-10-04 16:06
New changeset f39b26ca7f3d by Ezio Melotti in branch 'default':
#13054: fix usage of sys.maxunicode after PEP-393.
http://hg.python.org/cpython/rev/f39b26ca7f3d
msg144914 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2011-10-04 17:47
The buildbot seems happy, so I'm closing this.
Antoine already took care of test_bigmem, and Victor opened #13100 for sre_compile.
History
Date User Action Args
2011-10-04 17:47:26ezio.melottisetstatus: open -> closed
resolution: fixed
messages: + msg144914

stage: patch review -> resolved
2011-10-04 16:06:24python-devsetmessages: + msg144897
2011-09-29 19:37:55pitrousetmessages: + msg144618
2011-09-29 18:13:04hayposetmessages: + msg144611
2011-09-28 23:28:18ezio.melottisetnosy: + pitrou

messages: + msg144580
stage: test needed -> patch review
2011-09-28 23:22:45ezio.melottisetfiles: + issue13054-2.diff

messages: + msg144579
2011-09-28 21:18:33python-devsetnosy: + python-dev
messages: + msg144578
2011-09-28 17:52:51loewissetmessages: + msg144574
2011-09-28 16:36:36ezio.melottisetfiles: + issue13054.diff
keywords: + patch
messages: + msg144570
2011-09-28 15:47:34ezio.melotticreate