This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Title: In some UCS4 builds, sizeof(Py_UNICODE) could end up being more than 4.
Type: behavior Stage: patch review
Components: Unicode Versions: Python 3.3
Status: closed Resolution: fixed
Dependencies: 3098 Superseder:
Assigned To: Nosy List: BreamoreBoy, effbot, ezio.melotti, lemburg, loewis, mark.dickinson, pitrou, schuppenies, vstinner
Priority: normal Keywords: patch

Created on 2008-06-17 09:39 by schuppenies, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (6)
msg68310 - (view) Author: Robert Schuppenies (schuppenies) * (Python committer) Date: 2008-06-17 09:38
This issue is a branch from issue3098.

Below a summary of the discussion:

Antoine Pitrou wrote:
> It seems that in some UCS4 builds, sizeof(Py_UNICODE) could end
> up being more than 4 if the native int type is itself larger than 32
> bits; although the latter is probably quite rare (64-bit platforms are
> usually either LP64 or LLP64).

Marc-Andre Lemburg wrote:
> AFAIK, only Crays have this problem, but apart from that: I'd consider
> it a bug if sizeof(Py_UCS4) != 4.

Antoine Pitrou wrote:
> Perhaps a #error can be added to that effect?
> Something like (untested):
> #if SIZEOF_INT == 4
> typedef unsigned int Py_UCS4;
> #elif SIZEOF_LONG == 4
> typedef unsigned long Py_UCS4;
> #else
> #error Could not find a 4-byte integer type for Py_UCS4, aborting
> #endif

Marc-Andre Lemburg wrote:
> Sounds good !
> Python should really try to use uint32_t as fallback solution for
> UCS4 where available (and uint16_t for UCS2).
> We'd have to add an AC_TYPE_INT32_T and AC_TYPE_INT16_T check to
> configure:
> and could then use
> typedef uint32_t Py_UCS4
> and
> typedef uint16_t Py_UCS2
> Note that the code for supporting UCS2/UCS4 is not really all that
> clean. It was a quick sprint between Martin and Fredrik and appears
> to be only half-done... e.g. there currently is no Py_UCS2.
msg87088 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2009-05-04 00:01
I like the idea of using uint16_t and uint32_t. Unicode 5.1 contains 
approximately 1 million of codes (and 100,000 characters), so 21 bits 
are already enough to use the full Unicode 5.1 standard (released in 
April 2009). Use more than 32 bits for an unicode character is wasting 
msg87104 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2009-05-04 08:32
> We'd have to add an AC_TYPE_INT32_T and AC_TYPE_INT16_T check to
> configure:

AC_TYPE_INT32_T should already be there.  See also the code in
pyport.h that #defines HAVE_INT32_T and PY_INT32_T, and the
corresponding bits of PC/pyconfig.h.

It was recently pointed out that there are some issues with these
definitions when using a C++ compiler instead of a C compiler, since
then INT32_MAX is undefined.  (See the footnote to 7.18.2, para.1 of
msg110674 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2010-07-18 19:27
@Mark Dickinson you've shown some interest, could you run with this?
msg111868 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-07-28 22:59
This issue has no patch.
msg144616 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-09-29 19:17
The PEP 393 has been accepted: strings are now stored as PyUCS1*, PyUCS2* or PyUCS4*. The Py_UNICODE type still exist but is deprecated, and only used in the legacy API. Py_UNICODE is now always the wchar_t type, it cannot be unsigned int anymore. I hope that no platform chose to use wchar_t larger than 32 bits. Let' close this issue.
Date User Action Args
2022-04-11 14:56:35adminsetgithub: 47380
2011-09-29 19:17:32vstinnersetstatus: open -> closed
resolution: fixed
messages: + msg144616

versions: + Python 3.3, - Python 2.6, Python 3.0
2010-07-28 22:59:51vstinnersetmessages: + msg111868
2010-07-18 19:27:21BreamoreBoysetnosy: + BreamoreBoy
messages: + msg110674
2009-05-04 08:32:38mark.dickinsonsetnosy: + mark.dickinson
messages: + msg87104
2009-05-04 00:01:47vstinnersetmessages: + msg87088
2009-04-27 01:10:42ajaksu2setnosy: + vstinner, ezio.melotti
versions: + Python 2.6, Python 3.0
priority: normal
dependencies: + sys.sizeof test fails with wide unicode
keywords: + patch
stage: patch review
2008-06-17 09:59:15pitrousetnosy: + lemburg, loewis, effbot, pitrou
2008-06-17 09:39:08schuppeniescreate