In some UCS4 builds, sizeof(Py_UNICODE) could end up being more than 4. #47380

schuppenies · 2008-06-17T09:39:08Z

BPO	3130
Nosy	@malemburg, @loewis, @mdickinson, @pitrou, @vstinner, @ezio-melotti
Dependencies	bpo-3098: sys.sizeof test fails with wide unicode

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2011-09-29.19:17:32.116>
created_at = <Date 2008-06-17.09:39:08.272>
labels = ['type-bug', 'expert-unicode']
title = 'In some UCS4 builds, sizeof(Py_UNICODE) could end up being more than 4.'
updated_at = <Date 2011-09-29.19:17:32.113>
user = 'https://bugs.python.org/schuppenies'

bugs.python.org fields:

activity = <Date 2011-09-29.19:17:32.113>
actor = 'vstinner'
assignee = 'none'
closed = True
closed_date = <Date 2011-09-29.19:17:32.116>
closer = 'vstinner'
components = ['Unicode']
creation = <Date 2008-06-17.09:39:08.272>
creator = 'schuppenies'
dependencies = ['3098']
files = []
hgrepos = []
issue_num = 3130
keywords = ['patch']
message_count = 6.0
messages = ['68310', '87088', '87104', '110674', '111868', '144616']
nosy_count = 9.0
nosy_names = ['lemburg', 'loewis', 'effbot', 'mark.dickinson', 'pitrou', 'vstinner', 'schuppenies', 'ezio.melotti', 'BreamoreBoy']
pr_nums = []
priority = 'normal'
resolution = 'fixed'
stage = 'patch review'
status = 'closed'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue3130'
versions = ['Python 3.3']

schuppenies · 2008-06-17T09:38:57Z

This issue is a branch from bpo-3098.

Below a summary of the discussion:

Antoine Pitrou wrote:

It seems that in some UCS4 builds, sizeof(Py_UNICODE) could end
up being more than 4 if the native int type is itself larger than 32
bits; although the latter is probably quite rare (64-bit platforms are
usually either LP64 or LLP64).

Marc-Andre Lemburg wrote:

AFAIK, only Crays have this problem, but apart from that: I'd consider
it a bug if sizeof(Py_UCS4) != 4.

Antoine Pitrou wrote:

Perhaps a #error can be added to that effect?
Something like (untested):

#if SIZEOF_INT == 4
typedef unsigned int Py_UCS4;
#elif SIZEOF_LONG == 4
typedef unsigned long Py_UCS4;
#else
#error Could not find a 4-byte integer type for Py_UCS4, aborting
#endif

Marc-Andre Lemburg wrote:

Sounds good !

Python should really try to use uint32_t as fallback solution for
UCS4 where available (and uint16_t for UCS2).

We'd have to add an AC_TYPE_INT32_T and AC_TYPE_INT16_T check to
configure:

http://www.gnu.org/software/autoconf/manual/html_node/Particular-Types.html#Particular-Types

and could then use

typedef uint32_t Py_UCS4

and

typedef uint16_t Py_UCS2

Note that the code for supporting UCS2/UCS4 is not really all that
clean. It was a quick sprint between Martin and Fredrik and appears
to be only half-done... e.g. there currently is no Py_UCS2.

vstinner · 2009-05-04T00:01:47Z

I like the idea of using uint16_t and uint32_t. Unicode 5.1 contains
approximately 1 million of codes (and 100,000 characters), so 21 bits
are already enough to use the full Unicode 5.1 standard (released in
April 2009). Use more than 32 bits for an unicode character is wasting
memory.

mdickinson · 2009-05-04T08:32:38Z

We'd have to add an AC_TYPE_INT32_T and AC_TYPE_INT16_T check to
configure:

AC_TYPE_INT32_T should already be there. See also the code in
pyport.h that #defines HAVE_INT32_T and PY_INT32_T, and the
corresponding bits of PC/pyconfig.h.

It was recently pointed out that there are some issues with these
definitions when using a C++ compiler instead of a C compiler, since
then INT32_MAX is undefined. (See the footnote to 7.18.2, para.1 of
C99.)

BreamoreBoy · 2010-07-18T19:27:22Z

@mark Dickinson you've shown some interest, could you run with this?

vstinner · 2010-07-28T22:59:52Z

This issue has no patch.

vstinner · 2011-09-29T19:17:32Z

The PEP-393 has been accepted: strings are now stored as PyUCS1*, PyUCS2* or PyUCS4*. The Py_UNICODE type still exist but is deprecated, and only used in the legacy API. Py_UNICODE is now always the wchar_t type, it cannot be unsigned int anymore. I hope that no platform chose to use wchar_t larger than 32 bits. Let' close this issue.

schuppenies mannequin added topic-unicode type-bug An unexpected behavior, bug, or error labels Jun 17, 2008

vstinner closed this as completed Sep 29, 2011

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

In some UCS4 builds, sizeof(Py_UNICODE) could end up being more than 4. #47380

In some UCS4 builds, sizeof(Py_UNICODE) could end up being more than 4. #47380

schuppenies mannequin commented Jun 17, 2008

schuppenies mannequin commented Jun 17, 2008

vstinner commented May 4, 2009

mdickinson commented May 4, 2009

BreamoreBoy mannequin commented Jul 18, 2010

vstinner commented Jul 28, 2010

vstinner commented Sep 29, 2011

In some UCS4 builds, sizeof(Py_UNICODE) could end up being more than 4. #47380

In some UCS4 builds, sizeof(Py_UNICODE) could end up being more than 4. #47380

Comments

schuppenies mannequin commented Jun 17, 2008

schuppenies mannequin commented Jun 17, 2008

vstinner commented May 4, 2009

mdickinson commented May 4, 2009

BreamoreBoy mannequin commented Jul 18, 2010

vstinner commented Jul 28, 2010

vstinner commented Sep 29, 2011