Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In some UCS4 builds, sizeof(Py_UNICODE) could end up being more than 4. #47380

Closed
schuppenies mannequin opened this issue Jun 17, 2008 · 6 comments
Closed

In some UCS4 builds, sizeof(Py_UNICODE) could end up being more than 4. #47380

schuppenies mannequin opened this issue Jun 17, 2008 · 6 comments
Labels
topic-unicode type-bug An unexpected behavior, bug, or error

Comments

@schuppenies
Copy link
Mannequin

schuppenies mannequin commented Jun 17, 2008

BPO 3130
Nosy @malemburg, @loewis, @mdickinson, @pitrou, @vstinner, @ezio-melotti
Dependencies
  • bpo-3098: sys.sizeof test fails with wide unicode
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2011-09-29.19:17:32.116>
    created_at = <Date 2008-06-17.09:39:08.272>
    labels = ['type-bug', 'expert-unicode']
    title = 'In some UCS4 builds, sizeof(Py_UNICODE) could end up being more than 4.'
    updated_at = <Date 2011-09-29.19:17:32.113>
    user = 'https://bugs.python.org/schuppenies'

    bugs.python.org fields:

    activity = <Date 2011-09-29.19:17:32.113>
    actor = 'vstinner'
    assignee = 'none'
    closed = True
    closed_date = <Date 2011-09-29.19:17:32.116>
    closer = 'vstinner'
    components = ['Unicode']
    creation = <Date 2008-06-17.09:39:08.272>
    creator = 'schuppenies'
    dependencies = ['3098']
    files = []
    hgrepos = []
    issue_num = 3130
    keywords = ['patch']
    message_count = 6.0
    messages = ['68310', '87088', '87104', '110674', '111868', '144616']
    nosy_count = 9.0
    nosy_names = ['lemburg', 'loewis', 'effbot', 'mark.dickinson', 'pitrou', 'vstinner', 'schuppenies', 'ezio.melotti', 'BreamoreBoy']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'patch review'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue3130'
    versions = ['Python 3.3']

    @schuppenies
    Copy link
    Mannequin Author

    schuppenies mannequin commented Jun 17, 2008

    This issue is a branch from bpo-3098.

    Below a summary of the discussion:

    Antoine Pitrou wrote:

    It seems that in some UCS4 builds, sizeof(Py_UNICODE) could end
    up being more than 4 if the native int type is itself larger than 32
    bits; although the latter is probably quite rare (64-bit platforms are
    usually either LP64 or LLP64).

    Marc-Andre Lemburg wrote:

    AFAIK, only Crays have this problem, but apart from that: I'd consider
    it a bug if sizeof(Py_UCS4) != 4.

    Antoine Pitrou wrote:

    Perhaps a #error can be added to that effect?
    Something like (untested):

    #if SIZEOF_INT == 4
    typedef unsigned int Py_UCS4;
    #elif SIZEOF_LONG == 4
    typedef unsigned long Py_UCS4;
    #else
    #error Could not find a 4-byte integer type for Py_UCS4, aborting
    #endif

    Marc-Andre Lemburg wrote:

    Sounds good !

    Python should really try to use uint32_t as fallback solution for
    UCS4 where available (and uint16_t for UCS2).

    We'd have to add an AC_TYPE_INT32_T and AC_TYPE_INT16_T check to
    configure:

    http://www.gnu.org/software/autoconf/manual/html_node/Particular-Types.html#Particular-Types

    and could then use

    typedef uint32_t Py_UCS4

    and

    typedef uint16_t Py_UCS2

    Note that the code for supporting UCS2/UCS4 is not really all that
    clean. It was a quick sprint between Martin and Fredrik and appears
    to be only half-done... e.g. there currently is no Py_UCS2.

    @schuppenies schuppenies mannequin added topic-unicode type-bug An unexpected behavior, bug, or error labels Jun 17, 2008
    @vstinner
    Copy link
    Member

    vstinner commented May 4, 2009

    I like the idea of using uint16_t and uint32_t. Unicode 5.1 contains
    approximately 1 million of codes (and 100,000 characters), so 21 bits
    are already enough to use the full Unicode 5.1 standard (released in
    April 2009). Use more than 32 bits for an unicode character is wasting
    memory.

    @mdickinson
    Copy link
    Member

    We'd have to add an AC_TYPE_INT32_T and AC_TYPE_INT16_T check to
    configure:

    AC_TYPE_INT32_T should already be there. See also the code in
    pyport.h that #defines HAVE_INT32_T and PY_INT32_T, and the
    corresponding bits of PC/pyconfig.h.

    It was recently pointed out that there are some issues with these
    definitions when using a C++ compiler instead of a C compiler, since
    then INT32_MAX is undefined. (See the footnote to 7.18.2, para.1 of
    C99.)

    @BreamoreBoy
    Copy link
    Mannequin

    BreamoreBoy mannequin commented Jul 18, 2010

    @mark Dickinson you've shown some interest, could you run with this?

    @vstinner
    Copy link
    Member

    This issue has no patch.

    @vstinner
    Copy link
    Member

    The PEP-393 has been accepted: strings are now stored as PyUCS1*, PyUCS2* or PyUCS4*. The Py_UNICODE type still exist but is deprecated, and only used in the legacy API. Py_UNICODE is now always the wchar_t type, it cannot be unsigned int anymore. I hope that no platform chose to use wchar_t larger than 32 bits. Let' close this issue.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    topic-unicode type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants