Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

float.__format__('n') fails with _PyUnicode_CheckConsistency assertion error for locales with non-ascii thousands separator #78135

Closed
vstinner opened this issue Jun 25, 2018 · 14 comments
Labels
3.7 (EOL) end of life 3.8 only security fixes topic-unicode

Comments

@vstinner
Copy link
Member

BPO 33954
Nosy @mdickinson, @vstinner, @ericvsmith, @ezio-melotti, @serhiy-storchaka, @tirkarthi
PRs
  • bpo-33954: Fix _PyUnicode_InsertThousandsGrouping() #10623
  • [3.7] bpo-33954: Fix _PyUnicode_InsertThousandsGrouping() (GH-10623) #10717
  • [3.7] bpo-33954: Fix _PyUnicode_InsertThousandsGrouping() (GH-10623) #10718
  • [3.6] bpo-33954: Fix _PyUnicode_InsertThousandsGrouping() (GH-10623) (GH-10718) #10720
  • bpo-33954: Fix compiler warning in _PyUnicode_FastFill() #10737
  • [3.7] bpo-33954: Rewrite FILL() macro of unicodeobject.c #10738
  • [3.6] bpo-33954: Rewrite FILL() macro of unicodeobject.c #10740
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2018-11-26.16:06:00.710>
    created_at = <Date 2018-06-25.08:23:46.008>
    labels = ['3.7', '3.8', 'expert-unicode']
    title = "float.__format__('n') fails with _PyUnicode_CheckConsistency assertion error for locales with non-ascii thousands separator"
    updated_at = <Date 2018-11-27.13:32:20.395>
    user = 'https://github.com/vstinner'

    bugs.python.org fields:

    activity = <Date 2018-11-27.13:32:20.395>
    actor = 'vstinner'
    assignee = 'none'
    closed = True
    closed_date = <Date 2018-11-26.16:06:00.710>
    closer = 'vstinner'
    components = ['Unicode']
    creation = <Date 2018-06-25.08:23:46.008>
    creator = 'vstinner'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 33954
    keywords = ['patch']
    message_count = 14.0
    messages = ['320403', '320635', '320669', '330157', '330211', '330429', '330431', '330444', '330445', '330494', '330510', '330511', '330517', '330518']
    nosy_count = 6.0
    nosy_names = ['mark.dickinson', 'vstinner', 'eric.smith', 'ezio.melotti', 'serhiy.storchaka', 'xtreak']
    pr_nums = ['10623', '10717', '10718', '10720', '10737', '10738', '10740']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue33954'
    versions = ['Python 3.6', 'Python 3.7', 'Python 3.8']

    @vstinner
    Copy link
    Member Author

    Example:

    vstinner@apu$ ./python
    Python 3.8.0a0 (heads/master-dirty:bcd3a1a18d, Jun 23 2018, 10:31:03) 
    [GCC 8.1.1 20180502 (Red Hat 8.1.1-1)] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import locale
    >>> locale.str(2.5)
    '2.5'
    >>> '{:n}'.format(2.5)
    '2.5'
    >>> locale.setlocale(locale.LC_ALL, '')
    'fr_FR.UTF-8'
    >>> locale.str(2.5)
    '2,5'
    >>> '{:n}'.format(2.5)
    python: Objects/unicodeobject.c:474: _PyUnicode_CheckConsistency: Assertion `maxchar < 128' failed.
    Aborted (core dumped)

    Another example:

    vstinner@apu$ ./python
    Python 3.8.0a0 (heads/master-dirty:bcd3a1a18d, Jun 23 2018, 10:31:03) 
    >>> import locale; locale.setlocale(locale.LC_ALL, '')
    'fr_FR.UTF-8'
    >>> (2.5).__format__('n')
    python: Objects/unicodeobject.c:474: _PyUnicode_CheckConsistency: Assertion `maxchar < 128' failed.
    Aborted (core dumped)

    Result of my system Python 3.6 of Fedora 28:

    vstinner@apu$ python3
    Python 3.6.5 (default, Mar 29 2018, 18:20:46) 
    [GCC 8.0.1 20180317 (Red Hat 8.0.1-0.19)] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import locale
    >>> locale.str(2.5)
    '2.5'
    >>> '{:n}'.format(2.5)
    '2.5'
    >>> locale.setlocale(locale.LC_ALL, '')
    'fr_FR.UTF-8'
    >>> locale.str(2.5)
    '2,5'
    >>> '{:n}'.format(2.5)
    '°,5'
    >>> '{:n}'.format(3.5)
    '°,5'
    >>> '{:n}'.format(33.5)
    '°\x18,5'
    >>> '{:n}'.format(333.5)
    '°\x186,5'

    @vstinner vstinner added 3.8 only security fixes topic-unicode 3.7 (EOL) end of life labels Jun 25, 2018
    @vstinner
    Copy link
    Member Author

    Aha, the problem occurs when the thousands separator code point is greater than 255.

    On my Fedora 28 (glibc 2.27), it's U+202f:

    vstinner@apu$ ./python
    Python 3.8.0a0 (heads/master-dirty:492572715a, Jun 28 2018, 00:18:54) 
    >>> import locale
    >>> locale.setlocale(locale.LC_ALL, '')
    'fr_FR.UTF-8'
    >>> locale.localeconv()['thousands_sep']
    '\u202f'

    The bug is in _PyUnicode_InsertThousandsGrouping(): if thousands_sep kind is different than unicode kind, "data = _PyUnicode_AsKind(unicode, thousands_sep_kind);" is used, but later this memory is released. So the function writes into a temporary buffer which is then released. It doesn't work...

    It seems like I introduced the regression 6 years ago in bpo-13706:

    commit 90f50d4
    Author: Victor Stinner <victor.stinner@haypocalc.com>
    Date: Fri Feb 24 01:44:47 2012 +0100

    Issue bpo-13706: Fix format(float, "n") for locale with non-ASCII decimal point (e.g. ps_aF)
    

    @vstinner
    Copy link
    Member Author

    The bug is in _PyUnicode_InsertThousandsGrouping()

    This function should take a _PyUnicodeWriter as input, not a PyUnicodeObject.

    @pppery pppery mannequin changed the title float.__format__('n') fails with _PyUnicode_CheckConsistency assertion error float.__format__('n') fails with _PyUnicode_CheckConsistency assertion error for locales with non-ascii thousands separator Aug 5, 2018
    @vstinner
    Copy link
    Member Author

    Minimum reproducer, on Fedora 29:

    import locale
    locale.setlocale(locale.LC_ALL, 'fr_FR.UTF-8')
    print(ascii('{:n}'.format(1.5)))

    Current result:

    'H,5'

    Output with PR 10623:

    '1,5'

    @vstinner
    Copy link
    Member Author

    My private test suite for locales:
    https://github.com/vstinner/misc/blob/master/python/test_all_locales.py

    Since each platform uses a different locale database, it's hard to write reliable portable and future-proof tests :-(

    My tests only work on specific Windows, macOS, FreeBSD and Linux versions.

    @vstinner
    Copy link
    Member Author

    New changeset 59423e3 by Victor Stinner in branch 'master':
    bpo-33954: Fix _PyUnicode_InsertThousandsGrouping() (GH-10623)
    59423e3

    @vstinner
    Copy link
    Member Author

    New changeset 6f5fa1b by Victor Stinner in branch '3.7':
    bpo-33954: Fix _PyUnicode_InsertThousandsGrouping() (GH-10623) (GH-10718)
    6f5fa1b

    @vstinner
    Copy link
    Member Author

    New changeset fc4a44b by Victor Stinner in branch '3.6':
    bpo-33954: Fix _PyUnicode_InsertThousandsGrouping() (GH-10623) (GH-10718) (GH-10720)
    fc4a44b

    @vstinner
    Copy link
    Member Author

    Python 3.6, 3.7 and master have been fixed.

    @serhiy-storchaka
    Copy link
    Member

    Objects/unicodeobject.c: In function ‘_PyUnicode_FastFill’:
    Objects/unicodeobject.c:10126:24: warning: passing argument 2 of ‘unicode_fill’ discards ‘const’ qualifier from pointer target type [-Wdiscarded-qualifiers]
    unicode_fill(kind, data, fill_char, start, length);
    ^~~~
    Objects/unicodeobject.c:224:1: note: expected ‘void *’ but argument is of type ‘const void *’
    unicode_fill(enum PyUnicode_Kind kind, void *data, Py_UCS4 value,
    ^~~~~~~~~~~~
    In file included from /usr/include/wchar.h:850:0,
    from ./Include/unicodeobject.h:97,
    from ./Include/Python.h:87,
    from ./Modules/getpath.c:3:

    @vstinner
    Copy link
    Member Author

    New changeset 163403a by Victor Stinner in branch 'master':
    bpo-33954: Fix compiler warning in _PyUnicode_FastFill() (GH-10737)
    163403a

    @vstinner
    Copy link
    Member Author

    New changeset 7f9fb0f by Victor Stinner in branch '3.7':
    bpo-33954: Rewrite FILL() macro of unicodeobject.c (GH-10738)
    7f9fb0f

    @vstinner
    Copy link
    Member Author

    New changeset 54fa83e by Victor Stinner in branch '3.6':
    bpo-33954: Rewrite FILL() macro of unicodeobject.c (GH-10740)
    54fa83e

    @vstinner
    Copy link
    Member Author

    Serhiy Storchaka:
    """
    Objects/unicodeobject.c: In function ‘_PyUnicode_FastFill’:
    Objects/unicodeobject.c:10126:24: warning: passing argument 2 of ‘unicode_fill’ discards ‘const’ qualifier from pointer target type [-Wdiscarded-qualifiers]
    """

    Yeah, I was aware of the warning. I had a local branch but I started to make many unrelated changes and so I lost track of the issue :-)

    The warning should now be fixed. Thanks for the report.

    Note: it was a real bug which exists since at least Python 3.6 ;-)

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life 3.8 only security fixes topic-unicode
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants