float.format('n') fails with _PyUnicode_CheckConsistency assertion error for locales with non-ascii thousands separator #78135

vstinner · 2018-06-25T08:23:46Z

BPO	33954
Nosy	@mdickinson, @vstinner, @ericvsmith, @ezio-melotti, @serhiy-storchaka, @tirkarthi
PRs	bpo-33954: Fix _PyUnicode_InsertThousandsGrouping() #10623 [3.7] bpo-33954: Fix _PyUnicode_InsertThousandsGrouping() (GH-10623) #10717 [3.7] bpo-33954: Fix _PyUnicode_InsertThousandsGrouping() (GH-10623) #10718 [3.6] bpo-33954: Fix _PyUnicode_InsertThousandsGrouping() (GH-10623) (GH-10718) #10720 bpo-33954: Fix compiler warning in _PyUnicode_FastFill() #10737 [3.7] bpo-33954: Rewrite FILL() macro of unicodeobject.c #10738 [3.6] bpo-33954: Rewrite FILL() macro of unicodeobject.c #10740

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2018-11-26.16:06:00.710>
created_at = <Date 2018-06-25.08:23:46.008>
labels = ['3.7', '3.8', 'expert-unicode']
title = "float.__format__('n') fails with _PyUnicode_CheckConsistency assertion error for locales with non-ascii thousands separator"
updated_at = <Date 2018-11-27.13:32:20.395>
user = 'https://github.com/vstinner'

bugs.python.org fields:

activity = <Date 2018-11-27.13:32:20.395>
actor = 'vstinner'
assignee = 'none'
closed = True
closed_date = <Date 2018-11-26.16:06:00.710>
closer = 'vstinner'
components = ['Unicode']
creation = <Date 2018-06-25.08:23:46.008>
creator = 'vstinner'
dependencies = []
files = []
hgrepos = []
issue_num = 33954
keywords = ['patch']
message_count = 14.0
messages = ['320403', '320635', '320669', '330157', '330211', '330429', '330431', '330444', '330445', '330494', '330510', '330511', '330517', '330518']
nosy_count = 6.0
nosy_names = ['mark.dickinson', 'vstinner', 'eric.smith', 'ezio.melotti', 'serhiy.storchaka', 'xtreak']
pr_nums = ['10623', '10717', '10718', '10720', '10737', '10738', '10740']
priority = 'normal'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = None
url = 'https://bugs.python.org/issue33954'
versions = ['Python 3.6', 'Python 3.7', 'Python 3.8']

vstinner · 2018-06-25T08:23:46Z

Example:

vstinner@apu$ ./python
Python 3.8.0a0 (heads/master-dirty:bcd3a1a18d, Jun 23 2018, 10:31:03) 
[GCC 8.1.1 20180502 (Red Hat 8.1.1-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import locale
>>> locale.str(2.5)
'2.5'
>>> '{:n}'.format(2.5)
'2.5'
>>> locale.setlocale(locale.LC_ALL, '')
'fr_FR.UTF-8'
>>> locale.str(2.5)
'2,5'
>>> '{:n}'.format(2.5)
python: Objects/unicodeobject.c:474: _PyUnicode_CheckConsistency: Assertion `maxchar < 128' failed.
Aborted (core dumped)

Another example:

vstinner@apu$ ./python
Python 3.8.0a0 (heads/master-dirty:bcd3a1a18d, Jun 23 2018, 10:31:03) 
>>> import locale; locale.setlocale(locale.LC_ALL, '')
'fr_FR.UTF-8'
>>> (2.5).__format__('n')
python: Objects/unicodeobject.c:474: _PyUnicode_CheckConsistency: Assertion `maxchar < 128' failed.
Aborted (core dumped)

Result of my system Python 3.6 of Fedora 28:

vstinner@apu$ python3
Python 3.6.5 (default, Mar 29 2018, 18:20:46) 
[GCC 8.0.1 20180317 (Red Hat 8.0.1-0.19)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import locale
>>> locale.str(2.5)
'2.5'
>>> '{:n}'.format(2.5)
'2.5'
>>> locale.setlocale(locale.LC_ALL, '')
'fr_FR.UTF-8'
>>> locale.str(2.5)
'2,5'
>>> '{:n}'.format(2.5)
'°,5'
>>> '{:n}'.format(3.5)
'°,5'
>>> '{:n}'.format(33.5)
'°\x18,5'
>>> '{:n}'.format(333.5)
'°\x186,5'

vstinner · 2018-06-27T22:35:49Z

Aha, the problem occurs when the thousands separator code point is greater than 255.

On my Fedora 28 (glibc 2.27), it's U+202f:

vstinner@apu$ ./python
Python 3.8.0a0 (heads/master-dirty:492572715a, Jun 28 2018, 00:18:54) 
>>> import locale
>>> locale.setlocale(locale.LC_ALL, '')
'fr_FR.UTF-8'
>>> locale.localeconv()['thousands_sep']
'\u202f'

The bug is in _PyUnicode_InsertThousandsGrouping(): if thousands_sep kind is different than unicode kind, "data = _PyUnicode_AsKind(unicode, thousands_sep_kind);" is used, but later this memory is released. So the function writes into a temporary buffer which is then released. It doesn't work...

It seems like I introduced the regression 6 years ago in bpo-13706:

commit 90f50d4
Author: Victor Stinner <victor.stinner@haypocalc.com>
Date: Fri Feb 24 01:44:47 2012 +0100

Issue bpo-13706: Fix format(float, "n") for locale with non-ASCII decimal point (e.g. ps_aF)

vstinner · 2018-06-28T14:45:48Z

The bug is in _PyUnicode_InsertThousandsGrouping()

This function should take a _PyUnicodeWriter as input, not a PyUnicodeObject.

vstinner · 2018-11-20T22:00:43Z

Minimum reproducer, on Fedora 29:

import locale
locale.setlocale(locale.LC_ALL, 'fr_FR.UTF-8')
print(ascii('{:n}'.format(1.5)))

Current result:

'H,5'

Output with PR 10623:

'1,5'

vstinner · 2018-11-21T17:19:08Z

My private test suite for locales:
https://github.com/vstinner/misc/blob/master/python/test_all_locales.py

Since each platform uses a different locale database, it's hard to write reliable portable and future-proof tests :-(

My tests only work on specific Windows, macOS, FreeBSD and Linux versions.

vstinner · 2018-11-26T12:40:04Z

New changeset 59423e3 by Victor Stinner in branch 'master':
bpo-33954: Fix _PyUnicode_InsertThousandsGrouping() (GH-10623)
59423e3

vstinner · 2018-11-26T13:17:07Z

New changeset 6f5fa1b by Victor Stinner in branch '3.7':
bpo-33954: Fix _PyUnicode_InsertThousandsGrouping() (GH-10623) (GH-10718)
6f5fa1b

vstinner · 2018-11-26T16:03:39Z

New changeset fc4a44b by Victor Stinner in branch '3.6':
bpo-33954: Fix _PyUnicode_InsertThousandsGrouping() (GH-10623) (GH-10718) (GH-10720)
fc4a44b

vstinner · 2018-11-26T16:06:01Z

Python 3.6, 3.7 and master have been fixed.

serhiy-storchaka · 2018-11-27T07:10:08Z

Objects/unicodeobject.c: In function ‘_PyUnicode_FastFill’:
Objects/unicodeobject.c:10126:24: warning: passing argument 2 of ‘unicode_fill’ discards ‘const’ qualifier from pointer target type [-Wdiscarded-qualifiers]
unicode_fill(kind, data, fill_char, start, length);
^~~~
Objects/unicodeobject.c:224:1: note: expected ‘void *’ but argument is of type ‘const void *’
unicode_fill(enum PyUnicode_Kind kind, void *data, Py_UCS4 value,
^~~~~~~~~~~~
In file included from /usr/include/wchar.h:850:0,
from ./Include/unicodeobject.h:97,
from ./Include/Python.h:87,
from ./Modules/getpath.c:3:

vstinner · 2018-11-27T11:41:20Z

New changeset 163403a by Victor Stinner in branch 'master':
bpo-33954: Fix compiler warning in _PyUnicode_FastFill() (GH-10737)
163403a

vstinner · 2018-11-27T11:42:08Z

New changeset 7f9fb0f by Victor Stinner in branch '3.7':
bpo-33954: Rewrite FILL() macro of unicodeobject.c (GH-10738)
7f9fb0f

vstinner · 2018-11-27T13:31:01Z

New changeset 54fa83e by Victor Stinner in branch '3.6':
bpo-33954: Rewrite FILL() macro of unicodeobject.c (GH-10740)
54fa83e

vstinner · 2018-11-27T13:32:20Z

Serhiy Storchaka:
"""
Objects/unicodeobject.c: In function ‘_PyUnicode_FastFill’:
Objects/unicodeobject.c:10126:24: warning: passing argument 2 of ‘unicode_fill’ discards ‘const’ qualifier from pointer target type [-Wdiscarded-qualifiers]
"""

Yeah, I was aware of the warning. I had a local branch but I started to make many unrelated changes and so I lost track of the issue :-)

The warning should now be fixed. Thanks for the report.

Note: it was a real bug which exists since at least Python 3.6 ;-)

vstinner added 3.8 only security fixes topic-unicode 3.7 (EOL) end of life labels Jun 25, 2018

pppery mannequin changed the title ~~float.__format__('n') fails with _PyUnicode_CheckConsistency assertion error~~ float.__format__('n') fails with _PyUnicode_CheckConsistency assertion error for locales with non-ascii thousands separator Aug 5, 2018

vstinner closed this as completed Nov 26, 2018

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

float.format('n') fails with _PyUnicode_CheckConsistency assertion error for locales with non-ascii thousands separator #78135

float.format('n') fails with _PyUnicode_CheckConsistency assertion error for locales with non-ascii thousands separator #78135

vstinner commented Jun 25, 2018

vstinner commented Jun 25, 2018

vstinner commented Jun 27, 2018

vstinner commented Jun 28, 2018

vstinner commented Nov 20, 2018

vstinner commented Nov 21, 2018

vstinner commented Nov 26, 2018

vstinner commented Nov 26, 2018

vstinner commented Nov 26, 2018

vstinner commented Nov 26, 2018

serhiy-storchaka commented Nov 27, 2018

vstinner commented Nov 27, 2018

vstinner commented Nov 27, 2018

vstinner commented Nov 27, 2018

vstinner commented Nov 27, 2018

float.__format__('n') fails with _PyUnicode_CheckConsistency assertion error for locales with non-ascii thousands separator #78135

float.__format__('n') fails with _PyUnicode_CheckConsistency assertion error for locales with non-ascii thousands separator #78135

Comments

vstinner commented Jun 25, 2018

vstinner commented Jun 25, 2018

vstinner commented Jun 27, 2018

vstinner commented Jun 28, 2018

vstinner commented Nov 20, 2018

vstinner commented Nov 21, 2018

vstinner commented Nov 26, 2018

vstinner commented Nov 26, 2018

vstinner commented Nov 26, 2018

vstinner commented Nov 26, 2018

serhiy-storchaka commented Nov 27, 2018

vstinner commented Nov 27, 2018

vstinner commented Nov 27, 2018

vstinner commented Nov 27, 2018

vstinner commented Nov 27, 2018

float.format('n') fails with _PyUnicode_CheckConsistency assertion error for locales with non-ascii thousands separator #78135

float.format('n') fails with _PyUnicode_CheckConsistency assertion error for locales with non-ascii thousands separator #78135