classification
Title: float.__format__('n') fails with _PyUnicode_CheckConsistency assertion error for locales with non-ascii thousands separator
Type: Stage:
Components: Unicode Versions: Python 3.8, Python 3.7, Python 3.6
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: eric.smith, ezio.melotti, mark.dickinson, vstinner, xtreak
Priority: normal Keywords:

Created on 2018-06-25 08:23 by vstinner, last changed 2018-08-05 17:54 by ppperry.

Messages (3)
msg320403 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-06-25 08:23
Example:

vstinner@apu$ ./python
Python 3.8.0a0 (heads/master-dirty:bcd3a1a18d, Jun 23 2018, 10:31:03) 
[GCC 8.1.1 20180502 (Red Hat 8.1.1-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import locale
>>> locale.str(2.5)
'2.5'
>>> '{:n}'.format(2.5)
'2.5'
>>> locale.setlocale(locale.LC_ALL, '')
'fr_FR.UTF-8'
>>> locale.str(2.5)
'2,5'
>>> '{:n}'.format(2.5)
python: Objects/unicodeobject.c:474: _PyUnicode_CheckConsistency: Assertion `maxchar < 128' failed.
Aborted (core dumped)

Another example:

vstinner@apu$ ./python
Python 3.8.0a0 (heads/master-dirty:bcd3a1a18d, Jun 23 2018, 10:31:03) 
>>> import locale; locale.setlocale(locale.LC_ALL, '')
'fr_FR.UTF-8'
>>> (2.5).__format__('n')
python: Objects/unicodeobject.c:474: _PyUnicode_CheckConsistency: Assertion `maxchar < 128' failed.
Aborted (core dumped)


Result of my system Python 3.6 of Fedora 28:

vstinner@apu$ python3
Python 3.6.5 (default, Mar 29 2018, 18:20:46) 
[GCC 8.0.1 20180317 (Red Hat 8.0.1-0.19)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import locale
>>> locale.str(2.5)
'2.5'
>>> '{:n}'.format(2.5)
'2.5'
>>> locale.setlocale(locale.LC_ALL, '')
'fr_FR.UTF-8'
>>> locale.str(2.5)
'2,5'
>>> '{:n}'.format(2.5)
'°,5'
>>> '{:n}'.format(3.5)
'°,5'
>>> '{:n}'.format(33.5)
'°\x18,5'
>>> '{:n}'.format(333.5)
'°\x186,5'
msg320635 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-06-27 22:35
Aha, the problem occurs when the thousands separator code point is greater than 255.

On my Fedora 28 (glibc 2.27), it's U+202f:

vstinner@apu$ ./python
Python 3.8.0a0 (heads/master-dirty:492572715a, Jun 28 2018, 00:18:54) 
>>> import locale
>>> locale.setlocale(locale.LC_ALL, '')
'fr_FR.UTF-8'
>>> locale.localeconv()['thousands_sep']
'\u202f'

The bug is in _PyUnicode_InsertThousandsGrouping(): if thousands_sep kind is different than unicode kind, "data = _PyUnicode_AsKind(unicode, thousands_sep_kind);" is used, but later this memory is released. So the function writes into a temporary buffer which is then released. It doesn't work...

It seems like I introduced the regression 6 years ago in bpo-13706:

commit 90f50d4df9e21093f006427fd7ed11a0d704f792
Author: Victor Stinner <victor.stinner@haypocalc.com>
Date:   Fri Feb 24 01:44:47 2012 +0100

    Issue #13706: Fix format(float, "n") for locale with non-ASCII decimal point (e.g. ps_aF)
msg320669 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-06-28 14:45
> The bug is in _PyUnicode_InsertThousandsGrouping()

This function should take a _PyUnicodeWriter as input, not a PyUnicodeObject.
History
Date User Action Args
2018-08-05 17:54:55ppperrysettitle: float.__format__('n') fails with _PyUnicode_CheckConsistency assertion error -> float.__format__('n') fails with _PyUnicode_CheckConsistency assertion error for locales with non-ascii thousands separator
2018-08-05 12:45:57xtreaksetnosy: + xtreak
2018-06-28 14:45:48vstinnersetmessages: + msg320669
2018-06-28 13:57:20mark.dickinsonsetnosy: + mark.dickinson, eric.smith
2018-06-27 22:35:49vstinnersetmessages: + msg320635
2018-06-25 08:32:07vstinnersetversions: + Python 3.6, Python 3.7
2018-06-25 08:23:46vstinnercreate