classification
Title: float.__format__('n') fails with _PyUnicode_CheckConsistency assertion error for locales with non-ascii thousands separator
Type: Stage: resolved
Components: Unicode Versions: Python 3.8, Python 3.7, Python 3.6
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: eric.smith, ezio.melotti, mark.dickinson, serhiy.storchaka, vstinner, xtreak
Priority: normal Keywords: patch

Created on 2018-06-25 08:23 by vstinner, last changed 2018-11-27 13:32 by vstinner. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 10623 merged vstinner, 2018-11-20 21:35
PR 10717 closed miss-islington, 2018-11-26 12:40
PR 10718 merged vstinner, 2018-11-26 12:45
PR 10720 merged vstinner, 2018-11-26 14:16
PR 10737 merged vstinner, 2018-11-27 11:12
PR 10738 merged vstinner, 2018-11-27 11:20
PR 10740 merged vstinner, 2018-11-27 11:23
Messages (14)
msg320403 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-06-25 08:23
Example:

vstinner@apu$ ./python
Python 3.8.0a0 (heads/master-dirty:bcd3a1a18d, Jun 23 2018, 10:31:03) 
[GCC 8.1.1 20180502 (Red Hat 8.1.1-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import locale
>>> locale.str(2.5)
'2.5'
>>> '{:n}'.format(2.5)
'2.5'
>>> locale.setlocale(locale.LC_ALL, '')
'fr_FR.UTF-8'
>>> locale.str(2.5)
'2,5'
>>> '{:n}'.format(2.5)
python: Objects/unicodeobject.c:474: _PyUnicode_CheckConsistency: Assertion `maxchar < 128' failed.
Aborted (core dumped)

Another example:

vstinner@apu$ ./python
Python 3.8.0a0 (heads/master-dirty:bcd3a1a18d, Jun 23 2018, 10:31:03) 
>>> import locale; locale.setlocale(locale.LC_ALL, '')
'fr_FR.UTF-8'
>>> (2.5).__format__('n')
python: Objects/unicodeobject.c:474: _PyUnicode_CheckConsistency: Assertion `maxchar < 128' failed.
Aborted (core dumped)


Result of my system Python 3.6 of Fedora 28:

vstinner@apu$ python3
Python 3.6.5 (default, Mar 29 2018, 18:20:46) 
[GCC 8.0.1 20180317 (Red Hat 8.0.1-0.19)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import locale
>>> locale.str(2.5)
'2.5'
>>> '{:n}'.format(2.5)
'2.5'
>>> locale.setlocale(locale.LC_ALL, '')
'fr_FR.UTF-8'
>>> locale.str(2.5)
'2,5'
>>> '{:n}'.format(2.5)
'°,5'
>>> '{:n}'.format(3.5)
'°,5'
>>> '{:n}'.format(33.5)
'°\x18,5'
>>> '{:n}'.format(333.5)
'°\x186,5'
msg320635 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-06-27 22:35
Aha, the problem occurs when the thousands separator code point is greater than 255.

On my Fedora 28 (glibc 2.27), it's U+202f:

vstinner@apu$ ./python
Python 3.8.0a0 (heads/master-dirty:492572715a, Jun 28 2018, 00:18:54) 
>>> import locale
>>> locale.setlocale(locale.LC_ALL, '')
'fr_FR.UTF-8'
>>> locale.localeconv()['thousands_sep']
'\u202f'

The bug is in _PyUnicode_InsertThousandsGrouping(): if thousands_sep kind is different than unicode kind, "data = _PyUnicode_AsKind(unicode, thousands_sep_kind);" is used, but later this memory is released. So the function writes into a temporary buffer which is then released. It doesn't work...

It seems like I introduced the regression 6 years ago in bpo-13706:

commit 90f50d4df9e21093f006427fd7ed11a0d704f792
Author: Victor Stinner <victor.stinner@haypocalc.com>
Date:   Fri Feb 24 01:44:47 2012 +0100

    Issue #13706: Fix format(float, "n") for locale with non-ASCII decimal point (e.g. ps_aF)
msg320669 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-06-28 14:45
> The bug is in _PyUnicode_InsertThousandsGrouping()

This function should take a _PyUnicodeWriter as input, not a PyUnicodeObject.
msg330157 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-11-20 22:00
Minimum reproducer, on Fedora 29:

import locale
locale.setlocale(locale.LC_ALL, 'fr_FR.UTF-8')
print(ascii('{:n}'.format(1.5)))

Current result:

'H,5'

Output with PR 10623:

'1,5'
msg330211 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-11-21 17:19
My private test suite for locales:
https://github.com/vstinner/misc/blob/master/python/test_all_locales.py

Since each platform uses a different locale database, it's hard to write reliable portable and future-proof tests :-(

My tests only work on specific Windows, macOS, FreeBSD and Linux versions.
msg330429 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-11-26 12:40
New changeset 59423e3ddd736387cef8f7632c71954c1859bed0 by Victor Stinner in branch 'master':
bpo-33954: Fix _PyUnicode_InsertThousandsGrouping() (GH-10623)
https://github.com/python/cpython/commit/59423e3ddd736387cef8f7632c71954c1859bed0
msg330431 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-11-26 13:17
New changeset 6f5fa1b4be735159e964906ab608dc467476e47c by Victor Stinner in branch '3.7':
bpo-33954: Fix _PyUnicode_InsertThousandsGrouping() (GH-10623) (GH-10718)
https://github.com/python/cpython/commit/6f5fa1b4be735159e964906ab608dc467476e47c
msg330444 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-11-26 16:03
New changeset fc4a44b0c3a69390eca4680d89c2ae5fe967f882 by Victor Stinner in branch '3.6':
bpo-33954: Fix _PyUnicode_InsertThousandsGrouping() (GH-10623) (GH-10718) (GH-10720)
https://github.com/python/cpython/commit/fc4a44b0c3a69390eca4680d89c2ae5fe967f882
msg330445 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-11-26 16:06
Python 3.6, 3.7 and master have been fixed.
msg330494 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-11-27 07:10
Objects/unicodeobject.c: In function ‘_PyUnicode_FastFill’:
Objects/unicodeobject.c:10126:24: warning: passing argument 2 of ‘unicode_fill’ discards ‘const’ qualifier from pointer target type [-Wdiscarded-qualifiers]
     unicode_fill(kind, data, fill_char, start, length);
                        ^~~~
Objects/unicodeobject.c:224:1: note: expected ‘void *’ but argument is of type ‘const void *’
 unicode_fill(enum PyUnicode_Kind kind, void *data, Py_UCS4 value,
 ^~~~~~~~~~~~
In file included from /usr/include/wchar.h:850:0,
                 from ./Include/unicodeobject.h:97,
                 from ./Include/Python.h:87,
                 from ./Modules/getpath.c:3:
msg330510 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-11-27 11:41
New changeset 163403a63e9272fcd14707e344122c2e3c5e0244 by Victor Stinner in branch 'master':
bpo-33954: Fix compiler warning in _PyUnicode_FastFill() (GH-10737)
https://github.com/python/cpython/commit/163403a63e9272fcd14707e344122c2e3c5e0244
msg330511 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-11-27 11:42
New changeset 7f9fb0f34555641722be5468b7fbd53dd3c0f1e2 by Victor Stinner in branch '3.7':
bpo-33954: Rewrite FILL() macro of unicodeobject.c (GH-10738)
https://github.com/python/cpython/commit/7f9fb0f34555641722be5468b7fbd53dd3c0f1e2
msg330517 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-11-27 13:31
New changeset 54fa83e0a3f3b077763cb50705d7a7dbe4a40a4a by Victor Stinner in branch '3.6':
bpo-33954: Rewrite FILL() macro of unicodeobject.c (GH-10740)
https://github.com/python/cpython/commit/54fa83e0a3f3b077763cb50705d7a7dbe4a40a4a
msg330518 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-11-27 13:32
Serhiy Storchaka:
"""
Objects/unicodeobject.c: In function ‘_PyUnicode_FastFill’:
Objects/unicodeobject.c:10126:24: warning: passing argument 2 of ‘unicode_fill’ discards ‘const’ qualifier from pointer target type [-Wdiscarded-qualifiers]
"""

Yeah, I was aware of the warning. I had a local branch but I started to make many unrelated changes and so I lost track of the issue :-)

The warning should now be fixed. Thanks for the report.

Note: it was a real bug which exists since at least Python 3.6 ;-)
History
Date User Action Args
2019-01-04 23:31:03vstinnerlinkissue35432 superseder
2018-11-27 13:32:20vstinnersetmessages: + msg330518
2018-11-27 13:31:00vstinnersetmessages: + msg330517
2018-11-27 11:42:08vstinnersetmessages: + msg330511
2018-11-27 11:41:20vstinnersetmessages: + msg330510
2018-11-27 11:23:06vstinnersetpull_requests: + pull_request9987
2018-11-27 11:20:29vstinnersetpull_requests: + pull_request9985
2018-11-27 11:12:10vstinnersetpull_requests: + pull_request9984
2018-11-27 07:10:08serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg330494
2018-11-26 16:06:00vstinnersetstatus: open -> closed
resolution: fixed
messages: + msg330445

stage: patch review -> resolved
2018-11-26 16:03:39vstinnersetmessages: + msg330444
2018-11-26 14:16:34vstinnersetpull_requests: + pull_request9968
2018-11-26 13:17:06vstinnersetmessages: + msg330431
2018-11-26 12:45:29vstinnersetpull_requests: + pull_request9967
2018-11-26 12:40:18miss-islingtonsetpull_requests: + pull_request9966
2018-11-26 12:40:03vstinnersetmessages: + msg330429
2018-11-21 17:19:07vstinnersetmessages: + msg330211
2018-11-20 22:00:43vstinnersetmessages: + msg330157
2018-11-20 21:35:49vstinnersetkeywords: + patch
stage: patch review
pull_requests: + pull_request9871
2018-08-05 17:54:55ppperrysettitle: float.__format__('n') fails with _PyUnicode_CheckConsistency assertion error -> float.__format__('n') fails with _PyUnicode_CheckConsistency assertion error for locales with non-ascii thousands separator
2018-08-05 12:45:57xtreaksetnosy: + xtreak
2018-06-28 14:45:48vstinnersetmessages: + msg320669
2018-06-28 13:57:20mark.dickinsonsetnosy: + mark.dickinson, eric.smith
2018-06-27 22:35:49vstinnersetmessages: + msg320635
2018-06-25 08:32:07vstinnersetversions: + Python 3.6, Python 3.7
2018-06-25 08:23:46vstinnercreate