New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
float.__format__('n') fails with _PyUnicode_CheckConsistency assertion error for locales with non-ascii thousands separator #78135
Comments
Example: vstinner@apu$ ./python
Python 3.8.0a0 (heads/master-dirty:bcd3a1a18d, Jun 23 2018, 10:31:03)
[GCC 8.1.1 20180502 (Red Hat 8.1.1-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import locale
>>> locale.str(2.5)
'2.5'
>>> '{:n}'.format(2.5)
'2.5'
>>> locale.setlocale(locale.LC_ALL, '')
'fr_FR.UTF-8'
>>> locale.str(2.5)
'2,5'
>>> '{:n}'.format(2.5)
python: Objects/unicodeobject.c:474: _PyUnicode_CheckConsistency: Assertion `maxchar < 128' failed.
Aborted (core dumped) Another example: vstinner@apu$ ./python
Python 3.8.0a0 (heads/master-dirty:bcd3a1a18d, Jun 23 2018, 10:31:03)
>>> import locale; locale.setlocale(locale.LC_ALL, '')
'fr_FR.UTF-8'
>>> (2.5).__format__('n')
python: Objects/unicodeobject.c:474: _PyUnicode_CheckConsistency: Assertion `maxchar < 128' failed.
Aborted (core dumped) Result of my system Python 3.6 of Fedora 28: vstinner@apu$ python3
Python 3.6.5 (default, Mar 29 2018, 18:20:46)
[GCC 8.0.1 20180317 (Red Hat 8.0.1-0.19)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import locale
>>> locale.str(2.5)
'2.5'
>>> '{:n}'.format(2.5)
'2.5'
>>> locale.setlocale(locale.LC_ALL, '')
'fr_FR.UTF-8'
>>> locale.str(2.5)
'2,5'
>>> '{:n}'.format(2.5)
'°,5'
>>> '{:n}'.format(3.5)
'°,5'
>>> '{:n}'.format(33.5)
'°\x18,5'
>>> '{:n}'.format(333.5)
'°\x186,5' |
Aha, the problem occurs when the thousands separator code point is greater than 255. On my Fedora 28 (glibc 2.27), it's U+202f: vstinner@apu$ ./python
Python 3.8.0a0 (heads/master-dirty:492572715a, Jun 28 2018, 00:18:54)
>>> import locale
>>> locale.setlocale(locale.LC_ALL, '')
'fr_FR.UTF-8'
>>> locale.localeconv()['thousands_sep']
'\u202f' The bug is in _PyUnicode_InsertThousandsGrouping(): if thousands_sep kind is different than unicode kind, "data = _PyUnicode_AsKind(unicode, thousands_sep_kind);" is used, but later this memory is released. So the function writes into a temporary buffer which is then released. It doesn't work... It seems like I introduced the regression 6 years ago in bpo-13706: commit 90f50d4
|
This function should take a _PyUnicodeWriter as input, not a PyUnicodeObject. |
Minimum reproducer, on Fedora 29: import locale
locale.setlocale(locale.LC_ALL, 'fr_FR.UTF-8')
print(ascii('{:n}'.format(1.5))) Current result: 'H,5' Output with PR 10623: '1,5' |
My private test suite for locales: Since each platform uses a different locale database, it's hard to write reliable portable and future-proof tests :-( My tests only work on specific Windows, macOS, FreeBSD and Linux versions. |
Python 3.6, 3.7 and master have been fixed. |
Objects/unicodeobject.c: In function ‘_PyUnicode_FastFill’: |
Serhiy Storchaka: Yeah, I was aware of the warning. I had a local branch but I started to make many unrelated changes and so I lost track of the issue :-) The warning should now be fixed. Thanks for the report. Note: it was a real bug which exists since at least Python 3.6 ;-) |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: