Issue13441
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2011-11-20 23:58 by vstinner, last changed 2022-04-11 14:57 by admin. This issue is now closed.
Files | ||||
---|---|---|---|---|
File name | Uploaded | Description | Edit | |
strxfrm.c | vstinner, 2011-11-21 02:09 | |||
localeconv_wchar.c | vstinner, 2011-12-08 01:23 |
Messages (24) | |||
---|---|---|---|
msg148017 - (view) | Author: STINNER Victor (vstinner) * | Date: 2011-11-20 23:58 | |
I added a test in _PyUnicode_CheckConsistency() (in debug mode) to ensure that all characters of a string are in the range U+0000-U+10FFFF. Locale tests are now failing on Solaris: ----------------------------------- [ 28/361] test__locale Assertion failed: maxchar <= 0x10FFFF, file Objects/unicodeobject.c, line 408 Fatal Python error: Aborted Current thread 0x00000001: File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/test/test__locale.py", line 134 in test_float_parsing File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/unittest/case.py", line 385 in _executeTestPart File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/unittest/case.py", line 440 in run File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/unittest/case.py", line 492 in __call__ File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/unittest/suite.py", line 105 in run File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/unittest/suite.py", line 67 in __call__ File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/unittest/suite.py", line 105 in run File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/unittest/suite.py", line 67 in __call__ File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/unittest/runner.py", line 168 in run File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/test/support.py", line 1368 in _run_suite File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/test/support.py", line 1402 in run_unittest File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/test/test__locale.py", line 139 in test_main File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/test/regrtest.py", line 1203 in runtest_inner File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/test/regrtest.py", line 906 in runtest File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/test/regrtest.py", line 709 in main File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/test/__main__.py", line 13 in <module> File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/runpy.py", line 73 in _run_code File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/runpy.py", line 160 in _run_module_as_main *** Error code 134 ----------------------------------- The problem is that strxfrm() and wcsxfrm() return strange results for the string "a" and the english locale (e.g. en_US.UTF-8). strxfrm(buffer, "a\0", 100) returns 21 (bytes) but only 2 bytes are written ("\x01\x00"). The next bytes are unchanged. wcsxfrm(buffer, L"a\0", 100) returns 7 (characters), the 7 characters are written but they are in range U+1010101..U+1010163, whereas the maximum character of Unicode 6.0 is U+10FFFF (U+101xxxx vs U+10xxxx). Output of the attached program, strxfrm.c, on OpenSolaris: ----------------------------------- strxfrm: len=21 0x01 0x00 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff wcsxfrm: len=7 U+1010163 U+1010101 U+1010103 U+1010101 U+1010103 U+1010101 U+1010101 ----------------------------------- I don't know if it's normal that wcsxfrm() writes characters in the range U+1010101..U+1010163. Is Python supposed to support characters outside U+0000-U+10FFFF range? chr(0x10FFFF+1) raises a ValueError. |
|||
msg148019 - (view) | Author: Roundup Robot (python-dev) | Date: 2011-11-21 00:11 | |
New changeset 31baf1363ba1 by Victor Stinner in branch 'default': Issue #13441: Disable temporary strxfrm() tests on Solaris http://hg.python.org/cpython/rev/31baf1363ba1 |
|||
msg148026 - (view) | Author: STINNER Victor (vstinner) * | Date: 2011-11-21 02:05 | |
> Is Python supposed to support characters outside U+0000-U+10FFFF range? If not, PyUnicode_FromUnicode(), PyUnicode_FromWideChar() and PyUnicode_FromKindAndData() should be patched to raise an error if a bigger character is encountered. |
|||
msg148027 - (view) | Author: STINNER Victor (vstinner) * | Date: 2011-11-21 02:09 | |
> strxfrm(buffer, "a\0", 100) returns 21 (bytes) but only 2 bytes > are written ("\x01\x00"). The next bytes are unchanged. Woops, it was a bug in my program. I attached the fixed version. The correct program writes: ---- strxfrm: len=21 0x01 0x01 0x63 0x01 0x01 0x01 0x01 0x01 0x03 0x01 0x01 0x01 0x01 0x01 0x03 0x01 0x01 0x01 0x01 0x01 0x01 wcsxfrm: len=7 U+1010163 U+1010101 U+1010103 U+1010101 U+1010103 U+1010101 U+1010101 ---- |
|||
msg148028 - (view) | Author: Roundup Robot (python-dev) | Date: 2011-11-21 02:17 | |
New changeset 78123afb3ea4 by Victor Stinner in branch 'default': Issue #13441: Disable temporary localeconv() tests on Solaris http://hg.python.org/cpython/rev/78123afb3ea4 |
|||
msg148034 - (view) | Author: Ezio Melotti (ezio.melotti) * | Date: 2011-11-21 12:14 | |
> Is Python supposed to support characters outside U+0000-U+10FFFF range? No, they should be rejected. Allowing them in some specific places might cause them to leak somewhere else and cause problems, so I'd rather stick with that range and reject all the chars >U+10FFFF everywhere. |
|||
msg148038 - (view) | Author: Roundup Robot (python-dev) | Date: 2011-11-21 13:32 | |
New changeset a19dad38d4e8 by Victor Stinner in branch 'default': Issue #13441: _PyUnicode_CheckConsistency() dumps the string if the maximum http://hg.python.org/cpython/rev/a19dad38d4e8 |
|||
msg148039 - (view) | Author: STINNER Victor (vstinner) * | Date: 2011-11-21 13:32 | |
> No, they should be rejected. Allowing them in some specific > places might cause them to leak somewhere else and cause problems, > so I'd rather stick with that range and reject all the chars > >U+10FFFF everywhere. That's why I added a (debug) check to reject them. I don't think that your UTF-8 encoder support such character some example. All functions assumes that the maximum character is U+10FFFF. If they should be rejected, a solution is to modify strxfrm() to return a list of integer (of code points) instead of a string. |
|||
msg148046 - (view) | Author: Roundup Robot (python-dev) | Date: 2011-11-21 14:41 | |
New changeset d1b3b1d00811 by Victor Stinner in branch 'default': Another temporary hack to debug the issue #13441 http://hg.python.org/cpython/rev/d1b3b1d00811 |
|||
msg148048 - (view) | Author: STINNER Victor (vstinner) * | Date: 2011-11-21 14:44 | |
I dumped some values to try to debug this issue. Last failure in test__locale.test_lc_numeric_basic() on localeconv(): ---------------------------- [ 25/361] test_float Decode localeconv() decimal_point: {0x2c} (len=1) Decode localeconv() thousands_sep: {0x2e} (len=1) Decode localeconv() int_curr_symbol: {} (len=0) Decode localeconv() currency_symbol: {} (len=0) Decode localeconv() mon_decimal_point: {} (len=0) Decode localeconv() mon_thousands_sep: {} (len=0) Decode localeconv() positive_sign: {} (len=0) Decode localeconv() negative_sign: {} (len=0) ... [100/361] test__locale Decode localeconv() decimal_point: {0x2c} (len=1) Decode localeconv() thousands_sep: {0xa0} (len=1) Invalid Unicode string! {U+30000020} (len=1) Fatal Python error: Aborted ---------------------------- |
|||
msg148051 - (view) | Author: Roundup Robot (python-dev) | Date: 2011-11-21 15:01 | |
New changeset acda16de630c by Victor Stinner in branch 'default': Remove temporary hacks for the issue #13441 http://hg.python.org/cpython/rev/acda16de630c |
|||
msg148054 - (view) | Author: STINNER Victor (vstinner) * | Date: 2011-11-21 15:10 | |
Here is a more complete output. localeconv() fails in the hu_HU locale for the "thousands_sep" field: localeconv() returns b'\xa0' which is decoded as the wchar_t* string: {U+30000020} (len=1). This is an invalid character ----------------------------------- [ 54/361/3] test__locale Decode wchar_t {U+0043} (len=1) SET LOCALE "es_UY" SET LOCALE "fr_FR" SET LOCALE "fi_FI" SET LOCALE "es_CO" SET LOCALE "pt_PT" SET LOCALE "it_IT" SET LOCALE "et_EE" SET LOCALE "es_PY" SET LOCALE "no_NO" SET LOCALE "nl_NL" SET LOCALE "lv_LV" SET LOCALE "el_GR" SET LOCALE "be_BY" SET LOCALE "fr_BE" SET LOCALE "ro_RO" SET LOCALE "ru_UA" SET LOCALE "ru_RU" SET LOCALE "es_VE" SET LOCALE "ca_ES" SET LOCALE "se_NO" SET LOCALE "es_EC" SET LOCALE "id_ID" SET LOCALE "ka_GE" SET LOCALE "es_CL" SET LOCALE "hu_HU" SET LOCALE -> hu_HU Decode wchar_t {U+0068 U+0075 U+005f U+0048 U+0055} (len=5) SET LOCALE "hu_HU" SET LOCALE -> hu_HU Decode wchar_t {U+0068 U+0075 U+005f U+0048 U+0055} (len=5) Decode wchar_t {U+002c} (len=1) Decode localeconv() decimal_point: {0x2c} (len=1) Decode wchar_t {U+002c} (len=1) Decode localeconv() thousands_sep: {0xa0} (len=1) Decode wchar_t {U+30000020} (len=1) Invalid Unicode string! {U+30000020} (len=1) Fatal Python error: Aborted Current thread 0x00000001: File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/test/test__locale.py", line 105 in test_lc_numeric_basic File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/unittest/case.py", line 385 in _executeTestPart File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/unittest/case.py", line 440 in run File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/unittest/case.py", line 492 in __call__ File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/unittest/suite.py", line 105 in run File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/unittest/suite.py", line 67 in __call__ File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/unittest/suite.py", line 105 in run File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/unittest/suite.py", line 67 in __call__ File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/unittest/runner.py", line 168 in run File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/test/support.py", line 1368 in _run_suite File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/test/support.py", line 1402 in run_unittest File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/test/test__locale.py", line 141 in test_main File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/test/regrtest.py", line 1203 in runtest_inner File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/test/regrtest.py", line 906 in runtest File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/test/regrtest.py", line 709 in main File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/test/__main__.py", line 13 in <module> File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/runpy.py", line 73 in _run_code File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/runpy.py", line 160 in _run_module_as_main *** Error code 134 ----------------------------------- |
|||
msg148061 - (view) | Author: Roundup Robot (python-dev) | Date: 2011-11-21 17:04 | |
New changeset d6d15fcf5eb6 by Victor Stinner in branch 'default': Issue #13441: Reenable strxfrm() tests on Solaris http://hg.python.org/cpython/rev/d6d15fcf5eb6 |
|||
msg148106 - (view) | Author: Roundup Robot (python-dev) | Date: 2011-11-22 02:25 | |
New changeset 6f9af4e3c1db by Victor Stinner in branch 'default': Issue #13441: Disable temporary the check on the maximum character until http://hg.python.org/cpython/rev/6f9af4e3c1db |
|||
msg149014 - (view) | Author: STINNER Victor (vstinner) * | Date: 2011-12-08 01:23 | |
localeconv_wchar.c: test program to dump the thousands separator on a locale specified on the command line. I wrote this program to try to reproduce the hu_HU issue, but I cannot reproduce it on OpenIndiana. I only have UTF-8 locales on my OpenIndiana VM, whereas the issue looks to be specific to an ISO-8859-?? encoding (b'\xA0' is not decodable from UTF-8). |
|||
msg149022 - (view) | Author: STINNER Victor (vstinner) * | Date: 2011-12-08 12:16 | |
See also the issue #7442. |
|||
msg149033 - (view) | Author: Stefan Krah (skrah) * | Date: 2011-12-08 13:31 | |
localeconv_wchar.c runs fine on Ubuntu with hu_HU and fi_FI. I tried on OpenSolaris, but I only have UTF-8 locales. The package with ISO locales seems to be SUNWlang-cs-extra, but Oracle took down http://pkg.opensolaris.org/release/ . |
|||
msg149058 - (view) | Author: Roundup Robot (python-dev) | Date: 2011-12-08 22:41 | |
New changeset 93bab8400ca5 by Victor Stinner in branch 'default': Issue #13441: Log the locale when localeconv() fails http://hg.python.org/cpython/rev/93bab8400ca5 |
|||
msg149059 - (view) | Author: STINNER Victor (vstinner) * | Date: 2011-12-08 22:42 | |
Changeset 489ea02ed351 changed PyUnicode_FromWideChar() and PyUnicode_FromUnicode(): raise a ValueError if a character in not in range [U+0000; U+10ffff]. test__locale errors: ====================================================================== ERROR: test_float_parsing (test.test__locale._LocaleTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/test/test__locale.py", line 134, in test_float_parsing if localeconv()['decimal_point'] != '.': ValueError: character U+30000020 is not in range [U+0000; U+10ffff] ====================================================================== ERROR: test_lc_numeric_basic (test.test__locale._LocaleTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/test/test__locale.py", line 105, in test_lc_numeric_basic li_radixchar = localeconv()[lc] ValueError: character U+30000020 is not in range [U+0000; U+10ffff] ====================================================================== ERROR: test_lc_numeric_localeconv (test.test__locale._LocaleTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/test/test__locale.py", line 91, in test_lc_numeric_localeconv self.numeric_tester('localeconv', localeconv()[lc], lc, loc) ValueError: character U+30000020 is not in range [U+0000; U+10ffff] ====================================================================== ERROR: test_lc_numeric_nl_langinfo (test.test__locale._LocaleTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/test/test__locale.py", line 79, in test_lc_numeric_nl_langinfo self.numeric_tester('nl_langinfo', nl_langinfo(li), lc, loc) ValueError: character U+30000020 is not in range [U+0000; U+10ffff] ---------------------------------------------------------------------- If the issue is specific to the hu_HU locale, a possible workaround is to skip this locale on Solaris. I changed to test to display the locale on failure. |
|||
msg149064 - (view) | Author: Roundup Robot (python-dev) | Date: 2011-12-09 00:18 | |
New changeset 87c6be1e393a by Victor Stinner in branch 'default': Issue #13441: Don't test the hu_HU locale on Solaris to workaround a mbstowcs() http://hg.python.org/cpython/rev/87c6be1e393a |
|||
msg149081 - (view) | Author: Roundup Robot (python-dev) | Date: 2011-12-09 09:28 | |
New changeset 2a2d0872d993 by Victor Stinner in branch 'default': Issue #13441: Skip some locales (e.g. cs_CZ and hu_HU) on Solaris to workaround http://hg.python.org/cpython/rev/2a2d0872d993 |
|||
msg149085 - (view) | Author: Roundup Robot (python-dev) | Date: 2011-12-09 10:29 | |
New changeset 7ffe3d304487 by Victor Stinner in branch 'default': Issue #13441: Enable the workaround for Solaris locale bug http://hg.python.org/cpython/rev/7ffe3d304487 |
|||
msg149086 - (view) | Author: STINNER Victor (vstinner) * | Date: 2011-12-09 10:34 | |
I collected the locale list triggering the mbstowcs() bug thanks my previous commit: * hu_HU (ISO8859-2): character U+30000020 * de_AT (ISO8859-1): character U+30000076 * cs_CZ (ISO8859-2): character U+30000020 * sk_SK (ISO8859-2): character U+30000020 * pl_PL (ISO8859-2): character U+30000020 * fr_CA (ISO8859-1): character U+30000020 Hum, the bug occurs maybe on all locales... I suppose that all "xx_XX" locales use an encoding different than UTF-8 and that the bug is specific to encodings different than UTF-8. I don't understand why locale.strxfrm('à') doesn't crash anymore. |
|||
msg149091 - (view) | Author: STINNER Victor (vstinner) * | Date: 2011-12-09 13:21 | |
The Solaris buildbot is green, let's close it. I didn't report the bug upstream. Feel free to report it to Oracle! |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:57:24 | admin | set | github: 57650 |
2012-10-17 14:35:41 | jcea | set | nosy:
+ jcea superseder: test_local.TestEnUSCollection failures on Solaris 10 |
2011-12-09 13:21:17 | vstinner | set | status: open -> closed resolution: fixed messages: + msg149091 |
2011-12-09 10:34:18 | vstinner | set | messages: + msg149086 |
2011-12-09 10:29:16 | python-dev | set | messages: + msg149085 |
2011-12-09 09:28:41 | python-dev | set | messages: + msg149081 |
2011-12-09 00:18:20 | python-dev | set | messages: + msg149064 |
2011-12-08 22:42:10 | vstinner | set | messages: + msg149059 |
2011-12-08 22:41:00 | python-dev | set | messages: + msg149058 |
2011-12-08 13:31:31 | skrah | set | messages: + msg149033 |
2011-12-08 12:16:17 | vstinner | set | messages: + msg149022 |
2011-12-08 10:20:00 | skrah | set | nosy:
+ skrah |
2011-12-08 01:23:28 | vstinner | set | files:
+ localeconv_wchar.c messages: + msg149014 |
2011-11-22 02:25:52 | python-dev | set | messages: + msg148106 |
2011-11-21 17:04:19 | python-dev | set | messages: + msg148061 |
2011-11-21 15:12:05 | pitrou | set | nosy:
- pitrou |
2011-11-21 15:10:15 | vstinner | set | messages: + msg148054 |
2011-11-21 15:01:15 | python-dev | set | messages: + msg148051 |
2011-11-21 14:44:12 | vstinner | set | messages: + msg148048 |
2011-11-21 14:41:04 | python-dev | set | messages: + msg148046 |
2011-11-21 13:32:32 | vstinner | set | messages: + msg148039 |
2011-11-21 13:32:04 | python-dev | set | messages: + msg148038 |
2011-11-21 12:14:39 | ezio.melotti | set | messages: + msg148034 |
2011-11-21 02:17:06 | python-dev | set | messages: + msg148028 |
2011-11-21 02:09:27 | vstinner | set | files: - strxfrm.c |
2011-11-21 02:09:16 | vstinner | set | files:
+ strxfrm.c messages: + msg148027 |
2011-11-21 02:05:09 | vstinner | set | messages: + msg148026 |
2011-11-21 00:11:53 | python-dev | set | nosy:
+ python-dev messages: + msg148019 |
2011-11-20 23:58:15 | vstinner | create |