Message 244181 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	jbeck
Recipients	jbeck
Date	2015-05-27.15:43:31
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1432741411.37.0.656292350471.issue24299@psf.upfronthosting.co.za>
In-reply-to

Content
The upgrade from 2.7.9 to 2.7.10 resulted in test__locale failing. This test had previously succeeded. The difference is that the thousands-separator for the fr_FR locale in known_numerics was changed from '' (i.e., unknown) to ' ' (i.e. space). But on Solaris, '\xa0' (i.e., non-break space in ISO8859-1) is what the fr_FR locale returns for LC_NUMERIC's thousands-separator. I inquired with our Globalization experts, who replied: --- The short answer is that CLDR defines the group separator as no-break space (U+00A0): http://st.unicode.org/cldr-apps/v#/fr/Symbols/ so the solaris locale fr_FR (=fr_FR.ISO8859-1) is correct. The long answer is that the situation is confusing, the fr_FR.ISO8859-1 defines the thousands_sep as no-break space, but fr_FR.UTF-8 defines the thousands_sep as space (U+0020). There is no technical limit, but combination of POSIX [1] and C language [2] limits the thousands_sep to single byte character. The no-break space is single byte character in ISO8859-1, but multibyte in UTF-8. [1] http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap07.html#tag_07_03_04 [2] http://en.cppreference.com/w/c/locale/lconv && http://en.cppreference.com/w/c/language/character_constant --- The attached patch fixes the test on Solaris. It is not clear if this is the Right Answer for all platforms, but I offer the attached patch in case it helps anyone else.

The upgrade from 2.7.9 to 2.7.10 resulted in test__locale failing.
This test had previously succeeded.  The difference is that the
thousands-separator for the fr_FR locale in known_numerics was
changed from '' (i.e., unknown) to ' ' (i.e. space).  But on Solaris,
'\xa0' (i.e., non-break space in ISO8859-1) is what the fr_FR locale
returns for LC_NUMERIC's thousands-separator.  I inquired with our
Globalization experts, who replied:

---
The short answer is that CLDR defines the group separator as no-break 
space (U+00A0): http://st.unicode.org/cldr-apps/v#/fr/Symbols/
so the solaris locale fr_FR (=fr_FR.ISO8859-1) is correct.

The long answer is that the situation is confusing, the fr_FR.ISO8859-1
defines the thousands_sep as no-break space, but fr_FR.UTF-8 defines
the thousands_sep as space (U+0020). There is no technical limit, but
combination of POSIX [1] and C language [2] limits the thousands_sep
to single byte character. The no-break space is single byte character
in ISO8859-1, but multibyte in UTF-8.

[1] http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap07.html#tag_07_03_04

[2] http://en.cppreference.com/w/c/locale/lconv
    &&
    http://en.cppreference.com/w/c/language/character_constant
---
The attached patch fixes the test on Solaris.  It is not clear if this
is the Right Answer for all platforms, but I offer the attached patch
in case it helps anyone else.

History
Date	User	Action	Args
2015-05-27 15:43:31	jbeck	set	recipients: + jbeck
2015-05-27 15:43:31	jbeck	set	messageid: <1432741411.37.0.656292350471.issue24299@psf.upfronthosting.co.za>
2015-05-27 15:43:31	jbeck	link	issue24299 messages
2015-05-27 15:43:31	jbeck	create