This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author jbeck
Recipients jbeck
Date 2015-05-27.15:43:31
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1432741411.37.0.656292350471.issue24299@psf.upfronthosting.co.za>
In-reply-to
Content
The upgrade from 2.7.9 to 2.7.10 resulted in test__locale failing.
This test had previously succeeded.  The difference is that the
thousands-separator for the fr_FR locale in known_numerics was
changed from '' (i.e., unknown) to ' ' (i.e. space).  But on Solaris,
'\xa0' (i.e., non-break space in ISO8859-1) is what the fr_FR locale
returns for LC_NUMERIC's thousands-separator.  I inquired with our
Globalization experts, who replied:

---
The short answer is that CLDR defines the group separator as no-break 
space (U+00A0): http://st.unicode.org/cldr-apps/v#/fr/Symbols/
so the solaris locale fr_FR (=fr_FR.ISO8859-1) is correct.

The long answer is that the situation is confusing, the fr_FR.ISO8859-1
defines the thousands_sep as no-break space, but fr_FR.UTF-8 defines
the thousands_sep as space (U+0020). There is no technical limit, but
combination of POSIX [1] and C language [2] limits the thousands_sep
to single byte character. The no-break space is single byte character
in ISO8859-1, but multibyte in UTF-8.

[1] http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap07.html#tag_07_03_04

[2] http://en.cppreference.com/w/c/locale/lconv
    &&
    http://en.cppreference.com/w/c/language/character_constant
---
The attached patch fixes the test on Solaris.  It is not clear if this
is the Right Answer for all platforms, but I offer the attached patch
in case it helps anyone else.
History
Date User Action Args
2015-05-27 15:43:31jbecksetrecipients: + jbeck
2015-05-27 15:43:31jbecksetmessageid: <1432741411.37.0.656292350471.issue24299@psf.upfronthosting.co.za>
2015-05-27 15:43:31jbecklinkissue24299 messages
2015-05-27 15:43:31jbeckcreate