Author ned.deily
Recipients Dmitry.Jemerov, loewis, ned.deily, r.david.murray, ronaldoussoren, vstinner
Date 2014-03-31.00:20:47
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1396225254.16.0.70609958762.issue18378@psf.upfronthosting.co.za>
In-reply-to
Content
I've looked at this a bit, primarily on OS X 10.9 Mavericks, although I expect mostly similar behavior on older recent releases of OS X.  On 10.9, the setting of locale variables is done by whatever program is used to launch a shell.  I looked at the behavior of the built-in Terminal.app, the third-party iTerm2.app, the MacPorts distribution of xterm, and the built-in sshd.  By default, the latter two do not set any locale env variables.  Both Terminal.app and iTerm2.app set either LANG or LC_CTYPE based on the user's settings for "Region" and "Preferred Language" in the "System Preferences" -> "Language & Region" control panel.  Three examples:

1. "Region" = "United States", "Preferred Language" = "English":
    -> LANG=en_US.UTF-8

2. "Region" = "Germany", "Preferred Language" = "German"
    -> LANG=de_DE.UTF-8

3. "Region" = "Germany", "Preferred Language" = "English"
    -> LC_CTYPE= "UTF-8"

So it is almost certainly the last case that is under discussion here.  Whether or not that is a bug is not as clear as it might seem at first.  BSD implementations of locale differ from the GNU Linux version.  Both FreeBSD and OS X define a "UTF-8" locale that has only one locale category defined in it: LC_CTYPE.  It appears to be a fallback locale used when there is no applicable region / language combination, in this case no "en_DE*" locales.

$ ls /usr/share/locale/UTF*
LC_CTYPE

Compare with the en_US* locales:

$ ls /usr/share/locale/en_US*
/usr/share/locale/en_US:
LC_COLLATE  LC_CTYPE    LC_MESSAGES LC_MONETARY LC_NUMERIC  LC_TIME

/usr/share/locale/en_US.ISO8859-1:
LC_COLLATE  LC_CTYPE    LC_MESSAGES LC_MONETARY LC_NUMERIC  LC_TIME

/usr/share/locale/en_US.ISO8859-15:
LC_COLLATE  LC_CTYPE    LC_MESSAGES LC_MONETARY LC_NUMERIC  LC_TIME

/usr/share/locale/en_US.US-ASCII:
LC_COLLATE  LC_CTYPE    LC_MESSAGES LC_MONETARY LC_NUMERIC  LC_TIME

/usr/share/locale/en_US.UTF-8:
LC_COLLATE  LC_CTYPE    LC_MESSAGES LC_MONETARY LC_NUMERIC  LC_TIME

Now as I read the current POSIX standard, there is nothing wrong with this.  AFAICT, the standard places no restriction on the format of locale names, in particular, it does not mandate that they conform to RFC 1766 or its successors.  Further, the standard provides for implementation-specific locales (other than the mandatory "POSIX" aka "C" locale) and some platforms provide tools to create custom locales, e.g. mklocale(1) on FreeBSD and OS X, localedef(1) on GNU Linux.  So I wonder if the locale module should really be imposing its own restrictions on locale names as it does currently.

From IEEE Std 1003.1, 2013 Edition:
"The capability to specify additional locales to those provided by an implementation is optional, denoted by the _POSIX2_LOCALEDEF symbol. If the option is not supported, only implementation-supplied locales are available. Such locales shall be documented using the format specified in this section. [...] The locale definition file shall contain one or more locale category source definitions, and shall not contain more than one definition for the same locale category. [...]  In the event that some of the information for a locale category, as specified in this volume of POSIX.1-2008, is missing from the locale source definition, the behavior of that category, if it is referenced, is unspecified."

There is a further complication for OS X.  Apple provides a richer native API for locales, CFLocale (and its Cocoa equivalent, NSLocale).  So some nuances may get lost in the imperfect mapping between CFLocale and the conventional LC_* environment variables and between them and Python.  We could look at trying to support the native APIs as well.

http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap07.html#tag_07
https://developer.apple.com/library/mac/documentation/CoreFoundation/Conceptual/CFLocales/CFLocales.html
https://developer.apple.com/library/mac/documentation/CoreFoundation/Reference/CFLocaleRef/Reference/reference.html
History
Date User Action Args
2014-03-31 00:20:56ned.deilysetrecipients: + ned.deily, loewis, ronaldoussoren, vstinner, r.david.murray, Dmitry.Jemerov
2014-03-31 00:20:54ned.deilysetmessageid: <1396225254.16.0.70609958762.issue18378@psf.upfronthosting.co.za>
2014-03-31 00:20:53ned.deilylinkissue18378 messages
2014-03-31 00:20:47ned.deilycreate