Message 233690 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	ned.deily
Recipients	ezio.melotti, lemburg, ned.deily, pnugues, r.david.murray, vstinner
Date	2015-01-08.22:26:40
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1420756001.27.0.339681117434.issue23195@psf.upfronthosting.co.za>
In-reply-to

Content
The initial difference appears to be a long-standing BSD (including OS X) versus GNU/Linux platform difference. See, for example: http://www.postgresql.org/message-id/18C8A481-33A6-4483-8C24-B8CE70DB7F27@eggerapps.at Why there is no difference between en and fr UTF-8 is obvious when you look under the covers at the system locale definitions. This is on FreeBSD 10, OS X 10.10 is the same: $ cd /usr/share/locale/fr_FR.UTF-8/ $ ls -l total 8 lrwxr-xr-x 1 root wheel 28 Jan 16 2014 LC_COLLATE -> ../la_LN.US-ASCII/LC_COLLATE lrwxr-xr-x 1 root wheel 17 Jan 16 2014 LC_CTYPE -> ../UTF-8/LC_CTYPE lrwxr-xr-x 1 root wheel 30 Jan 16 2014 LC_MESSAGES -> ../fr_FR.ISO8859-1/LC_MESSAGES -r--r--r-- 1 root wheel 36 Jan 16 2014 LC_MONETARY lrwxr-xr-x 1 root wheel 29 Jan 16 2014 LC_NUMERIC -> ../fr_FR.ISO8859-1/LC_NUMERIC -r--r--r-- 1 root wheel 364 Jan 16 2014 LC_TIME For some reason US-ASCII is used for UTF-8 collation; this is also true for en_US.UTF-8 and de_DE.UTF-8, the only other ones I checked. The postresq discussion and some earlier Python issues suggest using ICU to properly implement Unicode functions like collation across all platforms. But that has never been implemented in Python. Nosing Marc-Andre.

The initial difference appears to be a long-standing BSD (including OS X) versus GNU/Linux platform difference.  See, for example:
http://www.postgresql.org/message-id/18C8A481-33A6-4483-8C24-B8CE70DB7F27@eggerapps.at

Why there is no difference between en and fr UTF-8 is obvious when you look under the covers at the system locale definitions.  This is on FreeBSD 10, OS X 10.10 is the same:

$ cd /usr/share/locale/fr_FR.UTF-8/
$ ls -l
total 8
lrwxr-xr-x  1 root  wheel   28 Jan 16  2014 LC_COLLATE -> ../la_LN.US-ASCII/LC_COLLATE
lrwxr-xr-x  1 root  wheel   17 Jan 16  2014 LC_CTYPE -> ../UTF-8/LC_CTYPE
lrwxr-xr-x  1 root  wheel   30 Jan 16  2014 LC_MESSAGES -> ../fr_FR.ISO8859-1/LC_MESSAGES
-r--r--r--  1 root  wheel   36 Jan 16  2014 LC_MONETARY
lrwxr-xr-x  1 root  wheel   29 Jan 16  2014 LC_NUMERIC -> ../fr_FR.ISO8859-1/LC_NUMERIC
-r--r--r--  1 root  wheel  364 Jan 16  2014 LC_TIME

For some reason US-ASCII is used for UTF-8 collation; this is also true for en_US.UTF-8 and de_DE.UTF-8, the only other ones I checked.

The postresq discussion and some earlier Python issues suggest using ICU to properly implement Unicode functions like collation across all platforms.  But that has never been implemented in Python.  Nosing Marc-Andre.

History
Date	User	Action	Args
2015-01-08 22:26:41	ned.deily	set	recipients: + ned.deily, lemburg, vstinner, ezio.melotti, r.david.murray, pnugues
2015-01-08 22:26:41	ned.deily	set	messageid: <1420756001.27.0.339681117434.issue23195@psf.upfronthosting.co.za>
2015-01-08 22:26:41	ned.deily	link	issue23195 messages
2015-01-08 22:26:40	ned.deily	create