classification
Title: Fixed support for Indian locales
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.4, Python 3.3, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: lemburg, loewis, python-dev, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2013-12-19 19:42 by serhiy.storchaka, last changed 2013-12-26 19:24 by serhiy.storchaka. This issue is now closed.

Files
File name Uploaded Description Edit
locale_devanagari_3.patch serhiy.storchaka, 2013-12-20 17:07 review
Messages (10)
msg206636 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-12-19 19:42
The locales alias table contains invalid entries for devanagari modifiers (see issue5815):

    'ks_in@devanagari':                     'ks_IN@devanagari.UTF-8',
    'sd':                                   'sd_IN@devanagari.UTF-8',

Here is a patch which fixes aliases for these locales.
msg206680 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2013-12-20 13:26
On 20.12.2013 12:19, Serhiy Storchaka wrote:
> 
> Added file: http://bugs.python.org/file33231/locale_devanagari_2.patch

See my message on issue20034:

There is some recent activity in glibc related to these. Here's a
patch that adds the  sd_IN@devanagari locale to glibc:
http://sourceware.org/cgi-bin/cvsweb.cgi/libc/localedata/locales/sd_IN@devanagari.diff?cvsroot=glibc&r1=NONE&r2=1.1

So they will start working once platforms adopt the new
glibc versions.

The @-modifier is applied to the locale, not the encoding, because
the locale uses a different script, as opposed to limiting itself
to part of an encoding. This looks reasonable, even though I'm
not sure it conforms to standards.

Since all this is still very much in flux, perhaps we ought
to wait a bit more and let the dust settle ?!
msg206684 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-12-20 14:55
Ubuntu 12.04 supports Kashmiri and Sindhi locales (requires language-pack-sd-base and language-pack-sd-base packages).

$ locale -a
...
ks_IN
ks_IN@devanagari
ks_IN.utf8
ks_IN.utf8@devanagari
...
sd_IN
sd_IN@devanagari
sd_IN.utf8
sd_IN.utf8@devanagari
...

Current Python doesn't support all of these locales:

$ LC_ALL=ks_IN ./python -c 'import locale; print(locale.getlocale())'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/serhiy/py/cpython/Lib/locale.py", line 556, in getlocale
    return _parse_localename(localename)
  File "/home/serhiy/py/cpython/Lib/locale.py", line 465, in _parse_localename
    raise ValueError('unknown locale: %s' % localename)
ValueError: unknown locale: ks_IN
$ LC_ALL=ks_IN@devanagari ./python -c 'import locale; print(locale.getlocale())'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/serhiy/py/cpython/Lib/locale.py", line 556, in getlocale
    return _parse_localename(localename)
  File "/home/serhiy/py/cpython/Lib/locale.py", line 465, in _parse_localename
    raise ValueError('unknown locale: %s' % localename)
ValueError: unknown locale: ks_IN@devanagari
$ LC_ALL=ks_IN.utf8 ./python -c 'import locale; print(locale.getlocale())'
('ks_IN', 'utf8')
$ LC_ALL=ks_IN.utf8@devanagari ./python -c 'import locale; print(locale.getlocale())'
('ks_IN', 'UTF-8')
$ LC_ALL=sd_IN ./python -c 'import locale; print(locale.getlocale())'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/serhiy/py/cpython/Lib/locale.py", line 556, in getlocale
    return _parse_localename(localename)
  File "/home/serhiy/py/cpython/Lib/locale.py", line 465, in _parse_localename
    raise ValueError('unknown locale: %s' % localename)
ValueError: unknown locale: sd_IN
$ LC_ALL=sd_IN@devanagari ./python -c 'import locale; print(locale.getlocale())'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/serhiy/py/cpython/Lib/locale.py", line 556, in getlocale
    return _parse_localename(localename)
  File "/home/serhiy/py/cpython/Lib/locale.py", line 465, in _parse_localename
    raise ValueError('unknown locale: %s' % localename)
ValueError: unknown locale: sd_IN@devanagari
$ LC_ALL=sd_IN.utf8 ./python -c 'import locale; print(locale.getlocale())'
('sd_IN', 'utf8')
$ LC_ALL=sd_IN.utf8@devanagari ./python -c 'import locale; print(locale.getlocale())'
('sd_IN', 'utf8')

After applying the patch Python supports all ks_IN and sd_IN locales.
msg206685 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2013-12-20 15:06
On 20.12.2013 15:55, Serhiy Storchaka wrote:
> 
> After applying the patch Python supports all ks_IN and sd_IN locales.

Well, yes, but only because you are removing the @-modifiers. I don't
think that's correct, since e.g. the string formatting used for
numbers is different with the modifier.

If you keep the modifiers, but move them to the end of the locale
string you should get the correct behavior, e.g.

-    'sd':                                   'sd_IN@devanagari.UTF-8',
+    'sd':                                   'sd_IN.UTF-8@devanagari',

(modulo perhaps the spelling of "UTF-8")
msg206687 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-12-20 15:24
> Well, yes, but only because you are removing the @-modifiers. I don't
> think that's correct, since e.g. the string formatting used for
> numbers is different with the modifier.

All the @-modifiers except euro are applied to the locale, not the encoding. 
And Python removes all the @-modifiers, e.g. latin and cyrillic which specify 
the script.

> If you keep the modifiers, but move them to the end of the locale
> string you should get the correct behavior, e.g.
> 
> -    'sd':                                   'sd_IN@devanagari.UTF-8',
> +    'sd':                                   'sd_IN.UTF-8@devanagari',
> 
> (modulo perhaps the spelling of "UTF-8")

Recent the locale.alias file changes these entities:

sd:						sd_IN.UTF-8
sd_IN.utf8:					sd_IN.UTF-8
sd@devanagari:					sd_IN@devanagari.UTF-8
sd_IN@devanagari:				sd_IN@devanagari.UTF-8
sd_IN@devanagari.utf8:				sd_IN@devanagari.UTF-8
msg206689 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2013-12-20 16:03
On 20.12.2013 16:24, Serhiy Storchaka wrote:
> 
> Serhiy Storchaka added the comment:
> 
>> Well, yes, but only because you are removing the @-modifiers. I don't
>> think that's correct, since e.g. the string formatting used for
>> numbers is different with the modifier.
> 
> All the @-modifiers except euro are applied to the locale, not the encoding. 
> And Python removes all the @-modifiers, e.g. latin and cyrillic which specify 
> the script.

That's not quite correct. The modifiers are used to determine the
correct mapping, so you'll often find them on the left side, but
not necessarily on the right side.

There are several cases where the modifiers are kept around,
since they have implications on the way number or dates are
formatted.

For the Indian "devanagari" locales we have to keep them,
because the locale formatting of number and dates depends
on them.

>> If you keep the modifiers, but move them to the end of the locale
>> string you should get the correct behavior, e.g.
>>
>> -    'sd':                                   'sd_IN@devanagari.UTF-8',
>> +    'sd':                                   'sd_IN.UTF-8@devanagari',
>>
>> (modulo perhaps the spelling of "UTF-8")
> 
> Recent the locale.alias file changes these entities:
> 
> sd:						sd_IN.UTF-8
> sd_IN.utf8:					sd_IN.UTF-8
> sd@devanagari:					sd_IN@devanagari.UTF-8
> sd_IN@devanagari:				sd_IN@devanagari.UTF-8
> sd_IN@devanagari.utf8:				sd_IN@devanagari.UTF-8

I'm not sure I can parse this comment :-)

Looking at issue20034 I think we are saying that the new updated
local.alias file contains these entries:

sd:						sd_IN.UTF-8
sd_IN.utf8:					sd_IN.UTF-8
sd@devanagari:					sd_IN@devanagari.UTF-8
sd_IN@devanagari:				sd_IN@devanagari.UTF-8
sd_IN@devanagari.utf8:				sd_IN@devanagari.UTF-8

So my example is wrong with the new locale.alias file. Instead,
sd will map directly to sd_IN.UTF-8.

Still, I think the makelocalalias.py script should correct
the non-standard locale names from sd_IN@devanagari.UTF-8
to sd_IN.UTF-8@devanagari in order to match the output
of "locale -a".
msg206695 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-12-20 17:07
Updated patch to tip. The makelocalalias.py script now corrects
the non-standard locale names.
msg206953 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-12-26 18:59
Could you please make a decision about last patch, Marc-Andre?
msg206954 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2013-12-26 19:16
On 26.12.2013 19:59, Serhiy Storchaka wrote:
> 
> Could you please make a decision about last patch, Marc-Andre?

Looks good. Thanks, Serhiy.
msg206955 - (view) Author: Roundup Robot (python-dev) Date: 2013-12-26 19:22
New changeset aad582f717da by Serhiy Storchaka in branch '2.7':
Issue #20027: Fixed locale aliases for devanagari locales.
http://hg.python.org/cpython/rev/aad582f717da

New changeset 7615c009e925 by Serhiy Storchaka in branch '3.3':
Issue #20027: Fixed locale aliases for devanagari locales.
http://hg.python.org/cpython/rev/7615c009e925

New changeset fff3f28733b4 by Serhiy Storchaka in branch 'default':
Issue #20027: Fixed locale aliases for devanagari locales.
http://hg.python.org/cpython/rev/fff3f28733b4
History
Date User Action Args
2013-12-26 19:24:04serhiy.storchakasetstatus: open -> closed
assignee: serhiy.storchaka
resolution: fixed
stage: patch review -> resolved
2013-12-26 19:22:16python-devsetnosy: + python-dev
messages: + msg206955
2013-12-26 19:16:21lemburgsetmessages: + msg206954
2013-12-26 18:59:29serhiy.storchakasetmessages: + msg206953
2013-12-21 19:22:08serhiy.storchakalinkissue20046 dependencies
2013-12-20 17:08:17serhiy.storchakasetfiles: - locale_devanagari_2.patch
2013-12-20 17:07:58serhiy.storchakasetfiles: + locale_devanagari_3.patch

messages: + msg206695
2013-12-20 16:03:47lemburgsetmessages: + msg206689
2013-12-20 15:24:10serhiy.storchakasetmessages: + msg206687
2013-12-20 15:06:02lemburgsetmessages: + msg206685
2013-12-20 14:55:29serhiy.storchakasetmessages: + msg206684
2013-12-20 13:26:32lemburgsetmessages: + msg206680
2013-12-20 11:22:50serhiy.storchakasetfiles: - locale_devanagari.patch
2013-12-20 11:19:10serhiy.storchakasetfiles: + locale_devanagari_2.patch
2013-12-19 19:49:41serhiy.storchakasetfiles: + locale_devanagari.patch
2013-12-19 19:49:09serhiy.storchakasetfiles: - locale_aliases.patch
2013-12-19 19:42:08serhiy.storchakacreate