Author lemburg
Recipients Arfrever, lemburg, loewis, serhiy.storchaka
Date 2017-03-07.18:29:01
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <>
In-reply-to <>
On 07.03.2017 18:23, Serhiy Storchaka wrote:
> Serhiy Storchaka added the comment:
>> 'cy_GB.ISO8859-1' to 'cy_GB.ISO8859-14'
> Looks as just fixing an error. The default West-European ISO8859-1 is changed to Celtic cy_GB.ISO8859-14. This looks better option for Welsh.
>> 'tg_TJ.KOI8-C' to 'tg_TJ.KOI8-T'
> KOI8-C is not supported by Python, but KOI8-T is supported. I don't know what KOI8-C means, there are several rarely used incompatible encodings with this name.

While all this may make sense, I'm missing some more reasoning
behind the differences between and glibc.

This change also looks strange:

-    'ka_ge':                                'ka_GE.GEORGIAN-ACADEMY',
+    'ka_ge':                                'ka_GE.GEORGIAN_PS',
     'ka_ge.georgianacademy':                'ka_GE.GEORGIAN-ACADEMY',
     'ka_ge.georgianps':                     'ka_GE.GEORGIAN-PS',
     'ka_ge.georgianrs':                     'ka_GE.GEORGIAN-ACADEMY',

Why is GEORGIAN_PS written with an underscore whereas the other
mappings use dashes ?

Or this one:

-    'fi_fi':                                'fi_FI.ISO8859-15',
+    'fi_fi':                                'fi_FI.ISO8859-1',

Why would a locale switch away from an encoding having
the Euro sign to one without it ?

Or why is this latin variant removed:

-    'nan_tw@latin':                         'nan_TW.UTF-8@latin',

Why should Russians switch back to ISO ?

-    'ru_ru':                                'ru_RU.UTF-8',
+    'ru_ru':                                'ru_RU.ISO8859-5',

or from ISO to KOI ?

-    'russian':                              'ru_RU.ISO8859-5',
+    'russian':                              'ru_RU.KOI8-R',

The more I look at these changes, the more I believe we
should not simply take everything we find in the files
for granted. They obviously both have bugs.

>> I also don't understand why some "xx.utf-8" locale mappings were removed - I don't think we should remove those, unless they are no longer needed due to some other logic implying these mappings.
> The aliases table is a table of exceptions. Removed entries no longer are exceptional.

It's not a table of exceptions, it's a table mapping commonly
used locale settings to ones which the lib C understands :-)

But regardless, I checked the code and it is already
smart enough to convert lib C incompatible spellings such
as "utf8" to "UTF-8", so these entries can indeed be
removed, but only if the locale is otherwise listed.

In some cases, it's probably better to drop the ".utf8"
to have more generic mappings, e.g.

+    'bhb_in.utf8':                          'bhb_IN.UTF-8',


     'de_li.utf8':                           'de_LI.UTF-8',

though I'd expect that mapping to be:

     'de_li':                           'de_LI.ISO8859-1',

as for all other "de" entries.
Date User Action Args
2017-03-07 18:29:01lemburgsetrecipients: + lemburg, loewis, Arfrever, serhiy.storchaka
2017-03-07 18:29:01lemburglinkissue20087 messages
2017-03-07 18:29:01lemburgcreate