classification
Title: lzh_tw is missing in locale.py
Type: Stage: resolved
Components: Library (Lib) Versions: Python 3.6
process
Status: closed Resolution: duplicate
Dependencies: Superseder: Mismatch between glibc and X11 locale.alias
View: 20087
Assigned To: Nosy List: benjamin.peterson, cypressyew, inada.naoki, lemburg, serhiy.storchaka
Priority: normal Keywords:

Created on 2018-02-06 08:17 by cypressyew, last changed 2018-02-15 10:03 by serhiy.storchaka. This issue is now closed.

Messages (8)
msg311717 - (view) Author: Po-Hsu Lin (cypressyew) Date: 2018-02-06 08:17
The lzh_tw locale (Literary Chinese) is not available in Lib/locale.py

This issue will cause error like:

Traceback (most recent call last):
  File "/usr/share/apport/apport-gtk", line 598, in <module>
    app.run_argv()
  File "/usr/lib/python3/dist-packages/apport/ui.py", line 694, in run_argv
    return self.run_crashes()
  File "/usr/lib/python3/dist-packages/apport/ui.py", line 245, in run_crashes
    logind_session[1] > self.report.get_timestamp():
  File "/usr/lib/python3/dist-packages/apport/report.py", line 1684, in get_timestamp
    orig_ctime = locale.getlocale(locale.LC_TIME)
  File "/usr/lib/python3.6/locale.py", line 581, in getlocale
    return _parse_localename(localename)
  File "/usr/lib/python3.6/locale.py", line 490, in _parse_localename
    raise ValueError('unknown locale: %s' % localename)
ValueError: unknown locale: lzh_TW

This can be easily reproduced in Ubuntu 17.10, with English selected as the default language, but Timezone set to Taipei. This will set the locale to:

$ locale
LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC=lzh_TW
LC_TIME=lzh_TW
LC_COLLATE="en_US.UTF-8"
LC_MONETARY=lzh_TW
LC_MESSAGES="en_US.UTF-8"
LC_PAPER=lzh_TW
LC_NAME=lzh_TW
LC_ADDRESS=lzh_TW
LC_TELEPHONE=lzh_TW
LC_MEASUREMENT=lzh_TW
LC_IDENTIFICATION=lzh_TW
LC_ALL=

And when running some python script to call locale.py, you will see the error message above.
msg311719 - (view) Author: Inada Naoki (inada.naoki) * (Python committer) Date: 2018-02-06 09:10
Maybe, relating to https://bugs.launchpad.net/ubuntu/+source/language-pack-zh-hant/+bug/1699540
msg311720 - (view) Author: Po-Hsu Lin (cypressyew) Date: 2018-02-06 09:24
Yes, this is related to the language setting in Ubuntu, as the locale should be set to zh_TW instead of lzh_TW when the Timezone was set to Taipei.

But even so, I think this bug is still valid, as the lzh_TW does not exist in the lib at all.
msg311721 - (view) Author: Inada Naoki (inada.naoki) * (Python committer) Date: 2018-02-06 09:39
> But even so, I think this bug is still valid, as the lzh_TW does not exist in the lib at all.

Python doesn't have locale database, while have some aliases.
Python uses libc's locale.

This exception is raised because `_parse_localename` doesn't support
locale name without encoding.

In case of zh_TW, alias is registered:

    'zh_tw':                                'zh_TW.big5',

But I don't think adding `lzh_tw` to alias is good idea.
There are no "one right alias table".  In case of zh_tw, you may
want zh_TW.UTF-8 rather than zh_TW.bit5, don't you?

So I think supporting locale name without encoding is right way.
Maybe, we should return None for encoding in such situation.
msg311778 - (view) Author: Po-Hsu Lin (cypressyew) Date: 2018-02-07 09:19
Yes I think you are right, 
return None sounds like a good approach to me as we might have zh_TW translated but not lzh_TW.
msg312195 - (view) Author: Inada Naoki (inada.naoki) * (Python committer) Date: 2018-02-15 09:27
lzh_tw was added in this commit:
https://github.com/bminor/glibc/commit/5057e7ce826bb0be3f476408b2ae364042f2a9bb#diff-3d056472e12e5dc464fa44144719b82f

I don't know why Python should have such a large locale alias table.

I added Serhiy to nosy list because he is author of issue20079.

Serhiy, how do you think about making UTF-8 as default charset and
drop all aliases like "xx_YY" -> "xx_YY.UTF-8" ?
msg312196 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-02-15 09:56
I'm not sure that wrong guess is better that exception.

It looks to me that there is something wrong with the way we use the alias table. It is glibc centric, but some entries contradict glibc, because the X11 alias have a precedence. There are known issues on OS X. I think the other way for determining the locale encoding should be used.
msg312197 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-02-15 10:03
See also issue20087. It added lzh_tw locale, but later this change was reverted. Thus I close this issue as a duplicate.
History
Date User Action Args
2018-02-15 10:03:40serhiy.storchakasetstatus: open -> closed
superseder: Mismatch between glibc and X11 locale.alias
messages: + msg312197

resolution: duplicate
stage: resolved
2018-02-15 09:56:35serhiy.storchakasetnosy: + lemburg, benjamin.peterson
messages: + msg312196
2018-02-15 09:27:37inada.naokisetnosy: + serhiy.storchaka
messages: + msg312195
2018-02-07 09:19:32cypressyewsetmessages: + msg311778
2018-02-06 09:39:42inada.naokisetmessages: + msg311721
2018-02-06 09:24:00cypressyewsetmessages: + msg311720
2018-02-06 09:10:20inada.naokisetnosy: + inada.naoki
messages: + msg311719
2018-02-06 08:17:53cypressyewcreate