This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Python does not support the GEORGIAN-PS charset
Type: crash Stage: resolved
Components: Unicode Versions: Python 3.11, Python 3.10, Python 3.9
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Caolán.McNamara, ezio.melotti, jwilk, lemburg, loewis, serhiy.storchaka, taleinat, vstinner
Priority: normal Keywords:

Created on 2013-10-31 10:52 by Caolán.McNamara, last changed 2022-04-11 14:57 by admin.

Files
File name Uploaded Description Edit
georgian_ps.py vstinner, 2013-10-31 11:24
Messages (7)
msg201800 - (view) Author: Caolán McNamara (Caolán.McNamara) Date: 2013-10-31 10:52
LANG=ka_GE.georgianps /usr/bin/python3
Fatal Python error: Py_Initialize: Unable to get the locale encoding
LookupError: unknown encoding: GEORGIAN-PS
Aborted (core dumped)

but with python-2.7.5 no crash...
LANG=ka_GE.georgianps /usr/bin/python2
Python 2.7.5 (default, Oct  8 2013, 12:19:40) 
[GCC 4.8.1 20130603 (Red Hat 4.8.1-1)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>>

(fedora 19)
msg201801 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-10-31 10:56
This bug was initially reported in LibreOffice:
https://bugs.freedesktop.org/show_bug.cgi?id=68850
msg201802 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-10-31 11:24
I found three georgian encodings:

https://sourceware.org/git/?p=glibc.git;a=blob;f=localedata/charmaps/GEORGIAN-PS;h=64615ff4344d74ea0c70cfd7a6c6c8019afb884e;hb=HEAD

https://sourceware.org/git/?p=glibc.git;a=blob;f=localedata/charmaps/GEORGIAN-ACADEMY;h=9dc1bc9e782e9fe6092a00daf1a75274fd6dd738;hb=HEAD

http://tools.ietf.org/html/draft-giasher-geostd8-00

The first one ("GEORGIAN-PS") is probably the most accurate because it is the one included in the GNU libc.

Could you please try to copy attached georgian_ps.py file into /usr/lib64/python3.3/encodings/ (or /usr/lib/python3.3/encodings/ for 32-bit Linux)?

Then try to print georgian letters using:

   print(bytes(range(0xc0, 0xe6)).decode("GEORGIAN-PS"))

Please give me also your locale encoding:

   import locale; print(locale.getpreferredencoding())

@Caolán: Do you know the GEORGIAN-ACADEMY encoding? It doesn't look to be used by any glibc locale.

On my Fedora 18, I have 3 georgian locales:

* ka_GE.georgianps: locale encoding GEORGIAN-PS
* ka_GE: locale encoding GEORGIAN-PS
* ka_GE.utf8: locale encoding UTF-8

You can workaround this issue by switching your locale from ka_GE.georgianps to ka_GE.utf8.
msg404214 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2021-10-18 19:46
With recent versions of Python (e.g. 3.9) this no longer causes a crash. Python apparently falls back to UTF-8, at least on my system:

$ LANG=ka_GE.georgianps python3.9
Python 3.9.7 (default, Sep  9 2021, 23:20:13) 
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import locale; print(locale.getpreferredencoding())
UTF-8

I'm marking this as fixed. If someone still has issues with this encoding, please open a new issue with up-to-date information.
msg404250 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-10-18 23:46
Python uses UTF-8 if the locale is not supported:

$ LANG=xxx python3.9 -c "import sys; print(sys.flags.utf8_mode)"
1

On Fedora 34, the locale is still supported, and Python 3.11 still fails:

vstinner@apu$ LANG=ka_GE.georgianps locale
LANG=ka_GE.georgianps
LC_CTYPE="ka_GE.georgianps"
LC_NUMERIC="ka_GE.georgianps"
LC_TIME="ka_GE.georgianps"
LC_COLLATE="ka_GE.georgianps"
LC_MONETARY="ka_GE.georgianps"
LC_MESSAGES="ka_GE.georgianps"
LC_PAPER="ka_GE.georgianps"
LC_NAME="ka_GE.georgianps"
LC_ADDRESS="ka_GE.georgianps"
LC_TELEPHONE="ka_GE.georgianps"
LC_MEASUREMENT="ka_GE.georgianps"
LC_IDENTIFICATION="ka_GE.georgianps"
LC_ALL=

vstinner@apu$ LANG=ka_GE.georgianps python3.11 -c "import sys; print(sys.flags.utf8_mode)"
Python path configuration:
  PYTHONHOME = (not set)
  PYTHONPATH = (not set)
  program name = './python'
  isolated = 0
  environment = 1
  user site = 1
  import site = 1
  stdlib dir = '/home/vstinner/python/main/Lib'
  sys._base_executable = '/home/vstinner/python/main/python'
  sys.base_prefix = '/usr/local'
  sys.base_exec_prefix = '/usr/local'
  sys.platlibdir = 'lib'
  sys.executable = '/home/vstinner/python/main/python'
  sys.prefix = '/usr/local'
  sys.exec_prefix = '/usr/local'
  sys.path = [
    '/usr/local/lib/python311.zip',
    '/home/vstinner/python/main/Lib',
    '/home/vstinner/python/main/build/lib.linux-x86_64-3.11-pydebug',
  ]
Fatal Python error: init_fs_encoding: failed to get the Python codec of the filesystem encoding
Python runtime state: core initialized
LookupError: unknown encoding: GEORGIAN-PS

Current thread 0x00007ff89b81d2c0 (most recent call first):
  <no Python frame>
msg404275 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2021-10-19 08:44
Possible solutions (they can be combined):

1. Add support for the GEORGIAN-PS charset and all other encodings used in libc (issue22679). The problem is that it is difficult to get the official information about these encodings.

2. Falls back to utf-8 or ascii+surrogateescape in case of unsupported locale encoding. But typos can slip unnoticed.
msg404290 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2021-10-19 11:20
On 19.10.2021 10:44, Serhiy Storchaka wrote:
> 
> Possible solutions (they can be combined):
> 
> 1. Add support for the GEORGIAN-PS charset and all other encodings used in libc (issue22679). The problem is that it is difficult to get the official information about these encodings.

As with all encodings we add: there has to be a real need to support
them natively in Python (as opposed to installing codecs via PyPI)
and we need a definite source for the encoding, e.g. a standards
document from an official body.

IMO, we should not really add more encodings to the stdlib, but instead
point people to e.g. the iconv package:

https://pypi.org/project/python-iconv/

Perhaps we ought to make it easier for such packages to provide
additional codecs even during the startup phase, e.g. via a special
env var which points Python to a list of codec packages to load
prior to initializing the I/O encoding... not sure whether this is
possible, though.

> 2. Falls back to utf-8 or ascii+surrogateescape in case of unsupported locale encoding. But typos can slip unnoticed.

I think this would be a more general solution to such cases, provided
the startup logic issues a visible warning about the fallback.
History
Date User Action Args
2022-04-11 14:57:52adminsetgithub: 63658
2021-12-11 19:13:45iritkatrielsetversions: + Python 3.9, Python 3.10, Python 3.11, - Python 3.3, Python 3.4
2021-10-19 11:20:36lemburgsetmessages: + msg404290
2021-10-19 08:44:49serhiy.storchakasetmessages: + msg404275
2021-10-18 23:46:36vstinnersetstatus: closed -> open
resolution: fixed ->
messages: + msg404250
2021-10-18 19:46:45taleinatsetstatus: open -> closed

nosy: + taleinat
messages: + msg404214

resolution: fixed
stage: resolved
2014-10-28 14:29:49jwilksetnosy: + jwilk
2014-10-20 16:50:51serhiy.storchakalinkissue22679 dependencies
2013-10-31 11:37:25serhiy.storchakasetnosy: + lemburg, loewis, serhiy.storchaka
2013-10-31 11:25:06vstinnersettitle: Fatal Python error: Py_Initialize: Unable to get the locale encoding: GEORGIAN-PS -> Python does not support the GEORGIAN-PS charset
versions: + Python 3.4
2013-10-31 11:24:45vstinnersetfiles: + georgian_ps.py

messages: + msg201802
2013-10-31 10:56:24vstinnersetnosy: + vstinner
messages: + msg201801
2013-10-31 10:52:59Caolán.McNamaracreate