classification
Title: Add support of KOI8-T encoding
Type: enhancement Stage: resolved
Components: Library (Lib) Versions: Python 3.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: amaury.forgeotdarc, jwilk, ned.deily, python-dev, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2014-10-20 17:49 by serhiy.storchaka, last changed 2015-05-12 22:13 by ned.deily. This issue is now closed.

Files
File name Uploaded Description Edit
encoding_koi8_t.patch serhiy.storchaka, 2014-10-20 17:56
Messages (10)
msg229739 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-10-20 17:49
KOI8-T is Tajik encoding partially compatible with KOI8-R. This is default encoding of Tajik locale tg_TJ in glibc (but in X11 locale.alias file it is KOI8-C, issue20087).

Proposed patch adds support for this encoding. I have not found official mapping of KOI8-T and have used a table from Apple's implementation of libiconv. It matches a table in Wikipedia [2] and GNU iconv.

[1] http://www.opensource.apple.com/source/libiconv/libiconv-4/libiconv/tests/KOI8-T.TXT
[2] https://ru.wikipedia.org/wiki/КОИ-8 (Russian)
msg229740 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-10-20 18:45
Ah, actually Apple uses (a fork of) GNU libiconv. So I should correct links.
msg242964 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-05-12 11:20
Ping.
msg242978 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2015-05-12 15:22
Looks good to me.
msg243006 - (view) Author: Roundup Robot (python-dev) Date: 2015-05-12 20:24
New changeset 78de5d040492 by Serhiy Storchaka in branch 'default':
Issue #22681: Added support for the koi8_t encoding.
https://hg.python.org/cpython/rev/78de5d040492
msg243016 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2015-05-12 21:21
Lots of "LookupError: unknown encoding: koi8_t" test failures (on OS X 10.10) after this commit, for example, in test_codecs:

======================================================================
ERROR: test_basics (test.test_codecs.BasicUnicodeTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/py/dev/3x/source/Lib/test/test_codecs.py", line 1869, in test_basics
    name = codecs.lookup(encoding).name
LookupError: unknown encoding: koi8_t

======================================================================
ERROR: test_decoder_state (test.test_codecs.BasicUnicodeTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/py/dev/3x/source/Lib/test/test_codecs.py", line 2024, in test_decoder_state
    self.check_state_handling_decode(encoding, u, u.encode(encoding))
LookupError: unknown encoding: koi8_t

======================================================================
ERROR: test_seek (test.test_codecs.BasicUnicodeTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/py/dev/3x/source/Lib/test/test_codecs.py", line 1992, in test_seek
    reader = codecs.getreader(encoding)(io.BytesIO(s.encode(encoding)))
  File "/py/dev/3x/blds/uxd/../../source/Lib/codecs.py", line 998, in getreader
    return lookup(encoding).streamreader
LookupError: unknown encoding: koi8_t

----------------------------------------------------------------------
Ran 211 tests in 5.970s

FAILED (errors=5, skipped=17)
msg243017 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2015-05-12 21:22
Also the 10.6 (Snow Leopard) buildbot:

http://buildbot.python.org/all/builders/AMD64%20Snow%20Leop%203.x/builds/3125/steps/test/logs/stdio
msg243020 - (view) Author: Roundup Robot (python-dev) Date: 2015-05-12 21:35
New changeset def3bab79c8f by Serhiy Storchaka in branch 'default':
Added forgotten new files for issues #22681 and #22682.
https://hg.python.org/cpython/rev/def3bab79c8f
msg243021 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-05-12 21:36
Thanks Ned. I just forgive to add new encoding files.
msg243026 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2015-05-12 22:13
All better, thanks!
History
Date User Action Args
2015-05-12 22:13:32ned.deilysetmessages: + msg243026
2015-05-12 21:36:53serhiy.storchakasetmessages: + msg243021
2015-05-12 21:35:20python-devsetmessages: + msg243020
2015-05-12 21:22:47ned.deilysetmessages: + msg243017
2015-05-12 21:21:15ned.deilysetnosy: + ned.deily
messages: + msg243016
2015-05-12 20:28:32serhiy.storchakasetstatus: open -> closed
assignee: serhiy.storchaka
resolution: fixed
stage: patch review -> resolved
2015-05-12 20:24:40python-devsetnosy: + python-dev
messages: + msg243006
2015-05-12 15:22:45amaury.forgeotdarcsetnosy: + amaury.forgeotdarc
messages: + msg242978
2015-05-12 11:20:33serhiy.storchakasetmessages: + msg242964
2014-10-28 14:30:20jwilksetnosy: + jwilk
2014-10-20 18:45:13serhiy.storchakasetmessages: + msg229740
2014-10-20 17:58:59serhiy.storchakalinkissue22679 dependencies
2014-10-20 17:56:21serhiy.storchakasetfiles: + encoding_koi8_t.patch
keywords: + patch
2014-10-20 17:49:30serhiy.storchakacreate