classification
Title: Add KOI8-RU as a known encoding
Type: Stage:
Components: Unicode Versions: Python 2.7
process
Status: closed Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: dwayne, haypo, lemburg
Priority: normal Keywords:

Created on 2009-02-11 07:03 by dwayne, last changed 2009-03-27 09:50 by lemburg. This issue is now closed.

Files
File name Uploaded Description Edit
koi8_ru.py haypo, 2009-02-12 12:09
koi8-ru haypo, 2009-02-12 12:10
Messages (8)
msg81630 - (view) Author: Dwayne Bailey (dwayne) Date: 2009-02-11 07:03
>>> u = unicode("bob", "KOI8-RU")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
LookupError: unknown encoding: KOI8-RU

This could be broadened to see that we support all encodings that are
supported by iconv.
msg81751 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2009-02-12 12:09
I found this file http://ra.dkuug.dk/i18n/charmaps/KOI8-RU. I 
converted it to a format compatible with gencodec.py. Here is the 
resulting file: copy it into <your python library>/encodings/.
msg81753 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2009-02-12 12:10
Attach file used as gencodec.py input: koi8-ru.

dwayne: Does the result look correct?
msg81754 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2009-02-12 12:14
My version of iconv (2.6.1) doesn't support KOI8-RU, only:
 - CSKOI8R
 - KOI-7
 - KOI-8
 - KOI8-R: supported by python trunk
 - KOI8-T
 - KOI8-U: supported by python trunk
 - KOI8
 - KOI8R
 - KOI8U

Note: python trunk doesn't support KOI8R nor KOI8U (which are just 
aliases to KOI8-R and KOI8-U).
msg81756 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2009-02-12 12:25
Could you please clarify the official status of this encoding. According
to this page:

http://www.terena.org/activities/multiling/koi8-ru/index.html

it is currently only a proposed draft which hasn't been updated since 1997.
msg81913 - (view) Author: Dwayne Bailey (dwayne) Date: 2009-02-13 12:05
@haypo: The encoding works and doesn't throw and error, my guess is that
aliases should be updated to cover the variant namings of -R and -U.

I also found glibc points to this reference
http://cad.ntu-kpi.kiev.ua/multiling/koi8-ru/ which seems to have
disappeared.  I couldn't find a way to validate that the glibc code
points where the same as the ones you have.

My iconv --version is 2.9

Apart from that I can't vouch for its correctness

@lemburg: I can't comment on the status of the standard.  I would assume
that like most 8 bit encodings that these are falling away and being
replaced by Unicode.

Why I'm interested in these issues is that our Python tools are used to
recover translations from installed .mo files on Linux.  I look for
encoding issues on a semi-regularly basis and fix any ones that present
issues. This is the first I've found that is missing in Python.

For us its useful in that we present a path for people to move from an
old encoding into Unicode if needed.
msg84250 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2009-03-27 01:44
> @lemburg: I can't comment on the status of the standard.
> I would assume that like most 8 bit encodings that these 
> are falling away and being replaced by Unicode.

Can I close this issue? Or do we have enough KOI8-RU users to include 
this charset in Python?

I think that iconv is enough for people who need to convert their old 
files to UTF-8 (or anything else).
msg84256 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2009-03-27 09:50
Viktor, I found this reference which has some background information
regarding koi8-ru and other cyrillic encodings:
http://segfault.kiev.ua/cyrillic-encodings/

"This charset wasn't supported by Ukrainian Internet community due to
political reasons; KOI8-U was invented as opposition to KOI8-RU."

Provided that resource is correct, it also appears that its inventor
Yuri Demchenko now switched to KOI8-U as well:
http://staff.science.uva.nl/~demch/

So I guess, we can close this request and leave the codec attached to
the ticket for interested parties to download and install if they need it.
History
Date User Action Args
2009-03-27 09:50:21lemburgsetstatus: open -> closed

messages: + msg84256
2009-03-27 01:44:44hayposetmessages: + msg84250
2009-02-13 12:05:44dwaynesetmessages: + msg81913
2009-02-12 12:25:49lemburgsetnosy: + lemburg
messages: + msg81756
versions: + Python 2.7, - Python 2.6
2009-02-12 12:14:24hayposetmessages: + msg81754
2009-02-12 12:10:30hayposetfiles: + koi8-ru
messages: + msg81753
2009-02-12 12:09:25hayposetfiles: + koi8_ru.py
nosy: + haypo
messages: + msg81751
2009-02-11 07:03:37dwaynecreate