classification
Title: Thai encoding alias for 'cp874'
Type: enhancement Stage:
Components: Unicode Versions:
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: Nosy List: era, kamthorn, lemburg, loewis
Priority: normal Keywords:

Created on 2003-12-05 03:16 by kamthorn, last changed 2017-08-29 08:48 by era. This issue is now closed.

Files
File name Uploaded Description Edit
python-cvs-thai-encoding-alias-2.diff kamthorn, 2003-12-05 16:15 patch for add Thai encoding aliases
Messages (7)
msg54076 - (view) Author: Kamthorn Krairaksa (kamthorn) Date: 2003-12-05 03:16
I suggest adding 'tis_620', 'ibm874', 'iso_8859_11',
'iso8859_11', 'windows-874' as alias to 'cp874' to
encodings/aliases.py.
msg54077 - (view) Author: Kamthorn Krairaksa (kamthorn) Date: 2003-12-05 05:00
Logged In: YES 
user_id=143334

sorry, 'windows_874' not 'windows-874'
msg54078 - (view) Author: Kamthorn Krairaksa (kamthorn) Date: 2003-12-05 08:46
Logged In: YES 
user_id=143334

This patch is for add Thai encoding aliases to aliases.py
msg54079 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2003-12-05 09:57
Logged In: YES 
user_id=38388

Thanks for the suggestion. 

Before we can add the aliases we
do however need a reference which clearly says that these
codec names all refer to the same encoding as cp874, esp.
since you seem to have a typo in tis_620 ... the only reference
I could find mentioned tis_602.
msg54080 - (view) Author: Kamthorn Krairaksa (kamthorn) Date: 2003-12-05 16:11
Logged In: YES 
user_id=143334

refer to the page
http://linux.thai.net/thep/mlit/countries.html

There are only two Thai character encoding standard;
'tis-620' and 'iso-8859-11'. The former is under Thai
Industrial Standards Institute (http://www.tisi.go.th/). You
can see details in
http://www.inet.co.th/cyberclub/trin/thairef/tis620-iso10646.html

The later is under ISO
(http://anubis.dkuug.dk/JTC1/SC2/open/02n3333.pdf).

The both of them refer to code page 874.

There are some non-standard Thai character encoding, refer
to code page 874. These are 'windows-874' 'ibm874'
'x-mac-thai' 'tactis' (adds x-mac-thai and tactis)

The name of Thai character encoding is tis-620 not tis-602
as you mentioned.

Summary:
- 'tis620', 'tis_620', 'ibm874', 'iso_8859_11',
'iso8859_11', 'windows-874', 'x-mac-thai', 'tactis' should
alias to 'cp874'

Additional, I found 'tis260' alias to 'tactis'  in
aliases.py, I sure 'tis260' is typo and 'tactis' is missing.
I suggest remove it.

(please see my update patch)
msg54081 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2004-01-18 09:38
Logged In: YES 
user_id=21627

The code sets should not alias. In CP 874, \80 is EURO SIGN.
In TIS 620, it is (apparently) unassigned (same for all
other characters in the range \x80..\xa0). IOW, CP 874 is a
superset of TIS 620.

Closing the request as rejected.
msg300976 - (view) Author: (era) Date: 2017-08-29 08:48
Closing the entire enhancement request just because one detail is off seems insane.

Anyway, until the day in the distant future when Python can support encoding names in common circulation, http://stackoverflow.com/a/1064191/874188 offers a crude workaround.


import encodings

if 'windows_874' not in encodings.aliases.aliases:
    encodings.aliases.aliases['windows_874'] = 'cp874'

This is tricky in a number of ways; in practice, this snippet needs to be at the very start of your source file. Also, the underscore is correct even for email encoding names like =?windows-874?Q?hello=3F?= which use a dash (the dash gets remapped to underscore internally when looking up the encoding alias).
History
Date User Action Args
2017-08-29 08:48:02erasetnosy: + era
messages: + msg300976
2003-12-05 03:16:34kamthorncreate