classification
Title: cp874 encoding almost empty
Type: behavior Stage: resolved
Components: Unicode Versions: Python 2.7
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: era, ezio.melotti, lemburg, vstinner
Priority: normal Keywords:

Created on 2014-11-24 10:39 by era, last changed 2014-11-24 14:39 by r.david.murray. This issue is now closed.

Messages (4)
msg231596 - (view) Author: (era) Date: 2014-11-24 10:39
I created a simple script to map character codes in the 8bit range to Unicode for simple lookup:

https://github.com/tripleee/8bit

In the generated output, on Python 2.6.6 (but corroborated on Python 2.7.6), almost all character codes come up as "undefined" in CP874.

According to http://en.wikipedia.org/wiki/ISO/IEC_8859-11, CP874 should be a superset of ISO-8859-11, with a few character codes *added* in the ISO control range.
msg231598 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2014-11-24 11:02
I'm not sure I understand the bug report. What's the problem ? :-)

The codec is a charmap codec generated from the file MAPPINGS/VENDORS/MICSFT/WINDOWS/CP874.TXT (http://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP874.TXT)

This mapping does have quite a few undefined characters.
msg231599 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2014-11-24 11:09
BTW: The table on the wiki page shows the same undefined chars.
msg231600 - (view) Author: (era) Date: 2014-11-24 11:47
My apologies -- I already attemptd to close this as a mistake on my part, but apparently, that failed too.  )-:  Sorry.
History
Date User Action Args
2014-11-24 14:39:36r.david.murraysetstage: resolved
2014-11-24 11:47:09erasetstatus: open -> closed
resolution: not a bug
messages: + msg231600
2014-11-24 11:09:08lemburgsetmessages: + msg231599
2014-11-24 11:02:43lemburgsetmessages: + msg231598
2014-11-24 10:43:24vstinnersetnosy: + lemburg
2014-11-24 10:39:37eracreate