This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author loewis
Recipients
Date 2004-08-05.13:02:16
SpamBayes Score
Marked as misclassified
Message-id
In-reply-to
Content
Logged In: YES 
user_id=21627

Code page 874 differs from the 8859 one in the definition of
\x80..\x9f. 

http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/PC/CP874.TXT

says

0x80	0x20AC	#EURO SIGN
0x85	0x2026	#HORIZONTAL ELLIPSIS
0x91	0x2018	#LEFT SINGLE QUOTATION MARK
0x92	0x2019	#RIGHT SINGLE QUOTATION MARK
0x93	0x201C	#LEFT DOUBLE QUOTATION MARK
0x94	0x201D	#RIGHT DOUBLE QUOTATION MARK
0x95	0x2022	#BULLET
0x96	0x2013	#EN DASH
0x97	0x2014	#EM DASH

I assume the Thai version of Windows is likely to generate
"windows-874". Debian offers the th_TH locale, with TIS-620,
and a th_TH.UTF-8 locale (i.e. no ISO-8859-1 one).

If ISO 8859-11 is understood as published by ISO (i.e. no
control characters at all), then CP 874 is a strict
extension (adding C0, plus the characters above).

Google gives these frequencies:
tis-620 16,200
windows-874 7,290
iso-8859-11  5,880
History
Date User Action Args
2007-08-23 16:08:16adminlinkissue1001895 messages
2007-08-23 16:08:16admincreate