Message 279149 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	ztane
Recipients	ezio.melotti, jwilk, lemburg, matorban, progfou, serhiy.storchaka, vstinner, ztane
Date	2016-10-21.20:03:22
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1477080203.39.0.062821452022.issue21081@psf.upfronthosting.co.za>
In-reply-to

Content
I found the full document on SlideShare: http://www.slideshare.net/sacobat/tcvn-5712-1993-cng-ngh-thng-tin-b-m-chun-8bit-k-t-vit-dng-trong-trao-i-thng-tin As far as I can understand, they're "subsets" of each other only in the sense that VN1 has the widest mapping of characters, but this also partially overlaps with C0 and C1 ranges of control characters in ISO code pages - there are 139 additional characters! VN2 then lets the C0 and C1 retain the meanings of ISO-8859 by sacrificing some capital vowels (Ezio perhaps remembers that Italy is Ý in Vietnamese - sorry, can't write it in upper case in VN2). VN3 then sacrifices even more for some more spaces left for possibly application-specific uses (the standard is very vague about that); The text of the standard is copy-pasteable at http://luatvn.net/tieu-chuan-viet-nam/tieu-chuan-viet-nam-tcvn5712_1993.2.171673.html - however, it lacks some of the tables. The standard additionally has both UCS-2 mappings and Unicode names of the characters, but they're in pictures; so it would be preferable to get the mapping from the iconv output, say.

I found the full document on SlideShare: http://www.slideshare.net/sacobat/tcvn-5712-1993-cng-ngh-thng-tin-b-m-chun-8bit-k-t-vit-dng-trong-trao-i-thng-tin 

As far as I can understand, they're "subsets" of each other only in the sense that VN1 has the widest mapping of characters, but this also partially overlaps with C0 and C1 ranges of control characters in ISO code pages - there are 139 additional characters!

VN2 then lets the C0 and C1 retain the meanings of ISO-8859 by sacrificing some capital vowels (Ezio perhaps remembers that Italy is Ý in Vietnamese - sorry, can't write it in upper case in VN2). VN3 then sacrifices even more for some more spaces left for possibly application-specific uses (the standard is very vague about that); 

The text of the standard is copy-pasteable at http://luatvn.net/tieu-chuan-viet-nam/tieu-chuan-viet-nam-tcvn5712_1993.2.171673.html - however, it lacks some of the tables.

The standard additionally has both UCS-2 mappings and Unicode names of the characters, but they're in pictures; so it would be preferable to get the mapping from the iconv output, say.

History
Date	User	Action	Args
2016-10-21 20:03:23	ztane	set	recipients: + ztane, lemburg, vstinner, jwilk, ezio.melotti, progfou, serhiy.storchaka, matorban
2016-10-21 20:03:23	ztane	set	messageid: <1477080203.39.0.062821452022.issue21081@psf.upfronthosting.co.za>
2016-10-21 20:03:23	ztane	link	issue21081 messages
2016-10-21 20:03:22	ztane	create