Title: The encoding map from Unicode to CP932 is different from that of Windows'
Type: behavior Stage:
Components: Unicode Versions: Python 3.2, Python 2.7
Status: open Resolution:
Dependencies: Superseder:
Assigned To: hyeshik.chang Nosy List: cedrem, ganaware, hyeshik.chang, lemburg, vstinner
Priority: normal Keywords:

Created on 2010-02-22 13:12 by ganaware, last changed 2011-11-16 08:54 by cedrem.

File name Uploaded Description Edit
differenes.txt ganaware, 2010-02-22 13:11 Unicode-to-CP932 differences between python and windows
Python-2.7a3-cp932-patch.txt ganaware, 2010-02-22 13:18 Fix Unicode-to-CP932 encoding map
Python-2.7a3-cp932-patch2.txt ganaware, 2010-02-22 13:21 Fix Unicode-to-CP932 encoding map, and add Java and glibc compatible conversion.
cp932_roundtrip.tar.bz2 ganaware, 2011-08-25 01:25 HTML version of differences, and programs to reproduce the problem. (2011-08-25)
Messages (4)
msg99731 - (view) Author: Nayuta Taga (ganaware) Date: 2010-02-22 13:11
The encoding map from Unicode to CP932 is different from that of Windows'.

In differences.txt, the first column is Unicode, and the second is CP932.
msg106781 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2010-05-30 21:01
Hye-Shik, could you please comment on this ?

The Windows version appears to replace private use code points with CJK compatibility idiographs, ie. uses standard Unicode code points rather than private escape code points (for round-trip safety).
msg142951 - (view) Author: Nayuta Taga (ganaware) Date: 2011-08-25 01:25
I have updated the tables about the latest Pythons (2.7.2, 3.2.1).
The patches for 2.7a3 can be applied to 2.7.2 and 3.2.1 successfully.

The latest Pythons still have the problem.
Their encoding maps from Unicode to CP932 are different from those of Windows'.
msg147757 - (view) Author: cedre.m (cedrem) Date: 2011-11-16 08:54
Date User Action Args
2011-11-16 08:54:47cedremsetmessages: + msg147757
2011-11-16 08:48:51cedremsetnosy: + cedrem
2011-08-25 01:25:59ganawaresetfiles: + cp932_roundtrip.tar.bz2

messages: + msg142951
2011-08-25 01:17:39ganawaresetfiles: - cp932_roundtrip.tar.bz2
2010-05-30 21:01:58lemburgsetassignee: hyeshik.chang

messages: + msg106781
nosy: + hyeshik.chang
2010-05-23 17:53:36pitrousetnosy: + lemburg, vstinner
2010-05-22 01:43:41vstinnersetversions: + Python 3.2
2010-02-22 13:28:52ganawaresetfiles: + cp932_roundtrip.tar.bz2
2010-02-22 13:22:01ganawaresetfiles: + Python-2.7a3-cp932-patch2.txt
2010-02-22 13:18:09ganawaresetfiles: + Python-2.7a3-cp932-patch.txt
2010-02-22 13:12:01ganawarecreate