classification
Title: cp720 encoding map
Type: enhancement Stage: patch review
Components: Unicode Versions: Python 3.1, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: abu_mohammed, amaury.forgeotdarc, bialix, lemburg, loewis
Priority: normal Keywords: needs review, patch

Created on 2006-12-16 14:24 by bialix, last changed 2009-11-16 10:19 by bialix. This issue is now closed.

Files
File name Uploaded Description Edit
cp720.diff bialix, 2006-12-16 14:24 cp720 support
CP720.TXT bialix, 2007-01-08 13:33 source of map
genwincodec-trunk.patch amaury.forgeotdarc, 2009-07-12 22:31
genwincodec-py3k.patch amaury.forgeotdarc, 2009-07-12 22:40
Messages (14)
msg51546 - (view) Author: Alexander Belchenko (bialix) Date: 2006-12-16 14:24
I'm working on Bazaar (bzr) VCS. One of our user report about bug that occurs because of his Windows XP machine use cp720 codepage for DOS console. cp720 is OEM Arabic codepage.

Python standard library does not have encoding map for this encoding so I create corresponding one. Attached patch provide cp720.py file for encodings package and mention this encoding in documentation.
msg51547 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2007-01-08 07:45
Where did you get CP720.txt from? Just generating the file is not good enough: it must be integrated somehow into Tools/unicode.
msg51548 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2007-01-08 10:26
Please provide a reference defining the encoding.

The only reference I could find was http://msdn2.microsoft.com/en-us/library/system.text.encoding(vs.80).aspx
but that doesn't provide the mapping table.

Thanks.
msg51549 - (view) Author: Alexander Belchenko (bialix) Date: 2007-01-08 12:47
When I start working on cp720 I'm search in google for cp720. I found this presentation with actual map of chars:
http://stanley.cs.toronto.edu/presentations/2005-winter/unicode.ppt

Then I try to search for CP720.txt file and I found this page:
http://www.haible.de/bruno/charsets/conversion-tables/Arabic-other.html

I download archive from that page and use CP720.txt to generate cp720.py.
msg51550 - (view) Author: Alexander Belchenko (bialix) Date: 2007-01-08 13:33
File Added: CP720.TXT
msg51551 - (view) Author: Alexander Belchenko (bialix) Date: 2007-01-08 13:47
Here is the map on the Microsoft site:
http://www.microsoft.com/globaldev/reference/oem/720.mspx
msg90440 - (view) Author: Abdulmonem (abu_mohammed) Date: 2009-07-12 08:15
As a user I experienced this bug. With  python 3.1, the interpreter 
terminate with fatal error:
"Py_Initialize: can't initialize sys standard streams
LookupError: unknown encoding: cp720"

I think, this can be replicated by changing the active code page in the 
cmd session, before invoking the interpreter. The following command will 
do so:
> chcp 720

without the patch python would crash after this command.
I think testing the patch after this command is sufficient.
msg90459 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2009-07-12 22:31
Instead of using another source of third-party files, I suggest to use the Windows 
functions to generate the mapping.
The attached patch contains a script, genwincodec.py, which uses MultiByteToWideChar 
and generates a codec file.

I use it like this:
.\PCBuild\python Tools\unicode\genwincodec.py 720 > Lib\encodings\cp720.py

The generated file is also in the patch.
msg90468 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2009-07-13 05:09
Amaury: your approach sounds fine to me, please apply.
msg90469 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2009-07-13 05:29
Reconsidering, I'd like to ask for two changes:
- please record the command(s) used to generate tables on Windows
somewhere, in either Tools/unicode/Makefile, or a separate batch file.
- please arrange for the doc string of the generated file to identify
the source of the data base; in particular, make sure that the Windows
version on which this was run is identified. It might be that future
Windows versions change the mappings.
msg90501 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2009-07-13 20:56
The codec file now starts with the comment:
"""Python Character Mapping Codec cp720 generated on Windows:
Vista 6.0.6002 SP2 Multiprocessor Free with the command:
  python Tools/unicode/genwincodec.py 720
"""

I also added a file Tools\unicode\genwincodecs.bat that currently only 
generates cp720.py.

Applied in r74000 (trunk) and r74003 & r74004 (py3k)
msg95332 - (view) Author: Alexander Belchenko (bialix) Date: 2009-11-16 09:36
As the author of original patch I want to note that it seems your merged
patch does not update the documentation (list of standard encodings).

Please, update the docs as well.
msg95337 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2009-11-16 10:13
I think it is, see r74006 and
http://docs.python.org/dev/library/codecs.html#standard-encodings
(this is the doc for the future 2.7 version)
msg95338 - (view) Author: Alexander Belchenko (bialix) Date: 2009-11-16 10:19
OK, thanks.
History
Date User Action Args
2010-05-12 04:07:40loewislinkissue8693 superseder
2009-12-30 15:03:38r.david.murraylinkissue7600 superseder
2009-11-16 10:19:28bialixsetmessages: + msg95338
2009-11-16 10:13:43amaury.forgeotdarcsetmessages: + msg95337
2009-11-16 09:36:58bialixsetmessages: + msg95332
2009-07-13 20:56:20amaury.forgeotdarcsetstatus: open -> closed
resolution: accepted -> fixed
messages: + msg90501
2009-07-13 05:29:32loewissetmessages: + msg90469
2009-07-13 05:09:32loewissetresolution: accepted
messages: + msg90468
2009-07-12 22:41:28amaury.forgeotdarcsetfiles: + genwincodec-py3k.patch
keywords: + patch
2009-07-12 22:31:42amaury.forgeotdarcsetfiles: + genwincodec-trunk.patch

nosy: + amaury.forgeotdarc
messages: + msg90459

keywords: + needs review, - patch
stage: test needed -> patch review
2009-07-12 08:15:14abu_mohammedsetnosy: + abu_mohammed
messages: + msg90440
2009-03-30 18:02:23ajaksu2setstage: test needed
type: enhancement
components: + Unicode, - None
versions: + Python 3.1, Python 2.7
2006-12-16 14:24:07bialixcreate