Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cp720 encoding map #44344

Closed
bialix mannequin opened this issue Dec 16, 2006 · 14 comments
Closed

cp720 encoding map #44344

bialix mannequin opened this issue Dec 16, 2006 · 14 comments
Labels
topic-unicode type-feature A feature request or enhancement

Comments

@bialix
Copy link
Mannequin

bialix mannequin commented Dec 16, 2006

BPO 1616979
Nosy @malemburg, @loewis, @amauryfa
Files
  • cp720.diff: cp720 support
  • CP720.TXT: source of map
  • genwincodec-trunk.patch
  • genwincodec-py3k.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2009-07-13.20:56:20.510>
    created_at = <Date 2006-12-16.14:24:07.000>
    labels = ['type-feature', 'expert-unicode']
    title = 'cp720 encoding map'
    updated_at = <Date 2009-11-16.10:19:28.601>
    user = 'https://bugs.python.org/bialix'

    bugs.python.org fields:

    activity = <Date 2009-11-16.10:19:28.601>
    actor = 'bialix'
    assignee = 'none'
    closed = True
    closed_date = <Date 2009-07-13.20:56:20.510>
    closer = 'amaury.forgeotdarc'
    components = ['Unicode']
    creation = <Date 2006-12-16.14:24:07.000>
    creator = 'bialix'
    dependencies = []
    files = ['7657', '7658', '14489', '14490']
    hgrepos = []
    issue_num = 1616979
    keywords = ['patch', 'needs review']
    message_count = 14.0
    messages = ['51546', '51547', '51548', '51549', '51550', '51551', '90440', '90459', '90468', '90469', '90501', '95332', '95337', '95338']
    nosy_count = 5.0
    nosy_names = ['lemburg', 'loewis', 'amaury.forgeotdarc', 'bialix', 'abu_mohammed']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'patch review'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue1616979'
    versions = ['Python 3.1', 'Python 2.7']

    @bialix
    Copy link
    Mannequin Author

    bialix mannequin commented Dec 16, 2006

    I'm working on Bazaar (bzr) VCS. One of our user report about bug that occurs because of his Windows XP machine use cp720 codepage for DOS console. cp720 is OEM Arabic codepage.

    Python standard library does not have encoding map for this encoding so I create corresponding one. Attached patch provide cp720.py file for encodings package and mention this encoding in documentation.

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Jan 8, 2007

    Where did you get CP720.txt from? Just generating the file is not good enough: it must be integrated somehow into Tools/unicode.

    @malemburg
    Copy link
    Member

    Please provide a reference defining the encoding.

    The only reference I could find was http://msdn2.microsoft.com/en-us/library/system.text.encoding(vs.80).aspx
    but that doesn't provide the mapping table.

    Thanks.

    @bialix
    Copy link
    Mannequin Author

    bialix mannequin commented Jan 8, 2007

    When I start working on cp720 I'm search in google for cp720. I found this presentation with actual map of chars:
    http://stanley.cs.toronto.edu/presentations/2005-winter/unicode.ppt

    Then I try to search for CP720.txt file and I found this page:
    http://www.haible.de/bruno/charsets/conversion-tables/Arabic-other.html

    I download archive from that page and use CP720.txt to generate cp720.py.

    @bialix
    Copy link
    Mannequin Author

    bialix mannequin commented Jan 8, 2007

    File Added: CP720.TXT

    @bialix
    Copy link
    Mannequin Author

    bialix mannequin commented Jan 8, 2007

    Here is the map on the Microsoft site:
    http://www.microsoft.com/globaldev/reference/oem/720.mspx

    @devdanzin devdanzin mannequin added topic-unicode type-feature A feature request or enhancement labels Mar 30, 2009
    @abumohammed
    Copy link
    Mannequin

    abumohammed mannequin commented Jul 12, 2009

    As a user I experienced this bug. With python 3.1, the interpreter
    terminate with fatal error:
    "Py_Initialize: can't initialize sys standard streams
    LookupError: unknown encoding: cp720"

    I think, this can be replicated by changing the active code page in the
    cmd session, before invoking the interpreter. The following command will
    do so:

    chcp 720

    without the patch python would crash after this command.
    I think testing the patch after this command is sufficient.

    @amauryfa
    Copy link
    Member

    Instead of using another source of third-party files, I suggest to use the Windows
    functions to generate the mapping.
    The attached patch contains a script, genwincodec.py, which uses MultiByteToWideChar
    and generates a codec file.

    I use it like this:
    .\PCBuild\python Tools\unicode\genwincodec.py 720 > Lib\encodings\cp720.py

    The generated file is also in the patch.

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Jul 13, 2009

    Amaury: your approach sounds fine to me, please apply.

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Jul 13, 2009

    Reconsidering, I'd like to ask for two changes:

    • please record the command(s) used to generate tables on Windows
      somewhere, in either Tools/unicode/Makefile, or a separate batch file.
    • please arrange for the doc string of the generated file to identify
      the source of the data base; in particular, make sure that the Windows
      version on which this was run is identified. It might be that future
      Windows versions change the mappings.

    @amauryfa
    Copy link
    Member

    The codec file now starts with the comment:
    """Python Character Mapping Codec cp720 generated on Windows:
    Vista 6.0.6002 SP2 Multiprocessor Free with the command:
    python Tools/unicode/genwincodec.py 720
    """

    I also added a file Tools\unicode\genwincodecs.bat that currently only
    generates cp720.py.

    Applied in r74000 (trunk) and r74003 & r74004 (py3k)

    @bialix
    Copy link
    Mannequin Author

    bialix mannequin commented Nov 16, 2009

    As the author of original patch I want to note that it seems your merged
    patch does not update the documentation (list of standard encodings).

    Please, update the docs as well.

    @amauryfa
    Copy link
    Member

    I think it is, see r74006 and
    http://docs.python.org/dev/library/codecs.html#standard-encodings
    (this is the doc for the future 2.7 version)

    @bialix
    Copy link
    Mannequin Author

    bialix mannequin commented Nov 16, 2009

    OK, thanks.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    topic-unicode type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants