Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IA5 Encoding should be in the default encodings #47899

Closed
pascalbach mannequin opened this issue Aug 22, 2008 · 8 comments
Closed

IA5 Encoding should be in the default encodings #47899

pascalbach mannequin opened this issue Aug 22, 2008 · 8 comments
Labels
topic-unicode type-feature A feature request or enhancement

Comments

@pascalbach
Copy link
Mannequin

pascalbach mannequin commented Aug 22, 2008

BPO 3649
Nosy @malemburg, @loewis, @amauryfa
Files
  • ia5.py: File wich implements the python .encode/decode methodes
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2008-08-25.15:11:00.890>
    created_at = <Date 2008-08-22.16:26:46.167>
    labels = ['type-feature', 'expert-unicode']
    title = 'IA5 Encoding should be in the default encodings'
    updated_at = <Date 2008-08-25.15:31:47.296>
    user = 'https://bugs.python.org/pascalbach'

    bugs.python.org fields:

    activity = <Date 2008-08-25.15:31:47.296>
    actor = 'pascal.bach'
    assignee = 'none'
    closed = True
    closed_date = <Date 2008-08-25.15:11:00.890>
    closer = 'lemburg'
    components = ['Unicode']
    creation = <Date 2008-08-22.16:26:46.167>
    creator = 'pascal.bach'
    dependencies = []
    files = ['11214']
    hgrepos = []
    issue_num = 3649
    keywords = []
    message_count = 8.0
    messages = ['71755', '71771', '71776', '71803', '71845', '71887', '71934', '71939']
    nosy_count = 4.0
    nosy_names = ['lemburg', 'loewis', 'amaury.forgeotdarc', 'pascal.bach']
    pr_nums = []
    priority = 'normal'
    resolution = 'rejected'
    stage = None
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue3649'
    versions = ['Python 3.1', 'Python 2.7']

    @pascalbach
    Copy link
    Mannequin Author

    pascalbach mannequin commented Aug 22, 2008

    This encoding is used in the GSM standard it is a 7-bit encoding similar
    to ASCII.
    The encoding definition is found in:
    Short Message Service Centre EMI - UCP Interface 4.6 Specification (p. 79)
    as well as in:
    [3GPP 23.038] 3GPP TS 23.038 Alphabets and language-specific information.

    I think this encoding would be useful for other GSM specific use cases.

    @pascalbach pascalbach mannequin added topic-unicode type-feature A feature request or enhancement labels Aug 22, 2008
    @amauryfa
    Copy link
    Member

    The provided file does not work for "EXTENSION" characters:

    >>> import ia5
    >>> u"[a]".encode("ia5")
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "ia5.py", line 18, in encode
        return codecs.charmap_encode(input,errors,encoding_map)
    TypeError: character mapping must be in range(256)

    I doubt this can be achieved with just a charmap. You will have to roll
    your own incremental stateful decoder.
    Are you willing to do it?

    @pascalbach
    Copy link
    Mannequin Author

    pascalbach mannequin commented Aug 22, 2008

    Well I have seen the problem.

    I'm willing to do this to improve python, but I don't know exactly how
    to do it.

    I looked at how utf-8 and utf-7 are done but I didn't exactly
    understand, are they based on C code?

    Is there an example how this needs to be done? It would be nice if you
    could get me some help where to start.

    @amauryfa
    Copy link
    Member

    You could start with utf_8.py, and of course replace the calls to
    codecs.utf_8_encode and codecs.utf_8_decode.

    @pascalbach
    Copy link
    Mannequin Author

    pascalbach mannequin commented Aug 24, 2008

    I have looked at utf_8.py and I think I know how to implement the
    incremental de/encoder. But I don't understand the codecs.register()
    function. Do I have to provide stateless, stateful and streamwriter at
    the same time?
    If I implement IncrementalEncoder and IncrementalDecoder can I just give
    those two to codecs.register()?

    Thank you for your help.

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Aug 24, 2008

    I don't think this codec should be named IA-5. IA-5 is specified in
    ITU-T Rec. T.50 (International Alphabet No. 5), recently renamed to
    "International Reference Alphabet", and it does *not* specify that the
    characters 0..31 are printable. Instead, IA5 is identical to ISO 646
    (i.e. allowing for national variants), with the International Reference
    Version of IA5 (e.g. as used in ASN.1 IA5String) is identical to US-ASCII.

    If GSM uses a modified version of this, it should receive a separate
    name. If you were looking at section 2 (Structure of EMI messages), what
    makes you think that this specification calls the encoding "IA5"? In my
    copy, it says:

    # Alphanumeric characters are encoded as two numeric IA5 characters,
    # the higher 3 bits (0..7) first, the lower 4 bits (0..F) thereafter,
    # according to the following table.

    So it *uses* IA5 to hex-encode the encoding. To achieve that, one would
    have to write

    text.encode("emi-section-2").encode("hex")

    [Notice that the "hex" codec already uses IA-5]

    In any case, I don't think this is general enough to deserve inclusion
    into the standard library. The codec system is designed to be so
    flexible to support additional codecs outside the core.

    @malemburg
    Copy link
    Member

    I think what you're after is the encoding used in SMS messages:

    http://en.wikipedia.org/wiki/Short_message_service

    Here's an old discussion about this codec:

    http://mail.python.org/pipermail/python-list/2002-October/167267.html
    http://mail.python.org/pipermail/python-list/2002-October/167271.html

    Note that nowadays, SMSCs and interface software such as Kannel
    typically accept UTF-16 data just fine, so the need for such a codec in
    Python in minimal.

    I agree with Martin, that the stdlib is not the right place for such a
    codec. It's easy to write your own codec package and have your
    application register this package at startup time using codecs.register().

    @pascalbach
    Copy link
    Mannequin Author

    pascalbach mannequin commented Aug 25, 2008

    I currently use the codec in my ucplib already and this is not a
    problem. I just thought that it might be useful for somebody else. But
    maybe it is to use case specific.
    If this codec is not of general interest I think this report can be closed.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    topic-unicode type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants