Issue 3649: IA5 Encoding should be in the default encodings

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/47899

classification

Title:	IA5 Encoding should be in the default encodings
Type:	enhancement	Stage:
Components:	Unicode	Versions:	Python 3.1, Python 2.7

process

Status:	closed	Resolution:	rejected
Dependencies:		Superseder:
Assigned To:		Nosy List:	amaury.forgeotdarc, lemburg, loewis, pascal.bach
Priority:	normal	Keywords:

Created on 2008-08-22 16:26 by pascal.bach, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
ia5.py	pascal.bach, 2008-08-22 16:26	File wich implements the python .encode/decode methodes

Messages (8)
msg71755 - (view)	Author: Pascal Bach (pascal.bach)	Date: 2008-08-22 16:26
This encoding is used in the GSM standard it is a 7-bit encoding similar to ASCII. The encoding definition is found in: Short Message Service Centre EMI - UCP Interface 4.6 Specification (p. 79) as well as in: [3GPP 23.038] 3GPP TS 23.038 Alphabets and language-specific information. I think this encoding would be useful for other GSM specific use cases.
msg71771 - (view)	Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) *	Date: 2008-08-22 19:20
The provided file does not work for "EXTENSION" characters: >>> import ia5 >>> u"[a]".encode("ia5") Traceback (most recent call last): File "<stdin>", line 1, in <module> File "ia5.py", line 18, in encode return codecs.charmap_encode(input,errors,encoding_map) TypeError: character mapping must be in range(256) I doubt this can be achieved with just a charmap. You will have to roll your own incremental stateful decoder. Are you willing to do it?
msg71776 - (view)	Author: Pascal Bach (pascal.bach)	Date: 2008-08-22 20:49
Well I have seen the problem. I'm willing to do this to improve python, but I don't know exactly how to do it. I looked at how utf-8 and utf-7 are done but I didn't exactly understand, are they based on C code? Is there an example how this needs to be done? It would be nice if you could get me some help where to start.
msg71803 - (view)	Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) *	Date: 2008-08-23 09:06
You could start with utf_8.py, and of course replace the calls to codecs.utf_8_encode and codecs.utf_8_decode. - your "ia5_encode" follows this interface: http://docs.python.org/dev/library/codecs.html#codecs.Codec.encode - your "ia5_decode" has the signature: def ia5_decode(input, errors='strict', final=False) and returns a tuple (output object, length consumed). See http://docs.python.org/dev/library/codecs.html#codecs.IncrementalDecoder.decode for an explanation of the final parameter; in particular, if the input is a single 0x1B, - it will return ('', 0) if final is False - and raise UnicodeDecodeError("unexpected end of data") if final is True
msg71845 - (view)	Author: Pascal Bach (pascal.bach)	Date: 2008-08-24 17:38
I have looked at utf_8.py and I think I know how to implement the incremental de/encoder. But I don't understand the codecs.register() function. Do I have to provide stateless, stateful and streamwriter at the same time? If I implement IncrementalEncoder and IncrementalDecoder can I just give those two to codecs.register()? Thank you for your help.
msg71887 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2008-08-24 21:52
I don't think this codec should be named IA-5. IA-5 is specified in ITU-T Rec. T.50 (International Alphabet No. 5), recently renamed to "International Reference Alphabet", and it does not specify that the characters 0..31 are printable. Instead, IA5 is identical to ISO 646 (i.e. allowing for national variants), with the International Reference Version of IA5 (e.g. as used in ASN.1 IA5String) is identical to US-ASCII. If GSM uses a modified version of this, it should receive a separate name. If you were looking at section 2 (Structure of EMI messages), what makes you think that this specification calls the encoding "IA5"? In my copy, it says: # Alphanumeric characters are encoded as two numeric IA5 characters, # the higher 3 bits (0..7) first, the lower 4 bits (0..F) thereafter, # according to the following table. So it uses IA5 to hex-encode the encoding. To achieve that, one would have to write text.encode("emi-section-2").encode("hex") [Notice that the "hex" codec already uses IA-5] In any case, I don't think this is general enough to deserve inclusion into the standard library. The codec system is designed to be so flexible to support additional codecs outside the core.
msg71934 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2008-08-25 15:10
I think what you're after is the encoding used in SMS messages: http://en.wikipedia.org/wiki/Short_message_service Here's an old discussion about this codec: http://mail.python.org/pipermail/python-list/2002-October/167267.html http://mail.python.org/pipermail/python-list/2002-October/167271.html Note that nowadays, SMSCs and interface software such as Kannel typically accept UTF-16 data just fine, so the need for such a codec in Python in minimal. I agree with Martin, that the stdlib is not the right place for such a codec. It's easy to write your own codec package and have your application register this package at startup time using codecs.register().
msg71939 - (view)	Author: Pascal Bach (pascal.bach)	Date: 2008-08-25 15:31
I currently use the codec in my ucplib already and this is not a problem. I just thought that it might be useful for somebody else. But maybe it is to use case specific. If this codec is not of general interest I think this report can be closed.

History
Date	User	Action	Args
2022-04-11 14:56:38	admin	set	github: 47899
2008-08-25 15:31:47	pascal.bach	set	messages: + msg71939
2008-08-25 15:11:00	lemburg	set	status: open -> closed nosy: + lemburg resolution: rejected messages: + msg71934
2008-08-24 21:52:17	loewis	set	nosy: + loewis messages: + msg71887
2008-08-24 19:05:14	pitrou	set	priority: normal versions: + Python 3.1, Python 2.7, - Python 2.5
2008-08-24 17:38:11	pascal.bach	set	messages: + msg71845
2008-08-23 09:06:23	amaury.forgeotdarc	set	messages: + msg71803
2008-08-22 20:49:30	pascal.bach	set	messages: + msg71776
2008-08-22 19:20:25	amaury.forgeotdarc	set	nosy: + amaury.forgeotdarc messages: + msg71771
2008-08-22 16:26:46	pascal.bach	create