Message 93059 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	lemburg
Recipients	eric.smith, ezio.melotti, ggenellina, lemburg, loewis, mark.dickinson, pitrou
Date	2009-09-24.08:40:13
SpamBayes Score	0.0
Marked as misclassified	No
Message-id	<4ABB30A0.4050803@egenix.com>
In-reply-to	<4ABA5541.6090207@v.loewis.de>

Content
Martin v. Löwis wrote: > > Martin v. Löwis <martin@v.loewis.de> added the comment: > >> The codec currently doesn't look at the base at all - and shouldn't >> need to: >> >> It simply converts input characters that have a decimal digit value >> associated with them, to the usual ASCII digits in preparation >> for parsing them using the standard number parsing tools we have in >> Python. > > Right. And as such, it shouldn't stop with digit 9, but continue into > digits a, b, c, and so on, as appropriate. I don't think that's needed. The codec already passes those through as-is. >> This is to support number representations using non-ASCII code >> points for digits (e.g. Japanese or Sanskrit numbers) > > Notice that it also supports bases other than 10: > > 80 > > So calling it "decimal" is a misnomer. Not really: _PyUnicode_ToDecimalDigit() is used for the conversion and that API explicitly only returns integer values for code points that map to the digits 0-9 - at least that's how it was originally written (see the code in Python 1.6 which makes this explicit). If it returns values outside that range, that's a bug and needs to be fixed, since it would cause the codec to fail. It is designed to only work on digits, not arbitrary decimals. >> Also note that we already have a hex codec in Python 2.x >> which converts between the hex representations of a string >> and its regular form. This was removed in 3.x for some reason >> I don't understand (probably just an oversight). > > The hex codec doesn't have to do anything with number conversions; > nor does it have to do with character encodings. To introduce it was > a mistake in Python 2.x which has been fixed in 3.x (by removing > it and other similar "codecs", such as rot13). That's your particular view of things. It's not mine and never was the basis of the codec design. Codecs in Python are open to work on arbitrary types and it's well possible to have codecs that return the same type as their input. The hex codec in Python 2.x is a very useful and handy codec and it's used a lot. It should be added back again - after all, even by your restrictive view of codecs in Python only serving as a way to do character encodings, it is a valid character encoding - that of Latin-1 code points to a two-byte HEX representation and vice-versa. Just like rot-13 and most of the others that were apparently removed (uu, base64, quoted-printable, zip, bz2). BTW: I noticed that idna and punycode were not removed... even though they fall into the same category as the hex codec. I guess we should have a discussion about this on python-dev.

Martin v. Löwis wrote:
> 
> Martin v. Löwis <martin@v.loewis.de> added the comment:
> 
>> The codec currently doesn't look at the base at all - and shouldn't
>> need to:
>>
>> It simply converts input characters that have a decimal digit value
>> associated with them, to the usual ASCII digits in preparation
>> for parsing them using the standard number parsing tools we have in
>> Python.
> 
> Right. And as such, it shouldn't stop with digit 9, but continue into
> digits a, b, c, and so on, as appropriate.

I don't think that's needed. The codec already passes those
through as-is.

>> This is to support number representations using non-ASCII code
>> points for digits (e.g. Japanese or Sanskrit numbers)
> 
> Notice that it also supports bases other than 10:
> 
> 80
> 
> So calling it "decimal" is a misnomer.

Not really: _PyUnicode_ToDecimalDigit() is used for the
conversion and that API explicitly only returns integer
values for code points that map to the digits 0-9 - at
least that's how it was originally written (see the code
in Python 1.6 which makes this explicit).

If it returns values outside that range, that's a bug
and needs to be fixed, since it would cause the codec
to fail. It is designed to only work on digits, not
arbitrary decimals.

>> Also note that we already have a hex codec in Python 2.x
>> which converts between the hex representations of a string
>> and its regular form. This was removed in 3.x for some reason
>> I don't understand (probably just an oversight).
> 
> The hex codec doesn't have to do anything with number conversions;
> nor does it have to do with character encodings. To introduce it was
> a mistake in Python 2.x which has been fixed in 3.x (by removing
> it and other similar "codecs", such as rot13).

That's your particular view of things. It's not mine and never
was the basis of the codec design.

Codecs in Python are open to work on arbitrary types and
it's well possible to have codecs that return the same type
as their input.

The hex codec in Python 2.x is a very useful and handy
codec and it's used a lot.

It should be added back again - after all, even by your
restrictive view of codecs in Python only serving as a way to
do character encodings, it is a valid character encoding -
that of Latin-1 code points to a two-byte HEX representation
and vice-versa.

Just like rot-13 and most of the others that were apparently
removed (uu, base64, quoted-printable, zip, bz2).

BTW: I noticed that idna and punycode were not removed...
even though they fall into the same category as the hex
codec.

I guess we should have a discussion about this on python-dev.

History
Date	User	Action	Args
2009-09-24 08:40:15	lemburg	set	recipients: + lemburg, loewis, mark.dickinson, ggenellina, pitrou, eric.smith, ezio.melotti
2009-09-24 08:40:14	lemburg	link	issue6632 messages
2009-09-24 08:40:13	lemburg	create