Message 93004 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	lemburg
Recipients	eric.smith, ezio.melotti, ggenellina, lemburg, loewis, mark.dickinson, pitrou
Date	2009-09-22.16:53:06
SpamBayes Score	2.646689e-10
Marked as misclassified	No
Message-id	<4AB90123.7050500@egenix.com>
In-reply-to	<4AB7DD06.2040800@v.loewis.de>

Content
Martin v. Löwis wrote: > > Martin v. Löwis <martin@v.loewis.de> added the comment: > >> int()/float() use the decimal codec for numbers - this only supports >> base-10 numbers. For hex numbers, we'd need a new hex codec (only >> the encoder part, actually), otherwise, int('a') would start to return >> 10. > > That's not true. PyUnicode_EncodeDecimal could happily accept hexdigits, > and int(u'a') would still be rejected. In fact, PyUnicode_EncodeDecimal > already accepts arbitrary Latin-1 characters, whether they represent > digits or not. I suppose this is to support non-decimal bases, so it > would only be consequential to widen this to all characters that > reasonably have the Hex_Digit property (although I'm unsure which ones > are excluded at the moment). The codec currently doesn't look at the base at all - and shouldn't need to: It simply converts input characters that have a decimal digit value associated with them, to the usual ASCII digits in preparation for parsing them using the standard number parsing tools we have in Python. This is to support number representations using non-ASCII code points for digits (e.g. Japanese or Sanskrit numbers) http://sp.cis.iwate-u.ac.jp/sp/lessonj/doc/numbers.html http://veda.wikidot.com/sanskrit-numbers All other Latin-1 characters are passed through as-is, so you can already use the codec to e.g. prepare parsing of hex values. Also note that we already have a hex codec in Python 2.x which converts between the hex representations of a string and its regular form. This was removed in 3.x for some reason I don't understand (probably just an oversight).

Martin v. Löwis wrote:
> 
> Martin v. Löwis <martin@v.loewis.de> added the comment:
> 
>> int()/float() use the decimal codec for numbers - this only supports
>> base-10 numbers. For hex numbers, we'd need a new hex codec (only
>> the encoder part, actually), otherwise, int('a') would start to return
>> 10.
> 
> That's not true. PyUnicode_EncodeDecimal could happily accept hexdigits,
> and int(u'a') would still be rejected. In fact, PyUnicode_EncodeDecimal
> *already* accepts arbitrary Latin-1 characters, whether they represent
> digits or not. I suppose this is to support non-decimal bases, so it
> would only be consequential to widen this to all characters that
> reasonably have the Hex_Digit property (although I'm unsure which ones
> are excluded at the moment).

The codec currently doesn't look at the base at all - and shouldn't
need to:

It simply converts input characters that have a decimal digit value
associated with them, to the usual ASCII digits in preparation
for parsing them using the standard number parsing tools we have in
Python.

This is to support number representations using non-ASCII code
points for digits (e.g. Japanese or Sanskrit numbers)

http://sp.cis.iwate-u.ac.jp/sp/lessonj/doc/numbers.html
http://veda.wikidot.com/sanskrit-numbers

All other Latin-1 characters are passed through as-is, so you
can already use the codec to e.g. prepare parsing of hex
values.

Also note that we already have a hex codec in Python 2.x
which converts between the hex representations of a string
and its regular form. This was removed in 3.x for some reason
I don't understand (probably just an oversight).

History
Date	User	Action	Args
2009-09-22 16:53:08	lemburg	set	recipients: + lemburg, loewis, mark.dickinson, ggenellina, pitrou, eric.smith, ezio.melotti
2009-09-22 16:53:06	lemburg	link	issue6632 messages
2009-09-22 16:53:06	lemburg	create