Message 123398 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	mark.dickinson
Recipients	belopolsky, eric.smith, ezio.melotti, lemburg, mark.dickinson, skrah
Date	2010-12-04.20:11:53
SpamBayes Score	7.629747e-11
Marked as misclassified	No
Message-id	<1291493516.72.0.216748407371.issue10557@psf.upfronthosting.co.za>
In-reply-to

Content
> What do you think about adding number parsers that operate directly on > Py_UNICODE* strings? I think that might make some sense. It's not without difficulties, though. One issue is that we'd still need the char* -> double operations, partly because PyOS_string_to_double is part of the public API, and partly to continue to support creation of a float from a bytes instance. The other issue is that for floats, it's difficult to separate the parser from the base conversion; to be useful, we'd probably end up making the whole of dtoa.c Py_UNICODE aware. (One of the return values from the dtoa.c parser is a pointer to the significant digits in the original input string; so the base-conversion calculation itself needs access to portions of the original string.) Ideally, for float(string), we'd have a zero-copy setup that operated directly on the unicode input (read-only); but I think that achieving that right now is going to be messy, and involve dtoa.c knowing far more about Unicode that I'd be comfortable with. N.B. If we didn't have to deal with alternative digits, it really would be much simpler. Perhaps a compromise option is available, that does a preliminary pass on the Unicode string and only makes a copy if non-European digits are discovered.

> What do you think about adding number parsers that operate directly on
> Py_UNICODE* strings?

I think that might make some sense.  It's not without difficulties, though.  One issue is that we'd still need the char* -> double operations, partly because PyOS_string_to_double is part of the public API, and partly to continue to support creation of a float from a bytes instance.

The other issue is that for floats, it's difficult to separate the parser from the base conversion;  to be useful, we'd probably end up making the whole of dtoa.c Py_UNICODE aware.  (One of the return values from the dtoa.c parser is a pointer to the significant digits in the original input string;  so the base-conversion calculation itself needs access to portions of the original string.)

Ideally, for float(string), we'd have a zero-copy setup that operated directly on the unicode input (read-only);  but I think that achieving that right now is going to be messy, and involve dtoa.c knowing far more about Unicode that I'd be comfortable with.

N.B. If we didn't have to deal with alternative digits, it *really* would be much simpler.

Perhaps a compromise option is available, that does a preliminary pass on the Unicode string and only makes a copy if non-European digits are discovered.

History
Date	User	Action	Args
2010-12-04 20:11:56	mark.dickinson	set	recipients: + mark.dickinson, lemburg, belopolsky, eric.smith, ezio.melotti, skrah
2010-12-04 20:11:56	mark.dickinson	set	messageid: <1291493516.72.0.216748407371.issue10557@psf.upfronthosting.co.za>
2010-12-04 20:11:53	mark.dickinson	link	issue10557 messages
2010-12-04 20:11:53	mark.dickinson	create