Author mark.dickinson
Recipients belopolsky, eric.smith, ezio.melotti, lemburg, mark.dickinson, skrah
Date 2010-12-04.20:11:53
SpamBayes Score 7.62975e-11
Marked as misclassified No
Message-id <1291493516.72.0.216748407371.issue10557@psf.upfronthosting.co.za>
In-reply-to
Content
> What do you think about adding number parsers that operate directly on
> Py_UNICODE* strings?

I think that might make some sense.  It's not without difficulties, though.  One issue is that we'd still need the char* -> double operations, partly because PyOS_string_to_double is part of the public API, and partly to continue to support creation of a float from a bytes instance.

The other issue is that for floats, it's difficult to separate the parser from the base conversion;  to be useful, we'd probably end up making the whole of dtoa.c Py_UNICODE aware.  (One of the return values from the dtoa.c parser is a pointer to the significant digits in the original input string;  so the base-conversion calculation itself needs access to portions of the original string.)

Ideally, for float(string), we'd have a zero-copy setup that operated directly on the unicode input (read-only);  but I think that achieving that right now is going to be messy, and involve dtoa.c knowing far more about Unicode that I'd be comfortable with.

N.B. If we didn't have to deal with alternative digits, it *really* would be much simpler.

Perhaps a compromise option is available, that does a preliminary pass on the Unicode string and only makes a copy if non-European digits are discovered.
History
Date User Action Args
2010-12-04 20:11:56mark.dickinsonsetrecipients: + mark.dickinson, lemburg, belopolsky, eric.smith, ezio.melotti, skrah
2010-12-04 20:11:56mark.dickinsonsetmessageid: <1291493516.72.0.216748407371.issue10557@psf.upfronthosting.co.za>
2010-12-04 20:11:53mark.dickinsonlinkissue10557 messages
2010-12-04 20:11:53mark.dickinsoncreate