classification
Title: Use the Grisu algorithms to convert floats to strings
Type: performance Stage:
Components: Interpreter Core Versions: Python 3.3
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: rhettinger Nosy List: Bart Robinson, Florian.Loitsch, amaury.forgeotdarc, djc, eric.smith, mark.dickinson, michael.foord, rhettinger, vstinner
Priority: low Keywords:

Created on 2011-06-30 08:19 by amaury.forgeotdarc, last changed 2017-02-20 14:00 by vstinner. This issue is now closed.

Messages (9)
msg139466 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2011-06-30 08:19
Reported by Michael Ford in msg139402:
http://www.serpentine.com/blog/2011/06/29/here-be-dragons-advances-in-problems-you-didnt-even-know-you-had/
describes a new algorithm for float<->str conversions.

It would be interesting to see if it is better/faster than the current dtoa.
msg139469 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2011-06-30 10:01
I see the problems as:

1. Given Python's other overhead, we'd need to profile to show an improvement in the speed of this conversion would make a noticeable impact on any import workload.

2. If we want to keep the shortest-float-repr property for all possible doubles, we'd need to use Grisu3 but still keep our existing code for the fallback cases. This is a big increase in the complexity of an already complex piece of code.

I'm not saying don't switch to Grisu2 or use Grisu3 with the fallback to existing code: maybe the speed improvements are worth it, maybe we can say we we can live with 99.5% shortest repr coverage, or maybe the complexity is worth it. I just want to record the issues here.
msg139472 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2011-06-30 12:10
FWIW, I don't think this is worth further disruption.  We already have an excellent set of float repr routines.  There is no problem to be solved here.
msg139549 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2011-07-01 07:13
[Amaury]
"It would be interesting to see if it is better/faster than the current dtoa."

I agree that it would be interesting to compare.

[Eric]
"maybe we can say we we can live with 99.5% shortest repr coverage"

Please no!  As you say, we'd still need the fallback code.

For me, a precondition for actually switching (rather than just investigating speed differences) would be that the Grisu3 code has excellent tests.  The dtoa.c code is still pretty hairy in places but is now fairly well tested and inspected;  I'd hesitate to replace that with something less well exercised.
msg139552 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2011-07-01 07:55
I concur with Mark.  This is a very challenging area and no change would be warranted without keeping the current accuracy guarantees, definitive speed improvement, extensive tests, and consideration of the impact on users of having float reprs change yet again.

This was a nice tracker item for improving our awareness of alternate algorithms, but it isn't a credible feature request without the qualities listed above.  

I'll mark this as closed/rejected.  Feel free to reopen if we get some solid evidence that python would be significantly improved enough to warrant the disruption for users and the investment of developer time.
msg139555 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2011-07-01 08:30
Agreed. If some volunteer wants to work on it, I suggest to make an extension module first, so that everybody can try and compare with the current routines.
msg144217 - (view) Author: Florian Loitsch (Florian.Loitsch) Date: 2011-09-17 19:28
FYI: the double-conversion library at http://code.google.com/p/double-conversion already contains code for the fallback case. It would not be necessary to keep Python's existing code just for the 0.5%.
The library is now used by Firefox, Chrome, and Webkit, and should be well tested.
It's biggest deficiency (compared to Gay's dtoa.c) is its specialization to IEEE doubles. Floats or long doubles are not supported. If Python needs to support these types I recommend not to switch.
msg144246 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2011-09-18 17:16
> It's biggest deficiency (compared to Gay's dtoa.c) is its specialization > to IEEE doubles.

We're only using the portion of Gay's code (with some significant modifications at this point) that applies to the IEEE 754 binary64 format, so I don't think this is a concern.
msg288171 - (view) Author: Bart Robinson (Bart Robinson) Date: 2017-02-20 00:25
Six years later, I have accepted Amaury's challenge and created an extension module that uses the double-conversion library to generate repr's for floats: https://pypi.python.org/pypi/frepr

My rudimentary performance testing gives something like 8X speedup compared to the standard float_repr().

Scrutiny and suggestions welcome...
History
Date User Action Args
2017-02-20 14:00:07vstinnersettype: enhancement -> performance
2017-02-20 00:25:29Bart Robinsonsetnosy: + Bart Robinson
messages: + msg288171
2011-09-18 17:16:03mark.dickinsonsetmessages: + msg144246
2011-09-17 19:28:25Florian.Loitschsetnosy: + Florian.Loitsch
messages: + msg144217
2011-07-01 08:30:58amaury.forgeotdarcsetmessages: + msg139555
2011-07-01 07:55:39rhettingersetstatus: open -> closed
resolution: rejected
messages: + msg139552
2011-07-01 07:13:20mark.dickinsonsetmessages: + msg139549
2011-06-30 13:56:28vstinnersetnosy: + vstinner
2011-06-30 12:10:01rhettingersetpriority: normal -> low
assignee: rhettinger
messages: + msg139472

versions: + Python 3.3
2011-06-30 10:01:17eric.smithsetmessages: + msg139469
2011-06-30 09:50:52eric.smithsetnosy: + eric.smith
2011-06-30 08:34:30djcsetnosy: + djc
2011-06-30 08:19:16amaury.forgeotdarccreate