Issue 41310: micro-optimization: increase our float parsing speed by Nx

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/85482

classification

Title:	micro-optimization: increase our float parsing speed by Nx
Type:	enhancement	Stage:	needs patch
Components:	Interpreter Core	Versions:	Python 3.10

process

Status:	open	Resolution:
Dependencies:		Superseder:
Assigned To:		Nosy List:	eric.smith, gregory.p.smith, mark.dickinson, pitrou, rhettinger, serhiy.storchaka, tim.peters
Priority:	low	Keywords:

Created on 2020-07-15 22:21 by gregory.p.smith, last changed 2022-04-11 14:59 by admin.

Messages (6)
msg373730 - (view)	Author: Gregory P. Smith (gregory.p.smith) *	Date: 2020-07-15 22:21
See https://lemire.me/blog/2020/03/10/fast-float-parsing-in-practice/ for inspiration and a reference (possibly a library to use, but if not the techniques still apply). Primarily usable when creating the float objects from the string data as is common in python code and particularly common in JSON serialized data.
msg373743 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2020-07-16 05:57
CPython is written on C, not C++, so we cannot use std::from_chars. Note also that to parse a number in JSON we need first to scan the PyUnicode object character-by-character using PyUnicode_READ() which is slower than just reading a byte from a memory, then convert a substring to a nul-terminated bytes string, and finally use the conversion function which creates a new PyFloat or PyLong objects. The overhead of scanning a string and memory management may be larger than the parsing time itself.
msg373760 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2020-07-16 18:50
Is there a description of the algorithms or ideas used anywhere? The blog post you link to has no details, and I don't see any useful descriptions in the GitHub README or the source; trawling through the source to figure out what the key ideas are is not ideal. I don't find the tests particularly convincing, either. I think this is too immature to consider for CPython, right now. Maybe it would make sense to revisit some day in the future if/when the library is more mature.
msg373769 - (view)	Author: Tim Peters (tim.peters) *	Date: 2020-07-16 20:38
Gregory, care to take their code and time it against Python? I'm not inclined to: reading the comments in the code, they're trying "fast paths" already described in papers by Clinger and - later - by Gay. When those fast paths don't apply, they fall back on the system `strtod`. But Python's current implementation builds on Gay's fastest code (and never falls back to the system). It's possible they "improved" on Gay by building relatively giant tables of static data.
msg373775 - (view)	Author: Eric V. Smith (eric.smith) *	Date: 2020-07-16 22:00
I don't think we'd want to fall back to strtod, but rather to the code we already use (Gay's). So this would increase our maintenance burden. I'm also not convinced that we actually spend a lot of time in this code, but of course profiling would be needed to make sure.
msg378969 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2020-10-19 17:27
FTR, no formal description of the algorithm seems to have been published, but the source code is amply commented: https://github.com/lemire/fast_double_parser/blob/master/include/fast_double_parser.h

History
Date	User	Action	Args
2022-04-11 14:59:33	admin	set	github: 85482
2020-10-19 17:27:20	pitrou	set	priority: normal -> low
2020-10-19 17:27:11	pitrou	set	nosy: + pitrou messages: + msg378969
2020-10-19 17:20:42	pitrou	link	issue42081 superseder
2020-07-16 22:00:57	eric.smith	set	messages: + msg373775
2020-07-16 20:38:02	tim.peters	set	messages: + msg373769
2020-07-16 18:50:06	mark.dickinson	set	messages: + msg373760
2020-07-16 05:57:27	serhiy.storchaka	set	nosy: + serhiy.storchaka messages: + msg373743
2020-07-16 05:16:39	rhettinger	set	nosy: + tim.peters, rhettinger
2020-07-15 23:09:53	eric.smith	set	nosy: + mark.dickinson, eric.smith
2020-07-15 22:21:55	gregory.p.smith	create