This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author rhettinger
Recipients mark.dickinson, rhettinger, serhiy.storchaka, tim.peters
Date 2018-08-10.21:51:16
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1533937876.22.0.56676864532.issue34376@psf.upfronthosting.co.za>
In-reply-to
Content
Here's a little more performance data that might suggest where possible speed optimizations may lay (I was mostly going for accuracy improvements in this patch).

On my 2.6GHz (3.6Ghz burst) Haswell, the hypot() function for n arguments takes about 11*n+60 ns per call.

The 60 ns fixed portion goes to function call overhead, manipulating native Python objects scattered all over memory, Inf/NaN handling, and in the external calls to __PyArg_ParseStack(), PyObject_Malloc(), PyFloat_AsDouble(), PyObject_Free(), and PyFloat_FromDouble().

The inlined summation routine accesses native C doubles in consecutive memory addresses.  Per Agner Fog's instruction timing tables, the DIVSD takes 10-13 cycles which is about 3 ns, the MULSD takes 5 cycles which is about 2ns, and ADDSD/SUBSD each have a 3 cycle latency for another 1 ns each.  That accounts for most of the 11 ns per argument variable portion of the running time.
History
Date User Action Args
2018-08-10 21:51:16rhettingersetrecipients: + rhettinger, tim.peters, mark.dickinson, serhiy.storchaka
2018-08-10 21:51:16rhettingersetmessageid: <1533937876.22.0.56676864532.issue34376@psf.upfronthosting.co.za>
2018-08-10 21:51:16rhettingerlinkissue34376 messages
2018-08-10 21:51:16rhettingercreate