Author vstinner
Recipients Yury.Selivanov, casevh, josh.r, lemburg, mark.dickinson, pitrou, rhettinger, serhiy.storchaka, skrah, vstinner, yselivanov, zbyrne
Date 2016-02-05.22:37:27
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1454711847.9.0.736666080255.issue21955@psf.upfronthosting.co.za>
In-reply-to
Content
Serhiy Storchaka: "My patches were just samples. I'm glad that Yury incorporated the main idea and that this helps."

Oh, if even Serhiy prefers Yury's patches, I should read them again :-)

--

I read fastint5.patch one more time and I finally understood the following macros:

+#define NB_SLOT(slot) offsetof(PyNumberMethods, slot)
+#define NB_BINOP(nb_methods, slot) \
+    (*(binaryfunc*)(& ((char*)nb_methods)[NB_SLOT(slot)]))
+#define PY_LONG_CALL_BINOP(slot, left, right) \
+    (NB_BINOP(PyLong_Type.tp_as_number, slot))(left, right)
+#define PY_FLOAT_CALL_BINOP(slot, left, right) \
+    (NB_BINOP(PyFloat_Type.tp_as_number, slot))(left, right)

In short, a+b calls long_add(a, b) with that. At the first read, I understood that it casted objects to C long or C double (don't ask me why).


I see a difference between fastint5.patch and fastintfloat_alt.patch: fastint5.patch resolves the address of long_add() at runtime, whereas fastintfloat_alt.patch gets a direct pointer to _PyLong_Add() at the compilation. I expected a sublte speedup, but I'm unable to see it on benchmarks (again, both patches have the same speed).

The float path is simpler in fastint5.patch because it uses the same code if right is float or long, but it adds more checks for the slow-path. No patch looks to have a real impact on the slow-path. Is it worth to change the second if to PyFloat_CheckExact() and then check type of right in the if body to avoid other checks on the slow-path?

(C checks look very cheap, so I think that I already replied to my own question :-))

--

fastint5.patch optimizes a+b, a-b, a*b, a/b and a//b. Why not other operators? List of operators from my constant folding optimzation in fatoptimizer:

* int, float: a+b, a-b, a*b, a/b, +x, -x, ~x, a//b, a%b, a**b
* int only: a<<b, a>>b, a&b, a|b, a^b

If we optimize a//b, I suggest to also optimize a%b to be consistent. For integers, a**b, a<<b and a>>b would make sense too. Coming from the C language, I would prefer a<<b and a>>b than a*2**k or a//2**k, since I expect better performance.

For float, -x and +x may be common, but less a+b, a-b, a*b, a/b.

Well, what I'm trying to say: if choose fastintfloat_alt.patch design, we will have to expose like a lot of new C functions in headers, and duplicate a lot of code.

To support more than 4 operators, we need a macro.

If we use a macro, it's cheap (in term of code maintenance) to use it for most or even all operators.

--

> But I don't quite understand why it adds any gain. Is this just due to overhead of calling PyNumber_Add?

Hum, that's a good question.


> Then we should test with other compilers and with the LTO option.

There are projects (I don't recall the number number) but I would prefer to handle LTO separatly. Python supports platforms and compilers which don't implement LTO.


> fastint5.patch adds an overhead for type checks and increases the size of ceval loop. What is outweigh this overhead?

I stopped to guess the speedup just by reading the code or a patch. I only trust benchmarks :-)

Advice: don't trust yourself! only trust benchmarks.
History
Date User Action Args
2016-02-05 22:37:27vstinnersetrecipients: + vstinner, lemburg, rhettinger, mark.dickinson, pitrou, casevh, skrah, Yury.Selivanov, serhiy.storchaka, yselivanov, josh.r, zbyrne
2016-02-05 22:37:27vstinnersetmessageid: <1454711847.9.0.736666080255.issue21955@psf.upfronthosting.co.za>
2016-02-05 22:37:27vstinnerlinkissue21955 messages
2016-02-05 22:37:27vstinnercreate