Message 160224 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	sbrunthaler
Recipients	dmalcolm, eric.snow, sbrunthaler, skrah
Date	2012-05-08.20:58:53
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<CA+j1x0k9nb=qYoi9_uaB7taeFLUVyBPJP587V64vM-WgpiNRfA@mail.gmail.com>
In-reply-to	<1336508322.79.0.759981731607.issue14757@psf.upfronthosting.co.za>

Content
> This looks quite impressive, so sorry for immediately jumping in with > criticism. -- I've benchmarked the things I worked on, and I can't see > any speedups but some significant slowdowns. This is on 64-bit Linux > with a Core 2 Duo, both versions compiled with just `./configure && make`: Well, no problem -- I don't actually consider it criticism at all. Build is correct, you could verify the interpreter working adequatly by running the test suite and seeing some tests depending on specific bytecodes fail (test_dis, and test_importlib, AFAIR). I don't have a Core 2 Duo available for testing, though. > Modules/_decimal/tests/bench.py: > -------------------------------- > > Not much change for floats and decimal.py, 8-10% slowdown for _decimal! This result is not unexpected, as I have no inline cached versions of functions using this module. The derivatives I generate work for Long, Float and Complex numbers (plus Unicode strings and some others.) If there is a clear need, of course I can look into that and add these derivatives (as I said, there are still some 40+ opcodes unused.) > Memoryview: > ----------- > > ./python -m timeit -n 10000000 -s "x = memoryview(bytearray(b'x'10000))" "x[:100]" > > 17% (!) slowdown. Hm, the 17% slowdown seems strange to me. However, I don't expect to see any speedups in this case, as there is no repeated execution within the benchmark code that could leverage type feedback via inline caching. You should see most speedups when dealing with for-loops (as FOR_ITER has optimized derivatives), if-statements (COMPARE_OP has optimized derivatives), and mathematical code. In addition there are some optimizations for frequently executed function calls, unpacked sequences, etc. Note: frequent as in how I encountered them, probably this needs adjustments for different use cases. > Did I perhaps miss some option to turn on the optimizations? Does not seem to be the case, but if you could verify running the regression tests we could easily eliminate this scenario. You could verifiy speedups, too, on computer language benchmark game benchmarks, primarily binarytrees, mandelbrot, nbody and spectralnorm, just to see how much you should* gain on your machine. Testing methodology could also make a difference. I use the following: - Linux 3.0.0-17 (Ubuntu) - gcc version 4.6.1 - nice -n -20 to minimize scheduler interference - 30 repetitions per benchmark I hope that helps/explains, regards, --stefan

> This looks quite impressive, so sorry for immediately jumping in with
> criticism. -- I've benchmarked the things I worked on, and I can't see
> any speedups but some significant slowdowns. This is on 64-bit Linux
> with a Core 2 Duo, both versions compiled with just `./configure && make`:

Well, no problem -- I don't actually consider it criticism at all.
Build is correct, you could verify the interpreter working adequatly
by running the test suite and seeing some tests depending on specific
bytecodes fail (test_dis, and test_importlib, AFAIR).

I don't have a Core 2 Duo available for testing, though.

> Modules/_decimal/tests/bench.py:
> --------------------------------
>
> Not much change for floats and decimal.py, 8-10% slowdown for _decimal!

This result is not unexpected, as I have no inline cached versions of
functions using this module. The derivatives I generate work for Long,
Float and Complex numbers (plus Unicode strings and some others.) If
there is a clear need, of course I can look into that and add these
derivatives (as I said, there are still some 40+ opcodes unused.)

> Memoryview:
> -----------
>
> ./python -m timeit -n 10000000 -s "x = memoryview(bytearray(b'x'*10000))" "x[:100]"
>
> 17% (!) slowdown.

Hm, the 17% slowdown seems strange to me. However, I don't expect to
see any speedups in this case, as there is no repeated execution
within the benchmark code that could leverage type feedback via inline
caching.

You should see most speedups when dealing with for-loops (as FOR_ITER
has optimized derivatives), if-statements (COMPARE_OP has optimized
derivatives), and mathematical code. In addition there are some
optimizations for frequently executed function calls, unpacked
sequences, etc. Note: frequent as in how I encountered them, probably
this needs adjustments for different use cases.

> Did I perhaps miss some option to turn on the optimizations?

Does not seem to be the case, but if you could verify running the
regression tests we could easily eliminate this scenario. You could
verifiy speedups, too, on computer language benchmark game benchmarks,
primarily binarytrees, mandelbrot, nbody and spectralnorm, just to see
how much you *should* gain on your machine. Testing methodology could
also make a difference. I use the following:
- Linux 3.0.0-17 (Ubuntu)
- gcc version 4.6.1
- nice -n -20 to minimize scheduler interference
- 30 repetitions per benchmark

I hope that helps/explains,
regards,
--stefan

History
Date	User	Action	Args
2012-05-08 20:58:54	sbrunthaler	set	recipients: + sbrunthaler, skrah, dmalcolm, eric.snow
2012-05-08 20:58:54	sbrunthaler	link	issue14757 messages
2012-05-08 20:58:53	sbrunthaler	create