Message 259607 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	pitrou
Recipients	Yury.Selivanov, casevh, josh.r, lemburg, mark.dickinson, pitrou, rhettinger, serhiy.storchaka, skrah, vstinner, yselivanov, zbyrne
Date	2016-02-05.01:05:59
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<56B3F572.7070003@free.fr>
In-reply-to	<1454630977.91.0.881600344597.issue21955@psf.upfronthosting.co.za>

Content
Hi Yury, > I'm not sure how to respond to that. Every performance aspect is > important. Performance is not a religion (not any more than security or any other matter). It is not helpful to brandish results on benchmarks which have no relevance to real-world applications. It helps to define what we should achieve and why we want to achieve it. Once you start asking "why", the prospect of speeding up FP computations in the eval loop starts becoming dubious. > numpy isn't shipped with CPython, not everyone uses it. That's not the point. People doing FP-heavy computations should use Numpy or any of the packages that can make FP-heavy computations faster (Numba, Cython, Pythran, etc.). You should use the right tool for the job. There is no need to micro-optimize a hammer for driving screws when you could use a screwdriver instead. Lists or tuples of Python float objects are an awful representation for what should be vectorized native data. They eat more memory in addition to being massively slower (they will also be slower to serialize from/to disk, etc.). "Not using" Numpy when you would benefit from it is silly. Numpy is not only massively faster on array-wide tasks, it also makes it easier to write high-level, readable, reusable code instead of writing loops and iterating by hand. Because it has been designed explicitly for such use cases (which the Python core was not, despite the existence of the colorsys module ;-)). It also gives you access to a large ecosystem of third-party modules implementing various domain-specific operations, actively maintained by experts in the field. Really, the mindset of "people shouldn't need to use Numpy, they can do FP computations in the interpreter loop" is counter-productive. I understand that it's seductive to think that Python core should stand on its own, but it's also a dangerous fallacy. You should advocate people use Numpy for FP computations. It's an excellent library, and it's currently a major selling point for Python. Anyone doing FP-heavy computations with Python should learn to use Numpy, even if they only use it from time to time. Downplaying its importance, and pretending core Python is sufficient, is not helpful. > It also harms Python 3 adoption a little bit, since many benchmarks > are still slower. Some of them are FP related. The Python 3 migration is happening already. There is no need to worry about it... Even the diehard 3.x haters have stopped talking of releasing a 2.8 ;-) > In any case, I think that if we can optimize something - we should. That's not true. Some optimizations add maintenance overhead for no real benefit. Some may even hinder performance as they add conditional branches in a critical path (increasing the load on the CPU's branch predictors and making them potentially less efficient). Some optimizations are obviously good, like the method call optimization which caters to real-world use cases (and, by the way, kudos for that... you are doing much better than all previous attempts ;-)). But some are solutions waiting for a problem to solve.

Hi Yury,

> I'm not sure how to respond to that. Every performance aspect *is*
> important.

Performance is not a religion (not any more than security or any other
matter).  It is not helpful to brandish results on benchmarks which have
no relevance to real-world applications.

It helps to define what we should achieve and why we want to achieve it.
 Once you start asking "why", the prospect of speeding up FP
computations in the eval loop starts becoming dubious.

> numpy isn't shipped with CPython, not everyone uses it.

That's not the point. *People doing FP-heavy computations* should use
Numpy or any of the packages that can make FP-heavy computations faster
(Numba, Cython, Pythran, etc.).

You should use the right tool for the job.  There is no need to
micro-optimize a hammer for driving screws when you could use a
screwdriver instead.  Lists or tuples of Python float objects are an
awful representation for what should be vectorized native data.  They
eat more memory in addition to being massively slower (they will also be
slower to serialize from/to disk, etc.).

"Not using" Numpy when you would benefit from it is silly.
Numpy is not only massively faster on array-wide tasks, it also makes it
easier to write high-level, readable, reusable code instead of writing
loops and iterating by hand.  Because it has been designed explicitly
for such use cases (which the Python core was not, despite the existence
of the colorsys module ;-)).  It also gives you access to a large
ecosystem of third-party modules implementing various domain-specific
operations, actively maintained by experts in the field.

Really, the mindset of "people shouldn't need to use Numpy, they can do
FP computations in the interpreter loop" is counter-productive.  I
understand that it's seductive to think that Python core should stand on
its own, but it's also a dangerous fallacy.

You *should* advocate people use Numpy for FP computations.  It's an
excellent library, and it's currently a major selling point for Python.
Anyone doing FP-heavy computations with Python should learn to use
Numpy, even if they only use it from time to time.  Downplaying its
importance, and pretending core Python is sufficient, is not helpful.

> It also harms Python 3 adoption a little bit, since many benchmarks
> are still slower. Some of them are FP related.

The Python 3 migration is happening already. There is no need to worry
about it... Even the diehard 3.x haters have stopped talking of
releasing a 2.8 ;-)

> In any case, I think that if we can optimize something - we should.

That's not true. Some optimizations add maintenance overhead for no real
benefit. Some may even hinder performance as they add conditional
branches in a critical path (increasing the load on the CPU's branch
predictors and making them potentially less efficient).

Some optimizations are obviously good, like the method call optimization
which caters to real-world use cases (and, by the way, kudos for that...
you are doing much better than all previous attempts ;-)). But some are
solutions waiting for a problem to solve.

History
Date	User	Action	Args
2016-02-05 01:06:01	pitrou	set	recipients: + pitrou, lemburg, rhettinger, mark.dickinson, vstinner, casevh, skrah, Yury.Selivanov, serhiy.storchaka, yselivanov, josh.r, zbyrne
2016-02-05 01:06:01	pitrou	link	issue21955 messages
2016-02-05 01:05:59	pitrou	create