Author vstinner
Recipients gvanrossum, mark.dickinson, methane, ned.deily, skrah, vstinner, yselivanov
Date 2018-01-26.23:11:00
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <>
Since the root of the discussion is a performance regression, let me take a look since I also care of not regressing in term of performance. We (CPython core developers, as as team) spent a lot of time on optimizing CPython to make benchmarks like telco faster at each release. The good news is that Python 3.7 *is* faster than Python 3.6 on telco. If I recall correctly, it's not because of recent optimizations in the decimal module, but more general changes like CALL_METHOD optimization!

Python master vs 3.6 (normalized on 3.6):

Graph of telco performance on master since April 2014 to January 2017:

20.2 ms => 14.1 ms, well done!

If you are curious of reasons why Python became faster, see my documentation:

Or even my talk at Pycon 2017:

Sorry, I moved off topic. Let's move back to this measuring the performance of this issue...


I rewrote using my perf module to use CPU pinning (on my isolated CPUs), automatic calibration of the number of loops, ignore the first "warmup" value, spawn 20 processes, compute the average and standard deviation, etc. => see attached

Results on my laptop with 2 physical cores isolated for best benchmark stability (*):

vstinner@apu$ ./python -m perf compare_to master.json pr5278.json 
Mean +- std dev: [master] 1.86 us +- 0.03 us -> [pr5278] 2.27 us +- 0.04 us: 1.22x slower (+22%)

Note: master is the commit 29a7df78277447cf6b898dfa0b1b42f8da7abc0c and I rebased PR 5278 on top on this commit.


This is obvious the *worst* case: a *micro* benchmark using local contexts and modifying this local context. In this case, I understand that this microbenchmark basically measures the overhead of contextvars on modying a context.

The question here is if the bottleneck of applications using decimal is the code modifying the context or the code computing numbers (a+b, a*b, a/b, etc.).

Except for a few small projects, I rarely use decimal, so I'm unable to judge that.

But just to add my 2 cents, I never used "with localcontext()", I don't see the point of this tool in my short applications. I prefer to modify directly the current context (getcontext()), and only modify this context *once*, at startup. For example, set the rounding mode and set the precision, and that's all.


The Python benchmark suite does have a benchmark dedicated to the decimal module:

I ran this benchmark on PR 5278:

vstinner@apu$ ./python -m perf compare_to telco_master.json telco_pr5278.json 
Benchmark hidden because not significant (1): telco

... not significant. Honestly, I'm not surprised at all:

* telco benchmark doesn't modify the context in the hot code, only *outside* the benchmark
* telco likely spends most of its runtime in computing numbers (sum += x; d = d.quantize(...), etc.) and converting Decimal to string. I don't know the decimal module, but I guess that it requires to *read* the current context. But getting the context is likely efficient and not significant compared to the cost of other operations.

FYI timings can be seen in verbose mode:

vstinner@apu$ ./python -m perf compare_to telco_master.json telco_pr5278.json -v
Mean +- std dev: [telco_master] 10.7 ms +- 0.4 ms -> [telco_pr5278] 10.7 ms +- 0.4 ms: 1.00x faster (-0%)
Not significant!
Date User Action Args
2018-01-26 23:11:00vstinnersetrecipients: + vstinner, gvanrossum, mark.dickinson, ned.deily, methane, skrah, yselivanov
2018-01-26 23:11:00vstinnersetmessageid: <>
2018-01-26 23:11:00vstinnerlinkissue32630 messages
2018-01-26 23:11:00vstinnercreate