classification
Title: Fast decimalisation and conversion to other bases
Type: enhancement Stage: resolved
Components: Documentation Versions: Python 3.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: skrah Nosy List: cheryl.sabella, docs@python, jneb, mark.dickinson, rhettinger, serhiy.storchaka, skrah, vstinner
Priority: normal Keywords: patch

Created on 2016-02-01 10:28 by jneb, last changed 2019-02-02 14:49 by skrah. This issue is now closed.

Files
File name Uploaded Description Edit
fastStr.py jneb, 2016-02-01 10:28 Fast conversion of integers to string (in any number base)
Pull Requests
URL Status Linked Edit
PR 4808 merged cheryl.sabella, 2017-12-12 01:41
PR 11736 merged miss-islington, 2019-02-02 14:40
PR 11736 merged miss-islington, 2019-02-02 14:40
PR 11736 merged miss-islington, 2019-02-02 14:40
Messages (17)
msg259317 - (view) Author: Jurjen N.E. Bos (jneb) * Date: 2016-02-01 10:28
Inspired by the recently new discovered 49th Mersenne prime number I wrote a module to do high speed long/int to decimal conversion. I can now output the new Mersenne number in 18.5 minutes (instead of several hours) on my machine.
For numbers longer than about 100000 bits, this routine it is faster than str(number) thanks to the Karatsuba multiplication in CPython.
The module supports all number bases 2 through 36, and is written in pure python (both 2 and 3).
There is a simple way to save more time by reusing the conversion object (saving about half the time for later calls).
My suggestion is to incorporate this into some library, since Python still lacks a routine to convert to any number base. Ideally, it could be incorporated in the builtin str function, but this would need more work. 
When converting to C, it is recommended to optimize bases 4 and 32 the same way as oct, hex and bin do (which isn't easily accessible from Python).

Hope you like it. At least, it was a lot of fun to write...

Hope you like it.
msg259324 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2016-02-01 14:02
Very nice. I'd suggest posting it as a recipe or as a Python package on PyPI for others to use. It's not really something that it would be appropriate to add to the Python core: Python isn't really intended for super-efficient high-precision arithmetic, and there are external libraries for that sort of thing (like gmpy2, for example).
msg259326 - (view) Author: Jurjen N.E. Bos (jneb) * Date: 2016-02-01 14:20
Thanks for the tip. I'll consider making it a recipe.
- Jurjen
msg259402 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-02-02 16:56
Is this the same algorithm as in issue3451?
msg259408 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2016-02-02 18:07
Not really the same, since that issue was more about fast division. But it's the same spirit. It looks like I was more enthusiastic back then; now I'm older and grumpier. It's great to have this stuff, but I don't think it belongs in core Python: I'd much rather that the core Python integer implementation remain simple, portable and low-maintenance, and work well for the domain it's intended for: small-to-medium size integers. Cryptographic-size integers with a few hundred to a few thousand digits should be fine; number-theoretic work requiring hundreds of thousands of digits should use gmpy2 or something similar.
msg259421 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2016-02-02 19:10
Jurjen, this is very nice! -- Like Mark, I'm not sure if this should be in CPython.


Decimal (Python >= 3.3) has sneaked in a couple of fast bignum
algorithms, so calculating and converting the latest Mersenne
prime takes a couple of seconds:

from decimal import *

c = getcontext()
c.prec = MAX_PREC
c.Emax = MAX_EMAX
c.Emin = MIN_EMIN

x = Decimal(2)**74207281 - 1
s = str(x)
assert s[:12] == "300376418084"
assert s[-12:] == "391086436351"
msg259465 - (view) Author: Jurjen N.E. Bos (jneb) * Date: 2016-02-03 09:42
OMG is decimal that fast?
Maybe I should change the issue then to "documentation missing": it nowhere says in the documentation that decimal has optimized multiprecision computations. It only says that precision "can be as large as needed for a given problem", but I never realized that that included millions of digits.
It computed to my complete surprise 2**2**29 to full precision (161 million decimal digits) in about a minute. That smells suspiciously like FFT style multiplication, which implies that it is way more sophisticated than integer multiplication!
I suggest that the documentation of the decimal module recommends using decimal for multiprecision computations, as long as you use the builtin version.
msg259466 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-02-03 09:51
"It's great to have this stuff, but I don't think it belongs in core Python: I'd much rather that the core Python integer implementation remain simple, portable and low-maintenance, and work well for the domain it's intended for: small-to-medium size integers."

Yeah I agree. Maybe we need to explain that somewhere? In the devguide? Even in Python doc?

I know that they are super crazy^W fast algorithm to handle very large numbers (hum, what is large? more than 4096 bits?), but they are usually very complex.

Projects like GMP have ultra fast code with optimizations in assemblers. Having assembler implementation is clearly out of the scope of Python.


"Maybe I should change the issue then to "documentation missing": it nowhere says in the documentation that decimal has optimized multiprecision computations."

Well, nowhere means:
https://docs.python.org/dev/whatsnew/3.3.html#decimal

Usually, we don't document performances of a function, maybe only its complexity.

--

You may move to the numpy community which is problably more keen on such optimization than the Python *core*.
msg259467 - (view) Author: Jurjen N.E. Bos (jneb) * Date: 2016-02-03 10:00
That reference you gave says that the binary version is faster than the Python version, but here the _complexity_ actually changed by a lot.
Only people who know the library by name will notice that it is this fast.
But you are right, for 99% of the people it doesn't make much of a difference.
msg259471 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2016-02-03 10:41
On Wed, Feb 03, 2016 at 09:51:53AM +0000, STINNER Victor wrote:
> Well, nowhere means:
> https://docs.python.org/dev/whatsnew/3.3.html#decimal

Okay, but hardly anyone reads that, and I can't blame them. :)

For example, if I use something like Lua, I won't read a release announcement
from 2012.

> Usually, we don't document performances of a function, maybe only its complexity.

As Jurjen said, the presence of the FFT (actually fast number theoretic
transform here) in Decimal is something completely unexpected, so if there's
a place for a line or two in the docs that makes some users happy I'd be
in favor of it.
msg300769 - (view) Author: Cheryl Sabella (cheryl.sabella) * (Python committer) Date: 2017-08-23 22:39
Hello,

I came across this issue and was wondering if the FAQ section of the doc might be a good place to mention the presence of FFT, or actually fast number theoretic transform, as Stefan pointed out.

I was also wondering if the short code snippet for the Mersenne prime might be included in the recipes as an example of what decimal is capable of?
msg300770 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2017-08-24 00:43
A couple of thoughts:

* It is okay to add a couple lines to the decimal module docs noting a Cpython specific implementation detail that it uses better than expected algorithms which can payoff nicely when used with very large numbers.

* Mersenne prime examples and whatnot belong external to our docs.  I usually put my recipes and examples on the ASPN Python Cookbook.  http://code.activestate.com/recipes/langs/python/
msg300780 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2017-08-24 11:13
pypy-5.8.0-beta0 (Python 3.5.3) is using a very nicely written CFFI wrapper for libmpdec, so it also has the fast bignums.
msg300781 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2017-08-24 11:20
What needs to be mentioned though is that the context has to be set
for unrounded calculations:

c = getcontext()
c.prec = MAX_PREC
c.Emax = MAX_EMAX
c.Emin = MIN_EMIN


Otherwise some people believe that the bignums are just rounded floating point arithmetic that is expected to be fast (I've seen this misconception somewhere, perhaps on Stackoverflow).
msg316169 - (view) Author: Cheryl Sabella (cheryl.sabella) * (Python committer) Date: 2018-05-04 12:32
ping
msg334747 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2019-02-02 14:37
New changeset 00e9c55d27aff3e445ab4c8629cf4d59f46ff945 by Stefan Krah (Cheryl Sabella) in branch 'master':
bpo-26256: Document algorithm speed for the Decimal module. (#4808)
https://github.com/python/cpython/commit/00e9c55d27aff3e445ab4c8629cf4d59f46ff945
msg334748 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2019-02-02 14:46
New changeset a2f4c4023314f69333d2e8cee68e316619f3d68e by Stefan Krah (Miss Islington (bot)) in branch '3.7':
bpo-26256: Document algorithm speed for the Decimal module. (GH-4808) (#11736)
https://github.com/python/cpython/commit/a2f4c4023314f69333d2e8cee68e316619f3d68e
History
Date User Action Args
2019-02-02 14:49:57skrahsetstatus: open -> closed
assignee: docs@python -> skrah
resolution: fixed
stage: patch review -> resolved
2019-02-02 14:46:11skrahsetmessages: + msg334748
2019-02-02 14:40:45miss-islingtonsetpull_requests: + pull_request11641
2019-02-02 14:40:34miss-islingtonsetpull_requests: + pull_request11640
2019-02-02 14:40:22miss-islingtonsetpull_requests: + pull_request11639
2019-02-02 14:37:43skrahsetmessages: + msg334747
2018-05-04 12:32:07cheryl.sabellasetmessages: + msg316169
2017-12-12 01:43:28cheryl.sabellasetversions: + Python 3.7, - Python 3.4
2017-12-12 01:41:34cheryl.sabellasetkeywords: + patch
stage: patch review
pull_requests: + pull_request4704
2017-08-24 11:20:10skrahsetmessages: + msg300781
2017-08-24 11:13:20skrahsetmessages: + msg300780
2017-08-24 00:43:30rhettingersetnosy: + rhettinger
messages: + msg300770
2017-08-23 22:39:11cheryl.sabellasetnosy: + cheryl.sabella
messages: + msg300769
2016-02-03 10:41:36skrahsetmessages: + msg259471
2016-02-03 10:00:18jnebsetmessages: + msg259467
2016-02-03 09:51:53vstinnersetnosy: + vstinner
messages: + msg259466
2016-02-03 09:42:27jnebsetversions: + Python 3.4, - Python 2.7
nosy: + docs@python

messages: + msg259465

assignee: docs@python
components: + Documentation, - Library (Lib)
2016-02-02 19:10:40skrahsetnosy: + skrah
messages: + msg259421
2016-02-02 18:07:50mark.dickinsonsetmessages: + msg259408
2016-02-02 16:56:39serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg259402
2016-02-01 14:20:50jnebsetmessages: + msg259326
2016-02-01 14:02:22mark.dickinsonsetnosy: + mark.dickinson
messages: + msg259324
2016-02-01 10:28:33jnebcreate