Issue 26256: Fast decimalisation and conversion to other bases

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/70444

classification

Title:	Fast decimalisation and conversion to other bases
Type:	enhancement	Stage:	resolved
Components:	Documentation	Versions:	Python 3.7

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:	skrah	Nosy List:	cheryl.sabella, docs@python, jneb, mark.dickinson, rhettinger, serhiy.storchaka, skrah, vstinner
Priority:	normal	Keywords:	patch

Created on 2016-02-01 10:28 by jneb, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
fastStr.py	jneb, 2016-02-01 10:28	Fast conversion of integers to string (in any number base)

Pull Requests
URL	Status	Linked	Edit
PR 4808	merged	cheryl.sabella, 2017-12-12 01:41
PR 11736	merged	miss-islington, 2019-02-02 14:40
PR 11736	merged	miss-islington, 2019-02-02 14:40
PR 11736	merged	miss-islington, 2019-02-02 14:40

Messages (17)
msg259317 - (view)	Author: Jurjen N.E. Bos (jneb) *	Date: 2016-02-01 10:28
Inspired by the recently new discovered 49th Mersenne prime number I wrote a module to do high speed long/int to decimal conversion. I can now output the new Mersenne number in 18.5 minutes (instead of several hours) on my machine. For numbers longer than about 100000 bits, this routine it is faster than str(number) thanks to the Karatsuba multiplication in CPython. The module supports all number bases 2 through 36, and is written in pure python (both 2 and 3). There is a simple way to save more time by reusing the conversion object (saving about half the time for later calls). My suggestion is to incorporate this into some library, since Python still lacks a routine to convert to any number base. Ideally, it could be incorporated in the builtin str function, but this would need more work. When converting to C, it is recommended to optimize bases 4 and 32 the same way as oct, hex and bin do (which isn't easily accessible from Python). Hope you like it. At least, it was a lot of fun to write... Hope you like it.
msg259324 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2016-02-01 14:02
Very nice. I'd suggest posting it as a recipe or as a Python package on PyPI for others to use. It's not really something that it would be appropriate to add to the Python core: Python isn't really intended for super-efficient high-precision arithmetic, and there are external libraries for that sort of thing (like gmpy2, for example).
msg259326 - (view)	Author: Jurjen N.E. Bos (jneb) *	Date: 2016-02-01 14:20
Thanks for the tip. I'll consider making it a recipe. - Jurjen
msg259402 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2016-02-02 16:56
Is this the same algorithm as in issue3451?
msg259408 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2016-02-02 18:07
Not really the same, since that issue was more about fast division. But it's the same spirit. It looks like I was more enthusiastic back then; now I'm older and grumpier. It's great to have this stuff, but I don't think it belongs in core Python: I'd much rather that the core Python integer implementation remain simple, portable and low-maintenance, and work well for the domain it's intended for: small-to-medium size integers. Cryptographic-size integers with a few hundred to a few thousand digits should be fine; number-theoretic work requiring hundreds of thousands of digits should use gmpy2 or something similar.
msg259421 - (view)	Author: Stefan Krah (skrah) *	Date: 2016-02-02 19:10
Jurjen, this is very nice! -- Like Mark, I'm not sure if this should be in CPython. Decimal (Python >= 3.3) has sneaked in a couple of fast bignum algorithms, so calculating and converting the latest Mersenne prime takes a couple of seconds: from decimal import * c = getcontext() c.prec = MAX_PREC c.Emax = MAX_EMAX c.Emin = MIN_EMIN x = Decimal(2)**74207281 - 1 s = str(x) assert s[:12] == "300376418084" assert s[-12:] == "391086436351"
msg259465 - (view)	Author: Jurjen N.E. Bos (jneb) *	Date: 2016-02-03 09:42
OMG is decimal that fast? Maybe I should change the issue then to "documentation missing": it nowhere says in the documentation that decimal has optimized multiprecision computations. It only says that precision "can be as large as needed for a given problem", but I never realized that that included millions of digits. It computed to my complete surprise 2229 to full precision (161 million decimal digits) in about a minute. That smells suspiciously like FFT style multiplication, which implies that it is way more sophisticated than integer multiplication! I suggest that the documentation of the decimal module recommends using decimal for multiprecision computations, as long as you use the builtin version.
msg259466 - (view)	Author: STINNER Victor (vstinner) *	Date: 2016-02-03 09:51
"It's great to have this stuff, but I don't think it belongs in core Python: I'd much rather that the core Python integer implementation remain simple, portable and low-maintenance, and work well for the domain it's intended for: small-to-medium size integers." Yeah I agree. Maybe we need to explain that somewhere? In the devguide? Even in Python doc? I know that they are super crazy^W fast algorithm to handle very large numbers (hum, what is large? more than 4096 bits?), but they are usually very complex. Projects like GMP have ultra fast code with optimizations in assemblers. Having assembler implementation is clearly out of the scope of Python. "Maybe I should change the issue then to "documentation missing": it nowhere says in the documentation that decimal has optimized multiprecision computations." Well, nowhere means: https://docs.python.org/dev/whatsnew/3.3.html#decimal Usually, we don't document performances of a function, maybe only its complexity. -- You may move to the numpy community which is problably more keen on such optimization than the Python core.
msg259467 - (view)	Author: Jurjen N.E. Bos (jneb) *	Date: 2016-02-03 10:00
That reference you gave says that the binary version is faster than the Python version, but here the _complexity_ actually changed by a lot. Only people who know the library by name will notice that it is this fast. But you are right, for 99% of the people it doesn't make much of a difference.
msg259471 - (view)	Author: Stefan Krah (skrah) *	Date: 2016-02-03 10:41
On Wed, Feb 03, 2016 at 09:51:53AM +0000, STINNER Victor wrote: > Well, nowhere means: > https://docs.python.org/dev/whatsnew/3.3.html#decimal Okay, but hardly anyone reads that, and I can't blame them. :) For example, if I use something like Lua, I won't read a release announcement from 2012. > Usually, we don't document performances of a function, maybe only its complexity. As Jurjen said, the presence of the FFT (actually fast number theoretic transform here) in Decimal is something completely unexpected, so if there's a place for a line or two in the docs that makes some users happy I'd be in favor of it.
msg300769 - (view)	Author: Cheryl Sabella (cheryl.sabella) *	Date: 2017-08-23 22:39
Hello, I came across this issue and was wondering if the FAQ section of the doc might be a good place to mention the presence of FFT, or actually fast number theoretic transform, as Stefan pointed out. I was also wondering if the short code snippet for the Mersenne prime might be included in the recipes as an example of what decimal is capable of?
msg300770 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2017-08-24 00:43
A couple of thoughts: * It is okay to add a couple lines to the decimal module docs noting a Cpython specific implementation detail that it uses better than expected algorithms which can payoff nicely when used with very large numbers. * Mersenne prime examples and whatnot belong external to our docs. I usually put my recipes and examples on the ASPN Python Cookbook. http://code.activestate.com/recipes/langs/python/
msg300780 - (view)	Author: Stefan Krah (skrah) *	Date: 2017-08-24 11:13
pypy-5.8.0-beta0 (Python 3.5.3) is using a very nicely written CFFI wrapper for libmpdec, so it also has the fast bignums.
msg300781 - (view)	Author: Stefan Krah (skrah) *	Date: 2017-08-24 11:20
What needs to be mentioned though is that the context has to be set for unrounded calculations: c = getcontext() c.prec = MAX_PREC c.Emax = MAX_EMAX c.Emin = MIN_EMIN Otherwise some people believe that the bignums are just rounded floating point arithmetic that is expected to be fast (I've seen this misconception somewhere, perhaps on Stackoverflow).
msg316169 - (view)	Author: Cheryl Sabella (cheryl.sabella) *	Date: 2018-05-04 12:32
ping
msg334747 - (view)	Author: Stefan Krah (skrah) *	Date: 2019-02-02 14:37
New changeset 00e9c55d27aff3e445ab4c8629cf4d59f46ff945 by Stefan Krah (Cheryl Sabella) in branch 'master': bpo-26256: Document algorithm speed for the Decimal module. (#4808) https://github.com/python/cpython/commit/00e9c55d27aff3e445ab4c8629cf4d59f46ff945
msg334748 - (view)	Author: Stefan Krah (skrah) *	Date: 2019-02-02 14:46
New changeset a2f4c4023314f69333d2e8cee68e316619f3d68e by Stefan Krah (Miss Islington (bot)) in branch '3.7': bpo-26256: Document algorithm speed for the Decimal module. (GH-4808) (#11736) https://github.com/python/cpython/commit/a2f4c4023314f69333d2e8cee68e316619f3d68e

History
Date	User	Action	Args
2022-04-11 14:58:27	admin	set	github: 70444
2019-02-02 14:49:57	skrah	set	status: open -> closed assignee: docs@python -> skrah resolution: fixed stage: patch review -> resolved
2019-02-02 14:46:11	skrah	set	messages: + msg334748
2019-02-02 14:40:45	miss-islington	set	pull_requests: + pull_request11641
2019-02-02 14:40:34	miss-islington	set	pull_requests: + pull_request11640
2019-02-02 14:40:22	miss-islington	set	pull_requests: + pull_request11639
2019-02-02 14:37:43	skrah	set	messages: + msg334747
2018-05-04 12:32:07	cheryl.sabella	set	messages: + msg316169
2017-12-12 01:43:28	cheryl.sabella	set	versions: + Python 3.7, - Python 3.4
2017-12-12 01:41:34	cheryl.sabella	set	keywords: + patch stage: patch review pull_requests: + pull_request4704
2017-08-24 11:20:10	skrah	set	messages: + msg300781
2017-08-24 11:13:20	skrah	set	messages: + msg300780
2017-08-24 00:43:30	rhettinger	set	nosy: + rhettinger messages: + msg300770
2017-08-23 22:39:11	cheryl.sabella	set	nosy: + cheryl.sabella messages: + msg300769
2016-02-03 10:41:36	skrah	set	messages: + msg259471
2016-02-03 10:00:18	jneb	set	messages: + msg259467
2016-02-03 09:51:53	vstinner	set	nosy: + vstinner messages: + msg259466
2016-02-03 09:42:27	jneb	set	versions: + Python 3.4, - Python 2.7 nosy: + docs@python messages: + msg259465 assignee: docs@python components: + Documentation, - Library (Lib)
2016-02-02 19:10:40	skrah	set	nosy: + skrah messages: + msg259421
2016-02-02 18:07:50	mark.dickinson	set	messages: + msg259408
2016-02-02 16:56:39	serhiy.storchaka	set	nosy: + serhiy.storchaka messages: + msg259402
2016-02-01 14:20:50	jneb	set	messages: + msg259326
2016-02-01 14:02:22	mark.dickinson	set	nosy: + mark.dickinson messages: + msg259324
2016-02-01 10:28:33	jneb	create