msg259317 - (view) |
Author: Jurjen N.E. Bos (jneb) * |
Date: 2016-02-01 10:28 |
Inspired by the recently new discovered 49th Mersenne prime number I wrote a module to do high speed long/int to decimal conversion. I can now output the new Mersenne number in 18.5 minutes (instead of several hours) on my machine.
For numbers longer than about 100000 bits, this routine it is faster than str(number) thanks to the Karatsuba multiplication in CPython.
The module supports all number bases 2 through 36, and is written in pure python (both 2 and 3).
There is a simple way to save more time by reusing the conversion object (saving about half the time for later calls).
My suggestion is to incorporate this into some library, since Python still lacks a routine to convert to any number base. Ideally, it could be incorporated in the builtin str function, but this would need more work.
When converting to C, it is recommended to optimize bases 4 and 32 the same way as oct, hex and bin do (which isn't easily accessible from Python).
Hope you like it. At least, it was a lot of fun to write...
Hope you like it.
|
msg259324 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2016-02-01 14:02 |
Very nice. I'd suggest posting it as a recipe or as a Python package on PyPI for others to use. It's not really something that it would be appropriate to add to the Python core: Python isn't really intended for super-efficient high-precision arithmetic, and there are external libraries for that sort of thing (like gmpy2, for example).
|
msg259326 - (view) |
Author: Jurjen N.E. Bos (jneb) * |
Date: 2016-02-01 14:20 |
Thanks for the tip. I'll consider making it a recipe.
- Jurjen
|
msg259402 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2016-02-02 16:56 |
Is this the same algorithm as in issue3451?
|
msg259408 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2016-02-02 18:07 |
Not really the same, since that issue was more about fast division. But it's the same spirit. It looks like I was more enthusiastic back then; now I'm older and grumpier. It's great to have this stuff, but I don't think it belongs in core Python: I'd much rather that the core Python integer implementation remain simple, portable and low-maintenance, and work well for the domain it's intended for: small-to-medium size integers. Cryptographic-size integers with a few hundred to a few thousand digits should be fine; number-theoretic work requiring hundreds of thousands of digits should use gmpy2 or something similar.
|
msg259421 - (view) |
Author: Stefan Krah (skrah) * |
Date: 2016-02-02 19:10 |
Jurjen, this is very nice! -- Like Mark, I'm not sure if this should be in CPython.
Decimal (Python >= 3.3) has sneaked in a couple of fast bignum
algorithms, so calculating and converting the latest Mersenne
prime takes a couple of seconds:
from decimal import *
c = getcontext()
c.prec = MAX_PREC
c.Emax = MAX_EMAX
c.Emin = MIN_EMIN
x = Decimal(2)**74207281 - 1
s = str(x)
assert s[:12] == "300376418084"
assert s[-12:] == "391086436351"
|
msg259465 - (view) |
Author: Jurjen N.E. Bos (jneb) * |
Date: 2016-02-03 09:42 |
OMG is decimal that fast?
Maybe I should change the issue then to "documentation missing": it nowhere says in the documentation that decimal has optimized multiprecision computations. It only says that precision "can be as large as needed for a given problem", but I never realized that that included millions of digits.
It computed to my complete surprise 2**2**29 to full precision (161 million decimal digits) in about a minute. That smells suspiciously like FFT style multiplication, which implies that it is way more sophisticated than integer multiplication!
I suggest that the documentation of the decimal module recommends using decimal for multiprecision computations, as long as you use the builtin version.
|
msg259466 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2016-02-03 09:51 |
"It's great to have this stuff, but I don't think it belongs in core Python: I'd much rather that the core Python integer implementation remain simple, portable and low-maintenance, and work well for the domain it's intended for: small-to-medium size integers."
Yeah I agree. Maybe we need to explain that somewhere? In the devguide? Even in Python doc?
I know that they are super crazy^W fast algorithm to handle very large numbers (hum, what is large? more than 4096 bits?), but they are usually very complex.
Projects like GMP have ultra fast code with optimizations in assemblers. Having assembler implementation is clearly out of the scope of Python.
"Maybe I should change the issue then to "documentation missing": it nowhere says in the documentation that decimal has optimized multiprecision computations."
Well, nowhere means:
https://docs.python.org/dev/whatsnew/3.3.html#decimal
Usually, we don't document performances of a function, maybe only its complexity.
--
You may move to the numpy community which is problably more keen on such optimization than the Python *core*.
|
msg259467 - (view) |
Author: Jurjen N.E. Bos (jneb) * |
Date: 2016-02-03 10:00 |
That reference you gave says that the binary version is faster than the Python version, but here the _complexity_ actually changed by a lot.
Only people who know the library by name will notice that it is this fast.
But you are right, for 99% of the people it doesn't make much of a difference.
|
msg259471 - (view) |
Author: Stefan Krah (skrah) * |
Date: 2016-02-03 10:41 |
On Wed, Feb 03, 2016 at 09:51:53AM +0000, STINNER Victor wrote:
> Well, nowhere means:
> https://docs.python.org/dev/whatsnew/3.3.html#decimal
Okay, but hardly anyone reads that, and I can't blame them. :)
For example, if I use something like Lua, I won't read a release announcement
from 2012.
> Usually, we don't document performances of a function, maybe only its complexity.
As Jurjen said, the presence of the FFT (actually fast number theoretic
transform here) in Decimal is something completely unexpected, so if there's
a place for a line or two in the docs that makes some users happy I'd be
in favor of it.
|
msg300769 - (view) |
Author: Cheryl Sabella (cheryl.sabella) * |
Date: 2017-08-23 22:39 |
Hello,
I came across this issue and was wondering if the FAQ section of the doc might be a good place to mention the presence of FFT, or actually fast number theoretic transform, as Stefan pointed out.
I was also wondering if the short code snippet for the Mersenne prime might be included in the recipes as an example of what decimal is capable of?
|
msg300770 - (view) |
Author: Raymond Hettinger (rhettinger) * |
Date: 2017-08-24 00:43 |
A couple of thoughts:
* It is okay to add a couple lines to the decimal module docs noting a Cpython specific implementation detail that it uses better than expected algorithms which can payoff nicely when used with very large numbers.
* Mersenne prime examples and whatnot belong external to our docs. I usually put my recipes and examples on the ASPN Python Cookbook. http://code.activestate.com/recipes/langs/python/
|
msg300780 - (view) |
Author: Stefan Krah (skrah) * |
Date: 2017-08-24 11:13 |
pypy-5.8.0-beta0 (Python 3.5.3) is using a very nicely written CFFI wrapper for libmpdec, so it also has the fast bignums.
|
msg300781 - (view) |
Author: Stefan Krah (skrah) * |
Date: 2017-08-24 11:20 |
What needs to be mentioned though is that the context has to be set
for unrounded calculations:
c = getcontext()
c.prec = MAX_PREC
c.Emax = MAX_EMAX
c.Emin = MIN_EMIN
Otherwise some people believe that the bignums are just rounded floating point arithmetic that is expected to be fast (I've seen this misconception somewhere, perhaps on Stackoverflow).
|
msg316169 - (view) |
Author: Cheryl Sabella (cheryl.sabella) * |
Date: 2018-05-04 12:32 |
ping
|
msg334747 - (view) |
Author: Stefan Krah (skrah) * |
Date: 2019-02-02 14:37 |
New changeset 00e9c55d27aff3e445ab4c8629cf4d59f46ff945 by Stefan Krah (Cheryl Sabella) in branch 'master':
bpo-26256: Document algorithm speed for the Decimal module. (#4808)
https://github.com/python/cpython/commit/00e9c55d27aff3e445ab4c8629cf4d59f46ff945
|
msg334748 - (view) |
Author: Stefan Krah (skrah) * |
Date: 2019-02-02 14:46 |
New changeset a2f4c4023314f69333d2e8cee68e316619f3d68e by Stefan Krah (Miss Islington (bot)) in branch '3.7':
bpo-26256: Document algorithm speed for the Decimal module. (GH-4808) (#11736)
https://github.com/python/cpython/commit/a2f4c4023314f69333d2e8cee68e316619f3d68e
|
|
Date |
User |
Action |
Args |
2022-04-11 14:58:27 | admin | set | github: 70444 |
2019-02-02 14:49:57 | skrah | set | status: open -> closed assignee: docs@python -> skrah resolution: fixed stage: patch review -> resolved |
2019-02-02 14:46:11 | skrah | set | messages:
+ msg334748 |
2019-02-02 14:40:45 | miss-islington | set | pull_requests:
+ pull_request11641 |
2019-02-02 14:40:34 | miss-islington | set | pull_requests:
+ pull_request11640 |
2019-02-02 14:40:22 | miss-islington | set | pull_requests:
+ pull_request11639 |
2019-02-02 14:37:43 | skrah | set | messages:
+ msg334747 |
2018-05-04 12:32:07 | cheryl.sabella | set | messages:
+ msg316169 |
2017-12-12 01:43:28 | cheryl.sabella | set | versions:
+ Python 3.7, - Python 3.4 |
2017-12-12 01:41:34 | cheryl.sabella | set | keywords:
+ patch stage: patch review pull_requests:
+ pull_request4704 |
2017-08-24 11:20:10 | skrah | set | messages:
+ msg300781 |
2017-08-24 11:13:20 | skrah | set | messages:
+ msg300780 |
2017-08-24 00:43:30 | rhettinger | set | nosy:
+ rhettinger messages:
+ msg300770
|
2017-08-23 22:39:11 | cheryl.sabella | set | nosy:
+ cheryl.sabella messages:
+ msg300769
|
2016-02-03 10:41:36 | skrah | set | messages:
+ msg259471 |
2016-02-03 10:00:18 | jneb | set | messages:
+ msg259467 |
2016-02-03 09:51:53 | vstinner | set | nosy:
+ vstinner messages:
+ msg259466
|
2016-02-03 09:42:27 | jneb | set | versions:
+ Python 3.4, - Python 2.7 nosy:
+ docs@python
messages:
+ msg259465
assignee: docs@python components:
+ Documentation, - Library (Lib) |
2016-02-02 19:10:40 | skrah | set | nosy:
+ skrah messages:
+ msg259421
|
2016-02-02 18:07:50 | mark.dickinson | set | messages:
+ msg259408 |
2016-02-02 16:56:39 | serhiy.storchaka | set | nosy:
+ serhiy.storchaka messages:
+ msg259402
|
2016-02-01 14:20:50 | jneb | set | messages:
+ msg259326 |
2016-02-01 14:02:22 | mark.dickinson | set | nosy:
+ mark.dickinson messages:
+ msg259324
|
2016-02-01 10:28:33 | jneb | create | |