classification
Title: Decimal slowdown in 3.0 due to str/unicode changes
Type: performance
Components: Library (Lib) Versions: Python 3.0
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: facundobatista, marketdickinson, ncoghlan
Priority: Keywords: patch

Created on 2008-03-25 21:36 by marketdickinson, last changed 2008-05-04 14:37 by facundobatista.

Files
File name Uploaded Description Edit Remove
decimal_bytes.patch marketdickinson, 2008-03-26 00:55
deccoeff.c marketdickinson, 2008-05-03 15:35 Decimal coefficient module (alpha)
Messages
msg64521 (view) Author: Mark Dickinson (marketdickinson) Date: 2008-03-25 21:36
The Python 3.0 version of decimal.py currently stores the coefficient of
a Decimal number (or the payload of a NaN) as a string, all of whose
characters are in the range '0' through '9'.

It may be more time-efficient to store the coefficient as a bytes 
instance instead, since bytes -> int conversion is likely to be faster 
than str -> int conversion.  On the other hand, int -> bytes conversion 
has to go through str, so may be significantly slower than int -> str 
conversion...

Bottom line:  this needs testing.

One other option:  it may even be worth considering storing the 
coefficient directly as a Python integer.  I've argued against this 
before, on the basis that it makes direct access to the decimal digits 
of a number harder, and this direct access is useful for rounding as 
well as for some of the more esoteric Decimal operations (logical 
operations, shifts and rotations).  But it may be worth a look.

(I think it's clear what the *right* option is, given unlimited 
developer time and energy:  decimal.py should represent the coefficient 
using a long integer implementation written in C, with wordsize a power 
of 10---probably either 10**4 or 10**9.)

See related discussion at issue 2482 and at

http://mail.python.org/pipermail/python-dev/2008-March/078189.html
msg64531 (view) Author: Mark Dickinson (marketdickinson) Date: 2008-03-26 00:55
Here's a first try at a patch that converts the Decimal coefficient
from type str to type bytes.  It needs some work:  here are some timings 
for a complete run of the test suite (best of 3 in each case) on my
MacBook.

Python 3.0, str:   12.880s
Python 3.0, bytes: 13.294s
Python 2.6:         9.279s
Python 2.5:         9.212s

There's still some room for speedup with the bytes patch, but at the 
moment it doesn't look as though performance alone is a compelling reason 
to make this change.
msg64538 (view) Author: Nick Coghlan (ncoghlan) Date: 2008-03-26 02:30
Hmm, I forgot there wasn't currently a direct path back from int ->
bytes. I think your numbers *do* show that we need to do something about
the performance of the Python 3.0 version of decimal. A 25% increase in
runtime is a pretty big performance hit.
msg66155 (view) Author: Mark Dickinson (marketdickinson) Date: 2008-05-03 15:35
I've made some progress here:  I have a version of decimal.py for
Python 3.0 that uses a Deccoeff instead of a string for the Decimal
coefficient.  It still needs a little work to make things efficient,
but it already passes all the tests in test_decimal.  There are no
API changes.

What's a Deccoeff?  Read on, and/or see attached file.

Deccoeff (for *Dec*imal *coeff*icient) is an extension type for Python
3.0 that combines some of the most useful aspects of both the integer
and string types.

First, you can do arithmetic directly with Deccoeff instances:

>>> from deccoeff import Deccoeff
>>> x = Deccoeff('7')
>>> y = Deccoeff('11')
>>> x + y
Deccoeff('18')
>>> x * y
Deccoeff('77')
>>> y - x
Deccoeff('4')
>>> x < y
True

Only nonnegative integers can be represented as Deccoeff instances,
since that's all that's needed for Decimal: a subtraction that would
produce a negative result instead raises OverflowError:

>>> x-y
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
    OverflowError: difference is negative and cannot be represented as a Deccoeff

Powering also works:

>>> x ** y
Deccoeff('1977326743')
>>> y ** x
Deccoeff('19487171')

So much for arithmetic:  a Deccoeff instance can also be treated as a sequence:
it can be sliced (though the 'step' argument is not supported):

>>> x = Deccoeff('123456789')
>>> x[2:]
Deccoeff('1234567')
>>> x[:5]
Deccoeff('56789')

and indexed:

>>> x[3]
Deccoeff('6')

The length gives the total number of digits:

>>> len(x)
9

Note that the indices work in the opposite direction to the one you
might expect by looking at the string representation of a Deccoeff:
indexing is set up so that index 0 refers to the units digit, 1 to the
tens digit, 2 to the hundreds digit, etc.

Negative indices also make sense (but not in the usual Python sense of
counting from the end of the object): you should imagine the Deccoeff
as a finite string of digits padded with zeros infinitely on both the
left and right.  Then it makes sense to do, for example:

>>> x[-2:]
Deccoeff('12345678900')
>>> x[-3:4]
Deccoeff('6789000')

This provides a convenient way to do a 'decimal shift left'.  There
are many places in Decimal where it's appropriate to use x[n:] without
knowing in advance whether n is positive or negative.

This is a work in progress...  Comments very welcome.
msg66161 (view) Author: Nick Coghlan (ncoghlan) Date: 2008-05-03 17:56
(I changed the issue title to better reflect where the discussion has
moved to)

I really like the approach of a custom internal type for the mantissa
storage (the module containing would probably best be called _decimal).
Then it should be possible to focus on letting C do what it does best
(high speed arithmetic) and Python do what it does best (provide the
nice API and context management features)
msg66211 (view) Author: Facundo Batista (facundobatista) Date: 2008-05-04 14:37
I like this path, also.

Beyond Deccoeff, this module could hold in the future other functions
that are speed critical.

Great work, Mark!
History
Date User Action Args
2008-05-04 14:37:50facundobatistasetmessages: + msg66211
2008-05-03 17:56:23ncoghlansetmessages: + msg66161
title: Consider using bytes type instead of str to store Decimal coefficients -> Decimal slowdown in 3.0 due to str/unicode changes
2008-05-03 15:35:34marketdickinsonsetfiles: + deccoeff.c
messages: + msg66155
2008-03-26 02:30:01ncoghlansetmessages: + msg64538
2008-03-26 00:55:18marketdickinsonsetfiles: + decimal_bytes.patch
keywords: + patch
messages: + msg64531
2008-03-25 21:36:14marketdickinsoncreate