Message 148669 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	skrah
Recipients	brian.curtin, casevh, ced, eric.smith, eric.snow, jjconti, mark.dickinson, rhettinger, skrah, vstinner
Date	2011-11-30.16:05:22
SpamBayes Score	0.0
Marked as misclassified	No
Message-id	<1322669124.22.0.17489025754.issue7652@psf.upfronthosting.co.za>
In-reply-to

Content
Binary versus decimal --------------------- > There is already gmpy and bigfloat, based on the heavily optimized GMP library, > for example. Is it a license issue? Can't we reuse GMP/MPFR to offer a Decimal API? _decimal is a PEP-399 compliant C implementation of decimal.py. The underlying standard is Cowlishaw/IBM's "General Decimal Arithmetic Specification". decimal.py is used for standard-conforming financial calculations. There is no way to implement this in a reasonable manner using a binary floating point library. Additionally, _decimal is also heavily optimized. In fact, for small precisions the module has the same speed as gmpy! Soundness and code size ----------------------- > _decimal should maybe first be distributed as a third party library until > it is really well tested and its API is really stable, until we can decide > to integrate it. Except for a different directory structure, the cdecimal module is identical to this patch. cdecimal has been distributed for almost two years now and has been on pypi for a year. There have been many downloads from financial institutions, stock exchanges and also research institutes. I know for a fact from a private email correspondence that libmpdec is used in a billing application of a large national NIC. cdecimal appears to be huge because it has a test suite that actually provides 100% code coverage. Indeed this means that even every possible malloc failure is simulated together with an assertion that the result of the function is (NaN, Malloc_error). The test suite now tests against both decimal.py and decNumber. It has found several small issues in decimal.py, a bug in netlib's dtoa.c, a bug in gmp and a bug in CompCert. The latest tests against decNumber have found 18 issues in decNumber (that I haven't reported yet). In the past 8 months, regression tests for cdecimal-2.3 have been running trillions of test cases both with and without Valgrind. Review ------ The patch could be audited by focusing on basearith.c, cdecimal.c and mpdecimal.c. cdecimal.c is a long but simple wrapper around libmpdec. mpdecimal.c contains all functions of the specification. I contend that for a C programmer mpdecimal.c is not significantly harder to read than decimal.py. The tricky algorithms (newtondiv, invroot, sqrt-via-invroot and ln) have mechanical proofs in ACL2. An initial audit could certainly disregard convolute.c, crt.c, difradix2.c, fnt.c, numbertheory.c, transpose.c and umodarith.h. These are only needed for the number theoretic transform that kicks in at around 22000 digits. Context type safety ------------------- > The patch adds __setattr__ to the Decimal class. Making the context more strictly typed has instantly found a bug in one of decimal.py's docstring tests: # This doctest has always passed: >>> c = Context(ExtendedContext) # But the operation is meaningless: >>> c Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib/python2.7/decimal.py", line 3708, in __repr__ % vars(self)) TypeError: %d format: a number is required, not Context >>> What is the concern about __setattr__? For setting contexts, speed is not so important (for reading contexts it is).

Binary versus decimal
---------------------

> There is already gmpy and bigfloat, based on the heavily optimized GMP library,
> for example. Is it a license issue? Can't we reuse GMP/MPFR to offer a Decimal API?

_decimal is a PEP-399 compliant C implementation of decimal.py. The underlying
standard is Cowlishaw/IBM's "General Decimal Arithmetic Specification".

decimal.py is used for standard-conforming financial calculations. There is
no way to implement this in a reasonable manner using a binary floating
point library.

Additionally, _decimal is also heavily optimized. In fact, for small
precisions the module has the same speed as gmpy!


Soundness and code size
-----------------------

> _decimal should maybe first be distributed as a third party library until
> it is really well tested and its API is really stable, until we can decide
> to integrate it.

Except for a different directory structure, the cdecimal module is
identical to this patch. cdecimal has been distributed for almost
two years now and has been on pypi for a year.

There have been many downloads from financial institutions, stock
exchanges and also research institutes. I know for a fact from
a private email correspondence that libmpdec is used in a billing
application of a large national NIC.

cdecimal *appears* to be huge because it has a test suite that
actually provides 100% code coverage. Indeed this means that even
every possible malloc failure is simulated together with an assertion
that the result of the function is (NaN, Malloc_error).

The test suite now tests against both decimal.py and decNumber.
It has found several small issues in decimal.py, a bug in
netlib's dtoa.c, a bug in gmp and a bug in CompCert.

The latest tests against decNumber have found 18 issues in decNumber
(that I haven't reported yet).

In the past 8 months, regression tests for cdecimal-2.3 have been
running trillions of test cases both with and without Valgrind.


Review
------
The patch could be audited by focusing on basearith.c, cdecimal.c
and mpdecimal.c. cdecimal.c is a long but simple wrapper around
libmpdec. mpdecimal.c contains all functions of the specification.
I contend that for a C programmer mpdecimal.c is not significantly
harder to read than decimal.py.

The tricky algorithms (newtondiv, invroot, sqrt-via-invroot
and ln) have mechanical proofs in ACL2.


An initial audit could certainly disregard convolute.c, crt.c,
difradix2.c, fnt.c, numbertheory.c, transpose.c and umodarith.h.

These are only needed for the number theoretic transform that kicks
in at around 22000 digits.


Context type safety
-------------------

> The patch adds __setattr__ to the Decimal class.

Making the context more strictly typed has instantly found a bug
in one of decimal.py's docstring tests:

# This doctest has always passed:
>>> c = Context(ExtendedContext)

# But the operation is meaningless:
>>> c
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/decimal.py", line 3708, in __repr__
    % vars(self))
TypeError: %d format: a number is required, not Context
>>>


What is the concern about __setattr__? For *setting* contexts, speed
is not so important (for reading contexts it is).

History
Date	User	Action	Args
2011-11-30 16:05:24	skrah	set	recipients: + skrah, rhettinger, mark.dickinson, vstinner, casevh, eric.smith, jjconti, ced, brian.curtin, eric.snow
2011-11-30 16:05:24	skrah	set	messageid: <1322669124.22.0.17489025754.issue7652@psf.upfronthosting.co.za>
2011-11-30 16:05:23	skrah	link	issue7652 messages
2011-11-30 16:05:22	skrah	create