classification
Title: Merge C version of decimal into py3k.
Type: performance Stage: committed/rejected
Components: Extension Modules, Library (Lib) Versions: Python 3.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: skrah Nosy List: Amaury.Forgeot.d'Arc, Arfrever, Jim.Jewett, Ramchandra Apte, amaury.forgeotdarc, asvetlov, benjamin.peterson, casevh, ced, eric.smith, eric.snow, haypo, jjconti, lemburg, mark.dickinson, pitrou, python-dev, rhettinger, skrah
Priority: normal Keywords: patch

Created on 2010-01-07 10:42 by mark.dickinson, last changed 2012-09-02 12:18 by skrah. This issue is now closed.

Files
File name Uploaded Description Edit
49433f35a5f8.diff skrah, 2011-06-18 13:34 review
bba956250186.diff skrah, 2011-11-30 21:25 review
be8a59fcba49.diff skrah, 2011-12-14 21:11 fixed all comments except the ones in ISSUES.txt review
ppro-mulmod.txt skrah, 2011-12-15 14:00 Proof for x87 FPU modular multiplication
40917e4b51aa.diff skrah, 2012-02-23 17:30 review
9b3b1f5c4072.diff skrah, 2012-03-10 17:58 review
api-demo.c skrah, 2012-03-27 18:57
Repositories containing patches
http://hg.python.org/features/cdecimal#py3k-cdecimal
Messages (83)
msg97347 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-01-07 10:42
I've created this issue to keep track of progress in merging Stefan Krah's work on decimal in c into py3k.

We've created a branch py3k-cdecimal (with merge tracking from py3k) for this work.  When the branch is fully working and tested we'll consult python-dev about next steps.
msg97348 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2010-01-07 11:40
Is the intention to write Decimal.__format__ in C, too? That would be quite a bit of work, and I'm not sure I could recommend it. But I'm not sure if your plan is to get rid of all Python code or not.

If your plan is to rewrite absolutely everything in C, I could help out by exposing the methods that parse format specifiers and do some of the low level formatting. They're used internally by the int, float, and str formatting code. Let me know.
msg97355 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-01-07 13:41
Just to clarify, no decision has yet been made on *whether* the cdecimal work should be integrated into py3k;  we'll consult python-dev on this once we've got a working branch and performance information.
msg97372 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-01-07 20:54
To answer Eric's question:  Decimal.__format__ is already implemented in Stefan's work---it looks like most of the code is in

http://svn.python.org/projects/python/branches/py3k-cdecimal/Modules/cdecimal/io.c

(Stefan, is this right?)
msg97373 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-01-07 20:58
So the new branch looks great---thanks, Stefan!  I'm only just beginning to look at the code properly, though.

A couple of things:

(1) Could we unify test_decimal and test_cdecimal somehow?  This would avoid them getting out of sync when new tests are added, and would make it clear what the differences between them are.  It looks like there's currently a lot of duplicate code.

(2) At some point we'll need some documentation.  Even if all it says is:  the cdecimal module operates identically to the decimal module, with the following exceptions... (notes on threading differences, exponent limits, correct rounding of pow, etc.)
msg97377 - (view) Author: Brian Curtin (brian.curtin) * (Python committer) Date: 2010-01-07 21:35
>mark> (1) Could we unify test_decimal and test_cdecimal somehow?
>mark> This would avoid them getting out of sync when new tests are
>mark> added, and would make it clear what the differences between 
>mark> them are.  It looks like there's currently a lot of duplicate code.

An approach similar to the one taken in test_warnings.py might work: write common test code as a base class, then subclass it to be run against both the C and Py versions of the module.
msg97382 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2010-01-07 22:06
Yes, formatting is completely implemented in io.c, together with quite a comprehensive test suite. I like the new Python format strings, so I wanted them in the C library, too.
msg97411 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2010-01-08 11:46
Unify test_decimal and test_cdecimal:

Yes, quite possible. The diff is currently 400 lines, but it should be easy to get that down to below 100 without any loss of functionality.

I'll look into that when I'm done with the 64 bit ANSI path.


Documentation:

Anything is welcome, even a patch that just creates a stub so I don't have to figure out where to put it.

The differences are listed at the bottom of:

http://www.bytereef.org/cdecimal.html
msg97412 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2010-01-08 11:49
Just an update. Rev.77358 should compile and run stable on the buildbot platforms except Alpha and ia64. I'm working on a default ANSI path for
64-bit.
msg97719 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2010-01-13 14:17
As a first step in unifying test_decimal.py and test_cdecimal.py I
would like to patch test_decimal.py in trunk and py3k. This is to
minimize differences between py3k and py3k-cdecimal.

(1) Remove test that Decimal(x) generates a copy.

(2) Add test case to formatting test.

(3) Extend threading test.

(4) Use Emax of 425000000 instead of 999999999 where possible.
    (The 32-bit version of cdecimal has the official limit of
    425000000, even though 999999999 works in almost all cases.)


If I get an OK for the two patches, I can commit them to py3k and
trunk. If you don't want to apply (1), I'll make new patches.
msg98161 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2010-01-22 19:57
All outstanding issues mentioned here have been solved in Rev. 77696:

  (1) New ANSI path for unknown 64bit platforms (ia64 and Alpha build
      without problems now).

  (2) Unified tests for decimal and cdecimal.

  (3) Documentation for cdecimal.


Other improvements:

  (4) Added comprehensive test suite for testing the library directly.

  (5) Fixed warnings in Visual Studio.

  (6) Code formatting.
msg120305 - (view) Author: Case Van Horsen (casevh) Date: 2010-11-03 03:28
Has the cdecimal branch kept up with the hash value changes in 3.2?

Is there a still a chance that cdecimal could be merged into 3.2?
msg120674 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-11-07 12:30
On Wed, Nov 3, 2010 at 3:28 AM, Case Van Horsen <report@bugs.python.org> wrote:
> Has the cdecimal branch kept up with the hash value changes in 3.2?

Not sure;  that's a question for Stefan.

> Is there a still a chance that cdecimal could be merged into 3.2?

A chance, yes;  the major need is for someone with time to do a full
review.  (I'm afraid that's not me, right now.)
msg121452 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2010-11-18 12:08
An update on the progress:

All development currently happens in my private mpdecimal repository. The
next version of mpdecimal (2.0) is finished, stable and will be released
once all tests have completed successfully. Running the whole test suite
can take several weeks, since Valgrind is involved and all Python versions
from 2.5 to 3.2 are tested.


In py3k-cdecimal, r86497 is an exact copy of mpdecimal-2.0. All buildbots
pass the short tests.


Major improvements:

   o Full compatibility with decimal.py 3.2, including the new hash
     functions and float operations.

   o With the new FloatOperation signal, accidental float operations
     can be detected.

   o The underlying library - libmpdec - now has 100% code coverage
     together with Makefile targets for creating coverage reports.
     In particular, every possible allocation failure during an
     operation can be tested in brute force style.

   o The module has 85% code coverage. All lines except failures
     of Python C-API functions are tested.

   o Several minor bug fixes, most of them deal with allocation
     failures under extreme bignum conditions.



Potential reviewers:

I'll be happy to answer questions here or privately. IMO the best way to
get acquainted with the module is to do the regular build and tests first,
then explore Lib/test/mpdecimal, reading LIBTEST.txt and PYTEST.txt.
msg138029 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2011-06-09 22:34
Just a couple of remarks about the diff I created:

The changes to decimal.py are exploratory (i.e. done quite hastily)
and serve the purpose to fulfill PEP-399.

library/cdecimal.rst is completely out of date.


The rest should be very stable.
msg138583 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2011-06-18 14:20
The latest patch is based on a relatively stable revision of 3.3. To my
knowledge, _decimal.c and decimal.py are now fully compatible in the
sense of PEP-399.



libmpdec
========

    o New test suite with comprehensive tests against decNumber.

    o Full support for 32-bit compilers (tested with CompCert).


PEP-399
=======

    o cdecimal.c is now called _decimal.c. Instead of importing cdecimal,
      _decimal is transparently imported as decimal (if the C version is
      available).

    o Unified unit tests with 100% code coverage for both decimal.py and
      _decimal.c. For _decimal.c, the tests include all failures of
      Python API functions (requires special patch for testing).

    o deccheck.py now also tests arbitrary input and makes sure that both
      modules raise the same exceptions.

    o Both modules produce the same pickle output for Decimal and Context.

_decimal.c
==========

    o Speed up int/Decimal conversion for integers that fit into a
      single word of a PyLongObject (performance gain is around 15%).

    o real(), imag(), conjugate(), __complex__() support.

    o Fraction and complex comparison support.

    o Decimal constructor now accepts lists as well as tuples.

    o DecimalTuple support.

    o General cleanup and refactoring. The functions for conversions 
      between Decimal and other numeric types are much cleaner now 
      and could be used for a PyDec_* API.
msg148650 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-11-30 12:40
> Just to clarify, no decision has yet been made on *whether*
> the cdecimal work should be integrated into py3k;
> we'll consult python-dev on this once we've got a working branch
> and performance information.

So, what is the status today?

_decimal looks to be huge. Does Python really need yet another multiprecision library? There is already gmpy and bigfloat, based on the heavily optimized GMP library, for example. Is it a license issue? Can't we reuse GMP/MPFR to offer a Decimal API?

_decimal should maybe first be distributed as a third party library until it is really well tested and its API is really stable, until we can decide to integrate it. The patch adds __setattr__ to the Decimal class.
msg148652 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2011-11-30 13:03
> Does Python really need yet another multiprecision library?

It's not really another library:  it's a reimplementation of the existing decimal library in C.  The decimal library is *hugely* valuable to the financial world, but its slowness is a major concern.  _decimal would help to address that concern.

> Can't we reuse GMP/MPFR to offer a Decimal API?

Nope: those are for binary floating-point.  Shoehorning decimal semantics on top of a binary floating-point library is a really bad idea.  (Actually, that's a part of why decimal.py is slow---it's using Python's *binary* integers to store *decimal* coefficients, so that even simple addition is now a quadratic operation, thanks to the binary <-> decimal conversions involved.)

> _decimal should maybe first be distributed as a third party library until it is really well tested and its API is > really stable.

My take is that this has already happened.

The only problem from my perspective is getting someone to find time to review such a massive patch.  I've been wondering whether we could get away with some kind of 'statistical' review:  do a large-scale review, and then instead of having someone go through every line of C code, pick a few representative sections at random and review those.  If those code portions make it through the review unscathed, declare the code good and merge it in.
msg148669 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2011-11-30 16:05
Binary versus decimal
---------------------

> There is already gmpy and bigfloat, based on the heavily optimized GMP library,
> for example. Is it a license issue? Can't we reuse GMP/MPFR to offer a Decimal API?

_decimal is a PEP-399 compliant C implementation of decimal.py. The underlying
standard is Cowlishaw/IBM's "General Decimal Arithmetic Specification".

decimal.py is used for standard-conforming financial calculations. There is
no way to implement this in a reasonable manner using a binary floating
point library.

Additionally, _decimal is also heavily optimized. In fact, for small
precisions the module has the same speed as gmpy!


Soundness and code size
-----------------------

> _decimal should maybe first be distributed as a third party library until
> it is really well tested and its API is really stable, until we can decide
> to integrate it.

Except for a different directory structure, the cdecimal module is
identical to this patch. cdecimal has been distributed for almost
two years now and has been on pypi for a year.

There have been many downloads from financial institutions, stock
exchanges and also research institutes. I know for a fact from
a private email correspondence that libmpdec is used in a billing
application of a large national NIC.

cdecimal *appears* to be huge because it has a test suite that
actually provides 100% code coverage. Indeed this means that even
every possible malloc failure is simulated together with an assertion
that the result of the function is (NaN, Malloc_error).

The test suite now tests against both decimal.py and decNumber.
It has found several small issues in decimal.py, a bug in
netlib's dtoa.c, a bug in gmp and a bug in CompCert.

The latest tests against decNumber have found 18 issues in decNumber
(that I haven't reported yet).

In the past 8 months, regression tests for cdecimal-2.3 have been
running trillions of test cases both with and without Valgrind.


Review
------
The patch could be audited by focusing on basearith.c, cdecimal.c
and mpdecimal.c. cdecimal.c is a long but simple wrapper around
libmpdec. mpdecimal.c contains all functions of the specification.
I contend that for a C programmer mpdecimal.c is not significantly
harder to read than decimal.py.

The tricky algorithms (newtondiv, invroot, sqrt-via-invroot
and ln) have mechanical proofs in ACL2.


An initial audit could certainly disregard convolute.c, crt.c,
difradix2.c, fnt.c, numbertheory.c, transpose.c and umodarith.h.

These are only needed for the number theoretic transform that kicks
in at around 22000 digits.


Context type safety
-------------------

> The patch adds __setattr__ to the Decimal class.

Making the context more strictly typed has instantly found a bug
in one of decimal.py's docstring tests:

# This doctest has always passed:
>>> c = Context(ExtendedContext)

# But the operation is meaningless:
>>> c
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/decimal.py", line 3708, in __repr__
    % vars(self))
TypeError: %d format: a number is required, not Context
>>>


What is the concern about __setattr__? For *setting* contexts, speed
is not so important (for reading contexts it is).
msg148677 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2011-11-30 17:17
Mark Dickinson <report@bugs.python.org> wrote:
> The only problem from my perspective is getting someone to find time to review such a massive patch.  I've been wondering whether we could get away with some kind of 'statistical' review:  do a large-scale review, and then instead of having someone go through every line of C code, pick a few representative sections at random and review those.  If those code portions make it through the review unscathed, declare the code good and merge it in.

The regex module is in a somewhat similar situation. If I'm interpreting this

http://mail.python.org/pipermail/python-dev/2011-August/113240.html

dialogue correctly, a complete audit down to the last line isn't
always necessary.
msg148678 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-11-30 17:27
> If I'm interpreting this
> http://mail.python.org/pipermail/python-dev/2011-August/113240.html
> dialogue correctly, a complete audit down to the last line isn't
> always necessary.

It is also helped by the fact you are a core developer and we trust you to be here to do maintenance :)
I'd add that decimal is a fairly specialized module and the implications of a new engine are not as wide-reaching as a new regex engine. Especially if the pure Python version is still there.

I think it's still probably a good idea to probe python-dev, if that hasn't already happened.
msg148680 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2011-11-30 19:10
We've been wanting this for a long time.

Strong +1 from me.
msg148682 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2011-11-30 20:18
I can help with the review. Is http://bugs.python.org/review/7652/show a good starting point?  I already have some comments.
msg148687 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2011-11-30 21:10
Antoine Pitrou <report@bugs.python.org> wrote:
> It is also helped by the fact you are a core developer and we trust you to
> be here to do maintenance :)

Sure. The specification doesn't really change, so the work will hopefully
be limited. :)

> I think it's still probably a good idea to probe python-dev, if that
> hasn't already happened.

Yes, I'm planning to do that.
msg148688 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2011-11-30 21:12
Raymond Hettinger <report@bugs.python.org> wrote:
> We've been wanting this for a long time.
> 
> Strong +1 from me.

Thank you, Raymond!
msg148689 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-11-30 21:13
> (Actually, that's a part of why decimal.py is slow---it's
> using Python's *binary* integers to store *decimal* coefficients,
> so that even simple addition is now a quadratic operation,
> thanks to the binary <-> decimal conversions involved.)

Oh, I forgot this minor detail (base 2 vs base 10).
msg148690 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2011-11-30 21:15
Amaury Forgeot d'Arc <report@bugs.python.org> wrote:
> I can help with the review. Is http://bugs.python.org/review/7652/show a
> good starting point? I already have some comments.

Yes, that would be great. Apart from two or three changes that I still
need to push patch set 4 is the latest version.
msg148691 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2011-11-30 21:28
Stefan Krah <report@bugs.python.org> wrote:
> Yes, that would be great. Apart from two or three changes that I still
> need to push patch set 4 is the latest version.

Hmm, no. I'll create a slightly newer patch from Oct. 1st.
msg148721 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2011-12-01 13:38
[Amaury]
> Overall, I think that the "mpd" C library should be better separated from the
> _decimal module (a bit like _ctypes, with the libffi library): its own configure
> & makefile, its own test suite... which are not necessarily related to Python.

Except for its own directory libmpdec has all that (See LIBTEST.txt). Library
tests are in the tests/ directory, python tests in the python/ directory.

Are you suggesting to build a static library and then use that to build
the module? I remember this didn't work on Windows (not to mention AIX and such).
msg149556 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2011-12-15 14:00
Amaury has asked for more comments (and I agree). However, I'm not sure what 
level of detail would be appropriate. As an example, I've posted the full
proof of the x87 modular multiplication in umodarith.h.


Even with the Coq parts stripped, this would still be a massive comment.


Would you prefer that level of detail or should I just post the core
of the algorithm?
msg149557 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2011-12-15 14:03
Stefan Krah <report@bugs.python.org> wrote:
> Would you prefer that level of detail or should I just post the core
> of the algorithm?

Argh. s/post/add to comments in umodarith.h/
msg149559 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-12-15 14:12
> Amaury has asked for more comments (and I agree). However, I'm not sure what 
> level of detail would be appropriate. As an example, I've posted the full
> proof of the x87 modular multiplication in umodarith.h.
> 
> 
> Even with the Coq parts stripped, this would still be a massive comment.

You could ship it as a separate .txt file (like we have e.g.
Objects/dict_notes.txt).
msg149567 - (view) Author: Amaury Forgeot d'Arc (Amaury.Forgeot.d'Arc) * Date: 2011-12-15 17:33
2011/12/15 Stefan Krah <report@bugs.python.org>

>
> Stefan Krah <stefan-usenet@bytereef.org> added the comment:
>
> Amaury has asked for more comments (and I agree). However, I'm not sure
> what
> level of detail would be appropriate. As an example, I've posted the full
> proof of the x87 modular multiplication in umodarith.h.
>
>
> Even with the Coq parts stripped, this would still be a massive comment.
>
>
> Would you prefer that level of detail or should I just post the core
> of the algorithm?
>

For my part, a two-lines description of the purpose of file is enough.
Something like
"Routines for the reverse transmogrification of randomized digits,
used in multiplication of numbers above 2**32 bits"
Or something else that makes sense :-)

At least something that makes it clear that I don't have to read further if
I'm only interested in the definition of Python classes for example.
msg149600 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2011-12-16 08:51
> For my part, a two-lines description of the purpose of file is enough.

OK, I'll go for small comments in the files themselves and put big
ones in separate files.
msg153447 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2012-02-15 23:14
I tried my timestamp patch with _decimal: it fails because decimal and _decimal API is not exactly the same.

decimal.Decimal.__truediv__() has an optional context argument, whereas _decimal defines PyNumberMethods.

decimal.Decimal.quantize() second argument is optional and its default value is None, but if I pass None to _decimal.Decimal.quantize(), I get a TypeError because _decimal expects an integer.
msg153506 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-02-16 22:25
STINNER Victor <report@bugs.python.org> wrote:
> decimal.Decimal.__truediv__() has an optional context argument, whereas _decimal defines PyNumberMethods.

Regarding the special methods: decimal.py uses the optional context arguments
for convenience so that these methods can be re-used in other places.

I wouldn't consider this context argument as part of the API.

> decimal.Decimal.quantize() second argument is optional and its default value
> is None, but if I pass None to _decimal.Decimal.quantize(), I get a TypeError
> because _decimal expects an integer.

About this I'm not sure. I think type errors are a courtesy to the user.
Look what is possible now in decimal.py:

Decimal('9')

But here the argument might well be made for accepting None (and only
None apart from rounding modes). - I hope Mark and Raymond will give
their opinions, too.
msg153507 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-02-16 22:28
I walked into the Roundup trap again:

>>> Decimal(9).quantize(1, "?!?!?")
Decimal('9')
msg154074 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-02-23 17:33
Over the last two months I've done a full review of all files except
_decimal.c and mpdecimal.c.

I now have additional ACL2 proofs for a couple of functions in basearith.c
(some are partial proofs), a full proof for the special case (d = 10**19)
of Granlund/Montgomery's "Divide double word by constant" algorithm and a
full proof for the Chinese Remainder Theorem in crt.c.


I didn't find anything important. I found a couple of useless variable
initializations, missing comments etc.

There was one (deliberate) incompatibility in the format function:
If rounding lead to an Overflow, _decimal printed 'Infinity'.

_decimal can't read back numbers that are out of bounds, so this
was done to ensure that a roundtrip is always possible.


But I changed the function so _decimal now uses exactly the same algorithm
for formatting as decimal.py.
msg154988 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2012-03-05 23:15
How can I help to integrate this module into CPython?

The test suite pass in debug and release mode without any failure on my Linux box (64 bits, running Ubuntu 11.10).
msg154992 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2012-03-05 23:47
Victor, yes, the decimal module needs a C implementation.  Without it, the pure Python code is abysmally slow.  Other MP implementations don't fill the need or come close to implementing the decimal arithmetic spec.
msg155031 - (view) Author: Jim Jewett (Jim.Jewett) Date: 2012-03-06 18:23
(1)  I think this module would benefit greatly from a map explaining what each file does, and perhaps from some reorganization. 

As best I can yet tell, there are about ~130 files, over a dozen directories, but the only ones that directly affect the implementation are a subset (~33) of the *.c and *h files in Modules/_decimal/ (and not subdirectories).  

Even files that do affect the implementation, such as mpdecimal.c, also seem to have functions thrown in just for testing small pieces of functionality, such as Newton Division.

There may also be some code that really isn't needed, except possibly for backwards compatibility, and could be #ifdef'ed or at least commented.  For example, the comments above io.c function _mpd_strneq(const char *s, const char *l, const char *u, size_t n) mention working around the Turkish un/dotted-i problem when lowercasing -- but why is a decimal library even worried about casing?

(2)  Is assembly allowed?  If not, please make it clear that vcdiv64.asm is just an optional speedup, and that the code doesn't rely upon it.

(3)  Are there parts of this library that provide functionality NOT in the current decimal library?  If so, this should be at least documented, and perhaps either removed or exposed.
msg155034 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-03-06 20:07
Jim, thanks for taking a look at this.

Jim Jewett <report@bugs.python.org> wrote:
> (1)  I think this module would benefit greatly from a map explaining
>      what each file does, and perhaps from some reorganization. 

Just MAP.txt in the top level directory? Amaury suggested moving
the library into a subdirectory. I'm not sure about that. The
library would be out of sight, but is that a good thing?

> As best I can yet tell, there are about ~130 files, over a dozen directories,
> but the only ones that directly affect the implementation are a subset (~33)
> of the *.c and *h files in Modules/_decimal/ (and not subdirectories).  

Indeed the top level directory contains _decimal.c and all files
from libmpdec. Almost all files are required.

The three subdirectories contain:

  tests/       ->  library tests
  python/      ->  extended module tests
  literature/  ->  pointers to papers and explanations for trickier algorithms.

> Even files that do affect the implementation, such as mpdecimal.c,
> also seem to have functions thrown in just for testing small pieces
> of functionality, such as Newton Division.

That is correct. They were deliberately added in that place because they rely
on a couple of inline functions and I have a policy of testing the exact code
that the original function relies on.

The alternative is to extract all functions needed, move them to the test
directory and hope that the code doesn't get stale. But if you have a better
idea, I'd be glad to hear it: I don't like the test functions in that place
either.

The reason that Newton Division is tested for ridiculously small precisions
like prec=1 is that it should pass IBM's test suite just like the regular
division function. (Also, small precisions are most likely to expose problems).

> There may also be some code that really isn't needed, except possibly for
> backwards compatibility, and could be #ifdef'ed or at least commented.

I'm not aware of any except for whole files:

  mpsignal.c  ->  signaling wrappers for the mpdecimal.c functions,
                  not needed for _decimal.c but part of libmpdec.

  mptest.h    ->  header for running the tests.

  bench.c     ->  library benchmark.

> Turkish un/dotted-i problem when lowercasing -- but why is a decimal
> library even worried about casing?

"Infinity", "InFinItY", "iNF" are all allowed by the specification.

> (2)  Is assembly allowed?

I was under the assumption that it is allowed:

Python/pymath.c:23:    __asm__ __volatile__ ("fnstcw %0" : "=m" (cw));
Python/pymath.c:28:    __asm__ __volatile__ ("fldcw %0" : : "m" (cw));
Python/ceval.c:43:    asm volatile ("mftbu %0" : "=r" (tbu) );
Python/ceval.c:44:    asm volatile ("mftb  %0" : "=r" (tb)  );
Python/ceval.c:45:    asm volatile ("mftbu %0" : "=r" (tbu2));
Python/ceval.c:59:     __asm__ __volatile__("rdtsc" : "=A" (val))
Python/ceval.c:69:    __asm__ __volatile__("rdtsc" : \

> If not, please make it clear that vcdiv64.asm is just an optional speedup,
> and that the code doesn't rely upon it.

No code relies on asm. Assembly is only used for the double word mul/divmod
primitives in typearith.h and the Pentium PRO modular multiplication in
umodarith.h, and there are ANSI versions for everything.

The library really compiles with any compiler I have tested, including
compilers without uint64_t like CompCert (CompCert does not compile
Python for example, but for other reasons).

> (3) Are there parts of this library that provide functionality NOT
>     in the current decimal library?  If so, this should be at least
>     documented, and perhaps either removed or exposed.

Apart from mpsignal.c (see above), there are probably a couple of things
in the header files like mpd_invroot(). _mpd_qinvroot() from mpdecimal.c
*is* needed because the square root is calculated in terms of the
inverse square root.

Are these (probably) minor instances of additional functionality a
big problem for you? Because for me it would be a hassle to maintain
diverging versions of libmpdec and the Python version of libmpdec.

This is also related to testing: The complete test suite (all tests
against decNumber and decimal.py) under Valgrind takes 8 months to
run.

My question is: Where should I document these things and in what detail?
msg155036 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-03-06 20:14
STINNER Victor <report@bugs.python.org> wrote:
> How can I help to integrate this module into CPython?

It would be fantastic if you could take a look at _decimal.c, for example
to find some incompatibilities between _decimal.c and decimal.py.

mpdecimal.c could also always profit from another audit.

That's the only file that still needs to go through my second
self-audit round.
msg155045 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2012-03-07 03:49
The scripts for generating code would preferably go in a Tools/decimal directory.
msg155046 - (view) Author: Jim Jewett (Jim.Jewett) Date: 2012-03-07 04:03
On Tue, Mar 6, 2012 at 3:07 PM, Stefan Krah

> Jim Jewett <report@bugs.python.org> wrote:
>> (1)  I think this module would benefit greatly from a map explaining
>>      what each file does, and perhaps from some reorganization.

> Just MAP.txt in the top level directory?

That should work.  There may be better names, but I can't think of any just now.

> Amaury suggested moving
> the library into a subdirectory. I'm not sure about that. The
> library would be out of sight, but is that a good thing?

cdecimal certainly needs a subdirectory similar to those for _io,
_ctypes, _multiprocessing, and _sqlite.

It *may* make sense to move some of the subdirectories around.  (On
the other hand, it may not; if the tests in Lib/test/ end up
delegating back, that is probably OK.)

I believe it would be helpful to move non-code (project files, etc) to
separate directories.

Whether you need *additional* subdirectories within _cdecimal to
subcategorize the .c and .h files, I'm not sure -- because I didn't
get in deep enough to know what they should be.  If the categorization
let people focus on the core, that would be helpful, but it wasn't
clear to me which files were part of the exported API and which were
implementation details.  Are there are clear distinctions (type
info/python bindings/basic arithmetic/advanced
algorithms/internal-use-only/???)

>> As best I can yet tell, there are about ~130 files, over a dozen directories,
>> but the only ones that directly affect the implementation are a subset (~33)
>> of the *.c and *h files in Modules/_decimal/ (and not subdirectories).

> Indeed the top level directory contains _decimal.c and all files
> from libmpdec. Almost all files are required.

Would it make sense to integrate only cdecimal, and to treat libmpdec
as an external dependency that (usually) gets updated with each Python
feature release, the way that sqlite is?

> The three subdirectories contain:

>  tests/       ->  library tests
>  python/      ->  extended module tests

I would really expect that to still be under tests, and I would expect
a directory called python to contain code written in python, or at
least python bindings.

>  literature/  ->  pointers to papers and explanations for trickier algorithms.

>> Even files that do affect the implementation, such as mpdecimal.c,
>> also seem to have functions thrown in just for testing small pieces
>> of functionality, such as Newton Division.

> That is correct. They were deliberately added in that place because they rely
> on a couple of inline functions and I have a policy of testing the exact code
> that the original function relies on.

How important is it that these functions be inline?

Would it at least be OK to wrap them in stubs for exporting, so that
the test logic could be places with the others tests?  (I worry that
some tests may stop getting run if someone else modifies the build
process and doesn't notice the unusual location.)

> The alternative is to extract all functions needed, move them to the test
> directory and hope that the code doesn't get stale.

I agree that copying is bad.

I'll trust your judgement on the need for inline.  But given:

    ALWAYS_INLINE int
    mpd_word_digits(mpd_uint_t word)

I don't see anything wrong with exporting:

    int
    _testhelp_mpd_word_digits(mpd_uint_t word) {
        return mpd_word_digits(word);
    }

>> Turkish un/dotted-i problem when lowercasing -- but why is a decimal
>> library even worried about casing?

> "Infinity", "InFinItY", "iNF" are all allowed by the specification.

OK; so is io.c part of the library, or part of the python binding?

Given that this is targeted at 3.3 or later, would it make sense to
either use casefolding, or check the kind?  (If it isn't ASCII, it
can't be the word "INF".)

Are there only a certain number of strings that will ever matter, such
as INF, NAN, and INFINITY, so that a case statement would work?
tolower() with an extra check for the turkish undotted lower case i?
What you have may well be the best compromise, but it bothers me to
see text processing tools redone in a numeric type -- particularly
without knowing why they are needed.

>> (2)  Is assembly allowed?

> I was under the assumption that it is allowed:

I'm honestly not sure, but I think that was one of the reasons
stackless was never integrated.

>> If not, please make it clear that vcdiv64.asm is just an optional speedup,
>> and that the code doesn't rely upon it.

> No code relies on asm. Assembly is only used for the double word mul/divmod
> primitives in typearith.h and the Pentium PRO modular multiplication in
> umodarith.h, and there are ANSI versions for everything.

Good enough, though I would rather see that as a comment near the assembly.

>> (3) Are there parts of this library that provide functionality NOT
>>     in the current decimal library?  If so, this should be at least
>>     documented, and perhaps either removed or exposed.

> Apart from mpsignal.c (see above), there are probably a couple of things
> in the header files like mpd_invroot(). _mpd_qinvroot() from mpdecimal.c
> *is* needed because the square root is calculated in terms of the
> inverse square root.

> Are these (probably) minor instances of additional functionality a
> big problem for you? Because for me it would be a hassle to maintain
> diverging versions of libmpdec and the Python version of libmpdec.

I'm not worried about the header files.  I am worried about what is
exposed to python, but just documenting it (docstrings and the module
.rst) may be OK.

But I'm also worried that there may be fair amounts of code that are
effectively dead after the "remove any names not in decimal.py"
importing trick.  If so, I would at least like that in some sort of
#ifdef, so that people don't spend too much time trying to make sense
of it.

That said, if you plan to maintain an external libmpdec regardless of
what happens, then it makes even more sense to integrate (at most) the
bindings, and to treat libmpdec as an external dependency.
msg155047 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2012-03-07 04:09
Speaking of inline, the "inline" keyword will have to go because it's not C89.
msg155050 - (view) Author: Ramchandra Apte (Ramchandra Apte) * Date: 2012-03-07 06:40
But we could check if the compiler supports the inline keyword and use it if available.
msg155059 - (view) Author: Case Van Horsen (casevh) Date: 2012-03-07 08:30
I've found some differences between decimal and cdecimal. 

cdecimal 2.3 does not support the __ceil__ and __floor__ methods that exist in decimal. math.ceil converts a cdecimal.Decimal instance into a float before finding the ceiling. This can generate incorrect results.

>>> import decimal
>>> import math
>>> math.ceil(decimal.Decimal("12345678901234567890.1"))
12345678901234567168

The decimal module in previous versions returns the correct answer 12345678901234567891

cdecimal.Decimal instances do not emulate the various single-underscore methods of a decimal.Decimal instance. In gmpy2, I use _int, _exp, _sign, and _is_special to convert a decimal.Decimal into an exact fraction. I realize the issue is with gmpy2 and I will fix gmpy2, but there may be other code that uses those methods.
msg155070 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-03-07 10:28
Jim Jewett <report@bugs.python.org> wrote:
> Whether you need *additional* subdirectories within _cdecimal to
> subcategorize the .c and .h files, I'm not sure -- because I didn't
> get in deep enough to know what they should be.  If the categorization
> let people focus on the core, that would be helpful, but it wasn't
> clear to me which files were part of the exported API and which were
> implementation details.  Are there are clear distinctions (type
> info/python bindings/basic arithmetic/advanced
> algorithms/internal-use-only/???)

OK, as a basis for discussion I've added:

http://hg.python.org/features/cdecimal/file/8b75c2825508/Modules/_decimal/FILEMAP.txt

I didn't mention the main reason why _decimal.c and libmpdec are in a flat
directory: Building the library first and then the module from the library
led to problems on at least Windows and AIX. That's why I started to treat
all libmpdec files as part of the module, list them as dependencies in setup.py
and let distutils figure everything out. Distutils also can figure out
automatically if a Mac OS build happens to be a "universal" build and things
like that.

The build process is very well tested by now and it took quite a while
to figure everything out, so I'd be reluctant to change the flat hierarchy.

> > ??python/ ?? ?? ??-> ??extended module tests
> 
> I would really expect that to still be under tests, and I would expect
> a directory called python to contain code written in python, or at
> least python bindings.

Could you explain? The python/ directory contains deccheck.py, formathelper.py
etc.

> Would it at least be OK to wrap them in stubs for exporting, so that
> the test logic could be places with the others tests?  (I worry that
> some tests may stop getting run if someone else modifies the build
> process and doesn't notice the unusual location.)

tests/runtest.c won't compile then. I'll look into the stub and also
the _testhelp suggestions.

> > "Infinity", "InFinItY", "iNF" are all allowed by the specification.
> 
> OK; so is io.c part of the library, or part of the python binding?

I see a potential source of confusion: io.c is firmly part of the
library. All PEP 3101 formatting is part of libmpdec, because I like
the mini language. io.c only understands ASCII and UTF-8 fill characters.

It is the *library* tests that would fail under the Turkish locale
(if not for _mpd_strneq).

> Good enough, though I would rather see that as a comment near the assembly.

Comments how to enforce an ANSI build (much slower!) are in LIBTEST.txt
and now also in FILEMAP.txt.

> I'm not worried about the header files.  I am worried about what is
> exposed to python, but just documenting it (docstrings and the module
> .rst) may be OK.
> 
> But I'm also worried that there may be fair amounts of code that are
> effectively dead after the "remove any names not in decimal.py"
> importing trick.  If so, I would at least like that in some sort of
> #ifdef, so that people don't spend too much time trying to make sense
> of it.

It's the opposite: names from decimal.py starting with an underscore
that are not in _decimal are removed. If I don't use that trick, I
end up with about 50 additional symbols from decimal.py:

import decimal # the C version
dir(decimal)

... '_ContextManager', '_Infinity', '_Log10Memoize', ...
msg155071 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2012-03-07 10:37
Does the C version have a C API importable as capsule ?
If not, could you add one and a decimal.h to go with it ?

This makes integration in 3rd party modules a lot easier.

Thanks,
-- 
Marc-Andre Lemburg
eGenix.com

________________________________________________________________________
2012-02-13: Released eGenix pyOpenSSL 0.13        http://egenix.com/go26
2012-02-09: Released mxODBC.Zope.DA 2.0.2         http://egenix.com/go25

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/
msg155079 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-03-07 12:18
Benjamin Peterson <report@bugs.python.org> wrote:
> The scripts for generating code would preferably go in a Tools/decimal directory.

Hmm, do you mean the gen*.py scripts? The output is generated by decimal.py
and used for testing libmpdec. While the syntax resembles that of the
*.decTest files, only tests/runtest can handle the format.
msg155082 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-03-07 12:29
Benjamin Peterson <report@bugs.python.org> wrote:
> Speaking of inline, the "inline" keyword will have to go because it's not C89.

Do you happen to know a free compiler that builds Python but does not
understand "inline"? I'm asking because without testing you can never
really be sure:

For example I added support for compilers without uint64_t, but
all major compilers (gcc, suncc, icc, VS) of course have uint64_t.

Then I finally found CompCert, and discovered that a couple of
things were missing in the headers.
msg155083 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-03-07 12:43
> Do you happen to know a free compiler that builds Python but does not
> understand "inline"? I'm asking because without testing you can never
> really be sure:

You could use Py_LOCAL_INLINE, but most compilers should inline small
functions automatically, AFAIK.
msg155085 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-03-07 13:22
Case Van Horsen <report@bugs.python.org> wrote:
> cdecimal 2.3 does not support the __ceil__ and __floor__

Thanks. I'll look into that.

> cdecimal.Decimal instances do not emulate the various single-underscore methods of a decimal.Decimal instance. In gmpy2, I use _int, _exp, _sign, and _is_special to convert a decimal.Decimal into an exact fraction. I realize the issue is with gmpy2 and I will fix gmpy2, but there may be other code that uses those methods.

There seems to be a real need for getting (sign, coeff, exp). I think
psycopg2 uses a painful way to get an integer coefficient via as_tuple().
What should really be added is either as_triple(), which would return
(sign, coeff, exp) with an integer coefficient or make these attributes
official:

Decimal.sign
Decimal.coeff
Decimal.exp

I have to think about implementing Decimal._int etc. Somehow I feel that
doing so would send a wrong signal: People shouldn't assume that they'll
get away with using private methods and attributes.

Also, of course they'll keep using these private methods until they
are finally deprecated, and *then* they'll have to change things.
msg155086 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-03-07 13:49
Antoine Pitrou <report@bugs.python.org> wrote:
> You could use Py_LOCAL_INLINE, but most compilers should inline small
> functions automatically, AFAIK.

At the time I wrote it I benchmarked everything. I'm pretty sure
that gcc did not inline larger functions like mpd_qresize_zero()
and mpd_word_digits() and even some smaller ones that are declared
ALWAYS_INLINE. Also, the static inline functions in the header
files are absolutely crucial for speed.

I recall that Mark initially said that in the Modules hierarchy
not every module would need to compile. Now, _decimal is already
tested with gcc, clang, icc, suncc, Visual Studio, and success
has been reported with xlc[1]. CompCert compiles libmpdec but not
Python.

_ctypes does not compile with icc and suncc. Unlike _ctypes, _decimal
has a fallback in the form of decimal.py. So, perhaps as an alternative
we could leave the inlines and wait for build failure reports?

[1] compilation success, one of the numerous AIX issues on
    bugs.python.org with loading the module was encountered IIRC.
msg155087 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-03-07 13:52
> _ctypes does not compile with icc and suncc. Unlike _ctypes, _decimal
> has a fallback in the form of decimal.py. So, perhaps as an alternative
> we could leave the inlines and wait for build failure reports?

Sounds good to me.
msg155088 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-03-07 13:54
Jim Jewett <report@bugs.python.org> wrote:
> implementation details.  Are there are clear distinctions (type
> info/python bindings/basic arithmetic/advanced
> algorithms/internal-use-only/???)

I failed to mention that libmpdec also has complete documentation. Perhaps
that can answer some questions:

http://www.bytereef.org/mpdecimal/doc/libmpdec/index.html
http://www.bytereef.org/mpdecimal/doc/libmpdec/functions.html#quiet-functions
msg155093 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-03-07 15:32
Marc-Andre Lemburg <report@bugs.python.org> wrote:
> Does the C version have a C API importable as capsule ?

Not yet. I'll try to make a list with proposed function names and post it here.

> If not, could you add one and a decimal.h to go with it ?

Yes, sure.
msg155095 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-03-07 16:17
This issue was raised by Jim on Rietveld:

Currently, the order of arguments in Context.__init__() differs
from repr(Context):

>>> Context()
Context(prec=28, rounding=ROUND_HALF_EVEN, Emin=-999999999, Emax=999999999, capitals=1, flags=[], traps=[DivisionByZero, Overflow, InvalidOperation])
>>> 
>>> Context(28, ROUND_HALF_EVEN, -999999999, 999999999, 1, [], [DivisionByZero, Overflow, InvalidOperation])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.2/decimal.py", line 3774, in __init__
    flags = dict([(s, int(s in flags)) for s in _signals])
  File "/usr/lib/python3.2/decimal.py", line 3774, in <listcomp>
    flags = dict([(s, int(s in flags)) for s in _signals])
TypeError: argument of type 'int' is not iterable
>>> 


I find this quite ugly. The repr() order is the good one, since
it moves the flag dictionaries to the end. I wanted to change
the order in Context.__init__() to match repr(), but I'm not
sure if such a change is possible.

I don't think Python code would initialize a context without keywords,
but C extensions might.
msg155107 - (view) Author: Jim Jewett (Jim.Jewett) Date: 2012-03-07 19:42
On Wed, Mar 7, 2012 at 5:28 AM, Stefan Krah
<stefan-usenet@bytereef.org> added the comment:

> OK, as a basis for discussion I've added:
> http://hg.python.org/features/cdecimal/file/8b75c2825508/Modules/_decimal/FILEMAP.txt

That is indeed very helpful.  So helpful that now I understand well
enough to have additional gripes.  :D

Starting from that URL, I don't actually find setup.py.

I am assuming (but would prefer to have the file explicitly state)
that _decimal.c and docstrings.h are the only source files, and that
setup.py would be the only build infrastructure needed if you already
had libmpdec.a.

I'm not sure what sort of failures building in the normal order led
to, but that is certainly something worth documenting, and (ideally)
fixing.

I didn't see any mention of the literature subdirectory, which makes
me wonder if there were other files not listed in the map.  (Not yet
curious enough to verify for myself, though.)  (*.txt files?)

I suppose a subdirectory name like "python" makes sense when you look
at the library as a C/C++ project that happens to provide python
bindings; as part of the python core, it is misleading.  It sounds
like it should be named "extended_tests" or some such.  (Note that
this assumes it is strictly for tests; if you are also using it to
provide extra functionality, or to generate some of the source code,
then I agree with Benjamin that it should move to tools, and the
generated code should have clear comments right at the top warning
that it is generated.)

Within the library, does io.[ch] really limit itself to ASCII?  If so,
then I don't know why you're worried about the Turkish i.  If you mean
generic text, then please don't specify ASCII.

Within memory.[ch], how much of this configurability is useful (or
even available) to a python user, as opposed to an extension writer?
Or is it really just for testing?  As in, would it be reasonable to
just have a single header with a half-dozen #defines, but to replace
that header when doing a memory-test build?  Or at least to hide the
alternative function definitions inside an obvious #ifdef, so that
they don't take up memory and attention when they aren't applicable?

Under the Bignum section, it mentions that functions from these files
are ultimately used in _mpd_fntmul, but doesn't mention where that is
(in the main _cdecimal.c)

Also, when it talks about "large arrays", could you clarify that it
isn't talking about arrays of values or even matrixes, it is just
talking about numbers large enough that even representing them takes
at least N bytes?

>> Would it at least be OK to wrap them in stubs for exporting, so that
>> the test logic could be places with the others tests?  (I worry that
>> some tests may stop getting run if someone else modifies the build
>> process and doesn't notice the unusual location.)

> tests/runtest.c won't compile then. I'll look into the stub and also
> the _testhelp suggestions.

OK, let me rephrase.  Is newton division exported to users, or used
internally, or is it just for testing purposes?

_mpd_qtest_newtondiv is documented as a testcase; I would rather see
it move to a test file.  Why can't it?  If it is because of
_mpd_qdiv_inf, _settriple, _mpd_qbarrett_divmod, _mpd_real_size (the
only apparently internal things it uses), then why are these internal?
 Could they be exposed at least to test functions?  If not, could they
at least be wrapped in something that could exposed, such as
_testhelp_mpd_qdiv_inf?

>> > "Infinity", "InFinItY", "iNF" are all allowed by the specification.

>> OK; so is io.c part of the library, or part of the python binding?

> I see a potential source of confusion: io.c is firmly part of the
> library.

[And should therefore be available when used without python?]

> All PEP 3101 formatting is part of libmpdec, because I like
> the mini language. io.c only understands ASCII and UTF-8 fill characters.

> It is the *library* tests that would fail under the Turkish locale
> (if not for _mpd_strneq).

Are those tests relevant to a _cdecimal built in to python itself?  If
not, then the comment is at least misleading.  Would it make sense to
have a directory for files that are used only for the standalone C
library, but not when built as part of python?

>> Good enough, though I would rather see that as a comment near the assembly.

> Comments how to enforce an ANSI build (much slower!) are in LIBTEST.txt
> and now also in FILEMAP.txt.

I would not have made the leap from that to "What to do if the
assembly code needs to be replaced or even just changed."
msg155135 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-03-08 00:06
Jim Jewett <report@bugs.python.org> wrote:
> > OK, as a basis for discussion I've added:
> > http://hg.python.org/features/cdecimal/file/8b75c2825508/Modules/_decimal/FILEMAP.txt
> 
> Starting from that URL, I don't actually find setup.py.

It's the setup.py from the Python top level directory.

> I'm not sure what sort of failures building in the normal order led
> to, but that is certainly something worth documenting, and (ideally)
> fixing.

I do not have access to AIX. On Windows the failures were locale related
when mixing the dynamic and static runtimes.

> I didn't see any mention of the literature subdirectory, which makes
> me wonder if there were other files not listed in the map.  (Not yet
> curious enough to verify for myself, though.)  (*.txt files?)

FILEMAP.txt was intended to get people started, not to be a polished work.

> I suppose a subdirectory name like "python" makes sense when you look
> at the library as a C/C++ project that happens to provide python
> bindings; as part of the python core, it is misleading.

OK.

> provide extra functionality, or to generate some of the source code,

There is no source code generation. I'm not sure where this idea comes
from. genlocale.py e.g. has this comment:

Usage:  ../../../python genlocale.py | ../tests/runtest -

> Within the library, does io.[ch] really limit itself to ASCII?  If so,
> then I don't know why you're worried about the Turkish i.  If you mean
> generic text, then please don't specify ASCII.

I do mean ASCII. Please run this gdb session:

diff --git a/Modules/_decimal/io.c b/Modules/_decimal/io.c
--- a/Modules/_decimal/io.c
+++ b/Modules/_decimal/io.c
@@ -245,7 +245,7 @@
                if (digits > (size_t)(ctx->prec-ctx->clamp))
                        goto conversion_error;
        }
-       else if (_mpd_strneq(s, "inf", "INF", 3)) {
+       else if (strncasecmp(s, "inf", 3) == 0) {
                s += 3;
                if (*s == '\0' || _mpd_strneq(s, "inity", "INITY", 6)) {
                        /* numeric-value: infinity */

gdb ../../python
b mpd_qset_string
r
>>> locale.setlocale(locale.LC_ALL, 'tr_TR.utf8')
'tr_TR.utf8'
>>> from decimal import *
>>> Decimal("Infinity")

(gdb) 
248             else if (strncasecmp(s, "inf", 3) == 0) {
(gdb) p s
$1 = 0x7ffff7191280 "Infinity"
(gdb) p s[0]
$2 = 73 'I'
(gdb) n
259                     if ((coeff = scan_dpoint_exp(s, &dpoint, &exp, &end)) == NULL)

Clearly 'I' is ASCII and strncasecmp fails, exactly as written in the
comment above _mpd_strneq().

> Within memory.[ch], how much of this configurability is useful (or
> even available) to a python user, as opposed to an extension writer?

Since it is in the libmpdec section, "User" refers to the extension writer.
I can simply drop the "User".

> Under the Bignum section, it mentions that functions from these files
> are ultimately used in _mpd_fntmul, but doesn't mention where that is
> (in the main _cdecimal.c)

OK.

> Also, when it talks about "large arrays", could you clarify that it
> isn't talking about arrays of values or even matrixes, it is just
> talking about numbers large enough that even representing them takes
> at least N bytes?

But it is referring to abstract sequences or arrays of values less
than a certain prime. These values happen to be the coefficient of
an mpd_t, but you could perform the transform on any sequence.

I thought 'Bignum' would already imply an array of machine words
representing a number.

> >> Would it at least be OK to wrap them in stubs for exporting, so that
> >> the test logic could be places with the others tests? ??(I worry that
> >> some tests may stop getting run if someone else modifies the build
> >> process and doesn't notice the unusual location.)
> 
> > tests/runtest.c won't compile then. I'll look into the stub and also
> > the _testhelp suggestions.
> 
> OK, let me rephrase.  Is newton division exported to users, or used
> internally, or is it just for testing purposes?

It's used internally as _mpd_qbarrett_divmod(). When the coefficient has
more than MPD_NEWTONDIV_CUTOFF words, the division functions dispatch
to _mpd_qbarrett_divmod().

> _mpd_qtest_newtondiv is documented as a testcase; I would rather see
> it move to a test file.  Why can't it?

I said in my last mail that I'll look into it.

> >> > "Infinity", "InFinItY", "iNF" are all allowed by the specification.
> 
> >> OK; so is io.c part of the library, or part of the python binding?
> 
> > I see a potential source of confusion: io.c is firmly part of the
> > library.
> 
> [And should therefore be available when used without python?]

I meant: Despite the fact that io.c might /appear/ to be specifically
written for the module because of the presence of PEP 3101 references,
it is part of the standalone (moduleless) library. However, _decimal.c
uses all parts of io.c except for a couple of functions at the bottom
of the file that are useful for debugging.
msg155331 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-03-10 18:01
Here's a new patch that addresses several remarks:

  o _decimal now has __floor__ and __ceil__methods.

  o libmpdec is now in a separate directory.

  o I removed the big libmpdec test suite. This is why:

     - It reduces the number of files that has been criticized multiple times.

     - Probably I'll be the only one who will be running the test suite.

     - The test suite is still on PyPI.

     - The test suite is so large that it's a magnet for toolchain bugs.
       For example, it found a bug in suncc (acknowledged and fixed by Sun)
       and a bug in CompCert (acknowledged and fixed by Xavier Leroy).

       I'm really worried that *if* someone runs the tests and finds
       an obscure bug, I'll have to spend hours debugging something
       that might ultimately be an external bug.

  o Renamed "python" directory into "tests".

  o Test functions in mpdecimal.c are removed.

  o All files in libmpdec that aren't required for _decimal are removed.

  o Extra functionality in _decimal.c is commented out. This does not
    yet cover the FloatOperation signal, which I would like to add
    to decimal.py as well (I'll ask later on python-dev).
msg155359 - (view) Author: Arfrever Frehtes Taifersar Arahesis (Arfrever) * Date: 2012-03-10 22:57
Please add --with-system-libmpdec (or --with-system-mpdecimal) option in `configure`, similarly to --with-system-expat and --with-system-ffi options.
msg155381 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-03-11 11:05
Benjamin Peterson <report@bugs.python.org> wrote:
> Speaking of inline, the "inline" keyword will have to go because it's not C89.

Actually the trickier instances of "inline" in the .c files are already
suppressed when LEGACY_COMPILER (i.e. C89) is defined. I've now listed the
machine options here:

http://hg.python.org/features/cdecimal/file/0f032cda94aa/Modules/_decimal/README.txt

As I now remember, that was in fact necessary for CompCert. The "static inline"
instances in header files might not be a problem even for embedded compilers,
see e.g.:

http://embeddedgurus.com/barr-code/2011/03/do-inline-function-bodies-belong-in-c-header-files/

IIRC also the Linux kernel uses "static inline" in header files.
msg155419 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2012-03-11 23:18
FWIW, I think we would be better off if this patch were merged in soon.  Waiting until later in the release cycle risks introducing bugs that we won't have time to notice or fix.  An early merge lets more people exercise the code.
msg155644 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2012-03-13 18:44
Stefan, please feel free to commit the latest patch.
msg155649 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2012-03-13 19:34
> FWIW, I think we would be better off if this patch were merged in soon.

+1
msg155743 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-03-14 11:19
Mark Dickinson <report@bugs.python.org> wrote:
> > FWIW, I think we would be better off if this patch were merged in soon.
> 
> +1

OK, I'm counting three +1 for merging soon. Thanks everyone for the
encouragement!

I'll then proceed with the merge this weekend or shortly after,
with one caveat:

My *second* self-review of mpdecimal.c is currently at 17%, but I can
as well continue with that if the whole thing is merged. I'll probably
replace the caching of ln(10) (not thread-safe, protected by the GIL)
with a real thread-safe version.
msg155744 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-03-14 11:38
Arfrever Frehtes Taifersar Arahesis <report@bugs.python.org> wrote:
> Please add --with-system-libmpdec (or --with-system-mpdecimal) option in
> `configure`, similarly to --with-system-expat and --with-system-ffi options.

I've added that to the list of issues. However, if there is any change in
the behavior of decimal.py that also applies to the library, the libmpdec
shipped with Python will be silently updated.

I can probably release a matching external libmpdec with every major Python
release, but I can't promise to track all minor releases.

This would mean that for Python-3.3 the yet unreleased mpdecimal-2.4,
hopefully with thread-safe mpd_pow(), mpd_ln() and mpd_log10() should
be used.
msg156402 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-03-20 12:49
We need to decide what to do with the different limits of the 64-bit and
32-bit versions:

                   MAX_EMAX    

default context     10**9-1     
64-bit             10**18-1
32-bit            425000000   


I think it would be annoying to have the values in DefaultContext, 
ExtendedContext and BasicContext depend on the machine.

The best thing might be to use Emax=10**8-1 and Emin=-(10**8-1) throughout.
I don't think many applications depend on having Emax=10**9-1. If they do,
they'll have to use the 64-bit version.
msg156480 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-03-21 08:03
> The best thing might be to use Emax=10**8-1 and Emin=-(10**8-1) throughout.
> I don't think many applications depend on having Emax=10**9-1. If they do,
> they'll have to use the 64-bit version.

10**6-1 would be another option. The advantage is that it's visually
obvious that the default exponents have changed.
msg156500 - (view) Author: Roundup Robot (python-dev) Date: 2012-03-21 17:27
New changeset 7355550d5357 by Stefan Krah in branch 'default':
Issue #7652: Integrate the decimal floating point libmpdec library to speed
http://hg.python.org/cpython/rev/7355550d5357
msg156671 - (view) Author: Roundup Robot (python-dev) Date: 2012-03-23 18:07
New changeset f6d646e30028 by Stefan Krah in branch 'default':
Issue #7652: Enable linking of _decimal.so against an installed libmpdec.
http://hg.python.org/cpython/rev/f6d646e30028
msg156890 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2012-03-27 01:09
You may now close the issue.
msg156952 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-03-27 18:57
I have a couple of questions about the proposed capsule C API. In order to
be useful, it should be possible to set temporary contexts and pass them
to functions. For example, PyDec_Add could look like:

   PyDec_Add(PyObject *a, PyObject *b, PyObject *context);


To stay sane, either access macros or access functions to the context would
need to be provided:

  CTX(workcontext)->prec = 100;
  PyDec_SetPrec(workcontext, 100);
  PyDecContext_SetPrec(workcontext, 100);


Probably flags would need to be checked at some point:

  if (CTX(workcontext)->status & MPD_Underflow)
  if (PyDec_GetStatus(workcontext) & MPD_Underflow)
  if (PyDecContext_GetStatus(workcontext) & MPD_Underflow)



I wonder if users would not be better off if libmpdec functions and
access macros were exposed instead. The big advantage is that errors
(even malloc errors) propagate in the form of NaNs, so it is not
necessary to do repeated error checking.

Also, I find things like (status & MPD_Underflow) more readable than
the alternatives above.


I've attached a short (fake) example to demonstrate what I mean.
msg164470 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-07-01 10:06
I've opened #15237 for the capsule API. I didn't add everyone to the
nosy list, since I suspect it's not of general interest.
msg165329 - (view) Author: Roundup Robot (python-dev) Date: 2012-07-12 19:18
New changeset afdb0e1a9dac by Stefan Krah in branch 'default':
Issue #7652: Clean up _mpd_qinvroot() and mark it LIBMPDEC_ONLY. Use the
http://hg.python.org/cpython/rev/afdb0e1a9dac
msg165330 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-07-12 19:40
I switched the algorithm in mpd_qsqrt() to the one from decimal.py.
Previously the square root was calculated in terms of 1/invsqrt(x).

Curiously enough this scheme _always_ seems to produce exact results
when expected, but I don't have a proof. I remember I left this in because the specification gives some leeway with respect to exact
results:


"Square-root can also be calculated by using the power operation (with a second operand of 0.5). The result in that case will not be exact in most cases, and may not be correctly rounded."


Anyway, the algorithm from decimal.py gives the desired guarantees
and is also faster.


Since we're almost in beta-2 stage, would someone be able to do a
post commit review of mpd_qsqrt()? It should be a direct translation
from the function in decimal.py.
msg165339 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2012-07-12 22:17
I compared both implementations, and they are the same.

I noticed that on line 7537, the call to mpd_qshiftl() may "goto malloc_error;".  I think there is a memory leak in this case, "mpd_del(&c)" and 2 others lines are skipped.
msg165341 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-07-12 22:40
Thanks, Amaury! -- "goto malloc_error" should not leak, because there's always
a jump to the "out" label in line 7563.

I use this idiom a lot ever since I saw it in several places in the Linux
kernel. Of course it's a matter of taste.
msg168034 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2012-08-12 12:21
Is there some remaining work on this issue? Or can it be closed?
msg168035 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-08-12 12:52
I'm almost done with my (second) self-review of mpdecimal.c. The only
functions missing are all Karatsuba functions and mpd_qpowmod().

If there are any takers, I would be very happy. For the Karatsuba functions
you'll probably need Roman Maeder's paper, which I could send to anyone
interested. All I can promise is that it's a beautiful formulation of the
algorithm. :)
msg169692 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-09-02 12:17
My review is done. The Karatsuba function is basically a small stack machine
and very hard to prove formally as far as I can see. The algorithm is cited
in TAOCP and the subdivision is brute force tested for all combinations of
coefficient lengths of the two input operands that are used in libmpdec
(currently 256 < nwords <= 1024).
History
Date User Action Args
2012-09-02 12:18:25skrahsetstatus: open -> closed
2012-09-02 12:17:23skrahsetassignee: skrah
resolution: fixed
messages: + msg169692
stage: patch review -> committed/rejected
2012-08-12 12:52:28skrahsetmessages: + msg168035
2012-08-12 12:21:45hayposetmessages: + msg168034
2012-07-12 22:40:17skrahsetmessages: + msg165341
2012-07-12 22:17:11amaury.forgeotdarcsetmessages: + msg165339
2012-07-12 19:40:11skrahsetmessages: + msg165330
2012-07-12 19:18:54python-devsetmessages: + msg165329
2012-07-01 10:06:19skrahsetmessages: + msg164470
2012-03-27 18:57:32skrahsetfiles: + api-demo.c

messages: + msg156952
2012-03-27 01:09:53hayposetmessages: + msg156890
2012-03-23 18:07:28python-devsetmessages: + msg156671
2012-03-21 17:27:56python-devsetnosy: + python-dev
messages: + msg156500
2012-03-21 11:48:28asvetlovsetnosy: + asvetlov
2012-03-21 08:03:30skrahsetmessages: + msg156480
2012-03-20 12:49:29skrahsetmessages: + msg156402
2012-03-14 11:38:23skrahsetmessages: + msg155744
2012-03-14 11:19:15skrahsetmessages: + msg155743
2012-03-13 19:34:15mark.dickinsonsetmessages: + msg155649
2012-03-13 18:44:42benjamin.petersonsetmessages: + msg155644
2012-03-11 23:18:52rhettingersetmessages: + msg155419
2012-03-11 11:05:20skrahsetmessages: + msg155381
2012-03-10 22:57:25Arfreversetnosy: + Arfrever
messages: + msg155359
2012-03-10 18:01:11skrahsetmessages: + msg155331
2012-03-10 17:59:29skrahsetfiles: + 9b3b1f5c4072.diff
2012-03-08 00:06:14skrahsetmessages: + msg155135
2012-03-07 19:42:16Jim.Jewettsetmessages: + msg155107
2012-03-07 16:17:44skrahsetmessages: + msg155095
versions: + Python 3.3, - Python 3.2
2012-03-07 15:32:26skrahsetmessages: + msg155093
2012-03-07 13:54:22skrahsetmessages: + msg155088
2012-03-07 13:52:06pitrousetmessages: + msg155087
2012-03-07 13:49:44skrahsetmessages: + msg155086
2012-03-07 13:22:59skrahsetmessages: + msg155085
2012-03-07 12:43:14pitrousetmessages: + msg155083
2012-03-07 12:29:10skrahsetmessages: + msg155082
2012-03-07 12:18:46skrahsetmessages: + msg155079
2012-03-07 10:37:54lemburgsetnosy: + lemburg
messages: + msg155071
2012-03-07 10:28:57skrahsetmessages: + msg155070
2012-03-07 08:30:01casevhsetmessages: + msg155059
2012-03-07 06:40:52Ramchandra Aptesetnosy: + Ramchandra Apte
messages: + msg155050
2012-03-07 04:09:42benjamin.petersonsetmessages: + msg155047
2012-03-07 04:04:03Jim.Jewettsetmessages: + msg155046
2012-03-07 03:49:17benjamin.petersonsetnosy: + benjamin.peterson
messages: + msg155045
2012-03-06 20:14:52skrahsetmessages: + msg155036
2012-03-06 20:07:34skrahsetmessages: + msg155034
2012-03-06 18:36:04brian.curtinsetnosy: - brian.curtin
2012-03-06 18:23:11Jim.Jewettsetnosy: + Jim.Jewett
messages: + msg155031
2012-03-05 23:47:19rhettingersetmessages: + msg154992
2012-03-05 23:15:54hayposetmessages: + msg154988
2012-02-23 17:33:52skrahsetmessages: + msg154074
2012-02-23 17:32:29skrahsetfiles: + 40917e4b51aa.diff
2012-02-16 22:28:10skrahsetmessages: + msg153507
2012-02-16 22:25:53skrahsetmessages: + msg153506
2012-02-15 23:14:23hayposetmessages: + msg153447
2011-12-16 08:51:37skrahsetmessages: + msg149600
2011-12-15 17:33:09Amaury.Forgeot.d'Arcsetnosy: + Amaury.Forgeot.d'Arc
messages: + msg149567
2011-12-15 14:12:24pitrousetmessages: + msg149559
2011-12-15 14:03:28skrahsetmessages: + msg149557
2011-12-15 14:00:51skrahsetfiles: + ppro-mulmod.txt

messages: + msg149556
2011-12-14 21:13:56skrahsetfiles: + be8a59fcba49.diff
2011-12-01 13:38:13skrahsetmessages: + msg148721
2011-11-30 21:28:32skrahsetmessages: + msg148691
2011-11-30 21:26:39skrahsetfiles: + bba956250186.diff
2011-11-30 21:15:44skrahsetmessages: + msg148690
2011-11-30 21:13:43hayposetmessages: + msg148689
2011-11-30 21:12:36skrahsetmessages: + msg148688
2011-11-30 21:10:27skrahsetmessages: + msg148687
2011-11-30 20:18:16amaury.forgeotdarcsetnosy: + amaury.forgeotdarc
messages: + msg148682
2011-11-30 19:10:01rhettingersetmessages: + msg148680
2011-11-30 17:27:11pitrousetnosy: + pitrou
messages: + msg148678
2011-11-30 17:17:36skrahsetmessages: + msg148677
2011-11-30 16:05:23skrahsetmessages: + msg148669
2011-11-30 13:03:29mark.dickinsonsetmessages: + msg148652
2011-11-30 12:40:43hayposetmessages: + msg148650
2011-11-24 23:56:08cedsetnosy: + ced
2011-09-29 19:30:06eric.snowsetnosy: + eric.snow
2011-06-19 22:40:24hayposetnosy: + haypo
2011-06-18 14:20:40skrahsetmessages: + msg138583
2011-06-18 13:36:17skrahsetfiles: + 49433f35a5f8.diff
2011-06-18 13:33:58skrahsetfiles: - 9a10e3232445.diff
2011-06-09 22:56:55skrahsetstage: patch review
2011-06-09 22:34:06skrahsetmessages: + msg138029
2011-06-09 22:23:30skrahsetfiles: + 9a10e3232445.diff
2011-06-09 21:51:39skrahsetfiles: - ad05c2fdb3b2.diff
2011-06-09 21:48:24skrahsetfiles: + ad05c2fdb3b2.diff
2011-06-09 21:45:27skrahsethgrepos: + hgrepo25
2011-06-09 21:22:19skrahsetfiles: - trunk_help_unify.diff
2011-06-09 21:22:15skrahsetfiles: - py3k_help_unify.diff
2010-11-18 12:08:26skrahsetmessages: + msg121452
2010-11-07 12:30:58mark.dickinsonsetmessages: + msg120674
2010-11-03 03:28:06casevhsetnosy: + casevh
messages: + msg120305
2010-01-22 19:57:40skrahsetmessages: + msg98161
2010-01-13 14:17:29skrahsetfiles: + trunk_help_unify.diff
2010-01-13 14:17:02skrahsetfiles: + py3k_help_unify.diff
keywords: + patch
messages: + msg97719
2010-01-08 11:49:58skrahsetmessages: + msg97412
2010-01-08 11:46:14skrahsetmessages: + msg97411
2010-01-07 22:06:06skrahsetmessages: + msg97382
2010-01-07 21:35:45brian.curtinsetnosy: + brian.curtin
messages: + msg97377
2010-01-07 20:58:48mark.dickinsonsetmessages: + msg97373
2010-01-07 20:54:14mark.dickinsonsetmessages: + msg97372
2010-01-07 15:58:21jjcontisetnosy: + jjconti
2010-01-07 13:41:29mark.dickinsonsetmessages: + msg97355
2010-01-07 11:40:47eric.smithsetnosy: + eric.smith
messages: + msg97348
2010-01-07 10:42:50mark.dickinsoncreate