This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Buggy Decimal.__sizeof__
Type: behavior Stage: resolved
Components: Versions: Python 3.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: skrah Nosy List: loewis, mark.dickinson, pitrou, python-dev, skrah
Priority: normal Keywords:

Created on 2012-04-07 10:55 by pitrou, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Messages (10)
msg157726 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-04-07 10:55
I'm not sure __sizeof__ is implemented correctly:

>>> from decimal import Decimal
>>> import sys
>>> d = Decimal(123456789123456798123456789123456798123456789123456798)
>>> d
Decimal('123456789123456798123456789123456798123456789123456798')
>>> sys.getsizeof(d)
24

... looks too small.
msg157729 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-04-07 11:29
It isn't implemented at all. The Python version also always returns 96,
irrespective of the coefficient length. Well, arguably the coefficient
is a separate object in the Python version:

96
>>> sys.getsizeof(d._int)
212

For the C version I'll do the same as in longobject.c.
msg157730 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-04-07 11:32
In full:

>>> d = Decimal(100000000000000000000000000000000000000000000000000000000000000000000)                
>>> sys.getsizeof(d)                                                                                  
96                                                                                                    
>>> sys.getsizeof(d._int)                                                                             
212
msg157798 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2012-04-08 17:30
There are really two options:

a) if an object is a container, and the contained is accessible to reflection (preferably through gc.get_referents), then the container shouldn't account for the size of the contained.
b) if the contained is not accessible (except for sys.get_objects() in a debug build), then the container should provide the total sum.

A memory debugger is supposed to find all objects (e.g. through gc.get_objects, and gc.get_referents), eliminate duplicate references, and then apply sys.getsizeof for each object. This should then not leave out any memory, and not count any memory twice.
msg157857 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2012-04-09 16:14
In the C version of decimal, do distinct Decimal objects ever share coefficients?  (This would be an obvious optimization for methods like Decimal.copy_negate;  I don't know whether the C version applies such optimizations.)  If there's potential for shared coefficients, that might make the "not count any memory twice" part tricky.
msg157860 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2012-04-09 16:30
> In the C version of decimal, do distinct Decimal objects ever share  
> coefficients?  (This would be an obvious optimization for methods  
> like Decimal.copy_negate;  I don't know whether the C version  
> applies such optimizations.)  If there's potential for shared  
> coefficients, that might make the "not count any memory twice" part  
> tricky.

I know of three strategies to deal with such a case:
a) expose the inner objects, preferably through tp_traverse, and
    don't account for them in the container,
b) find a "canonical" owner of the contained objects, and only
    account for them along with the canonical container.
c) compute the number N of shared owners, and divide the object
    size by N. Due to rounding, this may be somewhat incorrect.
msg157861 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-04-09 16:54
Mark Dickinson <report@bugs.python.org> wrote:
> In the C version of decimal, do distinct Decimal objects ever share coefficients?

The coefficients are members of the mpd_t struct (libmpdec data type),
and they are not exposed as Python objects or shared.

Cache locality is incredibly important: I have a patch that reserves
a static coefficient of four words inside the PyDecObject. This patch
speeds up _decimal by roughly another 30-40% for regularly sized decimals.

If the decimal grows beyond that, libmpdec automatically switches to
a dynamically allocated coefficient.

I think sharing would probably slow things down a bit.
msg157864 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2012-04-09 17:14
> and they are not exposed as Python objects or shared.

Okay, thanks.  Sounds like this isn't an issue at the moment then.

+1 for having getsizeof report the total size used.
msg157886 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2012-04-09 19:33
New changeset 010aa5d955ac by Stefan Krah in branch 'default':
Issue #14520: Add __sizeof__() method to the Decimal object.
http://hg.python.org/cpython/rev/010aa5d955ac
msg157889 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-04-09 19:51
Thanks for the explanations. The new __sizeof__() method should now
report the exact memory usage.
History
Date User Action Args
2022-04-11 14:57:28adminsetgithub: 58725
2012-04-09 19:51:51skrahsetstatus: open -> closed
resolution: fixed
messages: + msg157889

stage: resolved
2012-04-09 19:33:35python-devsetnosy: + python-dev
messages: + msg157886
2012-04-09 17:14:18mark.dickinsonsetmessages: + msg157864
2012-04-09 16:54:38skrahsetmessages: + msg157861
2012-04-09 16:30:24loewissetmessages: + msg157860
2012-04-09 16:14:27mark.dickinsonsetmessages: + msg157857
2012-04-09 16:09:11mark.dickinsonsetnosy: + mark.dickinson
2012-04-08 17:30:56loewissetnosy: + loewis
messages: + msg157798
2012-04-07 11:32:04skrahsetmessages: + msg157730
2012-04-07 11:29:10skrahsetmessages: + msg157729
2012-04-07 10:55:38pitroucreate