classification
Title: Intern certain integral floats for memory savings and performance
Type: enhancement Stage:
Components: Interpreter Core Versions: Python 3.3
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: Nosy List: eric.snow, jcea, kristjan.jonsson, mark.dickinson, michael.foord, pitrou, rhettinger, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2012-03-21 17:47 by kristjan.jonsson, last changed 2012-04-20 13:41 by kristjan.jonsson. This issue is now closed.

Files
File name Uploaded Description Edit
internfloat.patch kristjan.jonsson, 2012-03-21 17:47 review
internfloat.patch kristjan.jonsson, 2012-03-22 15:49 Second patch, with using only 0.0 and 1.0 review
Messages (20)
msg156502 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2012-03-21 17:47
Michael Foord reminded me of this issue recently.  It was discussed on pydev a few years back and met with limited enthusiasm.  I speak from experience in live production with EVE that this simple change saved us a lot of memory.  Integral floating point numbers are surprisingly common, falling out of mathematical operations and when reading tabular data.  0.0 and 1.0 in particular.
msg156504 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-03-21 18:42
Does it not decrease the performance?

Falling out integral floating point numbers in the mathematical floating point calculations seems unlikely. I suggest interning integral floats only in conversions str -> float and int -> float. Exception can be made to zero, which can result in operations with zero, and which simple tested.
msg156509 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2012-03-21 19:42
This looks like a duplicate of issue 4024.
msg156564 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2012-03-22 14:43
Yes, there is a measurable performance decrease in pybench arithmetic tests.

Integers don't fall out of arithmetic that often, true.  But integral floats are incredibly common in tabular data.  In a game such as Eve Online,  configuration data contains a lot of 0.0, 1.0, -1.0 and so on.  This patch saved us many megabytes on the server.  I can check again...

>>> [sys.getrefcount(float(i)) for i in range(-10, 11)]
[777, 4, 38, 9, 215, 691, 627, 185, 98, 603, 73180, 62111, 8326, 6225, 6357, 11737, 2906, 1393, 3142, 1145, 5601]
>>> sum([sys.getrefcount(float(i)) for i in range(-10, 11)])
185340
>>> 
This is on an idle server.  A server with lots of stuff going on will have this:
[16715, 184, 1477, 34, 1505, 27102, 3878, 1344, 6248, 974, 595889, 313062, 124072, 120054, 65585, 138667, 13265, 2499, 15677, 3175, 24821]
>>> sum([sys.getrefcount(float(i)) for i in range(-10, 11)])
1465155

About half of the interned floats are 0.0
On a 64 bit machine with each float taking 24 bytes, this is 35mb net.

An alternative could be to add a function for manual intering of floating point data, which one can use e.g. when reading tabular data.
msg156566 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-03-22 14:59
> Integers don't fall out of arithmetic that often, true.  But integral
> floats are incredibly common in tabular data.  In a game such as Eve
> Online,  configuration data contains a lot of 0.0, 1.0, -1.0 and so
> on.  This patch saved us many megabytes on the server.

Can't you do your own interning when reading configuration data? You can even do it in pure Python.

If CPython starts interning some integral floats, I think only {-1.0, -0.0, 0.0, 1.0} should apply. The rest is too application-specific.
msg156568 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2012-03-22 15:07
The -10..11 range was determined empirically.  As you see from the values,  only -10 shows up as significant... In fact, just storing 0 to 5 would capture the bulk of the savings.
 
Right, I'll do some test with the hardcoded values you mentioned.
Btw, I don't think -0.0 is worth it, that value is a typical arithmetic result and not something you typically get from input data.

Also, while interning in python is certainly possible, perhaps it would make sense to implement such in c.  Perhaps in the math module, or perhaps  even the 'intern' builtin can be extended for this purpose?  It could be documented to intern any immutable object that it sees fit to intern, leaving the details to the implementation.
msg156569 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-03-22 15:12
> Right, I'll do some test with the hardcoded values you mentioned.
> Btw, I don't think -0.0 is worth it, that value is a typical
> arithmetic result and not something you typically get from input data.

I was suggesting -0.0 in the hope that it might make the patch a bit
simpler, but I was probably wrong: you still have to test the float's
sign to distinguish between the two values.

> Also, while interning in python is certainly possible, perhaps it
> would make sense to implement such in c.

I think it makes sense in the float implementation if it doesn't
significantly decrease performance. I suggest you post the benchmark
numbers (see http://hg.python.org/benchmarks/ ).

Adding a separate builtin/stdlib function would be overkill IMO.
msg156571 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-03-22 15:20
Try moving your changes from PyFloat_FromDouble to PyFloat_FromString. Look at memory and perfomance.
msg156578 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2012-03-22 15:49
PyFloat_FromString() is very restrictive.  In our case, for example, the floats in question don't come from text files.

I'm adding a new file with changes suggested by Antoine.  only 0.0 and 1.0 are interned.  It turns out that the "int" test is expensive, and special casing it to check for '0.0' and '1.0' with direct float comparison makes the performance hit go away.  Here are the figures:


D:\pydev\hg\cpython2>pcbuild10\python.exe Tools\pybench\pybench.py -s interned8 -c normal
-------------------------------------------------------------------------------
PYBENCH 2.1
-------------------------------------------------------------------------------
* using CPython 3.3.0a1+ (default, Mar 22 2012, 15:39:50) [MSC v.1600 32 bit (Intel)]
* disabled garbage collection
* system check interval set to maximum: 2147483647
* using timer: time.clock

-------------------------------------------------------------------------------
Benchmark: interned8
-------------------------------------------------------------------------------

    Rounds: 10
    Warp:   10
    Timer:  time.clock

    Machine Details:
       Platform ID:    Windows-7-6.1.7601-SP1
       Processor:      Intel64 Family 6 Model 26 Stepping 5, GenuineIntel

    Python:
       Implementation: CPython
       Executable:     D:\pydev\hg\cpython2\pcbuild10\python.exe
       Version:        3.3.0a1+
       Compiler:       MSC v.1600 32 bit (Intel)
       Bits:           32bit
       Build:          Mar 22 2012 15:39:50 (#default)
       Unicode:        UCS4


-------------------------------------------------------------------------------
Comparing with: normal
-------------------------------------------------------------------------------

    Rounds: 10
    Warp:   10
    Timer:  time.clock

    Machine Details:
       Platform ID:    Windows-7-6.1.7601-SP1
       Processor:      Intel64 Family 6 Model 26 Stepping 5, GenuineIntel

    Python:
       Implementation: CPython
       Executable:     D:\pydev\hg\cpython2\pcbuild10\python.exe
       Version:        3.3.0a1+
       Compiler:       MSC v.1600 32 bit (Intel)
       Bits:           32bit
       Build:          Mar 22 2012 13:53:45 (#default)
       Unicode:        UCS4


Test                             minimum run-time        average  run-time
                                 this    other   diff    this    other   diff
-------------------------------------------------------------------------------
          BuiltinFunctionCalls:    59ms    59ms   -0.7%    60ms    60ms   -0.3%
           BuiltinMethodLookup:    40ms    40ms   -0.2%    40ms    40ms   -0.2%
                 CompareFloats:    47ms    46ms   +2.3%    47ms    46ms   +2.3%
         CompareFloatsIntegers:   125ms   126ms   -0.8%   129ms   131ms   -2.0%
               CompareIntegers:    76ms    76ms   -0.2%    76ms    76ms   -0.3%
        CompareInternedStrings:    60ms    60ms   -0.0%    61ms    61ms   +0.2%
                  CompareLongs:    44ms    44ms   -0.4%    44ms    44ms   -0.4%
                CompareStrings:    59ms    58ms   +2.1%    59ms    58ms   +1.8%
    ComplexPythonFunctionCalls:    59ms    60ms   -1.7%    60ms    61ms   -2.5%
                 ConcatStrings:    49ms    55ms  -10.7%    49ms    55ms  -10.6%
               CreateInstances:    67ms    66ms   +1.1%    68ms    68ms   -0.1%
            CreateNewInstances:    50ms    50ms   +1.3%    51ms    50ms   +0.4%
       CreateStringsWithConcat:    70ms    71ms   -1.3%    70ms    72ms   -1.7%
                  DictCreation:    45ms    46ms   -1.7%    46ms    46ms   -1.5%
             DictWithFloatKeys:    81ms    84ms   -2.8%    81ms    84ms   -2.7%
           DictWithIntegerKeys:    52ms    55ms   -5.4%    52ms    55ms   -5.1%
            DictWithStringKeys:    47ms    51ms   -6.6%    48ms    51ms   -6.6%
                      ForLoops:    42ms    42ms   +0.2%    42ms    42ms   +0.2%
                    IfThenElse:    63ms    59ms   +5.5%    63ms    59ms   +5.6%
                   ListSlicing:    39ms    39ms   +0.0%    39ms    39ms   -0.4%
                NestedForLoops:    67ms    67ms   -0.2%    68ms    68ms   +0.0%
      NestedListComprehensions:    63ms    62ms   +1.2%    64ms    63ms   +1.2%
          NormalClassAttribute:   102ms   100ms   +1.3%   102ms   101ms   +1.4%
       NormalInstanceAttribute:    62ms    62ms   +0.2%    63ms    63ms   +0.2%
           PythonFunctionCalls:    60ms    60ms   +0.8%    61ms    60ms   +1.3%
             PythonMethodCalls:    74ms    74ms   -0.0%    74ms    74ms   -0.3%
                     Recursion:   103ms   102ms   +1.4%   104ms   102ms   +1.3%
                  SecondImport:   109ms   109ms   -0.4%   110ms   110ms   -0.6%
           SecondPackageImport:   111ms   112ms   -0.6%   113ms   113ms   +0.2%
         SecondSubmoduleImport:   154ms   153ms   +0.7%   155ms   155ms   -0.2%
       SimpleComplexArithmetic:    56ms    54ms   +3.7%    57ms    56ms   +1.9%
        SimpleDictManipulation:    94ms    93ms   +0.3%    94ms    94ms   -0.2%
         SimpleFloatArithmetic:    45ms    44ms   +3.2%    45ms    45ms   +1.7%
      SimpleIntFloatArithmetic:    57ms    56ms   +1.8%    57ms    57ms   +0.1%
       SimpleIntegerArithmetic:    56ms    56ms   +1.2%    57ms    56ms   +1.0%
      SimpleListComprehensions:    53ms    52ms   +1.8%    54ms    53ms   +2.4%
        SimpleListManipulation:    49ms    53ms   -8.0%    50ms    53ms   -6.7%
          SimpleLongArithmetic:    41ms    40ms   +1.6%    41ms    41ms   +0.1%
                    SmallLists:    66ms    68ms   -2.8%    67ms    68ms   -2.5%
                   SmallTuples:    76ms    72ms   +5.5%    76ms    72ms   +5.8%
         SpecialClassAttribute:   144ms   140ms   +2.8%   145ms   142ms   +2.1%
      SpecialInstanceAttribute:    64ms    64ms   +0.7%    65ms    64ms   +1.1%
                StringMappings:   126ms   132ms   -4.2%   127ms   132ms   -4.0%
              StringPredicates:    71ms    70ms   +2.3%    72ms    70ms   +2.5%
                 StringSlicing:    66ms    67ms   -0.6%    67ms    67ms   -1.1%
                     TryExcept:    44ms    43ms   +1.5%    44ms    44ms   +0.7%
                    TryFinally:    49ms    53ms   -7.4%    49ms    54ms   -8.1%
                TryRaiseExcept:    24ms    22ms   +8.6%    25ms    23ms   +8.9%
                  TupleSlicing:    61ms    61ms   -0.7%    61ms    62ms   -0.5%
                   WithFinally:    65ms    65ms   -0.3%    65ms    65ms   -0.0%
               WithRaiseExcept:    62ms    62ms   -0.4%    62ms    63ms   -0.6%
-------------------------------------------------------------------------------
Totals:                          3448ms  3454ms   -0.2%  3474ms  3487ms   -0.4%

(this=interned8, other=normal)


D:\pydev\hg\cpython2>
msg156623 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-03-22 22:44
Ok. In the current state it doesn't look detrimental, although I'm not convinced it's that useful either.
That said, the implementation needs a bit of work, I don't think it's ok for float_is_positive_zero() to use the Py_LONG_LONG casting thing.
msg156637 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-03-23 07:52
PyBench contains a lot of different tests and executes them not too long. Random fluctuations of a few percent are possible. We need a more stable realistic test working with floats.

$ ./python -m timeit 'x,y,d=1,0,0.01
for i in range(628):
  x,y=x-d*y,y+d*x'

Without patch: 1000 loops, best of 3: 220 usec per loop

With patch: 1000 loops, best of 3: 232 usec per loop

Difference is 5%.
msg156677 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2012-03-23 20:11
FWIW, I'm -1 on making the change.  Any memory savings would be microscopic for most Python programs.  And anything that slows down float creation (even a teeny bit) is detrimental to all Python programs.  We're better off not adding special case logic unless there is a clear benefit (there isn't).
msg156825 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2012-03-26 14:56
Why do you think it isn't safe, Antoine?
We only support IEEE platforms, and IEEE defines positive 0.0 to be all bits cleared.

I personally don't think that even a small measurable performance degradation in creating new floats is problematic since that is not where typical cpu power is spent.  But this is the second time I´ve made this suggestion (the first was on python-dev) and in both cases I've run up against the opinion that it is a "rare" problem and worries about "performance."  Yet this seemed not to be an issue when ints/longs were united.

So, while I was recently prompted to resubmit this suggestion, I feel no  need to push the matter further and am just letting it rest here, then.  I'm sure someone will think it is a good idea again in the future and can then reopen it.
msg158709 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2012-04-19 09:19
> Why do you think it isn't safe, Antoine?

It violates C's strict aliasing rules;  Google for 'C strict aliasing' or 'C type punning' for more information.  This isn't just a theoretical concern:  gcc is known to make optimizations based on the assumption that strict aliasing isn't violated.
msg158712 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2012-04-19 10:20
> We only support IEEE platforms.

I don't think that's true, BTW.
msg158752 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2012-04-19 21:06
return *(PY_LONG_LONG*)&fval == 0;
There is no aliasing, because there are no pointer variables in existence.
If we did this:

double *pfval = &fval;
PY_LONG_LONG *pl = (PY_LONG_LONG*)pfval
return *pfval == 0

Then we would have aliasing.  Because "pfval" in this example doesn't exist but is merely a temporary, there is no aliasing.

As for IEEE compatibility, I don't think we could have our own floating point formatting library if we didn't make that assumption, but I might be wrong about that.  On the other hand, I don't think there is a supported python architecture that defines positive zero as anything else than bitwise zero.  And should such a platform be found, it is easy enough to disable this code for it.
msg158800 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2012-04-20 06:46
> Because "pfval" in this example doesn't exist but is merely a temporary, > there is no aliasing.

Maybe aliasing wasn't the right word to use, then.  The exact rule being violated is C99 6.5, para. 7 (or C11 6.5 para. 7 if you prefer):

"""
An object shall have its stored value accessed only by an lvalue expression that has one of the following types: ...

"""

In any case, I don't think it's relevant whether the values involved are named variables or temporaries.

> I don't think we could have our own floating point formatting library > if we didn't make that assumption

There are configure-time checks that determine whether that floating-point formatting library is used or not, including (amongst some other things) whether the underlying floating-point type is IEEE 754.  Officially, Python shouldn't assume IEEE 754 floating-point anywhere---the core should build fine without it, and there's lots of code in the core to deal with non-IEEE 754 platforms.

> And should such a platform be found, it is easy enough to disable this > code for it.

I think that's the wrong way around:  the code should only be enabled in the first place when we're sure that the platform is IEEE 754.  That shouldn't be hard to build into the patch, since the checks for IEEE 754 are already all over the place for other purposes.
msg158809 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2012-04-20 09:14
Interesting.
I declare that this rule does not apply here since the code is a deliberate hack:  We are pretending that a certain address points to integers and checking those integers.  
If you insist on following the standard, you could do 

double cmp = 0;
return mcmcmp(&cmp, &fval, sizeof(fval)) == 0;

but on all real machines this is the same as:

PY_LONG_LONG cmp = 0;
return mcmcmp(&cmp, &fval, sizeof(fval)) == 0;

Which again is the same as 
return *(PY_LONG_LONG*)&fval == 0;
technically speaking, even if the standard doesn't agree.  You could implement this with in-line assembly and so cheerfully sidestep the standard.

When you're hacking, your're hacking and the standard isn't your friend :)

As for IEEE, sure, anyway that thing is oriented is fine, although in this day and age, I find it rather amusing that the logic thinks of IEEE support as the exception, rather than the rule.

Anyway, this proposal has been rejected due to lack of interest or some misguided idea of performance, so the point is moot.
msg158811 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2012-04-20 09:54
> I declare that this rule does not apply here ...

Clearly the gcc developers disagree. :-)


iwasawa:~ mdickinson$ cat test2.c
int is_positive_zero(double f) {
  return *(long long*)&f == 0;
}
iwasawa:~ mdickinson$ gcc -fstrict-aliasing -O3 -Wall -Wextra -Wstrict-aliasing -c test2.c
test2.c: In function ‘is_positive_zero’:
test2.c:2: warning: dereferencing type-punned pointer will break strict-aliasing rules
msg158827 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2012-04-20 13:41
Dang.  I yield!
History
Date User Action Args
2012-04-20 13:41:16kristjan.jonssonsetmessages: + msg158827
2012-04-20 09:54:11mark.dickinsonsetmessages: + msg158811
2012-04-20 09:14:54kristjan.jonssonsetmessages: + msg158809
2012-04-20 09:13:32kristjan.jonssonsetmessages: - msg158808
2012-04-20 09:12:42kristjan.jonssonsetmessages: + msg158808
2012-04-20 06:46:42mark.dickinsonsetmessages: + msg158800
2012-04-19 21:06:28kristjan.jonssonsetmessages: + msg158752
2012-04-19 10:20:12mark.dickinsonsetmessages: + msg158712
2012-04-19 09:19:02mark.dickinsonsetmessages: + msg158709
2012-04-19 01:22:38jceasetnosy: + jcea
2012-03-27 01:20:09eric.snowsetnosy: + eric.snow
2012-03-26 14:56:36kristjan.jonssonsetstatus: open -> closed
resolution: rejected
messages: + msg156825
2012-03-23 20:11:46rhettingersetnosy: + rhettinger
messages: + msg156677
2012-03-23 07:52:55serhiy.storchakasetmessages: + msg156637
2012-03-22 22:44:56pitrousetmessages: + msg156623
2012-03-22 15:49:35kristjan.jonssonsetfiles: + internfloat.patch

messages: + msg156578
2012-03-22 15:20:30serhiy.storchakasetmessages: + msg156571
2012-03-22 15:12:56pitrousetmessages: + msg156569
2012-03-22 15:07:44kristjan.jonssonsetmessages: + msg156568
2012-03-22 14:59:46pitrousetnosy: + pitrou
messages: + msg156566
2012-03-22 14:43:44kristjan.jonssonsetmessages: + msg156564
2012-03-22 11:55:09kristjan.jonssonlinkissue4024 superseder
2012-03-21 19:42:29mark.dickinsonsetnosy: + mark.dickinson
messages: + msg156509
2012-03-21 18:42:51serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg156504
2012-03-21 17:47:22kristjan.jonssoncreate