Intern certain integral floats for memory savings and performance #58589

kristjanvalur · 2012-03-21T17:47:23Z

BPO	14381
Nosy	@rhettinger, @jcea, @mdickinson, @pitrou, @kristjanvalur, @voidspace, @ericsnowcurrently, @serhiy-storchaka
Files	internfloat.patch internfloat.patch: Second patch, with using only 0.0 and 1.0

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2012-03-26.14:56:36.998>
created_at = <Date 2012-03-21.17:47:22.712>
labels = ['interpreter-core', 'type-feature']
title = 'Intern certain integral floats for memory savings and performance'
updated_at = <Date 2012-04-20.13:41:16.698>
user = 'https://github.com/kristjanvalur'

bugs.python.org fields:

activity = <Date 2012-04-20.13:41:16.698>
actor = 'kristjan.jonsson'
assignee = 'none'
closed = True
closed_date = <Date 2012-03-26.14:56:36.998>
closer = 'kristjan.jonsson'
components = ['Interpreter Core']
creation = <Date 2012-03-21.17:47:22.712>
creator = 'kristjan.jonsson'
dependencies = []
files = ['24986', '24994']
hgrepos = []
issue_num = 14381
keywords = ['patch']
message_count = 20.0
messages = ['156502', '156504', '156509', '156564', '156566', '156568', '156569', '156571', '156578', '156623', '156637', '156677', '156825', '158709', '158712', '158752', '158800', '158809', '158811', '158827']
nosy_count = 8.0
nosy_names = ['rhettinger', 'jcea', 'mark.dickinson', 'pitrou', 'kristjan.jonsson', 'michael.foord', 'eric.snow', 'serhiy.storchaka']
pr_nums = []
priority = 'normal'
resolution = 'rejected'
stage = None
status = 'closed'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue14381'
versions = ['Python 3.3']

kristjanvalur · 2012-03-21T17:47:22Z

Michael Foord reminded me of this issue recently. It was discussed on pydev a few years back and met with limited enthusiasm. I speak from experience in live production with EVE that this simple change saved us a lot of memory. Integral floating point numbers are surprisingly common, falling out of mathematical operations and when reading tabular data. 0.0 and 1.0 in particular.

serhiy-storchaka · 2012-03-21T18:42:51Z

Does it not decrease the performance?

Falling out integral floating point numbers in the mathematical floating point calculations seems unlikely. I suggest interning integral floats only in conversions str -> float and int -> float. Exception can be made to zero, which can result in operations with zero, and which simple tested.

mdickinson · 2012-03-21T19:42:30Z

This looks like a duplicate of bpo-4024.

kristjanvalur · 2012-03-22T14:43:43Z

Yes, there is a measurable performance decrease in pybench arithmetic tests.

Integers don't fall out of arithmetic that often, true. But integral floats are incredibly common in tabular data. In a game such as Eve Online, configuration data contains a lot of 0.0, 1.0, -1.0 and so on. This patch saved us many megabytes on the server. I can check again...

>>> [sys.getrefcount(float(i)) for i in range(-10, 11)]
[777, 4, 38, 9, 215, 691, 627, 185, 98, 603, 73180, 62111, 8326, 6225, 6357, 11737, 2906, 1393, 3142, 1145, 5601]
>>> sum([sys.getrefcount(float(i)) for i in range(-10, 11)])
185340
>>> 
This is on an idle server.  A server with lots of stuff going on will have this:
[16715, 184, 1477, 34, 1505, 27102, 3878, 1344, 6248, 974, 595889, 313062, 124072, 120054, 65585, 138667, 13265, 2499, 15677, 3175, 24821]
>>> sum([sys.getrefcount(float(i)) for i in range(-10, 11)])
1465155

About half of the interned floats are 0.0
On a 64 bit machine with each float taking 24 bytes, this is 35mb net.

An alternative could be to add a function for manual intering of floating point data, which one can use e.g. when reading tabular data.

pitrou · 2012-03-22T14:59:46Z

Integers don't fall out of arithmetic that often, true. But integral
floats are incredibly common in tabular data. In a game such as Eve
Online, configuration data contains a lot of 0.0, 1.0, -1.0 and so
on. This patch saved us many megabytes on the server.

Can't you do your own interning when reading configuration data? You can even do it in pure Python.

If CPython starts interning some integral floats, I think only {-1.0, -0.0, 0.0, 1.0} should apply. The rest is too application-specific.

kristjanvalur · 2012-03-22T15:07:44Z

The -10..11 range was determined empirically. As you see from the values, only -10 shows up as significant... In fact, just storing 0 to 5 would capture the bulk of the savings.

Right, I'll do some test with the hardcoded values you mentioned.
Btw, I don't think -0.0 is worth it, that value is a typical arithmetic result and not something you typically get from input data.

Also, while interning in python is certainly possible, perhaps it would make sense to implement such in c. Perhaps in the math module, or perhaps even the 'intern' builtin can be extended for this purpose? It could be documented to intern any immutable object that it sees fit to intern, leaving the details to the implementation.

pitrou · 2012-03-22T15:12:56Z

Right, I'll do some test with the hardcoded values you mentioned.
Btw, I don't think -0.0 is worth it, that value is a typical
arithmetic result and not something you typically get from input data.

I was suggesting -0.0 in the hope that it might make the patch a bit
simpler, but I was probably wrong: you still have to test the float's
sign to distinguish between the two values.

Also, while interning in python is certainly possible, perhaps it
would make sense to implement such in c.

I think it makes sense in the float implementation if it doesn't
significantly decrease performance. I suggest you post the benchmark
numbers (see http://hg.python.org/benchmarks/ ).

Adding a separate builtin/stdlib function would be overkill IMO.

serhiy-storchaka · 2012-03-22T15:20:30Z

Try moving your changes from PyFloat_FromDouble to PyFloat_FromString. Look at memory and perfomance.

kristjanvalur · 2012-03-22T15:49:34Z

PyFloat_FromString() is very restrictive. In our case, for example, the floats in question don't come from text files.

I'm adding a new file with changes suggested by Antoine. only 0.0 and 1.0 are interned. It turns out that the "int" test is expensive, and special casing it to check for '0.0' and '1.0' with direct float comparison makes the performance hit go away. Here are the figures:

D:\pydev\hg\cpython2>pcbuild10\python.exe Tools\pybench\pybench.py -s interned8 -c normal
-------------------------------------------------------------------------------
PYBENCH 2.1
-------------------------------------------------------------------------------

using CPython 3.3.0a1+ (default, Mar 22 2012, 15:39:50) [MSC v.1600 32 bit (Intel)]
disabled garbage collection
system check interval set to maximum: 2147483647
using timer: time.clock

-------------------------------------------------------------------------------
Benchmark: interned8
-------------------------------------------------------------------------------

Rounds: 10
Warp:   10
Timer:  time.clock

Machine Details:
   Platform ID:    Windows-7-6.1.7601-SP1
   Processor:      Intel64 Family 6 Model 26 Stepping 5, GenuineIntel

Python:
   Implementation: CPython
   Executable:     D:\pydev\hg\cpython2\pcbuild10\python.exe
   Version:        3.3.0a1+
   Compiler:       MSC v.1600 32 bit (Intel)
   Bits:           32bit
   Build:          Mar 22 2012 15:39:50 (#default)
   Unicode:        UCS4

-------------------------------------------------------------------------------
Comparing with: normal
-------------------------------------------------------------------------------

Rounds: 10
Warp:   10
Timer:  time.clock

Machine Details:
   Platform ID:    Windows-7-6.1.7601-SP1
   Processor:      Intel64 Family 6 Model 26 Stepping 5, GenuineIntel

Python:
   Implementation: CPython
   Executable:     D:\pydev\hg\cpython2\pcbuild10\python.exe
   Version:        3.3.0a1+
   Compiler:       MSC v.1600 32 bit (Intel)
   Bits:           32bit
   Build:          Mar 22 2012 13:53:45 (#default)
   Unicode:        UCS4

Test minimum run-time average run-time
this other diff this other diff
-------------------------------------------------------------------------------
BuiltinFunctionCalls: 59ms 59ms -0.7% 60ms 60ms -0.3%
BuiltinMethodLookup: 40ms 40ms -0.2% 40ms 40ms -0.2%
CompareFloats: 47ms 46ms +2.3% 47ms 46ms +2.3%
CompareFloatsIntegers: 125ms 126ms -0.8% 129ms 131ms -2.0%
CompareIntegers: 76ms 76ms -0.2% 76ms 76ms -0.3%
CompareInternedStrings: 60ms 60ms -0.0% 61ms 61ms +0.2%
CompareLongs: 44ms 44ms -0.4% 44ms 44ms -0.4%
CompareStrings: 59ms 58ms +2.1% 59ms 58ms +1.8%
ComplexPythonFunctionCalls: 59ms 60ms -1.7% 60ms 61ms -2.5%
ConcatStrings: 49ms 55ms -10.7% 49ms 55ms -10.6%
CreateInstances: 67ms 66ms +1.1% 68ms 68ms -0.1%
CreateNewInstances: 50ms 50ms +1.3% 51ms 50ms +0.4%
CreateStringsWithConcat: 70ms 71ms -1.3% 70ms 72ms -1.7%
DictCreation: 45ms 46ms -1.7% 46ms 46ms -1.5%
DictWithFloatKeys: 81ms 84ms -2.8% 81ms 84ms -2.7%
DictWithIntegerKeys: 52ms 55ms -5.4% 52ms 55ms -5.1%
DictWithStringKeys: 47ms 51ms -6.6% 48ms 51ms -6.6%
ForLoops: 42ms 42ms +0.2% 42ms 42ms +0.2%
IfThenElse: 63ms 59ms +5.5% 63ms 59ms +5.6%
ListSlicing: 39ms 39ms +0.0% 39ms 39ms -0.4%
NestedForLoops: 67ms 67ms -0.2% 68ms 68ms +0.0%
NestedListComprehensions: 63ms 62ms +1.2% 64ms 63ms +1.2%
NormalClassAttribute: 102ms 100ms +1.3% 102ms 101ms +1.4%
NormalInstanceAttribute: 62ms 62ms +0.2% 63ms 63ms +0.2%
PythonFunctionCalls: 60ms 60ms +0.8% 61ms 60ms +1.3%
PythonMethodCalls: 74ms 74ms -0.0% 74ms 74ms -0.3%
Recursion: 103ms 102ms +1.4% 104ms 102ms +1.3%
SecondImport: 109ms 109ms -0.4% 110ms 110ms -0.6%
SecondPackageImport: 111ms 112ms -0.6% 113ms 113ms +0.2%
SecondSubmoduleImport: 154ms 153ms +0.7% 155ms 155ms -0.2%
SimpleComplexArithmetic: 56ms 54ms +3.7% 57ms 56ms +1.9%
SimpleDictManipulation: 94ms 93ms +0.3% 94ms 94ms -0.2%
SimpleFloatArithmetic: 45ms 44ms +3.2% 45ms 45ms +1.7%
SimpleIntFloatArithmetic: 57ms 56ms +1.8% 57ms 57ms +0.1%
SimpleIntegerArithmetic: 56ms 56ms +1.2% 57ms 56ms +1.0%
SimpleListComprehensions: 53ms 52ms +1.8% 54ms 53ms +2.4%
SimpleListManipulation: 49ms 53ms -8.0% 50ms 53ms -6.7%
SimpleLongArithmetic: 41ms 40ms +1.6% 41ms 41ms +0.1%
SmallLists: 66ms 68ms -2.8% 67ms 68ms -2.5%
SmallTuples: 76ms 72ms +5.5% 76ms 72ms +5.8%
SpecialClassAttribute: 144ms 140ms +2.8% 145ms 142ms +2.1%
SpecialInstanceAttribute: 64ms 64ms +0.7% 65ms 64ms +1.1%
StringMappings: 126ms 132ms -4.2% 127ms 132ms -4.0%
StringPredicates: 71ms 70ms +2.3% 72ms 70ms +2.5%
StringSlicing: 66ms 67ms -0.6% 67ms 67ms -1.1%
TryExcept: 44ms 43ms +1.5% 44ms 44ms +0.7%
TryFinally: 49ms 53ms -7.4% 49ms 54ms -8.1%
TryRaiseExcept: 24ms 22ms +8.6% 25ms 23ms +8.9%
TupleSlicing: 61ms 61ms -0.7% 61ms 62ms -0.5%
WithFinally: 65ms 65ms -0.3% 65ms 65ms -0.0%
WithRaiseExcept: 62ms 62ms -0.4% 62ms 63ms -0.6%
-------------------------------------------------------------------------------
Totals: 3448ms 3454ms -0.2% 3474ms 3487ms -0.4%

(this=interned8, other=normal)

D:\pydev\hg\cpython2>

pitrou · 2012-03-22T22:44:56Z

Ok. In the current state it doesn't look detrimental, although I'm not convinced it's that useful either.
That said, the implementation needs a bit of work, I don't think it's ok for float_is_positive_zero() to use the Py_LONG_LONG casting thing.

serhiy-storchaka · 2012-03-23T07:52:55Z

PyBench contains a lot of different tests and executes them not too long. Random fluctuations of a few percent are possible. We need a more stable realistic test working with floats.

$ ./python -m timeit 'x,y,d=1,0,0.01
for i in range(628):
  x,y=x-d*y,y+d*x'

Without patch: 1000 loops, best of 3: 220 usec per loop

With patch: 1000 loops, best of 3: 232 usec per loop

Difference is 5%.

rhettinger · 2012-03-23T20:11:47Z

FWIW, I'm -1 on making the change. Any memory savings would be microscopic for most Python programs. And anything that slows down float creation (even a teeny bit) is detrimental to all Python programs. We're better off not adding special case logic unless there is a clear benefit (there isn't).

kristjanvalur · 2012-03-26T14:56:37Z

Why do you think it isn't safe, Antoine?
We only support IEEE platforms, and IEEE defines positive 0.0 to be all bits cleared.

I personally don't think that even a small measurable performance degradation in creating new floats is problematic since that is not where typical cpu power is spent. But this is the second time I´ve made this suggestion (the first was on python-dev) and in both cases I've run up against the opinion that it is a "rare" problem and worries about "performance." Yet this seemed not to be an issue when ints/longs were united.

So, while I was recently prompted to resubmit this suggestion, I feel no need to push the matter further and am just letting it rest here, then. I'm sure someone will think it is a good idea again in the future and can then reopen it.

mdickinson · 2012-04-19T09:19:03Z

Why do you think it isn't safe, Antoine?

It violates C's strict aliasing rules; Google for 'C strict aliasing' or 'C type punning' for more information. This isn't just a theoretical concern: gcc is known to make optimizations based on the assumption that strict aliasing isn't violated.

mdickinson · 2012-04-19T10:20:12Z

We only support IEEE platforms.

I don't think that's true, BTW.

kristjanvalur · 2012-04-19T21:06:28Z

return *(PY_LONG_LONG*)&fval == 0;
There is no aliasing, because there are no pointer variables in existence.
If we did this:

double *pfval = &fval;
PY_LONG_LONG *pl = (PY_LONG_LONG*)pfval
return *pfval == 0

Then we would have aliasing. Because "pfval" in this example doesn't exist but is merely a temporary, there is no aliasing.

As for IEEE compatibility, I don't think we could have our own floating point formatting library if we didn't make that assumption, but I might be wrong about that. On the other hand, I don't think there is a supported python architecture that defines positive zero as anything else than bitwise zero. And should such a platform be found, it is easy enough to disable this code for it.

mdickinson · 2012-04-20T06:46:42Z

Because "pfval" in this example doesn't exist but is merely a temporary, > there is no aliasing.

Maybe aliasing wasn't the right word to use, then. The exact rule being violated is C99 6.5, para. 7 (or C11 6.5 para. 7 if you prefer):

"""
An object shall have its stored value accessed only by an lvalue expression that has one of the following types: ...

"""

In any case, I don't think it's relevant whether the values involved are named variables or temporaries.

I don't think we could have our own floating point formatting library > if we didn't make that assumption

There are configure-time checks that determine whether that floating-point formatting library is used or not, including (amongst some other things) whether the underlying floating-point type is IEEE 754. Officially, Python shouldn't assume IEEE 754 floating-point anywhere---the core should build fine without it, and there's lots of code in the core to deal with non-IEEE 754 platforms.

And should such a platform be found, it is easy enough to disable this > code for it.

I think that's the wrong way around: the code should only be enabled in the first place when we're sure that the platform is IEEE 754. That shouldn't be hard to build into the patch, since the checks for IEEE 754 are already all over the place for other purposes.

kristjanvalur · 2012-04-20T09:14:54Z

Interesting.
I declare that this rule does not apply here since the code is a deliberate hack: We are pretending that a certain address points to integers and checking those integers.
If you insist on following the standard, you could do

double cmp = 0;
return mcmcmp(&cmp, &fval, sizeof(fval)) == 0;

but on all real machines this is the same as:

PY_LONG_LONG cmp = 0;
return mcmcmp(&cmp, &fval, sizeof(fval)) == 0;

Which again is the same as
return *(PY_LONG_LONG*)&fval == 0;
technically speaking, even if the standard doesn't agree. You could implement this with in-line assembly and so cheerfully sidestep the standard.

When you're hacking, your're hacking and the standard isn't your friend :)

As for IEEE, sure, anyway that thing is oriented is fine, although in this day and age, I find it rather amusing that the logic thinks of IEEE support as the exception, rather than the rule.

Anyway, this proposal has been rejected due to lack of interest or some misguided idea of performance, so the point is moot.

mdickinson · 2012-04-20T09:54:11Z

I declare that this rule does not apply here ...

Clearly the gcc developers disagree. :-)

iwasawa:~ mdickinson$ cat test2.c
int is_positive_zero(double f) {
  return *(long long*)&f == 0;
}
iwasawa:~ mdickinson$ gcc -fstrict-aliasing -O3 -Wall -Wextra -Wstrict-aliasing -c test2.c
test2.c: In function ‘is_positive_zero’:
test2.c:2: warning: dereferencing type-punned pointer will break strict-aliasing rules

kristjanvalur · 2012-04-20T13:41:17Z

Dang. I yield!

kristjanvalur mannequin added interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement labels Mar 21, 2012

kristjanvalur mannequin closed this as completed Mar 26, 2012

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Intern certain integral floats for memory savings and performance #58589

Intern certain integral floats for memory savings and performance #58589

kristjanvalur mannequin commented Mar 21, 2012

kristjanvalur mannequin commented Mar 21, 2012

serhiy-storchaka commented Mar 21, 2012

mdickinson commented Mar 21, 2012

kristjanvalur mannequin commented Mar 22, 2012

pitrou commented Mar 22, 2012

kristjanvalur mannequin commented Mar 22, 2012

pitrou commented Mar 22, 2012

serhiy-storchaka commented Mar 22, 2012

kristjanvalur mannequin commented Mar 22, 2012

pitrou commented Mar 22, 2012

serhiy-storchaka commented Mar 23, 2012

rhettinger commented Mar 23, 2012

kristjanvalur mannequin commented Mar 26, 2012

mdickinson commented Apr 19, 2012

mdickinson commented Apr 19, 2012

kristjanvalur mannequin commented Apr 19, 2012

mdickinson commented Apr 20, 2012

kristjanvalur mannequin commented Apr 20, 2012

mdickinson commented Apr 20, 2012

kristjanvalur mannequin commented Apr 20, 2012

Intern certain integral floats for memory savings and performance #58589

Intern certain integral floats for memory savings and performance #58589

Comments

kristjanvalur mannequin commented Mar 21, 2012

kristjanvalur mannequin commented Mar 21, 2012

serhiy-storchaka commented Mar 21, 2012

mdickinson commented Mar 21, 2012

kristjanvalur mannequin commented Mar 22, 2012

pitrou commented Mar 22, 2012

kristjanvalur mannequin commented Mar 22, 2012

pitrou commented Mar 22, 2012

serhiy-storchaka commented Mar 22, 2012

kristjanvalur mannequin commented Mar 22, 2012

pitrou commented Mar 22, 2012

serhiy-storchaka commented Mar 23, 2012

rhettinger commented Mar 23, 2012

kristjanvalur mannequin commented Mar 26, 2012

mdickinson commented Apr 19, 2012

mdickinson commented Apr 19, 2012

kristjanvalur mannequin commented Apr 19, 2012

mdickinson commented Apr 20, 2012

kristjanvalur mannequin commented Apr 20, 2012

mdickinson commented Apr 20, 2012

kristjanvalur mannequin commented Apr 20, 2012