New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Intern certain integral floats for memory savings and performance #58589
Comments
Michael Foord reminded me of this issue recently. It was discussed on pydev a few years back and met with limited enthusiasm. I speak from experience in live production with EVE that this simple change saved us a lot of memory. Integral floating point numbers are surprisingly common, falling out of mathematical operations and when reading tabular data. 0.0 and 1.0 in particular. |
Does it not decrease the performance? Falling out integral floating point numbers in the mathematical floating point calculations seems unlikely. I suggest interning integral floats only in conversions str -> float and int -> float. Exception can be made to zero, which can result in operations with zero, and which simple tested. |
This looks like a duplicate of bpo-4024. |
Yes, there is a measurable performance decrease in pybench arithmetic tests. Integers don't fall out of arithmetic that often, true. But integral floats are incredibly common in tabular data. In a game such as Eve Online, configuration data contains a lot of 0.0, 1.0, -1.0 and so on. This patch saved us many megabytes on the server. I can check again... >>> [sys.getrefcount(float(i)) for i in range(-10, 11)]
[777, 4, 38, 9, 215, 691, 627, 185, 98, 603, 73180, 62111, 8326, 6225, 6357, 11737, 2906, 1393, 3142, 1145, 5601]
>>> sum([sys.getrefcount(float(i)) for i in range(-10, 11)])
185340
>>>
This is on an idle server. A server with lots of stuff going on will have this:
[16715, 184, 1477, 34, 1505, 27102, 3878, 1344, 6248, 974, 595889, 313062, 124072, 120054, 65585, 138667, 13265, 2499, 15677, 3175, 24821]
>>> sum([sys.getrefcount(float(i)) for i in range(-10, 11)])
1465155 About half of the interned floats are 0.0 An alternative could be to add a function for manual intering of floating point data, which one can use e.g. when reading tabular data. |
Can't you do your own interning when reading configuration data? You can even do it in pure Python. If CPython starts interning some integral floats, I think only {-1.0, -0.0, 0.0, 1.0} should apply. The rest is too application-specific. |
The -10..11 range was determined empirically. As you see from the values, only -10 shows up as significant... In fact, just storing 0 to 5 would capture the bulk of the savings. Right, I'll do some test with the hardcoded values you mentioned. Also, while interning in python is certainly possible, perhaps it would make sense to implement such in c. Perhaps in the math module, or perhaps even the 'intern' builtin can be extended for this purpose? It could be documented to intern any immutable object that it sees fit to intern, leaving the details to the implementation. |
I was suggesting -0.0 in the hope that it might make the patch a bit
I think it makes sense in the float implementation if it doesn't Adding a separate builtin/stdlib function would be overkill IMO. |
Try moving your changes from PyFloat_FromDouble to PyFloat_FromString. Look at memory and perfomance. |
PyFloat_FromString() is very restrictive. In our case, for example, the floats in question don't come from text files. I'm adding a new file with changes suggested by Antoine. only 0.0 and 1.0 are interned. It turns out that the "int" test is expensive, and special casing it to check for '0.0' and '1.0' with direct float comparison makes the performance hit go away. Here are the figures: D:\pydev\hg\cpython2>pcbuild10\python.exe Tools\pybench\pybench.py -s interned8 -c normal
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Test minimum run-time average run-time (this=interned8, other=normal) D:\pydev\hg\cpython2> |
Ok. In the current state it doesn't look detrimental, although I'm not convinced it's that useful either. |
PyBench contains a lot of different tests and executes them not too long. Random fluctuations of a few percent are possible. We need a more stable realistic test working with floats. $ ./python -m timeit 'x,y,d=1,0,0.01
for i in range(628):
x,y=x-d*y,y+d*x' Without patch: 1000 loops, best of 3: 220 usec per loop With patch: 1000 loops, best of 3: 232 usec per loop Difference is 5%. |
FWIW, I'm -1 on making the change. Any memory savings would be microscopic for most Python programs. And anything that slows down float creation (even a teeny bit) is detrimental to all Python programs. We're better off not adding special case logic unless there is a clear benefit (there isn't). |
Why do you think it isn't safe, Antoine? I personally don't think that even a small measurable performance degradation in creating new floats is problematic since that is not where typical cpu power is spent. But this is the second time I´ve made this suggestion (the first was on python-dev) and in both cases I've run up against the opinion that it is a "rare" problem and worries about "performance." Yet this seemed not to be an issue when ints/longs were united. So, while I was recently prompted to resubmit this suggestion, I feel no need to push the matter further and am just letting it rest here, then. I'm sure someone will think it is a good idea again in the future and can then reopen it. |
It violates C's strict aliasing rules; Google for 'C strict aliasing' or 'C type punning' for more information. This isn't just a theoretical concern: gcc is known to make optimizations based on the assumption that strict aliasing isn't violated. |
I don't think that's true, BTW. |
return *(PY_LONG_LONG*)&fval == 0; double *pfval = &fval;
PY_LONG_LONG *pl = (PY_LONG_LONG*)pfval
return *pfval == 0 Then we would have aliasing. Because "pfval" in this example doesn't exist but is merely a temporary, there is no aliasing. As for IEEE compatibility, I don't think we could have our own floating point formatting library if we didn't make that assumption, but I might be wrong about that. On the other hand, I don't think there is a supported python architecture that defines positive zero as anything else than bitwise zero. And should such a platform be found, it is easy enough to disable this code for it. |
Maybe aliasing wasn't the right word to use, then. The exact rule being violated is C99 6.5, para. 7 (or C11 6.5 para. 7 if you prefer): """ """ In any case, I don't think it's relevant whether the values involved are named variables or temporaries.
There are configure-time checks that determine whether that floating-point formatting library is used or not, including (amongst some other things) whether the underlying floating-point type is IEEE 754. Officially, Python shouldn't assume IEEE 754 floating-point anywhere---the core should build fine without it, and there's lots of code in the core to deal with non-IEEE 754 platforms.
I think that's the wrong way around: the code should only be enabled in the first place when we're sure that the platform is IEEE 754. That shouldn't be hard to build into the patch, since the checks for IEEE 754 are already all over the place for other purposes. |
Interesting. double cmp = 0;
return mcmcmp(&cmp, &fval, sizeof(fval)) == 0; but on all real machines this is the same as: PY_LONG_LONG cmp = 0; Which again is the same as When you're hacking, your're hacking and the standard isn't your friend :) As for IEEE, sure, anyway that thing is oriented is fine, although in this day and age, I find it rather amusing that the logic thinks of IEEE support as the exception, rather than the rule. Anyway, this proposal has been rejected due to lack of interest or some misguided idea of performance, so the point is moot. |
Clearly the gcc developers disagree. :-) iwasawa:~ mdickinson$ cat test2.c
int is_positive_zero(double f) {
return *(long long*)&f == 0;
}
iwasawa:~ mdickinson$ gcc -fstrict-aliasing -O3 -Wall -Wextra -Wstrict-aliasing -c test2.c
test2.c: In function ‘is_positive_zero’:
test2.c:2: warning: dereferencing type-punned pointer will break strict-aliasing rules |
Dang. I yield! |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: