Issue14381
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2012-03-21 17:47 by kristjan.jonsson, last changed 2022-04-11 14:57 by admin. This issue is now closed.
Files | ||||
---|---|---|---|---|
File name | Uploaded | Description | Edit | |
internfloat.patch | kristjan.jonsson, 2012-03-21 17:47 | review | ||
internfloat.patch | kristjan.jonsson, 2012-03-22 15:49 | Second patch, with using only 0.0 and 1.0 | review |
Messages (20) | |||
---|---|---|---|
msg156502 - (view) | Author: Kristján Valur Jónsson (kristjan.jonsson) * | Date: 2012-03-21 17:47 | |
Michael Foord reminded me of this issue recently. It was discussed on pydev a few years back and met with limited enthusiasm. I speak from experience in live production with EVE that this simple change saved us a lot of memory. Integral floating point numbers are surprisingly common, falling out of mathematical operations and when reading tabular data. 0.0 and 1.0 in particular. |
|||
msg156504 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * | Date: 2012-03-21 18:42 | |
Does it not decrease the performance? Falling out integral floating point numbers in the mathematical floating point calculations seems unlikely. I suggest interning integral floats only in conversions str -> float and int -> float. Exception can be made to zero, which can result in operations with zero, and which simple tested. |
|||
msg156509 - (view) | Author: Mark Dickinson (mark.dickinson) * | Date: 2012-03-21 19:42 | |
This looks like a duplicate of issue 4024. |
|||
msg156564 - (view) | Author: Kristján Valur Jónsson (kristjan.jonsson) * | Date: 2012-03-22 14:43 | |
Yes, there is a measurable performance decrease in pybench arithmetic tests. Integers don't fall out of arithmetic that often, true. But integral floats are incredibly common in tabular data. In a game such as Eve Online, configuration data contains a lot of 0.0, 1.0, -1.0 and so on. This patch saved us many megabytes on the server. I can check again... >>> [sys.getrefcount(float(i)) for i in range(-10, 11)] [777, 4, 38, 9, 215, 691, 627, 185, 98, 603, 73180, 62111, 8326, 6225, 6357, 11737, 2906, 1393, 3142, 1145, 5601] >>> sum([sys.getrefcount(float(i)) for i in range(-10, 11)]) 185340 >>> This is on an idle server. A server with lots of stuff going on will have this: [16715, 184, 1477, 34, 1505, 27102, 3878, 1344, 6248, 974, 595889, 313062, 124072, 120054, 65585, 138667, 13265, 2499, 15677, 3175, 24821] >>> sum([sys.getrefcount(float(i)) for i in range(-10, 11)]) 1465155 About half of the interned floats are 0.0 On a 64 bit machine with each float taking 24 bytes, this is 35mb net. An alternative could be to add a function for manual intering of floating point data, which one can use e.g. when reading tabular data. |
|||
msg156566 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2012-03-22 14:59 | |
> Integers don't fall out of arithmetic that often, true. But integral > floats are incredibly common in tabular data. In a game such as Eve > Online, configuration data contains a lot of 0.0, 1.0, -1.0 and so > on. This patch saved us many megabytes on the server. Can't you do your own interning when reading configuration data? You can even do it in pure Python. If CPython starts interning some integral floats, I think only {-1.0, -0.0, 0.0, 1.0} should apply. The rest is too application-specific. |
|||
msg156568 - (view) | Author: Kristján Valur Jónsson (kristjan.jonsson) * | Date: 2012-03-22 15:07 | |
The -10..11 range was determined empirically. As you see from the values, only -10 shows up as significant... In fact, just storing 0 to 5 would capture the bulk of the savings. Right, I'll do some test with the hardcoded values you mentioned. Btw, I don't think -0.0 is worth it, that value is a typical arithmetic result and not something you typically get from input data. Also, while interning in python is certainly possible, perhaps it would make sense to implement such in c. Perhaps in the math module, or perhaps even the 'intern' builtin can be extended for this purpose? It could be documented to intern any immutable object that it sees fit to intern, leaving the details to the implementation. |
|||
msg156569 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2012-03-22 15:12 | |
> Right, I'll do some test with the hardcoded values you mentioned. > Btw, I don't think -0.0 is worth it, that value is a typical > arithmetic result and not something you typically get from input data. I was suggesting -0.0 in the hope that it might make the patch a bit simpler, but I was probably wrong: you still have to test the float's sign to distinguish between the two values. > Also, while interning in python is certainly possible, perhaps it > would make sense to implement such in c. I think it makes sense in the float implementation if it doesn't significantly decrease performance. I suggest you post the benchmark numbers (see http://hg.python.org/benchmarks/ ). Adding a separate builtin/stdlib function would be overkill IMO. |
|||
msg156571 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * | Date: 2012-03-22 15:20 | |
Try moving your changes from PyFloat_FromDouble to PyFloat_FromString. Look at memory and perfomance. |
|||
msg156578 - (view) | Author: Kristján Valur Jónsson (kristjan.jonsson) * | Date: 2012-03-22 15:49 | |
PyFloat_FromString() is very restrictive. In our case, for example, the floats in question don't come from text files. I'm adding a new file with changes suggested by Antoine. only 0.0 and 1.0 are interned. It turns out that the "int" test is expensive, and special casing it to check for '0.0' and '1.0' with direct float comparison makes the performance hit go away. Here are the figures: D:\pydev\hg\cpython2>pcbuild10\python.exe Tools\pybench\pybench.py -s interned8 -c normal ------------------------------------------------------------------------------- PYBENCH 2.1 ------------------------------------------------------------------------------- * using CPython 3.3.0a1+ (default, Mar 22 2012, 15:39:50) [MSC v.1600 32 bit (Intel)] * disabled garbage collection * system check interval set to maximum: 2147483647 * using timer: time.clock ------------------------------------------------------------------------------- Benchmark: interned8 ------------------------------------------------------------------------------- Rounds: 10 Warp: 10 Timer: time.clock Machine Details: Platform ID: Windows-7-6.1.7601-SP1 Processor: Intel64 Family 6 Model 26 Stepping 5, GenuineIntel Python: Implementation: CPython Executable: D:\pydev\hg\cpython2\pcbuild10\python.exe Version: 3.3.0a1+ Compiler: MSC v.1600 32 bit (Intel) Bits: 32bit Build: Mar 22 2012 15:39:50 (#default) Unicode: UCS4 ------------------------------------------------------------------------------- Comparing with: normal ------------------------------------------------------------------------------- Rounds: 10 Warp: 10 Timer: time.clock Machine Details: Platform ID: Windows-7-6.1.7601-SP1 Processor: Intel64 Family 6 Model 26 Stepping 5, GenuineIntel Python: Implementation: CPython Executable: D:\pydev\hg\cpython2\pcbuild10\python.exe Version: 3.3.0a1+ Compiler: MSC v.1600 32 bit (Intel) Bits: 32bit Build: Mar 22 2012 13:53:45 (#default) Unicode: UCS4 Test minimum run-time average run-time this other diff this other diff ------------------------------------------------------------------------------- BuiltinFunctionCalls: 59ms 59ms -0.7% 60ms 60ms -0.3% BuiltinMethodLookup: 40ms 40ms -0.2% 40ms 40ms -0.2% CompareFloats: 47ms 46ms +2.3% 47ms 46ms +2.3% CompareFloatsIntegers: 125ms 126ms -0.8% 129ms 131ms -2.0% CompareIntegers: 76ms 76ms -0.2% 76ms 76ms -0.3% CompareInternedStrings: 60ms 60ms -0.0% 61ms 61ms +0.2% CompareLongs: 44ms 44ms -0.4% 44ms 44ms -0.4% CompareStrings: 59ms 58ms +2.1% 59ms 58ms +1.8% ComplexPythonFunctionCalls: 59ms 60ms -1.7% 60ms 61ms -2.5% ConcatStrings: 49ms 55ms -10.7% 49ms 55ms -10.6% CreateInstances: 67ms 66ms +1.1% 68ms 68ms -0.1% CreateNewInstances: 50ms 50ms +1.3% 51ms 50ms +0.4% CreateStringsWithConcat: 70ms 71ms -1.3% 70ms 72ms -1.7% DictCreation: 45ms 46ms -1.7% 46ms 46ms -1.5% DictWithFloatKeys: 81ms 84ms -2.8% 81ms 84ms -2.7% DictWithIntegerKeys: 52ms 55ms -5.4% 52ms 55ms -5.1% DictWithStringKeys: 47ms 51ms -6.6% 48ms 51ms -6.6% ForLoops: 42ms 42ms +0.2% 42ms 42ms +0.2% IfThenElse: 63ms 59ms +5.5% 63ms 59ms +5.6% ListSlicing: 39ms 39ms +0.0% 39ms 39ms -0.4% NestedForLoops: 67ms 67ms -0.2% 68ms 68ms +0.0% NestedListComprehensions: 63ms 62ms +1.2% 64ms 63ms +1.2% NormalClassAttribute: 102ms 100ms +1.3% 102ms 101ms +1.4% NormalInstanceAttribute: 62ms 62ms +0.2% 63ms 63ms +0.2% PythonFunctionCalls: 60ms 60ms +0.8% 61ms 60ms +1.3% PythonMethodCalls: 74ms 74ms -0.0% 74ms 74ms -0.3% Recursion: 103ms 102ms +1.4% 104ms 102ms +1.3% SecondImport: 109ms 109ms -0.4% 110ms 110ms -0.6% SecondPackageImport: 111ms 112ms -0.6% 113ms 113ms +0.2% SecondSubmoduleImport: 154ms 153ms +0.7% 155ms 155ms -0.2% SimpleComplexArithmetic: 56ms 54ms +3.7% 57ms 56ms +1.9% SimpleDictManipulation: 94ms 93ms +0.3% 94ms 94ms -0.2% SimpleFloatArithmetic: 45ms 44ms +3.2% 45ms 45ms +1.7% SimpleIntFloatArithmetic: 57ms 56ms +1.8% 57ms 57ms +0.1% SimpleIntegerArithmetic: 56ms 56ms +1.2% 57ms 56ms +1.0% SimpleListComprehensions: 53ms 52ms +1.8% 54ms 53ms +2.4% SimpleListManipulation: 49ms 53ms -8.0% 50ms 53ms -6.7% SimpleLongArithmetic: 41ms 40ms +1.6% 41ms 41ms +0.1% SmallLists: 66ms 68ms -2.8% 67ms 68ms -2.5% SmallTuples: 76ms 72ms +5.5% 76ms 72ms +5.8% SpecialClassAttribute: 144ms 140ms +2.8% 145ms 142ms +2.1% SpecialInstanceAttribute: 64ms 64ms +0.7% 65ms 64ms +1.1% StringMappings: 126ms 132ms -4.2% 127ms 132ms -4.0% StringPredicates: 71ms 70ms +2.3% 72ms 70ms +2.5% StringSlicing: 66ms 67ms -0.6% 67ms 67ms -1.1% TryExcept: 44ms 43ms +1.5% 44ms 44ms +0.7% TryFinally: 49ms 53ms -7.4% 49ms 54ms -8.1% TryRaiseExcept: 24ms 22ms +8.6% 25ms 23ms +8.9% TupleSlicing: 61ms 61ms -0.7% 61ms 62ms -0.5% WithFinally: 65ms 65ms -0.3% 65ms 65ms -0.0% WithRaiseExcept: 62ms 62ms -0.4% 62ms 63ms -0.6% ------------------------------------------------------------------------------- Totals: 3448ms 3454ms -0.2% 3474ms 3487ms -0.4% (this=interned8, other=normal) D:\pydev\hg\cpython2> |
|||
msg156623 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2012-03-22 22:44 | |
Ok. In the current state it doesn't look detrimental, although I'm not convinced it's that useful either. That said, the implementation needs a bit of work, I don't think it's ok for float_is_positive_zero() to use the Py_LONG_LONG casting thing. |
|||
msg156637 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * | Date: 2012-03-23 07:52 | |
PyBench contains a lot of different tests and executes them not too long. Random fluctuations of a few percent are possible. We need a more stable realistic test working with floats. $ ./python -m timeit 'x,y,d=1,0,0.01 for i in range(628): x,y=x-d*y,y+d*x' Without patch: 1000 loops, best of 3: 220 usec per loop With patch: 1000 loops, best of 3: 232 usec per loop Difference is 5%. |
|||
msg156677 - (view) | Author: Raymond Hettinger (rhettinger) * | Date: 2012-03-23 20:11 | |
FWIW, I'm -1 on making the change. Any memory savings would be microscopic for most Python programs. And anything that slows down float creation (even a teeny bit) is detrimental to all Python programs. We're better off not adding special case logic unless there is a clear benefit (there isn't). |
|||
msg156825 - (view) | Author: Kristján Valur Jónsson (kristjan.jonsson) * | Date: 2012-03-26 14:56 | |
Why do you think it isn't safe, Antoine? We only support IEEE platforms, and IEEE defines positive 0.0 to be all bits cleared. I personally don't think that even a small measurable performance degradation in creating new floats is problematic since that is not where typical cpu power is spent. But this is the second time I´ve made this suggestion (the first was on python-dev) and in both cases I've run up against the opinion that it is a "rare" problem and worries about "performance." Yet this seemed not to be an issue when ints/longs were united. So, while I was recently prompted to resubmit this suggestion, I feel no need to push the matter further and am just letting it rest here, then. I'm sure someone will think it is a good idea again in the future and can then reopen it. |
|||
msg158709 - (view) | Author: Mark Dickinson (mark.dickinson) * | Date: 2012-04-19 09:19 | |
> Why do you think it isn't safe, Antoine? It violates C's strict aliasing rules; Google for 'C strict aliasing' or 'C type punning' for more information. This isn't just a theoretical concern: gcc is known to make optimizations based on the assumption that strict aliasing isn't violated. |
|||
msg158712 - (view) | Author: Mark Dickinson (mark.dickinson) * | Date: 2012-04-19 10:20 | |
> We only support IEEE platforms. I don't think that's true, BTW. |
|||
msg158752 - (view) | Author: Kristján Valur Jónsson (kristjan.jonsson) * | Date: 2012-04-19 21:06 | |
return *(PY_LONG_LONG*)&fval == 0; There is no aliasing, because there are no pointer variables in existence. If we did this: double *pfval = &fval; PY_LONG_LONG *pl = (PY_LONG_LONG*)pfval return *pfval == 0 Then we would have aliasing. Because "pfval" in this example doesn't exist but is merely a temporary, there is no aliasing. As for IEEE compatibility, I don't think we could have our own floating point formatting library if we didn't make that assumption, but I might be wrong about that. On the other hand, I don't think there is a supported python architecture that defines positive zero as anything else than bitwise zero. And should such a platform be found, it is easy enough to disable this code for it. |
|||
msg158800 - (view) | Author: Mark Dickinson (mark.dickinson) * | Date: 2012-04-20 06:46 | |
> Because "pfval" in this example doesn't exist but is merely a temporary, > there is no aliasing. Maybe aliasing wasn't the right word to use, then. The exact rule being violated is C99 6.5, para. 7 (or C11 6.5 para. 7 if you prefer): """ An object shall have its stored value accessed only by an lvalue expression that has one of the following types: ... """ In any case, I don't think it's relevant whether the values involved are named variables or temporaries. > I don't think we could have our own floating point formatting library > if we didn't make that assumption There are configure-time checks that determine whether that floating-point formatting library is used or not, including (amongst some other things) whether the underlying floating-point type is IEEE 754. Officially, Python shouldn't assume IEEE 754 floating-point anywhere---the core should build fine without it, and there's lots of code in the core to deal with non-IEEE 754 platforms. > And should such a platform be found, it is easy enough to disable this > code for it. I think that's the wrong way around: the code should only be enabled in the first place when we're sure that the platform is IEEE 754. That shouldn't be hard to build into the patch, since the checks for IEEE 754 are already all over the place for other purposes. |
|||
msg158809 - (view) | Author: Kristján Valur Jónsson (kristjan.jonsson) * | Date: 2012-04-20 09:14 | |
Interesting. I declare that this rule does not apply here since the code is a deliberate hack: We are pretending that a certain address points to integers and checking those integers. If you insist on following the standard, you could do double cmp = 0; return mcmcmp(&cmp, &fval, sizeof(fval)) == 0; but on all real machines this is the same as: PY_LONG_LONG cmp = 0; return mcmcmp(&cmp, &fval, sizeof(fval)) == 0; Which again is the same as return *(PY_LONG_LONG*)&fval == 0; technically speaking, even if the standard doesn't agree. You could implement this with in-line assembly and so cheerfully sidestep the standard. When you're hacking, your're hacking and the standard isn't your friend :) As for IEEE, sure, anyway that thing is oriented is fine, although in this day and age, I find it rather amusing that the logic thinks of IEEE support as the exception, rather than the rule. Anyway, this proposal has been rejected due to lack of interest or some misguided idea of performance, so the point is moot. |
|||
msg158811 - (view) | Author: Mark Dickinson (mark.dickinson) * | Date: 2012-04-20 09:54 | |
> I declare that this rule does not apply here ... Clearly the gcc developers disagree. :-) iwasawa:~ mdickinson$ cat test2.c int is_positive_zero(double f) { return *(long long*)&f == 0; } iwasawa:~ mdickinson$ gcc -fstrict-aliasing -O3 -Wall -Wextra -Wstrict-aliasing -c test2.c test2.c: In function ‘is_positive_zero’: test2.c:2: warning: dereferencing type-punned pointer will break strict-aliasing rules |
|||
msg158827 - (view) | Author: Kristján Valur Jónsson (kristjan.jonsson) * | Date: 2012-04-20 13:41 | |
Dang. I yield! |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:57:28 | admin | set | github: 58589 |
2012-04-20 13:41:16 | kristjan.jonsson | set | messages: + msg158827 |
2012-04-20 09:54:11 | mark.dickinson | set | messages: + msg158811 |
2012-04-20 09:14:54 | kristjan.jonsson | set | messages: + msg158809 |
2012-04-20 09:13:32 | kristjan.jonsson | set | messages: - msg158808 |
2012-04-20 09:12:42 | kristjan.jonsson | set | messages: + msg158808 |
2012-04-20 06:46:42 | mark.dickinson | set | messages: + msg158800 |
2012-04-19 21:06:28 | kristjan.jonsson | set | messages: + msg158752 |
2012-04-19 10:20:12 | mark.dickinson | set | messages: + msg158712 |
2012-04-19 09:19:02 | mark.dickinson | set | messages: + msg158709 |
2012-04-19 01:22:38 | jcea | set | nosy:
+ jcea |
2012-03-27 01:20:09 | eric.snow | set | nosy:
+ eric.snow |
2012-03-26 14:56:36 | kristjan.jonsson | set | status: open -> closed resolution: rejected messages: + msg156825 |
2012-03-23 20:11:46 | rhettinger | set | nosy:
+ rhettinger messages: + msg156677 |
2012-03-23 07:52:55 | serhiy.storchaka | set | messages: + msg156637 |
2012-03-22 22:44:56 | pitrou | set | messages: + msg156623 |
2012-03-22 15:49:35 | kristjan.jonsson | set | files:
+ internfloat.patch messages: + msg156578 |
2012-03-22 15:20:30 | serhiy.storchaka | set | messages: + msg156571 |
2012-03-22 15:12:56 | pitrou | set | messages: + msg156569 |
2012-03-22 15:07:44 | kristjan.jonsson | set | messages: + msg156568 |
2012-03-22 14:59:46 | pitrou | set | nosy:
+ pitrou messages: + msg156566 |
2012-03-22 14:43:44 | kristjan.jonsson | set | messages: + msg156564 |
2012-03-22 11:55:09 | kristjan.jonsson | link | issue4024 superseder |
2012-03-21 19:42:29 | mark.dickinson | set | nosy:
+ mark.dickinson messages: + msg156509 |
2012-03-21 18:42:51 | serhiy.storchaka | set | nosy:
+ serhiy.storchaka messages: + msg156504 |
2012-03-21 17:47:22 | kristjan.jonsson | create |