classification
Title: Add mode to disable small integer and interned string caches
Type: enhancement Stage:
Components: Interpreter Core Versions: Python 3.8
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: ammar2, gregory.p.smith, jwilk, nascheme, njs, rhettinger, serhiy.storchaka, steven.daprano, terry.reedy
Priority: normal Keywords:

Created on 2018-10-02 00:40 by steven.daprano, last changed 2018-11-30 20:00 by terry.reedy.

Messages (10)
msg326838 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2018-10-02 00:40
Split off from #34850 by Guido's request.

To help catch incorrect use of `is` when `==` is intended, perhaps we should add an interpreter mode that disables the caches for small ints and interned strings.

Nathaniel called it "chaos mode" but I don't like the name as there is nothing chaotic about the lack of such caches, and it doesn't come close to chaos testing (e.g. Netflix's Chaos Monkey tool).
msg326843 - (view) Author: Ammar Askar (ammar2) * (Python triager) Date: 2018-10-02 01:07
Maybe something more akin to UndefinedBehaviorSanitizer? Since its supposed to be catching implementation specific quirks. It wouldn't really be sanitizing though, more just making the bugs more likely to appear.
msg326850 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-10-02 04:31
Adding a runtime option will hit a performance of normal execution. And it is impossible to disable interning strings completely. Some core code depends on this. I have also concerns about disabling caching an empty string.

There are also other caches on different levels.
msg326852 - (view) Author: Ammar Askar (ammar2) * (Python triager) Date: 2018-10-02 04:34
Serhiy, take a look at the linked ticket. The idea is that something like pytest or libregrtest will use this to bring underlying bugs to the surface. It isn't intended to be used in normal execution.
msg326853 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-10-02 04:38
I don't worry about the performance when caches are disabled. An additional check will hit the performance in normal execution.
msg326854 - (view) Author: Ammar Askar (ammar2) * (Python triager) Date: 2018-10-02 04:40
Aah sorry, I misinterpreted what you meant. The original ticket proposes it as a compile time flag as well.
msg326860 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2018-10-02 06:34
I don't think this option will be of any value.  For it to work, the code would need to have this particular bug, have test cases that triggered those bugs, and a user sophisticated enough to run the tests but unsophisticated enough to make beginner mistakes regarding when to use identity tests versus equality tests (something I teach on day one of beginner Python courses).

Before this goes further, I would like to see some evidence that it would actually catch a real bug in the wild.
msg327073 - (view) Author: Neil Schemenauer (nascheme) * (Python committer) Date: 2018-10-04 18:20
Woudn't turning these off hurt performance a lot?  If so, I don't know if people would actually use such a mode.  Then it becomes pretty useless.  Could we combine this idea with the PYTHONDEVMODE flag?  If PYTHONDEVMODE is turned on, we could do a check like Serhiy suggests for inappropriate 'is' comparisons.  That seems more useful to me.
msg327093 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2018-10-04 22:26
The intent is to use only enable this during testing / continuous integration.
msg330823 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2018-11-30 20:00
Steven, thank you for splitting this off for proper discussion.

To me, the base issue is that CPython is both the language reference implementation and, as yet, the main production implementation.  As the latter, it has unintended and unwanted bugs and intentional optimizations added for performance rather than language conformance.  Some of these, like caching, affect boolean results involving 'is' and id().  Problems arise when people confuse reference features with implementation features.

This issue proposes adding a mode that turns off certain optimization features.  There is another proposal to turn off other optimizations (again during code analysis and testing) that affect tracing results and sometimes coverage results based thereon, giving false negatives.  In either case, I see the result as a 'language reference' mode.  As Steven suggested, the result is in a sense less chaotic, not more.  A chaos mode for caching would randomly cache or not.

Multiple comments above contain 'bug'.  Given that the language leaves implementations to cache certain immutables -- or not -- the bug in code meant to be implementation independent is to depend on caching *either way*.  Turning caching off only catches the 'bug' of assuming caching, not the bug of assuming no caching.

From a math viewpoint, n is n for all n, so 'is' *is* the proper comparison for ints.  From this viewpoint, caching should be the default and having not caching most values of n, and having to use '==' instead of 'is', is the practice time-space tradeoff compromise.  

Like Raymond, I currently think that this proposal lacks sufficient justification.
History
Date User Action Args
2018-11-30 20:00:55terry.reedysettype: enhancement

messages: + msg330823
nosy: + terry.reedy
2018-10-04 22:26:33gregory.p.smithsetmessages: + msg327093
2018-10-04 18:20:46naschemesetnosy: + nascheme
messages: + msg327073
2018-10-02 14:01:12jwilksetnosy: + jwilk
2018-10-02 06:34:31rhettingersetnosy: + rhettinger
messages: + msg326860
2018-10-02 04:40:08ammar2setmessages: + msg326854
2018-10-02 04:38:28serhiy.storchakasetmessages: + msg326853
2018-10-02 04:34:59ammar2setmessages: + msg326852
2018-10-02 04:31:56serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg326850
2018-10-02 01:07:58ammar2setnosy: + ammar2
messages: + msg326843
2018-10-02 00:40:12steven.dapranocreate