msg133769 - (view) |
Author: Anthony Long (antlong) |
Date: 2011-04-14 21:12 |
http://docs.python.org/c-api/int.html
"The current implementation keeps an array of integer objects for all integers between -5 and 256, when you create an int in that range you actually just get back a reference to the existing object. So it should be possible to change the value of 1. I suspect the behaviour of Python in this case is undefined. :-)"
This paragraph should be changed to reflect that you can (by construction) mutate anything you want in C, and (as per suggestion of dmalcolm)
"The current implementatin consolidates integers in the range -5 to 256 (inclusive) into singleton instances. Do not manipulate the internal value of a PyIntObject after creation."
Also, the last line of that paragraph insinuates this functionality (caching of -5 to 256) is undocumented. I searched for a good while for an answer for this, and I didn't find one. If there is something written on the implementation details surrounding why '-5 is -5' works, while -6 is -6' wouldn't.
If there is nothing written about this, I will put something together. My final question however which I have not been able to find an answer for, is: Is this even necessary functionality?
I encountered around 100 blog posts and a couple of stackoverflow questions about why this fails, and it seems like 1) a source of confusion 2) a point of ridicule. Is it really necessary?
>>> id(1)
4298196440
>>> a = 1
>>> id(a)
4298196440
>>> id(3000)
4320396376
>>> a = 3000
>>> id(a)
4320396160
>>>
|
msg133770 - (view) |
Author: Dave Malcolm (dmalcolm) |
Date: 2011-04-14 21:17 |
From IRC discussion, how about something like:
The current implementation consolidates integers in the range -5 to 256 (inclusive) into singleton PyIntObject instances, whereas other integer values have unique PyIntObject instances created for them.
This means that on CPython::
>>> -5 is -5
True
but::
>>> -6 is not -6
False
This behavior is an implementation detail of CPython, and is not required by other implementations of Python.
In particular:
- do not manipulate the internal value of a PyIntObject after creation
- do not use "is" for comparing integer values, use "==" instead.
|
msg133771 - (view) |
Author: Dave Malcolm (dmalcolm) |
Date: 2011-04-14 21:18 |
Perhaps should also note that this is done for the purposes of optimization (both speed and memory).
|
msg133772 - (view) |
Author: Dave Malcolm (dmalcolm) |
Date: 2011-04-14 21:20 |
Interpreter idea: "<int> is <int>" could trigger a compatibility warning, perhaps, to help people avoid relying on CPython quirks
|
msg133773 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2011-04-14 21:22 |
I don't think it's in the business of the C API docs to explain Python-visible semantics, or the difference between "==" and "is". I would just rephrase the original paragraph, removing the last sentence joke:
“Since integer objects are very frequently created, certain optimizations can be applied on their allocation. For example, the current implementation keeps an array of integer objects for all integers between -5 and 256: when you create an int in that range, you actually just get back a reference to the existing object.”
|
msg133774 - (view) |
Author: Raymond Hettinger (rhettinger) * |
Date: 2011-04-14 21:39 |
We should remove the documentation entries that discuss non-guaranteed implementation details (i.e. which integers are singletons).
Instead, there should probably be a brief tutorial entry on what aspects of object identity people can rely on:
* None, True, and False are singletons. PEP 8 recommends testing for None with "is".
* Most internal equality comparisons (i.e. that in list.count or list.__contains__) assume that identity-implied-equality regardless of how __eq__ is defined (i.e. an object is always equal to itself).
* Once created, an object doesn't change its identity. So, you can use "is" to find the exact same object at a later stage in a program.
* Unless documented otherwise (a singleton class telling you that it returns the same object every time), no other assumptions should be made about object identity. In particular, one cannot assume that an object id won't be re-used after the object is reclaimed.
|
msg134258 - (view) |
Author: Anthony Long (antlong) |
Date: 2011-04-22 02:12 |
I'll have a doc patch shortly.
Also, I am working on defining a solid range. Memory is not an issue like it was back in 1991 when this range was originally implemented, so we can go higher and get a bigger performance boost. This will be very important (to some, admittedly) in Python 3, where there is no distinction between PyInts and PyLongs (more processing req'd), which could benefit from further optimization of the range.
Going to be doing benchmarking, -256 to 256 seems like a good place to start. If anyone has app's i should benchmark with in mind, feel free to let me know.
|
msg134261 - (view) |
Author: Raymond Hettinger (rhettinger) * |
Date: 2011-04-22 04:34 |
> Memory is not an issue like it was back in 1991 when
> this range was originally implemented, so we can go
> higher and get a bigger performance boost.
Please don't do this. Memory is still important to a lot of people. Also, there is *very* little payoff for numbers outside the -5 to 256 range.
If we're to spend memory, we can do it in other places (like bigger freelists or somesuch).
> We should remove the documentation entries that discuss
> non-guaranteed implementation details
FWIW, I've changed my thinking on this. With documentation, these details are very difficult to find-out about. Instead of removing them, they should be marked as non-guaranteed implementation specific details or they can be moved to a separate section.
|
msg134262 - (view) |
Author: Anthony Long (antlong) |
Date: 2011-04-22 04:39 |
My plan is to document it, as it exists, in the current implementation. That's a start atleast, and will provide an entry point for further documentation in the future should it be changed again.
|
msg134401 - (view) |
Author: Terry J. Reedy (terry.reedy) * |
Date: 2011-04-25 18:45 |
The range of interned ints was once much smaller, but it was expanded upwards to 256 so that the bytes extracted from bytes and bytearray objects, as when indexing or iterating, would *all* be pre-allocated objects. I should presume that their indexers and iterators are cognizant of this and take advantage of this to bypass int() creation and directly index into the the array of preallocated ints without a range check.
As for space and some time saving, just on startup,
a nearly fresh IDLE shell shows
>>> sum((sys.getrefcount(i) for i in range(-5, 257)))
4432
There are hundreds of references for each of 0 and 1.
|
msg350226 - (view) |
Author: Raymond Hettinger (rhettinger) * |
Date: 2019-08-22 18:55 |
The current wording seems to have sufficed for users of the C API. Also, it is an implementation detail subject to change. Discussions on being able to mutate anything in C are covered in the ctypes module.
|
|
Date |
User |
Action |
Args |
2022-04-11 14:57:16 | admin | set | github: 56055 |
2019-08-22 18:55:16 | rhettinger | set | status: open -> closed resolution: accepted -> not a bug messages:
+ msg350226
stage: resolved |
2011-04-25 18:45:57 | terry.reedy | set | nosy:
+ terry.reedy messages:
+ msg134401
|
2011-04-24 22:39:05 | nedbat | set | nosy:
+ nedbat
|
2011-04-22 04:39:06 | antlong | set | messages:
+ msg134262 |
2011-04-22 04:34:46 | rhettinger | set | assignee: rhettinger messages:
+ msg134261 |
2011-04-22 02:12:14 | antlong | set | resolution: accepted messages:
+ msg134258 title: Implementation question for (-5) - 256 caching, and doc update for c-api/int.html -> Remove non-guaranteed implementation details from docs. |
2011-04-15 17:04:46 | eric.araujo | set | nosy:
+ eric.araujo
|
2011-04-15 04:21:21 | ezio.melotti | set | nosy:
+ ezio.melotti
|
2011-04-14 21:39:24 | rhettinger | set | nosy:
+ rhettinger messages:
+ msg133774
|
2011-04-14 21:22:01 | pitrou | set | nosy:
+ pitrou messages:
+ msg133773
|
2011-04-14 21:20:25 | dmalcolm | set | messages:
+ msg133772 |
2011-04-14 21:18:34 | dmalcolm | set | messages:
+ msg133771 |
2011-04-14 21:17:05 | dmalcolm | set | nosy:
+ dmalcolm messages:
+ msg133770
|
2011-04-14 21:12:46 | antlong | create | |