This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Remove non-guaranteed implementation details from docs.
Type: behavior Stage: resolved
Components: Library (Lib) Versions:
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: rhettinger Nosy List: antlong, dmalcolm, eric.araujo, ezio.melotti, nedbat, pitrou, rhettinger, terry.reedy
Priority: normal Keywords:

Created on 2011-04-14 21:12 by antlong, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Messages (11)
msg133769 - (view) Author: Anthony Long (antlong) Date: 2011-04-14 21:12
http://docs.python.org/c-api/int.html

"The current implementation keeps an array of integer objects for all integers between -5 and 256, when you create an int in that range you actually just get back a reference to the existing object. So it should be possible to change the value of 1. I suspect the behaviour of Python in this case is undefined. :-)"

This paragraph should be changed to reflect that you can (by construction) mutate anything you want in C, and (as per suggestion of dmalcolm)

"The current implementatin consolidates integers in the range -5 to 256 (inclusive) into singleton instances.  Do not manipulate the internal value of a PyIntObject after creation."

Also, the last line of that paragraph insinuates this functionality (caching of -5 to 256) is undocumented. I searched for a good while for an answer for this, and I didn't find one. If there is something written on the implementation details surrounding why '-5 is -5' works, while -6 is -6' wouldn't. 

If there is nothing written about this, I will put something together. My final question however which I have not been able to find an answer for, is: Is this even necessary functionality?

I encountered around 100 blog posts and a couple of stackoverflow questions about why this fails, and it seems like 1) a source of confusion 2) a point of ridicule. Is it really necessary?

>>> id(1)
4298196440
>>> a = 1
>>> id(a)
4298196440
>>> id(3000)
4320396376
>>> a = 3000
>>> id(a)
4320396160
>>>
msg133770 - (view) Author: Dave Malcolm (dmalcolm) (Python committer) Date: 2011-04-14 21:17
From IRC discussion, how about something like:

The current implementation consolidates integers in the range -5 to 256 (inclusive) into singleton PyIntObject instances, whereas other integer values have unique PyIntObject instances created for them.

This means that on CPython::

   >>> -5 is -5
   True

but::

   >>> -6 is not -6
   False

This behavior is an implementation detail of CPython, and is not required by other implementations of Python.

In particular:
  - do not manipulate the internal value of a PyIntObject after creation
  - do not use "is" for comparing integer values, use "==" instead.
msg133771 - (view) Author: Dave Malcolm (dmalcolm) (Python committer) Date: 2011-04-14 21:18
Perhaps should also note that this is done for the purposes of optimization (both speed and memory).
msg133772 - (view) Author: Dave Malcolm (dmalcolm) (Python committer) Date: 2011-04-14 21:20
Interpreter idea: "<int> is <int>" could trigger a compatibility warning, perhaps, to help people avoid relying on CPython quirks
msg133773 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-04-14 21:22
I don't think it's in the business of the C API docs to explain Python-visible semantics, or the difference between "==" and "is". I would just rephrase the original paragraph, removing the last sentence joke:

“Since integer objects are very frequently created, certain optimizations can be applied on their allocation. For example, the current implementation keeps an array of integer objects for all integers between -5 and 256: when you create an int in that range, you actually just get back a reference to the existing object.”
msg133774 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2011-04-14 21:39
We should remove the documentation entries that discuss non-guaranteed implementation details (i.e. which integers are singletons).

Instead, there should probably be a brief tutorial entry on what aspects of object identity people can rely on:

* None, True, and False are singletons.  PEP 8 recommends testing for None with "is".

* Most internal equality comparisons (i.e. that in list.count or list.__contains__) assume that identity-implied-equality regardless of how __eq__ is defined (i.e. an object is always equal to itself).

* Once created, an object doesn't change its identity.  So, you can use "is" to find the exact same object at a later stage in a program.

* Unless documented otherwise (a singleton class telling you that it returns the same object every time), no other assumptions should be made about object identity.  In particular, one cannot assume that an object id won't be re-used after the object is reclaimed.
msg134258 - (view) Author: Anthony Long (antlong) Date: 2011-04-22 02:12
I'll have a doc patch shortly.

Also, I am working on defining a solid range. Memory is not an issue like it was back in 1991 when this range was originally implemented, so we can go higher and get a bigger performance boost. This will be very important (to some, admittedly) in Python 3, where there is no distinction between PyInts and PyLongs (more processing req'd), which could benefit from further optimization of the range.

Going to be doing benchmarking, -256 to 256 seems like a good place to start. If anyone has app's i should benchmark with in mind, feel free to let me know.
msg134261 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2011-04-22 04:34
> Memory is not an issue like it was back in 1991 when 
> this range was originally implemented, so we can go 
> higher and get a bigger performance boost.

Please don't do this.  Memory is still important to a lot of people.  Also, there is *very* little payoff for numbers outside the -5 to 256 range.

If we're to spend memory, we can do it in other places (like bigger freelists or somesuch).

> We should remove the documentation entries that discuss 
> non-guaranteed implementation details

FWIW, I've changed my thinking on this.  With documentation, these details are very difficult to find-out about.  Instead of removing them, they should be marked as non-guaranteed implementation specific details or they can be moved to a separate section.
msg134262 - (view) Author: Anthony Long (antlong) Date: 2011-04-22 04:39
My plan is to document it, as it exists, in the current implementation. That's a start atleast, and will provide an entry point for further documentation in the future should it be changed again.
msg134401 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2011-04-25 18:45
The range of interned ints was once much smaller, but it was expanded upwards to 256 so that the bytes extracted from bytes and bytearray objects, as when indexing or iterating, would *all* be pre-allocated objects. I should presume that their indexers and iterators are cognizant of this and take advantage of this to bypass int() creation and directly index into the the array of preallocated ints without a range check.

As for space and some time saving, just on startup,
a nearly fresh IDLE shell shows

>>> sum((sys.getrefcount(i) for i in range(-5, 257)))
4432

There are hundreds of references for each of 0 and 1.
msg350226 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2019-08-22 18:55
The current wording seems to have sufficed for users of the C API.  Also, it is an implementation detail subject to change.  Discussions on being able to mutate anything in C are covered in the ctypes module.
History
Date User Action Args
2022-04-11 14:57:16adminsetgithub: 56055
2019-08-22 18:55:16rhettingersetstatus: open -> closed
resolution: accepted -> not a bug
messages: + msg350226

stage: resolved
2011-04-25 18:45:57terry.reedysetnosy: + terry.reedy
messages: + msg134401
2011-04-24 22:39:05nedbatsetnosy: + nedbat
2011-04-22 04:39:06antlongsetmessages: + msg134262
2011-04-22 04:34:46rhettingersetassignee: rhettinger
messages: + msg134261
2011-04-22 02:12:14antlongsetresolution: accepted
messages: + msg134258
title: Implementation question for (-5) - 256 caching, and doc update for c-api/int.html -> Remove non-guaranteed implementation details from docs.
2011-04-15 17:04:46eric.araujosetnosy: + eric.araujo
2011-04-15 04:21:21ezio.melottisetnosy: + ezio.melotti
2011-04-14 21:39:24rhettingersetnosy: + rhettinger
messages: + msg133774
2011-04-14 21:22:01pitrousetnosy: + pitrou
messages: + msg133773
2011-04-14 21:20:25dmalcolmsetmessages: + msg133772
2011-04-14 21:18:34dmalcolmsetmessages: + msg133771
2011-04-14 21:17:05dmalcolmsetnosy: + dmalcolm
messages: + msg133770
2011-04-14 21:12:46antlongcreate