classification
Title: implementation details in sys module
Type: behavior Stage: needs patch
Components: Documentation Versions: Python 3.2, Python 3.3, Python 3.4, Python 2.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: docs@python Nosy List: arigo, brett.cannon, docs@python, fijall, loewis, lukasz.langa, pitrou, terry.reedy
Priority: normal Keywords:

Created on 2011-01-24 13:43 by fijall, last changed 2013-01-22 12:49 by ezio.melotti.

Messages (14)
msg126925 - (view) Author: Maciej Fijalkowski (fijall) * (Python committer) Date: 2011-01-24 13:43
sys module documentation (as it is online) has some things that in my opinion should be marked as implementation details, but are not. Feel free to counter why not. 

Some of them has info it should be used for specialized purposes only, but IMO it's not the same as not mandatory for other implementations.

Temporary list:

_clear_type_cache

dllhandle

getrefcount

getdlopenflags (?)

getsizeof - it might be not well defined on other implementations

setdlopenflags

api_version
msg126926 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-01-24 14:02
Well, getsizeof is not better-defined under CPython than elsewhere. It just gives a hint.
Agreed about the other.
msg126927 - (view) Author: Maciej Fijalkowski (fijall) * (Python committer) Date: 2011-01-24 14:05
I suppose wrt getsizeof it's more of "if you provide us with a reasonable expectations, we can implement this" other than anything else.
msg126928 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-01-24 14:13
> I suppose wrt getsizeof it's more of "if you provide us with a
> reasonable expectations, we can implement this" other than anything
> else.

The expectation is that it returns the memory footprint of the given
object, and only it (not taking into account sharing, caching,
dependencies or anything else). For example, an instance will not count
its attribute __dict__. But a str object will count its object header
plus the string payload, if the payload is private.

Of course, you are free to tweak these semantics for the PyPy
implementation.
msg127018 - (view) Author: Armin Rigo (arigo) * (Python committer) Date: 2011-01-25 16:37
> The expectation is that it returns the memory footprint of the given
> object, and only it (not taking into account sharing, caching,
> dependencies or anything else).

It would be nice if this was a well-defined definition, but unfortunately it is not.  For example, string objects may appear different from the user's point of view (e.g. as seen by id() and 'is') but share the implementation's data; they may even share only a part of it (if ropes are enabled).  Conversely, for user-defined objects you would typically think not to count the "shape" information, which is usually shared among several instances -- but then you risk a gross under-estimation in the (rarer) cases where it is not shared.

Another way to look at the "official" definition is to return the size of the object itself and none of its dependencies, because in theory they might be shared; but that would make all strings, lists, tuples, dicts, and so on have a getsizeof() of 8 or 12, which is rather useless.

I hope this clarifies fijal's original comment: "it might be not well defined on other implementations."
msg127023 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-01-25 17:18
> > The expectation is that it returns the memory footprint of the given
> > object, and only it (not taking into account sharing, caching,
> > dependencies or anything else).
> 
> It would be nice if this was a well-defined definition, but
> unfortunately it is not.

I didn't claim it was. Actually, if you read the rest of my message, I
did mention that PyPy could tweak the semantics if it made more sense.
So, of course, the more sharing and caching takes place, the less
obvious these semantics are, but even with CPython they are not obvious
anyway. It's not supposed to be an exact measurement for the common
developer, rather a hint that experts can use to tweak their data
structures and algorithms; you need to know details of your VM's
implementation to use that information.
msg127025 - (view) Author: Maciej Fijalkowski (fijall) * (Python committer) Date: 2011-01-25 17:41
I can hardly think about a specification that would potentially help me identify actual sizes. Even as a rough estimation. Which experts you had in mind?
msg127026 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-01-25 17:43
> Which experts you had in mind?

People who know how the Python implementation works.
msg127027 - (view) Author: Maciej Fijalkowski (fijall) * (Python committer) Date: 2011-01-25 17:52
> > Which experts you had in mind?

> People who know how the Python implementation works.

I'm serious. What semantics would make sense to anyone? Even if you know implementation quite well a single number per object does not provide enough information.
msg127028 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2011-01-25 17:59
You could return -1 for everything. =)

In all seriousness, it could simply be proportional. IMO as long as people realize if a list takes up less space than a dict then the numbers seem fine to me.
msg127029 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-01-25 18:00
> Even if you know implementation quite well a single number per object
> does not provide enough information.

Enough information for what? It can certainly provide information about
the overhead of that particular object (again, regardless of sharing).
msg127878 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2011-02-04 09:42
I can propose a specification of getsizeof: if you somehow manage to traverse all objects (without considering an object twice), and sum up the getsizeof results, you should end up with something close to, but smaller than the actual memory consumption. How close is a quality-of-implementation issue (so always returning 0 would be correct-but-useless).

It may be that implementations can also support counting certain hidden memory usage (headers, blocks shared across instances that are not objects themselves). Such functions would should have different names and interfaces (e.g. sys.gethiddenblocks(o) may return a list of (address, size) pairs); CPython doesn't provide any such function (although sys.mallocoverhead might be useful).

In any case: I'm not convinced that it is useful to mark functions as CPython-specific in the documentation. This clutters the documentation, and is of interest only for language lawyers. So if implementation details are to be documented, I'd prefer this to happen in a separate document.
msg127900 - (view) Author: Armin Rigo (arigo) * (Python committer) Date: 2011-02-04 16:22
Martin: I kind of agree with you, although I guess that for pratical reasons if you don't have a reasonable sys.getsizeof() implementation then it's better to raise TypeError than return 0 (like CPython, which may raise "TypeError: Type %.100s doesn't define __sizeof__").

I agree that it's not really useful to mark functions as CPython-specific in the documentation, if only because whenever a new implementation like PyPy comes along, then it's going to have a rather different set of functions that it wants to consider implementation details.  I would say that more than half the functions in the sys module marked CPython-specific in the doc are implemented in PyPy just fine, and there is an equal number of functions not marked CPython-specific that have no chance to be implemented in PyPy.
msg136703 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2011-05-23 22:06
The __sizeof__ special attribute shows up in dir(object) but appears not to be documented other than with

>>> help(object.__sizeof__)
Help on method_descriptor:
__sizeof__(...)
    __sizeof__() -> size of object in memory, in bytes

Should it have an entry in Lib 4.12. Special Attributes?

object.__sizeof__
    A method used by sys.getsizeof.

It should then show up in the index (missing now) and point people to sys.getsizeof.  Looking further, I see that it is mentioned but not indexed in the sys.getsizeof entry.
History
Date User Action Args
2013-01-22 12:49:37ezio.melottisetversions: + Python 3.3, Python 3.4, - Python 3.1
2011-05-23 22:06:57terry.reedysetnosy: + terry.reedy
messages: + msg136703
2011-02-04 16:22:12arigosetnosy: loewis, brett.cannon, arigo, pitrou, docs@python, lukasz.langa, fijall
messages: + msg127900
2011-02-04 09:42:09loewissetnosy: + loewis
messages: + msg127878
2011-01-25 18:00:33pitrousetnosy: brett.cannon, arigo, pitrou, docs@python, lukasz.langa, fijall
messages: + msg127029
2011-01-25 17:59:46brett.cannonsetnosy: brett.cannon, arigo, pitrou, docs@python, lukasz.langa, fijall
messages: + msg127028
2011-01-25 17:52:41fijallsetnosy: brett.cannon, arigo, pitrou, docs@python, lukasz.langa, fijall
messages: + msg127027
2011-01-25 17:43:37pitrousetnosy: brett.cannon, arigo, pitrou, docs@python, lukasz.langa, fijall
messages: + msg127026
2011-01-25 17:41:54fijallsetnosy: brett.cannon, arigo, pitrou, docs@python, lukasz.langa, fijall
messages: + msg127025
2011-01-25 17:18:16pitrousetnosy: brett.cannon, arigo, pitrou, docs@python, lukasz.langa, fijall
messages: + msg127023
2011-01-25 16:37:16arigosetnosy: + arigo
messages: + msg127018
2011-01-24 17:37:19brett.cannonsetnosy: + brett.cannon
2011-01-24 14:13:31pitrousetnosy: pitrou, docs@python, lukasz.langa, fijall
messages: + msg126928
2011-01-24 14:05:28fijallsetnosy: pitrou, docs@python, lukasz.langa, fijall
messages: + msg126927
2011-01-24 14:02:55pitrousetnosy: + pitrou
messages: + msg126926
2011-01-24 13:49:10lukasz.langasetnosy: + lukasz.langa
2011-01-24 13:46:14pitrousetnosy: docs@python, fijall
stage: needs patch
versions: + Python 3.1, Python 2.7, Python 3.2
2011-01-24 13:43:39fijallcreate