This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author josh.r
Recipients josh.r, ncoghlan, pmoody, r.david.murray, rhettinger, sbromberger, serhiy.storchaka
Date 2014-12-23.23:07:21
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1419376042.44.0.12430024255.issue23103@psf.upfronthosting.co.za>
In-reply-to
Content
So, just to be clear, checking the implementation (as of 3.4):

1. All the ipaddress classes leave __slots__ unset. So the overhead is actually higher than 56 bytes per instance; the __dict__ for an IPv4Address (ignoring the actual keys and values in the dict, just looking at the size of the dict itself), on Python 3.4, 64 bit Windows, is 56 + 288 = 344 bytes. Based on the size of the dict, I'm guessing some instance attributes aren't being set eagerly in __init__, so it's not getting the reduced memory usage from shared key dictionaries either.
2. It looks like as of 3.4, _version and _max_prefixlen were both instance attributes; the current code base seems to have made _max_prefixlen a class attribute, but left _version as an instance attribute (anything set on self is an instance attribute; doesn't matter if it's set in __init__ of a base class)
3. Even if we switch to using __slots__, remove support for weak references and user added attributes, and store nothing but the raw address encoded as an int, you're never going to get true "flyweight" objects in CPython. The lowest instance cost you can get for a class defined in Python (on a system with 64 bit pointers) inheriting from object, is 40 bytes, plus 8 bytes per slot (plus the actual cost of the int object, which is another 16 bytes for an IPv4 address on a 64 bit OS). If you want it lighter-weight than that, either you need to inherit from another built-in that it even lighter-weight, or you need to implement it in C. If you built IPv4Address on top of int for instance, you could get the instance cost down to 16 bytes. Downside to inheriting from int is that you'll interact with many other objects as if you were an int, which can cause a lot of unexpected behaviors (as others have noted).

Basically, there is no concept of "flyweight" object in Python. Aside from implementing it in C or inheriting from int, you're going to be using an absolute minimum (on 64 bit build) of 48 bytes for the basic object structure, plus another 16 for the int itself, 64 bytes total, or about 16x the "real" data being stored. Using a WeakValueDictionary would actually require another 8 bytes per instance (a slot for __weakref__), and the overhead of the dictionary would likely be higher than the memory saved (unless you're regularly creating duplicate IP addresses and storing them for the long haul, but I suspect this is a less common use case than processing many different IP addresses once or twice).

I do think it would be a good idea to use __slots__ properly to get the memory down below 100 bytes per instance, but adding automatic "IP address interning" is probably not worth the effort. In the longer term, a C implementation of ipaddress might be worth it, not for improvements in computational performance, but to get the memory usage down for applications that need to make millions of instances.
History
Date User Action Args
2014-12-23 23:07:22josh.rsetrecipients: + josh.r, rhettinger, ncoghlan, pmoody, r.david.murray, serhiy.storchaka, sbromberger
2014-12-23 23:07:22josh.rsetmessageid: <1419376042.44.0.12430024255.issue23103@psf.upfronthosting.co.za>
2014-12-23 23:07:22josh.rlinkissue23103 messages
2014-12-23 23:07:21josh.rcreate