classification
Title: reduce uuid.UUID() memory footprint
Type: resource usage Stage: resolved
Components: Library (Lib) Versions: Python 3.8
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Nir Soffer, serhiy.storchaka, taleinat, vstinner, wbolster
Priority: normal Keywords: patch

Created on 2017-07-20 17:15 by wbolster, last changed 2018-09-10 13:11 by taleinat. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 2785 closed wbolster, 2017-07-20 17:15
PR 9078 merged taleinat, 2018-09-06 07:01
PR 9106 merged taleinat, 2018-09-07 21:11
PR 9133 merged taleinat, 2018-09-10 12:53
Messages (11)
msg298738 - (view) Author: wouter bolsterlee (wbolster) * Date: 2017-07-20 17:15
memory usage for uuid.UUID seems larger than it has to be. it seems that using __slots__ will save around ~100 bytes per instance, which is very significant, e.g. when dealing with large sets of uuids (which are often used as "primary keys" into external data stores).

uuid.UUID has a __setattr__ that prevents any extra attributes to be
set:

    def __setattr__(self, name, value):
        raise TypeError('UUID objects are immutable')

...so it seems to me not having __dict__ should not cause any problems?


before (RES column):

>>> import uuid
>>> y = {uuid.uuid4() for _ in range(1000000)}

  PID USER      PRI  NI  VIRT   RES   SHR S CPU% MEM%   TIME+  Command
23020 wbolster   20   0  315M  253M  7256 S  0.0  1.6  0:04.98 python

with slots:

>>> import uuid
>>> y = {uuid.uuid4() for _ in range(1000000)}

  PID USER      PRI  NI  VIRT   RES   SHR S CPU% MEM%   TIME+  Command
21722 wbolster   20   0  206M  145M  7348 S  0.0  0.9  0:05.03 python

i will open a pr on github shortly.
msg298739 - (view) Author: wouter bolsterlee (wbolster) * Date: 2017-07-20 17:19
as a follow-up note, i also experimented with keeping the actual value as a bytes object instead of an integer, but that does not lead to less memory being used: a 128-bit integer uses less memory than a 16 byte bytes object (presumably because PyBytesObject has a cached hash() field and a trailing null byte).
msg298757 - (view) Author: Nir Soffer (Nir Soffer) Date: 2017-07-21 00:56
This saves memory, but using str(uuid.uuid4()) requires even less memory.
If you really want to save memory, you can keep the uuid.uuid4().int.

Can you explain someone would like to have 1000000 uuid objects, instead of 1000000 strings? What is the advantage of keeping UUID objects around?
msg298785 - (view) Author: wouter bolsterlee (wbolster) * Date: 2017-07-21 08:57
i consider uuids as low level data types, not as fancy containers, similar to how i view datetime objects. given the native support in e.g. postgresql and many other systems, it's common to deal with uuids.

of course you can convert to/from strings or numbers, but that is cumbersome in many cases. for comparison, one would typically not convert unicode text from/into utf-8 encoded byte strings either, even though the latter will save memory in many cases.

from experience: converting can lead to nasty bugs, e.g. because you forgot about a conversion, and then a uuid string does not compare equal to a uuid.UUID instance, leaving you puzzled.
msg303194 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-09-28 07:05
This change breaks pickle.

You should preserve forward and backward pickle compatibility.

1. Pickle data produced by old Python versions should be unpickleable with a new implementation. Implement __setstate__ for satisfying this.

2. Pickle data produced by a new implementation should be unpickleable in old Python versions. There are many ways to satisfy this, you should choose the most efficient.
msg324665 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2018-09-06 07:10
See new PR which addresses pickle forward and backward compatibility.
msg324671 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-09-06 08:25
I close the issue because of the pickle issue that hasn't been addressed by the wouter bolsterlee (the author) didn't reply for 1 month 1/2.

@wouter bolsterlee: if you still want to work on that issue, you should try to address the pickle issue first, then reopen this issue or maybe create a new issue pointing to this one.
msg324672 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-09-06 08:26
Oops. I missed the fact that Tal created PR 9078. Sorry, I reopen the issue ;-)
msg324682 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2018-09-06 11:34
New changeset 3e2b29dccc3ca9fbc418bfa312ad655782e250f2 by Tal Einat in branch 'master':
bpo-30977: make uuid.UUID use __slots__ (GH-9078)
https://github.com/python/cpython/commit/3e2b29dccc3ca9fbc418bfa312ad655782e250f2
msg324683 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2018-09-06 11:35
Thanks for the suggestion and the original patch, Wouter!
msg324923 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2018-09-10 13:11
New changeset 54752533b2ed1c898ffe5ec2e795c6910ee46a39 by Tal Einat in branch 'master':
bpo-30977: rework code changes according to post-merge code review (GH-9106)
https://github.com/python/cpython/commit/54752533b2ed1c898ffe5ec2e795c6910ee46a39
History
Date User Action Args
2018-09-10 13:11:09taleinatsetmessages: + msg324923
2018-09-10 12:53:14taleinatsetpull_requests: + pull_request8586
2018-09-07 21:11:41taleinatsetpull_requests: + pull_request8560
2018-09-06 11:35:57taleinatsetstatus: open -> closed
resolution: fixed
messages: + msg324683
2018-09-06 11:34:32taleinatsetmessages: + msg324682
2018-09-06 08:26:25vstinnersetstatus: closed -> open
resolution: out of date -> (no value)
messages: + msg324672
2018-09-06 08:25:48vstinnersetstatus: open -> closed
resolution: out of date
messages: + msg324671

stage: patch review -> resolved
2018-09-06 07:10:39taleinatsetnosy: + taleinat
messages: + msg324665
2018-09-06 07:01:08taleinatsetkeywords: + patch
stage: patch review
pull_requests: + pull_request8536
2018-02-04 08:06:17serhiy.storchakasetversions: + Python 3.8, - Python 3.7
2017-09-28 13:38:35vstinnersetnosy: + vstinner
2017-09-28 07:05:25serhiy.storchakasetversions: + Python 3.7
nosy: + serhiy.storchaka

messages: + msg303194

components: + Library (Lib)
type: resource usage
2017-07-21 08:57:53wbolstersetmessages: + msg298785
2017-07-21 00:56:24Nir Soffersetnosy: + Nir Soffer
messages: + msg298757
2017-07-20 17:19:58wbolstersetmessages: + msg298739
2017-07-20 17:15:59wbolstersetpull_requests: + pull_request2835
2017-07-20 17:15:28wbolstercreate