This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: More compact pickle of iterators etc
Type: enhancement Stage: patch review
Components: Extension Modules, Interpreter Core Versions: Python 3.6
process
Status: closed Resolution: rejected
Dependencies: 26015 Superseder:
Assigned To: rhettinger Nosy List: alexandre.vassalotti, pitrou, rhettinger, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2015-12-02 10:29 by serhiy.storchaka, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
iterators_pickle_2.patch serhiy.storchaka, 2016-02-08 15:44 review
iterators_pickle_3.patch serhiy.storchaka, 2016-03-06 14:59 review
iterators_pickle_4.patch serhiy.storchaka, 2016-09-07 11:38 review
Messages (8)
msg255699 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-12-02 10:29
Proposed patch makes a number of classes produce more compact pickle data in common case. This includes iterators of list, tuple, str, bytes, bytearray, enumerate, array, deque, iterator object for classes with __getitem__, some itertools iterators, and non-iterator objects: slice, bytearray, deque. This is achieved by omitting default constructor arguments or state.

Exhausted iterators are pickled as iter(()). This is not new, exhausted bytes, and bytearray iterators, and reversed list iterator are already pickled as iter('') or iter([]) correspondingly. iter(()) is just the simplest way to create an empty iterator and it has the most compact pickle representation.

An example.

Unpatched:
>>> import pickle, pickletools, itertools
>>> len(pickle.dumps(itertools.islice('abcdefgh', 4), 3))
80
>>> len(pickletools.optimize(pickle.dumps(itertools.islice('abcdefgh', 4), 3)))
66
>>> pickletools.dis(pickletools.optimize(pickle.dumps(itertools.islice('abcdefgh', 4), 3)))
    0: \x80 PROTO      3
    2: c    GLOBAL     'itertools islice'
   20: (    MARK
   21: c        GLOBAL     'builtins iter'
   36: X        BINUNICODE 'abcdefgh'
   49: \x85     TUPLE1
   50: R        REDUCE
   51: K        BININT1    0
   53: b        BUILD
   54: K        BININT1    0
   56: K        BININT1    4
   58: K        BININT1    1
   60: t        TUPLE      (MARK at 20)
   61: R    REDUCE
   62: K    BININT1    0
   64: b    BUILD
   65: .    STOP
highest protocol among opcodes = 2

Patched:
>>> len(pickle.dumps(itertools.islice('abcdefgh', 4), 3))
69
>>> len(pickletools.optimize(pickle.dumps(itertools.islice('abcdefgh', 4), 3)))
55
>>> pickletools.dis(pickletools.optimize(pickle.dumps(itertools.islice('abcdefgh', 4), 3)))
    0: \x80 PROTO      3
    2: c    GLOBAL     'itertools islice'
   20: c    GLOBAL     'builtins iter'
   35: X    BINUNICODE 'abcdefgh'
   48: \x85 TUPLE1
   49: R    REDUCE
   50: K    BININT1    4
   52: \x86 TUPLE2
   53: R    REDUCE
   54: .    STOP
highest protocol among opcodes = 2
msg259853 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-02-08 15:40
Proposed patch extends tests for pickling iterators of mutable sequences. Now tested iterators in different states: initial (no iterated yet), running (in the middle of iteration), empty (just the last item was emitted) and exhausted (tried to iterate past the end).
msg259854 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-02-08 15:44
Updated patch conforms to new tests and to following rule: exhausted iterators of mutable sequences can be replaced with iter(()), non-exhausted iterators can't.
msg259855 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-02-08 15:47
Ah, I already proposed tests in separate issue.

Raymond, could you please look at patches? The touch the code maintained by you: itertools and deque.
msg261241 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2016-03-06 10:04
My first impression is that this isn't worth the extra special case code.  The savings is very small (only a few bytes) and it isn't common to pickle these iterators.
msg261257 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-03-06 14:57
There is the extra special case code for bytes and bytearray iterators and reversed list iterator. The patch just extends the optimizations to other builtin iterators.

In the example with itertools.islice() the overhead of pickling redundant data is 20%. If there are a lot of iterators, the overhead can be up to 90%.

>>> import pickle, pickletools, itertools
>>> len(pickletools.optimize(pickle.dumps([itertools.islice('abcdefgh', 4) for i in range(1000)], 4)))
Unpatched: 23059
Patched:   12059

Of course this is degenerated case. But if the data contains a lot of initial iterators of the same sequence or of short sequences, or exhausted iterators, the benefit can be not small.

Yet one benefit from the patch is that it speeds up copying and deepcopying initial and exhausted iterators.

$ ./python -m timeit -s "from itertools import islice; from copy import copy; it = islice('abcdefgh', 4)" -- "copy(it)"
Unpatched: 7.37 usec per loop
Patched:   6.4 usec per loop

$ ./python -m timeit -s "from itertools import islice; from copy import deepcopy; it = islice('abcdefgh', 4)" -- "deepcopy(it)"
Unpatched: 41.7 usec per loop
Patched:   32.6 usec per loop

$ ./python -m timeit -s "from copy import copy; it = iter('abcdefgh')" -- "copy(it)"
Unpatched: 10.4 usec per loop
Patched:   9.67 usec per loop

$ ./python -m timeit -s "from copy import deepcopy; it = iter('abcdefgh')" -- "deepcopy(it)"
Unpatched: 21.1 usec per loop
Patched:   18.3 usec per loop

$ ./python -m timeit -s "from copy import copy; it = iter(list('abcdefgh'))" -- "copy(it)"
Unpatched: 10.3 usec per loop
Patched:   9.54 usec per loop

$ ./python -m timeit -s "from copy import deepcopy; it = iter(list('abcdefgh'))" -- "deepcopy(it)"
Unpatched: 39.7 usec per loop
Patched:   36.8 usec per loop
msg261258 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-03-06 14:59
The patch is synchronized with current sources. Added mandatory braces.
msg275974 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2016-09-12 06:30
I would like to pass on this one.  It isn't common to pickle iterators, the savings is very small, and some of the code looks less pleasant.
History
Date User Action Args
2022-04-11 14:58:24adminsetgithub: 69962
2016-09-12 06:30:24rhettingersetstatus: open -> closed
resolution: rejected
messages: + msg275974
2016-09-07 11:38:52serhiy.storchakasetfiles: + iterators_pickle_4.patch
2016-03-06 14:59:09serhiy.storchakasetfiles: + iterators_pickle_3.patch

messages: + msg261258
2016-03-06 14:57:37serhiy.storchakasetmessages: + msg261257
2016-03-06 10:04:37rhettingersetmessages: + msg261241
2016-03-06 07:28:29serhiy.storchakasetassignee: rhettinger
2016-02-08 15:47:10serhiy.storchakasetmessages: + msg259855
2016-02-08 15:45:00serhiy.storchakasetfiles: - iterators_pickle_tests.patch
2016-02-08 15:44:24serhiy.storchakasetfiles: + iterators_pickle_2.patch

messages: + msg259854
2016-02-08 15:40:40serhiy.storchakasetfiles: + iterators_pickle_tests.patch

messages: + msg259853
2016-02-08 15:28:07serhiy.storchakasetfiles: - iterators_pickle.diff
2016-01-05 14:47:59serhiy.storchakasetdependencies: + Add new tests for pickling iterators of mutable sequences
2015-12-02 10:29:15serhiy.storchakacreate