Improve pickling efficiency of itertools.cycle #69062

rhettinger · 2015-08-15T21:23:08Z

BPO	24874
Nosy	@rhettinger, @pitrou, @avassalotti, @serhiy-storchaka
Files	time_cycle.py: Simple timing suite for cycle() cycle5_brokensetstate.diff: Partial patch -- still needs work on setstate cycle9.diff: More complete patch that passes all tests cycle_reduce_2.patch: Simpler, faster and more memory efficient pickling cycle_reduce_3.patch: Faster unpickled cycle object

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = 'https://github.com/rhettinger'
closed_at = <Date 2015-08-16.21:52:00.060>
created_at = <Date 2015-08-15.21:23:07.618>
labels = ['extension-modules', 'performance']
title = 'Improve pickling efficiency of itertools.cycle'
updated_at = <Date 2015-08-16.21:52:00.059>
user = 'https://github.com/rhettinger'

bugs.python.org fields:

activity = <Date 2015-08-16.21:52:00.059>
actor = 'rhettinger'
assignee = 'rhettinger'
closed = True
closed_date = <Date 2015-08-16.21:52:00.060>
closer = 'rhettinger'
components = ['Extension Modules']
creation = <Date 2015-08-15.21:23:07.618>
creator = 'rhettinger'
dependencies = []
files = ['40185', '40186', '40188', '40189', '40190']
hgrepos = []
issue_num = 24874
keywords = ['patch']
message_count = 8.0
messages = ['248662', '248664', '248669', '248674', '248675', '248677', '248692', '248693']
nosy_count = 5.0
nosy_names = ['rhettinger', 'pitrou', 'alexandre.vassalotti', 'python-dev', 'serhiy.storchaka']
pr_nums = []
priority = 'low'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'resource usage'
url = 'https://bugs.python.org/issue24874'
versions = ['Python 3.6']

rhettinger · 2015-08-15T21:32:26Z

When a cycle object has fully consumed its input iterable, __reduce__ method uses the returns a space-inefficient result when space-efficient alternative is available.

# Current way of restoring a cycle object with excess info in setstate:
>>> c = cycle(iter('de'))
>>> c.__setstate__((['a', 'b', 'c', 'd', 'e'], 1))
>>> ''.join(next(c) for i in range(20)) # next 20 values
'deabcdeabcdeabcdeabc'

# The same result can be achieved with less information: 
>>> c = cycle(iter('de'))
>>> c.__setstate__((['a', 'b', 'c'], 0))
>>> ''.join(next(c) for i in range(20)) # next 20 values
'deabcdeabcdeabcdeabc'

rhettinger · 2015-08-15T21:54:16Z

Also, looking at the source for itertools.cycle(), it looks like the overall speed could be boosted considerably by looping over the saved list directly rather than allocating a new list iterator every time the cycle loops around.

rhettinger · 2015-08-15T23:28:53Z

Attaching a partial patch:

More than doubles the speed of cycle()
Cuts size of __reduce__ result by about a third (on average)
Still needs work on __setstate__ for a correct restore.

serhiy-storchaka · 2015-08-16T04:57:52Z

Current cycle implementation is simple and clever, but can be optimized. The part about iterating LGTM (but looks the firstpass field can be eliminated at all). But __reduce__ doesn't look so optimal. It makes a copy of a list and makes iterating an unpickled cycle object slow. It would be more optimal if create new list with rotated content or even rotate original list inplace.

rhettinger · 2015-08-16T05:52:39Z

Added an updated patch that passes all tests.

serhiy-storchaka · 2015-08-16T08:44:10Z

Original Raymonds reason in msg248662 is not valid. Pickling a cycle object that fully consumed its input iterable is already space-inefficient.

>>> import itertools, pickle, pickletools
>>> c = itertools.cycle(iter('abcde'))
>>> [next(c) for i in range(8)]
['a', 'b', 'c', 'd', 'e', 'a', 'b', 'c']
>>> pickle.dumps(c)
b'\x80\x03citertools\ncycle\nq\x00cbuiltins\niter\nq\x01]q\x02(X\x01\x00\x00\x00aq\x03X\x01\x00\x00\x00bq\x04X\x01\x00\x00\x00cq\x05X\x01\x00\x00\x00dq\x06X\x01\x00\x00\x00eq\x07e\x85q\x08Rq\tK\x03b\x85q\nRq\x0bh\x02K\x01\x86q\x0cb.'
>>> pickletools.dis(pickle.dumps(c))
    0: \x80 PROTO      3
    2: c    GLOBAL     'itertools cycle'
   19: q    BINPUT     0
   21: c    GLOBAL     'builtins iter'
   36: q    BINPUT     1
   38: ]    EMPTY_LIST
   39: q    BINPUT     2
   41: (    MARK
   42: X        BINUNICODE 'a'
   48: q        BINPUT     3
   50: X        BINUNICODE 'b'
   56: q        BINPUT     4
   58: X        BINUNICODE 'c'
   64: q        BINPUT     5
   66: X        BINUNICODE 'd'
   72: q        BINPUT     6
   74: X        BINUNICODE 'e'
   80: q        BINPUT     7
   82: e        APPENDS    (MARK at 41)
   83: \x85 TUPLE1
   84: q    BINPUT     8
   86: R    REDUCE
   87: q    BINPUT     9
   89: K    BININT1    3
   91: b    BUILD
   92: \x85 TUPLE1
   93: q    BINPUT     10
   95: R    REDUCE
   96: q    BINPUT     11
   98: h    BINGET     2
  100: K    BININT1    1
  102: \x86 TUPLE2
  103: q    BINPUT     12
  105: b    BUILD
  106: .    STOP
highest protocol among opcodes = 2

An internal iterator is not pickled as iter("de"), but as an iterator of the list ["a", "b", "c", "d", "e"] with 3 items consumed. This list also saved as a part of a cycle object state, but not as a copy, but as a reference.

There are two alternative patches. Both keep Raymonds optimization of cycle iterating, but have advantages. cycle_reduce_2.patch makes __reduce__ faster and more memory efficient than Raymonds variant. cycle_reduce_3.patch makes unpickled cycle object so optimized as original.

python-dev · 2015-08-16T21:49:42Z

New changeset 17b5c8ba6875 by Raymond Hettinger in branch 'default':
Issue bpo-24874: Speed-up itertools and make it pickles more compact.
https://hg.python.org/cpython/rev/17b5c8ba6875

rhettinger · 2015-08-16T21:52:00Z

Applied the cycle2 patch but kept the signature the same as the original reduce (using a number instead of a boolean).

rhettinger self-assigned this Aug 15, 2015

rhettinger added extension-modules C modules in the Modules dir performance Performance or resource usage labels Aug 15, 2015

rhettinger closed this as completed Aug 16, 2015

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve pickling efficiency of itertools.cycle #69062

Improve pickling efficiency of itertools.cycle #69062

rhettinger commented Aug 15, 2015

rhettinger commented Aug 15, 2015

rhettinger commented Aug 15, 2015

rhettinger commented Aug 15, 2015

serhiy-storchaka commented Aug 16, 2015

rhettinger commented Aug 16, 2015

serhiy-storchaka commented Aug 16, 2015

python-dev mannequin commented Aug 16, 2015

rhettinger commented Aug 16, 2015

Improve pickling efficiency of itertools.cycle #69062

Improve pickling efficiency of itertools.cycle #69062

Comments

rhettinger commented Aug 15, 2015

rhettinger commented Aug 15, 2015

rhettinger commented Aug 15, 2015

rhettinger commented Aug 15, 2015

serhiy-storchaka commented Aug 16, 2015

rhettinger commented Aug 16, 2015

serhiy-storchaka commented Aug 16, 2015

python-dev mannequin commented Aug 16, 2015

rhettinger commented Aug 16, 2015