classification
Title: Reducing tee() memory footprint
Type: enhancement Stage: needs patch
Components: Documentation Versions: Python 3.2, Python 3.3, Python 3.4, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: rhettinger Nosy List: cami, docs@python, pconnell, python-dev, rhettinger
Priority: low Keywords:

Created on 2012-11-03 15:15 by cami, last changed 2016-04-26 07:11 by rhettinger. This issue is now closed.

Files
File name Uploaded Description Edit
tee.py cami, 2012-11-03 15:15 Sample code for global-queue tee()
Messages (8)
msg174632 - (view) Author: Carsten Milkau (cami) Date: 2012-11-03 15:15
The memory footprint of itertools.tee can be reduced substantially by using a shared buffer for the child iterators (see sample code). If local queues are desired for efficient threading support, they can be combined with a global queue, allowing to constrain the size of local queues.
msg174653 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-11-03 17:21
Why do you think that your implementation uses less memory? You have some measurements confirming it?
msg174659 - (view) Author: Carsten Milkau (cami) Date: 2012-11-03 18:19
No. The sample code is a demonstration how to do it, it's by no means a full-fledged patch.

The drawback of the current implementation is that if you tee n-fold, and then advance one of the iterators m times, it fills n queues with m references each, for a total of (n-1)*m references. The documentation explicitly mentions this is unfortunate.

I only demonstrate that it is perfectly unnecessary to fill n separate queues, as you can use a single queue and index into it. Instead of storing duplicate references, you can store a counter with each cached item reference. Replacing duplicates by refcounts, it turns (n-1)*m references into 2*m references (half of which being the counters). 

Not in the demo code: you can improve this further by storing items in iterator-local queues when that iterator is the only one that still needs to return it, and in a shared queue with refcount when there are more of them. That way, you eleminate the overhead of storing (item, 1) instead of item for a fix-cost per-iterator.
msg174669 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-11-03 18:51
> The documentation explicitly mentions this is unfortunate.

Ah, here's the thing.  The code in the documentation is provided only for the explanation.  In fact tee() is implemented in the same way that you suggest (but uses a more efficient deque implementation).  I propose to close this issue as invalid.  Thanks for the suggestion.
msg174674 - (view) Author: Carsten Milkau (cami) Date: 2012-11-03 19:11
Oh great! Then I can use it as-is. How about reassigning the issue to documentation (for clarifying the inefficiency warning)?
msg254552 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-11-12 20:12
Raymond, do you want to add a clarification to the documentation, or prefer just to close this issue?
msg254694 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2015-11-15 20:48
Unlike the other "equivalents" in the docs, this one is further away from the actual implementation.  So, I think I should add a clarification that the example code is a "rough logical equivalent" but that actual implementation may have shared queues (Jython, PyPy, and IronPython are free to have their own implementation choices).
msg264222 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2016-04-26 07:10
New changeset f09306f9fa6f by Raymond Hettinger in branch 'default':
Issue #16394:  Note the tee() pure python equivalent is only a rough approximation.
https://hg.python.org/cpython/rev/f09306f9fa6f
History
Date User Action Args
2016-04-26 07:11:17rhettingersetstatus: open -> closed
resolution: fixed
2016-04-26 07:10:08python-devsetnosy: + python-dev
messages: + msg264222
2015-12-12 10:09:36serhiy.storchakasetnosy: - serhiy.storchaka
2015-11-15 20:48:48rhettingersetpriority: normal -> low

messages: + msg254694
2015-11-12 20:12:18serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg254552
2013-04-19 20:02:15rhettingersetassignee: docs@python -> rhettinger

nosy: + rhettinger
2013-04-19 18:49:21pconnellsetnosy: + pconnell
2012-11-04 02:53:33Ramchandra Aptesettitle: Improving tee() memory footprint -> Reducing tee() memory footprint
2012-11-03 19:25:54serhiy.storchakasetnosy: - serhiy.storchaka
stage: needs patch
type: performance -> enhancement

versions: - Python 2.6, Python 3.1
2012-11-03 19:11:44camisetassignee: docs@python

components: + Documentation, - Library (Lib)
nosy: + docs@python
2012-11-03 19:11:21camisetmessages: + msg174674
2012-11-03 18:51:25serhiy.storchakasetmessages: + msg174669
2012-11-03 18:19:37camisetmessages: + msg174659
versions: + Python 2.6, Python 3.1, Python 2.7, Python 3.2, Python 3.3
2012-11-03 17:21:36serhiy.storchakasetnosy: + serhiy.storchaka

messages: + msg174653
versions: - Python 2.6, Python 3.1, Python 2.7, Python 3.2, Python 3.3
2012-11-03 15:15:19camicreate