classification
Title: 'enumerate' 'start' parameter documentation is confusing
Type: behavior Stage:
Components: Documentation Versions: Python 3.2, Python 3.3, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: rhettinger Nosy List: eric.araujo, phammer, python-dev, r.david.murray, rhettinger, terry.reedy
Priority: low Keywords:

Created on 2011-04-20 16:08 by phammer, last changed 2011-06-25 13:02 by rhettinger. This issue is now closed.

Messages (8)
msg134162 - (view) Author: Peter Hammer (phammer) Date: 2011-04-20 16:08
"""
A point of confusion using the builtin function 'enumerate' and
enlightenment for those who, like me, have been confused.

Note, this confusion was discussed at length at

  http://bugs.python.org/issue2831

prior to the 'start' parameter being added to 'enumerate'.  The
confusion discussed herein was forseen in that discussion, and
ultimately discounted.  There remains, IMO, an issue with the
clarity of the documentation that needs to be addressed.  That
is, the closed issue at

  http://bugs.python.org/issue8635

concerning the 'enumerate' docstring does not address the confusion
that prompted this posting.

Consider:

x=['a','b','c','d','e']
y=['f','g','h','i','j']
print 0,y[0]
for i,c in enumerate(y,1):
  print i,c
  if c=='g':
    print x[i], 'y[%i]=g' % (i)
    continue
  print x[i]


This code produces the following unexpected output, using python 2.7,
which is apparently the correct behavior (see commentary below).  This
example is an abstract simplification of a program defect encountered
in practice:

>>> 
0 f
1 f
b
2 g
c y[2]=g
3 h
d
4 i
e
5 j

Traceback (most recent call last):
  File "Untitled", line 9
    print x[i]
IndexError: list index out of range


Help on 'enumerate' yields:

>>> help(enumerate)
Help on class enumerate in module __builtin__:

class enumerate(object)
 |  enumerate(iterable[, start]) -> iterator for index, value of iterable
 |  
 |  Return an enumerate object.  iterable must be another object that supports
 |  iteration.  The enumerate object yields pairs containing a count (from
 |  start, which defaults to zero) and a value yielded by the iterable argument.
 |  enumerate is useful for obtaining an indexed list:
 |      (0, seq[0]), (1, seq[1]), (2, seq[2]), ...
 |  
 |  Methods defined here:
 |  
 |  __getattribute__(...)
 |      x.__getattribute__('name') <==> x.name
 |  
 |  __iter__(...)
 |      x.__iter__() <==> iter(x)
 |  
 |  next(...)
 |      x.next() -> the next value, or raise StopIteration
 |  
 |  ----------------------------------------------------------------------
 |  Data and other attributes defined here:
 |  
 |  __new__ = <built-in method __new__ of type object>
 |      T.__new__(S, ...) -> a new object with type S, a subtype of T

>>> 

Commentary:

The expected output was:
>>>
0 f
1 g
b y[2]=g
2 h
c
3 i
d
4 j
e
>>>

That is, it was expected that the iterator would yield a value
corresponding to the index, whether the index started at zero or not.
Using the notation of the doc string, with start=1, the expected
behavior was:

 |      (1, seq[1]), (2, seq[2]), (3, seq[3]), ...

while the actual behavior is:

 |      (1, seq[0]), (2, seq[1]), (3, seq[2]), ...

The practical problem in the real world code was to do something
special with the zero index value of x and y, then run through the
remaining values, doing one of two things with x and y, correlated,
depending on the value of y.

I can see now that the doc string does in fact correctly specify the
actual behavior: nowhere does it say the iterator will begin at any
other place than the beginning, so this is not a python bug.  I do
however question the general usefulness of such behavior.  Normally,
indices and values are expected to be correlated.

The correct behavior can be simply implemented without using
'enumerate':

x=['a','b','c','d','e']
y=['f','g','h','i','j']
print 0,y[0]
for i in xrange(1,len(y)):
  c=y[i]
  print i,c
  if c=='g':
    print x[i], 'y[%i]=g' % (i)
    continue
  print x[i]

This produces the expected results.

If one insists on using enumerate to produce the correct behavior in
this example, it can be done as follows:
"""
x=['a','b','c','d','e']
y=['f','g','h','i','j']
seq=enumerate(y)
print '%s %s' % seq.next()
for i,c in seq:
  print i,c
  if c=='g':
    print x[i], 'y[%i]=g' % (i)
    continue
  print x[i]
"""
This version produces the expected results, while achieving clarity
comparable to that which was sought in the original incorrect code.

Looking a little deeper, the python documentation on enumerate states:

enumerate(sequence[, start=0])
Return an enumerate object. sequence must be a sequence, an iterator,
or some other object which supports iteration. The next() method of the
iterator returned by enumerate() returns a tuple containing a count
(from start which defaults to 0) and the corresponding value obtained
from iterating over iterable. enumerate() is useful for obtaining an
indexed series:
  (0, seq[0]), (1, seq[1]), (2, seq[2]),


This makes a pretty clear implication the value corresponds to the
index, so perhaps there really is an issue here.  Have at it.  I'm
going back to work, using 'enumerate' as it actually is, now that I
clearly understand it.

One thing is certain: the documentation has to be clarified, for the
confusion foreseen prior to adding the start parameter is very real.
"""
msg134169 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-04-20 17:52
If you know what an iterator is, the documentation, it seems to me, is clear.  That is, an iterator cannot be indexed, so the behavior you expected could not be implemented by enumerate.

That doesn't meant the docs shouldn't be improved.  An example with a non-zero start would make things clear.
msg134290 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2011-04-23 01:31
Note: 3.x correct gives the signature at enumerate(iterable, start) rather that enumerate(sequence, start).

I agree that the current entry is a bit awkward. Perhaps the doc would be clearer with a reference to zipping. Removing the unneeded definition of *iterable* (which should be linked to the definition in the glossary, along with *iterator*), my suggestion is:
'''
enumerate(iterable, start=0)
Return an enumerate object, an *iterator* of tuples, that zips together a sequence of counts and *iterable*. Each tuple contain a count and an item from *iterable*, in that order. The counts begin with *start*, which defaults to 0. enumerate() is useful for obtaining an indexed series: enumerate(seq) produces (0, seq[0]), (1, seq[1]), (2, seq[2]), .... For another example, which uses *start*:

>>> for i, season in enumerate(['Spring','Summer','Fall','Winter'], 1):
...     print(i, season)
1 Spring
2 Summer
3 Fall
4 Winter
'''
Note that I changed the example to use a start of 1 instead of 0, to produce a list in traditional form, which is one reason to have the parameter!
msg134302 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011-04-23 14:42
+1 to what David says.

Terry’s patch is a good starting point; I think Raymond will commit something along its lines.
msg134311 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2011-04-23 16:54
I've got it from here.  Thanks.
msg136824 - (view) Author: Peter Hammer (phammer) Date: 2011-05-25 03:22
"""
Changing the 'enumerate' doc string text from:

|      (0, seq[0]), (1, seq[1]), (2, seq[2]), ...

to:

|      (start, seq[0]), (start+1, seq[1]), (start+2, seq[2]), ...

would completely disambiguate the doc string at the modest cost of
sixteen additional characters, a small price for pellucid clarity.

The proposed changes to the formal documentation also seem to me to
be prudent, and I hope at this late writing, they have already been
committed.

I conclude with a code fragment for the edification of R. David Murray.
"""


class numerate(object):
  """
  A demonstration of a plausible incorrect interpretation of
  the 'enumerate' function's doc string and documentation.
  """
  def __init__(self,seq,start=0):
    self.seq=seq; self.index=start-1
    try:
      if seq.next: pass #test for iterable
      for i in xrange(start): self.seq.next()
    except:
      if type(seq)==dict: self.seq=seq.keys()
      self.seq=iter(self.seq[start:])

  def next(self):
    self.index+=1
    return self.index,self.seq.next()
        

  def __iter__(self): return self


if __name__ == "__main__":
  #s=['spring','summer','autumn','winter']
  s={'spring':'a','summer':'b','autumn':'c','winter':'d'}
  #s=enumerate(s)#,2)
  s=numerate(s,2)
  for t in s: print t
msg139051 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011-06-25 12:57
New changeset 0ca8ffffd90b by Raymond Hettinger in branch '2.7':
Issue 11889: Clarify docs for enumerate.
http://hg.python.org/cpython/rev/0ca8ffffd90b
msg139054 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011-06-25 13:01
New changeset d0df12b32522 by Raymond Hettinger in branch '3.2':
Issue 11889: Clarify docs for enumerate.
http://hg.python.org/cpython/rev/d0df12b32522

New changeset 9b827e3998f6 by Raymond Hettinger in branch 'default':
Issue 11889: Clarify docs for enumerate.
http://hg.python.org/cpython/rev/9b827e3998f6
History
Date User Action Args
2011-06-25 13:02:18rhettingersetstatus: open -> closed
resolution: fixed
2011-06-25 13:01:22python-devsetmessages: + msg139054
2011-06-25 12:57:13python-devsetnosy: + python-dev
messages: + msg139051
2011-05-25 04:46:57rhettingersetpriority: normal -> low
2011-05-25 03:22:38phammersetmessages: + msg136824
2011-04-23 16:54:45rhettingersetmessages: + msg134311
2011-04-23 14:42:00eric.araujosetnosy: + eric.araujo
messages: + msg134302
2011-04-23 01:31:46terry.reedysetnosy: + terry.reedy

messages: + msg134290
versions: + Python 3.2, Python 3.3
2011-04-20 19:27:27rhettingersetassignee: rhettinger

components: + Documentation, - None
nosy: + rhettinger
2011-04-20 17:52:26r.david.murraysetnosy: + r.david.murray
messages: + msg134169
2011-04-20 16:08:55phammercreate