Author phammer
Recipients phammer
Date 2011-04-20.16:08:54
SpamBayes Score 0.0
Marked as misclassified No
Message-id <1303315736.72.0.416681647484.issue11889@psf.upfronthosting.co.za>
In-reply-to
Content
"""
A point of confusion using the builtin function 'enumerate' and
enlightenment for those who, like me, have been confused.

Note, this confusion was discussed at length at

  http://bugs.python.org/issue2831

prior to the 'start' parameter being added to 'enumerate'.  The
confusion discussed herein was forseen in that discussion, and
ultimately discounted.  There remains, IMO, an issue with the
clarity of the documentation that needs to be addressed.  That
is, the closed issue at

  http://bugs.python.org/issue8635

concerning the 'enumerate' docstring does not address the confusion
that prompted this posting.

Consider:

x=['a','b','c','d','e']
y=['f','g','h','i','j']
print 0,y[0]
for i,c in enumerate(y,1):
  print i,c
  if c=='g':
    print x[i], 'y[%i]=g' % (i)
    continue
  print x[i]


This code produces the following unexpected output, using python 2.7,
which is apparently the correct behavior (see commentary below).  This
example is an abstract simplification of a program defect encountered
in practice:

>>> 
0 f
1 f
b
2 g
c y[2]=g
3 h
d
4 i
e
5 j

Traceback (most recent call last):
  File "Untitled", line 9
    print x[i]
IndexError: list index out of range


Help on 'enumerate' yields:

>>> help(enumerate)
Help on class enumerate in module __builtin__:

class enumerate(object)
 |  enumerate(iterable[, start]) -> iterator for index, value of iterable
 |  
 |  Return an enumerate object.  iterable must be another object that supports
 |  iteration.  The enumerate object yields pairs containing a count (from
 |  start, which defaults to zero) and a value yielded by the iterable argument.
 |  enumerate is useful for obtaining an indexed list:
 |      (0, seq[0]), (1, seq[1]), (2, seq[2]), ...
 |  
 |  Methods defined here:
 |  
 |  __getattribute__(...)
 |      x.__getattribute__('name') <==> x.name
 |  
 |  __iter__(...)
 |      x.__iter__() <==> iter(x)
 |  
 |  next(...)
 |      x.next() -> the next value, or raise StopIteration
 |  
 |  ----------------------------------------------------------------------
 |  Data and other attributes defined here:
 |  
 |  __new__ = <built-in method __new__ of type object>
 |      T.__new__(S, ...) -> a new object with type S, a subtype of T

>>> 

Commentary:

The expected output was:
>>>
0 f
1 g
b y[2]=g
2 h
c
3 i
d
4 j
e
>>>

That is, it was expected that the iterator would yield a value
corresponding to the index, whether the index started at zero or not.
Using the notation of the doc string, with start=1, the expected
behavior was:

 |      (1, seq[1]), (2, seq[2]), (3, seq[3]), ...

while the actual behavior is:

 |      (1, seq[0]), (2, seq[1]), (3, seq[2]), ...

The practical problem in the real world code was to do something
special with the zero index value of x and y, then run through the
remaining values, doing one of two things with x and y, correlated,
depending on the value of y.

I can see now that the doc string does in fact correctly specify the
actual behavior: nowhere does it say the iterator will begin at any
other place than the beginning, so this is not a python bug.  I do
however question the general usefulness of such behavior.  Normally,
indices and values are expected to be correlated.

The correct behavior can be simply implemented without using
'enumerate':

x=['a','b','c','d','e']
y=['f','g','h','i','j']
print 0,y[0]
for i in xrange(1,len(y)):
  c=y[i]
  print i,c
  if c=='g':
    print x[i], 'y[%i]=g' % (i)
    continue
  print x[i]

This produces the expected results.

If one insists on using enumerate to produce the correct behavior in
this example, it can be done as follows:
"""
x=['a','b','c','d','e']
y=['f','g','h','i','j']
seq=enumerate(y)
print '%s %s' % seq.next()
for i,c in seq:
  print i,c
  if c=='g':
    print x[i], 'y[%i]=g' % (i)
    continue
  print x[i]
"""
This version produces the expected results, while achieving clarity
comparable to that which was sought in the original incorrect code.

Looking a little deeper, the python documentation on enumerate states:

enumerate(sequence[, start=0])
Return an enumerate object. sequence must be a sequence, an iterator,
or some other object which supports iteration. The next() method of the
iterator returned by enumerate() returns a tuple containing a count
(from start which defaults to 0) and the corresponding value obtained
from iterating over iterable. enumerate() is useful for obtaining an
indexed series:
  (0, seq[0]), (1, seq[1]), (2, seq[2]),


This makes a pretty clear implication the value corresponds to the
index, so perhaps there really is an issue here.  Have at it.  I'm
going back to work, using 'enumerate' as it actually is, now that I
clearly understand it.

One thing is certain: the documentation has to be clarified, for the
confusion foreseen prior to adding the start parameter is very real.
"""
History
Date User Action Args
2011-04-20 16:08:57phammersetrecipients: + phammer
2011-04-20 16:08:56phammersetmessageid: <1303315736.72.0.416681647484.issue11889@psf.upfronthosting.co.za>
2011-04-20 16:08:55phammerlinkissue11889 messages
2011-04-20 16:08:54phammercreate