classification
Title: Incorrect nested list comprehension documentation
Type: enhancement Stage: resolved
Components: Documentation Versions: Python 3.2, Python 3.3, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: ezio.melotti Nosy List: docs@python, eric.araujo, ezio.melotti, mark.dickinson, mattlong, python-dev, terry.reedy
Priority: normal Keywords: patch

Created on 2011-12-07 21:49 by mattlong, last changed 2011-12-13 13:43 by ezio.melotti. This issue is now closed.

Files
File name Uploaded Description Edit
issue13549.diff ezio.melotti, 2011-12-08 15:04 Doc patch against 2.7
issue13549-2.diff ezio.melotti, 2011-12-08 22:42 New doc patch against 2.7
issue13549-3.diff ezio.melotti, 2011-12-10 17:18 New doc patch against 2.7
issue13549-3-py32.diff ezio.melotti, 2011-12-10 17:29 New doc patch against 3.2
Messages (13)
msg148994 - (view) Author: Matt Long (mattlong) * Date: 2011-12-07 21:49
The description of nesting list comprehensions in section 5.1.5 of the main Python tutorial (http://docs.python.org/tutorial/datastructures.html#nested-list-comprehensions) is misleading at best and arguably incorrect. Where it says "To avoid apprehension when nesting list comprehensions, read from right to left." This is incorrect and conflicts directly with the comment at the bottom of PEP 202 (http://www.python.org/dev/peps/pep-0202/), which says the last index varies fastest.

Take the following example:

matrix = [[1,2],[3,4],[5,6]]
my_list = []
for row in matrix:
  for number in row
    my_list.append(number)

The current documentation would suggest I do the following to achieve the same result with list comprehensions since the for statements should be read from right to left:

matrix = [[1,2],[3,4],[5,6]]
my_list = [number for number in row for row in matrix]

Running this code will result in an error. The correct form is:

matrix = [[1,2],[3,4],[5,6]]
[number for row in matrix for number in row]

Clearly the nesting order only matters when the inner loop depends on the outer loop as in my example above. There is no such dependency in the documentation's example, which is why it is does not result in an Error.
msg149031 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2011-12-08 13:16
Isn't the documentation that you refer to about *nested* list comprehensions, rather than list comprehensions with multiple 'for' clauses?

E.g.,:

    [number for row in matrix for number in row]

is not a nested list comprehension:  it's merely a list comprehension with two 'for' clauses.  But:

    [[number for number in row] for row in matrix]

*is* a nested list comprehension (a list comprehension for which the initial expression is itself a list comprehension), and there the advice to read from right to left seems to make sense to me.
msg149032 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2011-12-08 13:17
Mind you, I'm not at all sure about that use of 'apprehension'...
msg149036 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2011-12-08 14:11
This section could be clearer imho.
First of all there's nothing special about "nested" list comprehensions.  In [expr for elem in seq], expr might be any expression -- including a listcomp.  As with any other expression, the nested listcomp will be executed for each elem in seq, so there's really nothing new to explain here.
The "reading order" is always from left to right, with the expression at the end:
    [expr for x in seq1 for y in seq2 for z in seq3]
     ^^^^ ^^^^^^^^^^^^^ ^^^^^^^^^^^^^ ^^^^^^^^^^^^^
     last     first        second         third...

So I agree that "To avoid apprehension when nesting list comprehensions, read from right to left." is a bit confusing/wrongish (is actually correct for listcomp with only a single for, but that's just a coincidence and it's not related to nested listcomps).

The previous section could also be a little clearer, showing the classic example:
    squares = [x**2 for x in range(10)]
and saying that it's equivalent to
    squares = []
    for x in range(10):
        squares.append(x**2)

and, similarly, that
    combs = [(c1, c2) for c1 in 'abc' for c2 in 'xyz']
is equivalent to
    combs = []
    for c1 in 'abc':
        for c2 in 'xyz':
            combs.append((c1, c2))

Showing the "reading direction" with these two equivalents and saying that nested listcomps follow the same rule (because there's nothing different about them), should be enough to make things clear.

In addition, the example with the matrix shouldn't use print in the "expanded" version, but rather something like:
    res = []
    for i in [0, 1, 2]:
        res.append([row[i] for row in mat])
    print res
msg149039 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011-12-08 16:31
I welcome improvements to this part of the docs.  Nested list comps had me quite confused at first: I had to write and execute them to understand how it worked.  So, the patch looks good to me.  Remarks:

- I’d recommend a few whitespace beautifications, like in ``for x in [1,2,3]`` and ``range(1,6)``.

- You changed “If the expression would evaluate to a tuple, it must be parenthesized” to “If the expression is a tuple (e.g. the ``(x, y)`` +in this example), it must be parenthesized”, I guess because either the concept that an expression evaluates to something is (a) incorrect or (b) not appropriate at this stage of the tutorial.  I think there is an example that makes that line more understandable, but it’s in the section about tuples, not here in the listcomp section; you may or may not want to improve that too.

- +1 for the removal of the half-joking half-recommendation not to use nested list comps (“If you've got the stomach for it, list comprehensions can be nested. They are a powerful tool but -- like all powerful tools -- they need to be used carefully, if at all.”).

- Maybe a link to the itertools module is appropriate (either after the combinations example, or after the link to the built-in zip function).
msg149043 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2011-12-08 17:10
> - I’d recommend a few whitespace beautifications,
> like in ``for x in [1,2,3]`` and ``range(1,6)``.
Leaving the space out in [1,2,3] makes the expression a bit more readable IMHO*.


>  I think there is an example that makes that line more
> understandable, but it’s in the section about tuples, not here in the 
> listcomp section; you may or may not want to improve that too.
What needs to be parenthesized are tuple literals, and I think people at this point of the tutorial still think that () are required to make a tuple, so they would probably put them anyway.  IOW the example that uses them, the note, and the example that fails are probably enough already.

> - Maybe a link to the itertools module is appropriate (either after 
> the combinations example, or after the link to the built-in zip 
> function).
I was going to add it, but then realized that itertools' functions don't return lists, so they are a bit unrelated (and maybe it's too early to introduce itertools at this point of the tutorial).


* the idea is to separate better different elements, even if the single elements are a bit less readable:
2 * 3          is ok (you are multiplying two numbers)
2*3 + 4*5      is ok (you are adding two products)
2 * 3 + 4 * 5  is less readable (you have to parse the operations)
2 * 3+4 * 5    is misleading (it looks like the product of 3 numbers)
Similarly:
[(1,2,3), (3,4,5), (5,6,7)]  is ok (you have a list with 3 tuples)
[(1, 2, 3), (3, 4, 5), (5, 6, 7)]  is less readable (the space are used to separate both the tuples and the elements in there, and you have to parse the () to see where the tuples start and end).
YMMV though.
msg149056 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2011-12-08 21:43
Comments:

1. The first sentence is a bit too opinionated for my taste. Consider:

map(f, seq)
[f(x) for x in seq]

The map is *more* concise, by 8 chars, than the list comp and, in *my* opinion, clearer and easier to read without the extra boilerplate and extraneous dummy variable. I would write the first sentence more like:

"List comprehensions provide a concise way to create lists without having to use either a blank list followed by list.append or combinations of map, filter, and lambda expressions."

2. The added examples are nice and should help.

3. "In a list comprehension might contain arbitrary expressions, including other listcomps." is not a sentence. I think this in meant to say: "The initial expression in a list comprehension can be any arbitrary expression, including another list comprehension."

4. I completely agree with removing the scary hobgoblin stuff. In fact, I think the platitudinous next sentence "Nested list comprehensions are a powerful tool but -- like all powerful tools -- they need to be used carefully, if at all." could also go. It does not add anything not better said in what follows. Python is a powerful tool, but -- like all powerful tools -- it needs to by used carefully. Yeah, right ;-).

I believe the later sentence "To avoid apprehension when nesting list comprehensions, read from right to left." means to say "To avoid mis-apprehandion [or misunderstanding] when nesting list comprehension, read from outside in." But even this only indirectly points to what I think is the real danger -- which is to read the nested l.c. as unnested, and hence to get the order of the clauses wrong. In other words, reading
   [[f(x,y) for x in xseq] for y in yseq]
as
   [ f(x,y) for x in xseq  for y in yseq]
I initally made that mistake when reading the nested example. So I think the appropriate warning sentence should be something like:
"The main danger to avoid is reading nested comprehensions as if they were one comprehension with multiple for clauses." This could go after the revised initial sentence in place of the useless platitude.

4. The example is more confusing than need be because of using a symmetric matrix. The literal [0,1,2] is both range(len(mat)) and range(len(mat[0])) but it is only the latter that is relevant. So I strongly urge using an asymmetric matrix (3x4) and suggest using range(4) instead of [0,1,2,3]

So, change "Consider the following example of a 3x3 matrix held as a list containing three lists, one list per row::" to
"Suppose one has a 3x4 matrix implemented as a list of 3 lists of length 4::"

Change mat display accordingly, using 1 to 12.

Change "Now, if you wanted to swap rows and columns, you could use this list comprehension::" to
"The following list comprehension will transpose rows and columns."
>>> [[row[i] for row in mat] for i in range(4)]
[[1, 5, 9], [2, 6, 10], [3, 7, 11], [4, 8, 12]]

I think the more verbose version (slightly modified to match the below) should be followed by an even more (completely) verbose version:

trans = []
for i in range(4):
    transrow = []
    for row in mat:
        transrow.append(row[i])
    trans.append(transrow)
print(trans)

5. "In real world," should be "In the real world,".
msg149066 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2011-12-09 00:54
v.2 looks great to me.
msg149095 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2011-12-09 13:44
In the first patch I included this example:
+   >>> swapped = []
+   >>> for i in [0, 1, 2]:
+   ...     swapped.append([row[i] for row in mat])
+   ...
+   >>> print swapped
+   [[1, 4, 7], [2, 5, 8], [3, 6, 9]]

Because I applied the following transformation without looking at what expr was (another listcomp in this case):
    res = [expr for elem in seq]
==>
   res = []
   for elem in seq:
       res.append(expr)

If the reader does the same he/she wouldn't have any problem figuring out what is the order of the `for`s, but the second version of the patch only includes a "fully expanded" example:
+   >>> transposed = []
+   >>> for i in range(4):
+   ...     # the following 3 lines implement the nested listcomp
+   ...     transposed_row = []
+   ...     for row in matrix:
+   ...         transposed_row.append(row[i])
+   ...     transposed.append(transposed_row)
+   ...
+   >>> transposed

Here it's easier to confuse the two `for` (not because they are similar, but because there are two of them, whereas in the previous example there's only one).

I added a comment to clarify that the inner loop is the listcomp, but maybe it would be better to show the first example too, followed by the "fully expanded" one, i.e.:

+   [[row[i] for row in mat] for i in range(4)]

+   >>> transposed = []
+   >>> for i in range(4):
+   ...     transposed.append([row[i] for row in mat])
+   ...
+   >>> transposed

+   >>> transposed = []
+   >>> for i in range(4):
+   ...     # the following 3 lines implement the nested listcomp
+   ...     transposed_row = []
+   ...     for row in matrix:
+   ...         transposed_row.append(row[i])
+   ...     transposed.append(transposed_row)
+   ...
+   >>> transposed

The step in the middle shows that in order to get to the "fully expanded" form, it's enough to apply the "usual" listcomp->for+append transformation twice, without worrying about left-to-right vs right-to-left.  Do you think it's worth adding this?
msg149141 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2011-12-10 03:47
Showing both was the intent of my comment. Since I am about 60:40 on that, I was and am  willing for you, having grabbed and pushed the issue, to drop the half-expanded version if you thought it better. With or without, we have improved this section. But I still think showing the intermediate step adds sufficient extra clarity to be worth the 5 or 6 extra lines. While nested list comps are not scary, they can, as I discovered, be confusing with a rushed reading. Anyone who wants to compress nested append loops should also do it in two steps.
msg149173 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2011-12-10 17:18
Having both is fine.

I just noticed that the 3.2 docs are different[0], so I tried to merge them:

3.2 says:
"""
List comprehensions provide a concise way to create lists from sequences. Common applications are to make lists where each element is the result of some operations applied to each member of the sequence, or to create a subsequence of those elements that satisfy a certain condition.
"""
I kept this, changing a few minor things.

In 3.2 the examples are divided and each one has a short description.  I reworked them a bit to make them more clear, and added the description inline as a comment.

The section about nested listcomps is the same, so I didn't change it (but it's now a subsection of the list comprehension section).

This is hopefully the final version.

[0]: http://docs.python.org/py3k/tutorial/datastructures.html#list-comprehensions
msg149386 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011-12-13 13:38
New changeset 132158b287d7 by Ezio Melotti in branch '2.7':
#13549: improve tutorial section about listcomps.
http://hg.python.org/cpython/rev/132158b287d7

New changeset ad5c70296c7b by Ezio Melotti in branch '3.2':
#13549: improve tutorial section about listcomps.
http://hg.python.org/cpython/rev/ad5c70296c7b

New changeset 37094a1454ab by Ezio Melotti in branch 'default':
#13549: merge with 3.2.
http://hg.python.org/cpython/rev/37094a1454ab
msg149387 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2011-12-13 13:43
Fixed.
On a side note, using:
>>> [x, x**2 for x in range(6)]
  File "<stdin>", line 1
    [x, x**2 for x in range(6)]
               ^
SyntaxError: invalid syntax

In the 3.x docs seems to break the hightlight.
With 'File "<stdin>", line 1, in ?' the highlight works fine, so that's what I used.
History
Date User Action Args
2011-12-13 13:43:22ezio.melottisetstatus: open -> closed
type: behavior -> enhancement
messages: + msg149387

resolution: fixed
stage: commit review -> resolved
2011-12-13 13:38:30python-devsetnosy: + python-dev
messages: + msg149386
2011-12-10 17:29:12ezio.melottisetfiles: + issue13549-3-py32.diff
2011-12-10 17:18:37ezio.melottisetfiles: + issue13549-3.diff

messages: + msg149173
2011-12-10 03:47:42terry.reedysetmessages: + msg149141
2011-12-09 13:44:40ezio.melottisetmessages: + msg149095
2011-12-09 00:54:45terry.reedysetmessages: + msg149066
2011-12-08 22:42:46ezio.melottisetfiles: + issue13549-2.diff
2011-12-08 21:43:51terry.reedysetmessages: + msg149056
2011-12-08 17:10:19ezio.melottisetmessages: + msg149043
2011-12-08 16:31:53eric.araujosetmessages: + msg149039
2011-12-08 15:04:11ezio.melottisetfiles: + issue13549.diff

nosy: + terry.reedy, eric.araujo
assignee: docs@python -> ezio.melotti
keywords: + patch
stage: needs patch -> commit review
2011-12-08 14:11:27ezio.melottisetversions: + Python 3.2, Python 3.3, - Python 2.6
nosy: + ezio.melotti

messages: + msg149036

stage: needs patch
2011-12-08 13:17:32mark.dickinsonsetmessages: + msg149032
2011-12-08 13:16:52mark.dickinsonsetnosy: + mark.dickinson
messages: + msg149031
2011-12-07 21:49:53mattlongcreate