classification
Title: interning and list comprehension leads to unexpected behavior
Type: enhancement Stage: resolved
Components: Documentation Versions: Python 3.6, Python 3.4, Python 3.5, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: docs@python Nosy List: Abraham.Smith, docs@python, georg.brandl, martin.panter, matheus.v.portela, python-dev, r.david.murray, steven.daprano
Priority: normal Keywords: easy, patch

Created on 2015-02-07 12:55 by Abraham.Smith, last changed 2015-09-07 04:18 by python-dev. This issue is now closed.

Files
File name Uploaded Description Edit
issue23406_doc_stdtypes.patch matheus.v.portela, 2015-09-02 19:00 review
issue23406_doc_stdtypes_and_faq.patch matheus.v.portela, 2015-09-03 12:22 review
issue23406_doc_stdtypes.v3.patch martin.panter, 2015-09-05 05:07 review
Messages (22)
msg235520 - (view) Author: Abraham Smith (Abraham.Smith) Date: 2015-02-07 12:55
Some students were working on matrix routines for practice.

The following code:
>>> L = [ [0]*3 ]*3
>>> for i in range(3):
...    for j in range(3):
...        if i==j: L[i][j]=1

was expected to return
[[1,0,0],[0,1,0],[0,0,1]]
but it returned
[[1,1,1],[1,1,1],[1,1,1]]
because the list [0]*3 was being interned silently, so all three rows were the same memory!

To see this, I did
>>> map(id, L)
    [139634871681464, 139634871681464, 139634871681464]

On the other hand
>>> M=[ [ 0 for i in range(3) ] for j in range(3) ]
does not intern:
>>> map(id, L)
    [139634871631672, 139634871681608, 139634871681680]

so the above loop works as expected.
This is true in both python 2.7 and 3.4.

This is very confusing to users!

If this intern behavior with [0]*3 is intended, it should be documented more clearly, because this is something that new students of python might encounter right away when playing with the language's list methods.  I didn't see any reference to interning in the discussion of lists in the standard library reference. 

Moreover, I also could not find any reference to the automatic interning of mutable objects, such as lists.  Personally, I cannot see any reason to silently and automatically intern a mutable object; however, if this behavior is really desired, it should be documented.
msg235521 - (view) Author: Abraham Smith (Abraham.Smith) Date: 2015-02-07 12:59
(Obviously, there's a copy/paste mistake in the second case; it should read  map(id, M).)
msg235523 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2015-02-07 13:10
There is no interning going on.  Multiplying lists just copies references.

This is not so surprising if you consider that the case may be simple for nested lists, but what about ``[a] * 3`` with some arbitrary object "a"?  Copying (or even deep copying) that object is usually not wanted, and impossible in general.

This is also documented here (see especially note 2 below the table):
https://docs.python.org/2/library/stdtypes.html#sequence-types-str-unicode-list-tuple-bytearray-buffer-xrange

You're right though that this might be good to mention in the tutorial, as it comes up every now and then.  I'll leave the issue open to discuss that.
msg235524 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2015-02-07 13:20
This is already a FAQ:

https://docs.python.org/2/faq/programming.html#how-do-i-create-a-multidimensional-list

I guess this bites every beginning Python programmer, but it's a natural, and desirable, consequence of Python's object model and the fact that * does not copy the list items.
msg235526 - (view) Author: Abraham Smith (Abraham.Smith) Date: 2015-02-07 13:37
Thanks for the helpful responses and correcting my misunderstanding.

Regarding improved documentation, I see now that the table at https://docs.python.org/2/library/stdtypes.html#id2 indeed says "shallow copies"; however, the footnote seems to bury the lede.   Perhaps the footnote should be expanded to either link to the FAQ entry or provide an abbreviated version of it.

The FAQ entry is actually very good, but I would guess that most readers (like me) skip the FAQs and jump straight to the library reference.    Internet users have been trained for 20 years to believe that FAQs are full of useless, snarky answers to questions at a much shallower level, like "what do I do with a .tar.gz file?".  The fact that Python's FAQ is extremely well-written and helpful is a pleasant surprise, but a surprise none-the-less.
msg249436 - (view) Author: Matheus Vieira Portela (matheus.v.portela) * Date: 2015-08-31 22:36
Does anyone else think the note should be expanded? For me, it seems to be pretty accurate although it may indeed be confusing to beginners.

If anything, I can work on rewriting it to be more explanatory.
msg249444 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-09-01 01:49
Interesting.  I usually start with a project's FAQ because I find they usually give me an overview of the project and an indication of its quality :)

The footnote looks very explanatory to me already (complete with examples).  The 'confusing to beginners' text could be made a link to the FAQ, though.
msg249445 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2015-09-01 02:14
Which note are you referring to? There are at least two mentioned in 
this thread, the FAQ and a footnote in the docs for stdtypes.

If you're referring to the table of operations just below these:

https://docs.python.org/2/library/stdtypes.html#sequence-types-str-unicode-list-tuple-bytearray-buffer-xrange

https://docs.python.org/3/library/stdtypes.html#common-sequence-operations

where the docs say:

	s * n, n * s	n shallow copies of s concatenated

I think that could be worded better. It is too easy to misread it 
as saying that the items of s are copied (as I just did now, despite 
knowing that they aren't). I would word it:

	repeat s n times and concatenate

which matches the common name of * as the sequence repetition operator, 
and avoids using the word prone to misinterpretation, "copy".

Given how error-prone sequence repetition is, I'd add an example 
directly in the table:

for example, [x]*3 returns [x, x, x] (note that x is not copied).
msg249449 - (view) Author: Matheus Vieira Portela (matheus.v.portela) * Date: 2015-09-01 03:08
I was referring to the table of operations.

So, what if I replace "n shallow copies of s concatenated" by "repeat s n times and concatenate (to create a multidimensional list, refer to [link to FAQ])"?
msg249485 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-09-01 15:54
I agree that the table entry could be made more precise.  I would suggest replacing the table entry with "equivalent to adding s to itself n times".  This formulation serves to explain *why* the multiply operation works the way it does:

>>> a = [1, []]
>>> b = a * 4
>>> c = a + a + a + a
>>> b
[1, [], 1, [], 1, [], 1, []]
>>> c
[1, [], 1, [], 1, [], 1, []]
>>> a.append(2)
>>> a
[1, [], 2]
>>> b
[1, [], 1, [], 1, [], 1, []]
>>> c
[1, [], 1, [], 1, [], 1, []]
>>> a[1].append(3)
>>> a
[1, [3], 2]
>>> b
[1, [3], 1, [3], 1, [3], 1, [3]]
>>> c
[1, [3], 1, [3], 1, [3], 1, [3]]

I don't think it is appropriate to put an example in the table; IMO that belongs in the footnote where it currently is.  You could hyperlink the table entry to the FAQ entry, though.
msg249561 - (view) Author: Matheus Vieira Portela (matheus.v.portela) * Date: 2015-09-02 19:00
I'm attaching a patch to update the stdtypes.rst documentation according to our discussion. I've replaced the table explanation to "equivalent to adding s to itself n times" and added a link to the FAQ entry.

I'm not sure, however, where to put the FAQ link. Should it be directly in the table? In this patch, I've put it after the last line of note #2.
msg249575 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-09-02 21:42
Left some review comments. The positioning of the link to the FAQ entry seems sensible to me, just that the markup could be better :)
msg249634 - (view) Author: Matheus Vieira Portela (matheus.v.portela) * Date: 2015-09-03 12:22
Applying review comments. Now, there is an internal link to the FAQ entry on multidimensional lists.
msg249655 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-09-03 15:14
Looks good to me.
msg249755 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-09-04 11:48
I was about to commit this, but I think I was wrong about one of my comments. The list that is multiplied is indeed shallow-copied. E.g. “[x] * 1” copies the list, but the reference to x is the same. I propose to commit the second patch, except to revert one of the sentences to the first patch, so it reads:

Note also that the copies are shallow; nested structures are not copied but referenced.
msg249765 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-09-04 14:23
What is actually happening is that the *contents* of the list are copied, but the list itself is not.  This is a consequence of the definition in terms of +.  So, yes, that is a shallow copy, but not quite in the sense that mylist.copy() is a shallow copy, since the references to the contents of s get appended to the list being constructed by *, not a new list that is a "copy" of s.

You are correct that "s is only referenced" is not really accurate.  But how about "Note that the contents of the *s* object are not copied, they are referenced multiple times".  I think that highlights the source of the confusion: that the *contents* are not copied.
msg249767 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-09-04 14:24
Sorry, I meant "references to the content are copied" in my first sentence there.  This just goes to show why this is complicated to explain :)
msg249799 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-09-04 18:13
Or better, "items in the list *s* are not copied..."
msg249894 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-09-05 05:07
That works for me. Here is a new patch using David’s new wording.
msg250016 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-09-06 20:24
Looks good to me.
msg250034 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2015-09-07 02:18
New changeset 6c222848badd by Martin Panter <vadmium> in branch '2.7':
Issue #23406: Clarify documentation on multiplying a sequence
https://hg.python.org/cpython/rev/6c222848badd

New changeset 57f8c7ad7782 by Martin Panter <vadmium> in branch '3.4':
Issue #23406: Clarify documentation on multiplying a sequence
https://hg.python.org/cpython/rev/57f8c7ad7782

New changeset f624b7fd3b83 by Martin Panter <vadmium> in branch '3.5':
Issue #23406: Merge 3.4 into 3.5
https://hg.python.org/cpython/rev/f624b7fd3b83

New changeset d2b3c7c5ef02 by Martin Panter <vadmium> in branch 'default':
Issue #23406: Merge 3.5 into 3.6
https://hg.python.org/cpython/rev/d2b3c7c5ef02
msg250052 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2015-09-07 04:18
New changeset 2640a4630e9e by Martin Panter <vadmium> in branch '3.5':
Issue #23406: Remove specific line number from susp-ignored.csv
https://hg.python.org/cpython/rev/2640a4630e9e

New changeset 48f1e9a47301 by Martin Panter <vadmium> in branch 'default':
Issue #23406: Merge 3.5 into 3.6
https://hg.python.org/cpython/rev/48f1e9a47301
History
Date User Action Args
2015-09-07 04:18:00python-devsetmessages: + msg250052
2015-09-07 02:21:17martin.pantersetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2015-09-07 02:18:13python-devsetnosy: + python-dev
messages: + msg250034
2015-09-06 20:24:17r.david.murraysetmessages: + msg250016
2015-09-05 05:07:53martin.pantersetfiles: + issue23406_doc_stdtypes.v3.patch

messages: + msg249894
2015-09-04 18:13:15r.david.murraysetmessages: + msg249799
2015-09-04 14:24:50r.david.murraysetmessages: + msg249767
2015-09-04 14:23:13r.david.murraysetmessages: + msg249765
2015-09-04 11:48:46martin.pantersetmessages: + msg249755
2015-09-03 15:14:34r.david.murraysetmessages: + msg249655
2015-09-03 12:22:24matheus.v.portelasetfiles: + issue23406_doc_stdtypes_and_faq.patch

messages: + msg249634
2015-09-02 21:42:23martin.pantersetversions: + Python 3.6
nosy: + martin.panter

messages: + msg249575

stage: needs patch -> patch review
2015-09-02 19:00:03matheus.v.portelasetfiles: + issue23406_doc_stdtypes.patch
keywords: + patch
messages: + msg249561
2015-09-01 15:54:15r.david.murraysetmessages: + msg249485
2015-09-01 03:08:33matheus.v.portelasetmessages: + msg249449
2015-09-01 02:14:49steven.dapranosetmessages: + msg249445
2015-09-01 01:49:14r.david.murraysetnosy: + r.david.murray
messages: + msg249444
2015-08-31 22:36:06matheus.v.portelasetnosy: + matheus.v.portela
messages: + msg249436
2015-02-14 17:55:16berker.peksagsetkeywords: + easy
stage: needs patch
type: enhancement
versions: + Python 3.5
2015-02-07 13:37:06Abraham.Smithsetresolution: not a bug -> (no value)
messages: + msg235526
2015-02-07 13:20:24steven.dapranosetresolution: not a bug

messages: + msg235524
nosy: + steven.daprano
2015-02-07 13:10:34georg.brandlsetnosy: + georg.brandl
messages: + msg235523
2015-02-07 12:59:11Abraham.Smithsetmessages: + msg235521
2015-02-07 12:55:26Abraham.Smithcreate