classification
Title: Tutorial recommends pickle module without any warning of insecurity
Type: enhancement Stage: resolved
Components: Documentation Versions: Python 3.4, Python 3.3, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: docs@python Nosy List: ChrisCooper, docs@python, dstufft, ezio.melotti, gregory.p.smith, ncoghlan, pitrou, python-dev, terry.reedy, westley.martinez
Priority: normal Keywords: patch

Created on 2013-08-26 12:37 by dstufft, last changed 2013-12-05 22:52 by pitrou. This issue is now closed.

Files
File name Uploaded Description Edit
pickle-add-warning_18840.diff westley.martinez, 2013-08-31 00:01 review
issue18840 ChrisCooper, 2013-11-06 00:44 json version with pickle and warning review
tutjson.patch pitrou, 2013-12-05 17:45
Messages (24)
msg196203 - (view) Author: Donald Stufft (dstufft) * (Python committer) Date: 2013-08-26 12:37
The Python tutorial tells, and even recommends, new users that they can use the pickle module to serialize arbitrary objects. However it does not provide any warning about the insecurity of unpickling arbtirary data. The text even goes so far as to mention sending pickled data over a network connection to other machines.

I believe this section should be replaced with using the json module instead of pickle. It is more standard and doesn't present the same security concerns with untrusted data as pickle does. However if it continues to recommend pickle to new users it should at least warn them of the dangers of using pickle.

The section in question is located at http://docs.python.org/3/tutorial/inputoutput.html#the-pickle-module
msg196204 - (view) Author: Fred L. Drake, Jr. (fdrake) (Python committer) Date: 2013-08-26 12:51
Advising the reader to be aware of the security warnings in the API documentation seems sufficient.

JSON isn't intended to support arbitrary data, and that's what this section is discussing.  Another section about data interchange with other applications (regardless of language), may be a reasonable addition, or a good candidate for a separate How-To document that can be referenced.
msg196205 - (view) Author: Donald Stufft (dstufft) * (Python committer) Date: 2013-08-26 12:57
The section to me just seems to be about how to handle more than just strings, it mentions numbers, lists, dictionaries, and class instances. Of those it mentions, only the class instances are not able to handled out of the box by JSON.

However like I said even if it remains pickle this particular area of the documentation should still warn users even though there's already a warning in the API documentation for pickle. As it is if a new user reads this and doesn't click through to the API documentation they've received recommendation from the Python documentation that they can send pickle strings over the network. This is dangerous behavior and the documentation shouldn't be advising new users to do dangerous things by default.
msg196206 - (view) Author: Donald Stufft (dstufft) * (Python committer) Date: 2013-08-26 12:59
Further more the tutorial claims it's the standard way of persisting data which in my experience it's far from that due to the security concerns. I've seen very little actual use of pickle in the wild (and when it was used it was often used by people who didn't understand the security implications).
msg196209 - (view) Author: Fred L. Drake, Jr. (fdrake) (Python committer) Date: 2013-08-26 15:02
When I read "... that can take almost any Python object  ...", I don't think the recommendation is about just a few types.

The Zope and ZODB communities certainly use pickle extensively, we're aware of the security implications, and we send pickled data over the network all the time.
msg196211 - (view) Author: Donald Stufft (dstufft) * (Python committer) Date: 2013-08-26 15:13
A description of the pickle module itself does not equate to the purpose of the section. Given that this is a tutorial and previous section taught how to read and write from files I would suggest that the purpose of the section was to give them the next step to persisting data which could be pickle or it could be JSON (or it could be another format all together). 

I don't see what Zope/ZODB's awareness of the security implications has to do with anything unless you're trying to state that the developers and users of ZODB are newcomers to Python whose knowledge of pickle stems from what they read in the tutorial. However I am glad that those communities are aware of the implications if they are using that module, but the point is the reader of the tutorial is *not* likely to be aware of them and should *not* be using pickle without being aware of them especially if they are sending that data over the network.

I'm really not sure what your problem is here. What is there to lose by annotating this section of the tutorial with a similar warning as exists in the pickle documentation that they should not unpickle data from an untrusted or unauthenticated sources? I can see how someone could state that the problem with switching to JSON is that it's harder to construct a serialization for arbitrary objects. That's not something I think it all that important to teach someone brand new to Python however I can understand if switching this section to using a "safe" serialization format doesn't sit well with people which is why I suggested at least adding a warning.

So what exactly is your problem with at a minimum adding a warning?
msg196214 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-08-26 15:42
I would be ok with changing that part of the tutorial to use json. Since json is much better known outside of the Python programming circles, and since its output is human-readable, it's probably a better fit for the tutorial. pickle can be mentioned as a more powerful and more dangerous alternative.
msg196217 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-08-26 15:52
By the way, there is one difference between json and pickle in this context: json will output a text serialization, pickle a binary one. If serializing to a binary file, users must do the (utf-8, most likely) encoding themselves.
msg196577 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2013-08-30 21:14
For the tutorial, I agree with presenting json rather than pickle for all the reasons given, with pickle mentioned in a paragraph at the end (more powerful, more dangerous, see warning in manual before using).
msg196595 - (view) Author: Westley Martínez (westley.martinez) * Date: 2013-08-30 23:07
How about we simply add the warning from http://docs.python.org/3/library/pickle.html#module-pickle to the beginning (or end) of the section?  The Official Python Tutorial has always assumed a certain programmer's competence.  It's up to them if they want to use the module or not.
msg196598 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2013-08-30 23:27
> How about we simply add the warning from http://docs.python.org/3/library/pickle.html#module-pickle to the beginning (or end) of the section?

That is one possibility.

> The Official Python Tutorial has always assumed a certain programmer's competence.

Such a person is much more likely to know about and find use for 
language independent json than Python-specific pickle. The tutorial was 
written before json existed, but if it were written today, json would 
seem to me to be the better choice for first exposure to serialization. 
If someone cares enough to write a patch, I think it should be considered.
msg196600 - (view) Author: Westley Martínez (westley.martinez) * Date: 2013-08-30 23:36
I won't question the usefulness of JSON.  I'm not a web programmer and have never used it.  From my interpretation of the tutorial, it seems that the section's purpose is for storing python objects.  If pickle is going to stay in the tutorial, I think a warning is imperative.  Either way, I think a section on JSON would be a welcome addition to the tutorial.

+1 adding a warning
+0 keeping pickle in the tutorial
msg196601 - (view) Author: Westley Martínez (westley.martinez) * Date: 2013-08-31 00:01
Here's a patch that adds the warning, if we so choose to keep pickle in the tutorial.  It's taken verbatim from the pickle module's documentation.
msg196602 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-08-31 00:12
The very fact that we would add a warning makes pickle unfit for the tutorial, IMHO. The tutorial should stick to simple enough stuff that doesn't need any warnings.
msg202250 - (view) Author: Chris Cooper (ChrisCooper) * Date: 2013-11-06 00:44
Here's a patch that focuses on the json module, with a smaller pickle section including the warning from the pickle docs.
msg205219 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2013-12-04 12:40
Since this particular tutorial section was written long before the json module was part of the standard library, I think it makes sense to switch now we have the option.

pickle is definitely a useful tool, but now that JSON is available by default, it's now one to escalate to if JSON doesn't meet your needs, rather than the one to use by default.

Chris's patch looks generally good to me, although I think it could use a second paragraph in the pickle section. Perhaps something like:

"""JSON is much simpler to use safely than pickle, and offers broader interoperability with programs written in other languages. However, pickle has the advantage of supporting a much wider range of Python object types, including executable code, which means it can handle use cases that JSON can't (like sending code to another process for execution)"


This simpler/safer/less powerful API/technique vs more powerful/more dangerous API/technique trade-off is a fairly common one, so I think it's a good thing to have a concrete example of it in the tutorial rather than just dropping the reference to pickle entirely.
msg205234 - (view) Author: Westley Martínez (westley.martinez) * Date: 2013-12-04 19:14
Sounds good to me.
msg205235 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-12-04 19:16
Correction: you can't pickle executable code, you can pickle references to well-known objects (by name):

>>> def f(): pass
... 
>>> pickle.dumps(f)
b'\x80\x03c__main__\nf\nq\x00.'
>>> pickle.dumps(f.__code__)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
_pickle.PicklingError: Can't pickle <class 'code'>: attribute lookup code on builtins failed
msg205314 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-12-05 17:45
Here is a modified patch. The changes are:
- pickle is deemphasized a lot more (it's moved into a "seealso")
- I replaced the terms "encoding" and "decoding" with "serializing" and "deserializing" (the former may be confusing for people who are already struggling to understand the bytes / unicode gap)
- I added a glossary entry for "text file" and pointed to it
msg205323 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2013-12-05 22:17
Antoine's latest draft looks good to me.
msg205328 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2013-12-05 22:41
I like Antoine's tutjson.patch.  commit it.  Thanks for noticing this Donald!
msg205329 - (view) Author: Roundup Robot (python-dev) Date: 2013-12-05 22:48
New changeset 90cf299dcf9b by Antoine Pitrou in branch '3.3':
Issue #18840: Introduce the json module in the tutorial, and deemphasize the pickle module.
http://hg.python.org/cpython/rev/90cf299dcf9b

New changeset 1009b77f59fd by Antoine Pitrou in branch 'default':
Issue #18840: Introduce the json module in the tutorial, and deemphasize the pickle module.
http://hg.python.org/cpython/rev/1009b77f59fd
msg205330 - (view) Author: Roundup Robot (python-dev) Date: 2013-12-05 22:52
New changeset 481b30bfe496 by Antoine Pitrou in branch '2.7':
Issue #18840: Introduce the json module in the tutorial, and deemphasize the pickle module.
http://hg.python.org/cpython/rev/481b30bfe496
msg205332 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-12-05 22:52
Ok, I've committed it. Thanks!
History
Date User Action Args
2013-12-05 22:52:45pitrousetstatus: open -> closed
versions: + Python 2.7
messages: + msg205332

resolution: fixed
stage: patch review -> resolved
2013-12-05 22:52:07python-devsetmessages: + msg205330
2013-12-05 22:48:18python-devsetnosy: + python-dev
messages: + msg205329
2013-12-05 22:44:52pitrousetversions: + Python 3.3, Python 3.4
2013-12-05 22:41:55gregory.p.smithsetnosy: + gregory.p.smith
messages: + msg205328
2013-12-05 22:17:34ncoghlansetmessages: + msg205323
2013-12-05 17:46:00pitrousetfiles: + tutjson.patch

messages: + msg205314
2013-12-04 19:17:00pitrousetmessages: + msg205235
2013-12-04 19:14:09westley.martinezsetmessages: + msg205234
2013-12-04 12:40:38ncoghlansetnosy: + ncoghlan
messages: + msg205219
2013-12-04 08:35:53ezio.melottisetnosy: + ezio.melotti

type: enhancement
stage: needs patch -> patch review
2013-11-06 00:44:59ChrisCoopersetfiles: + issue18840
nosy: + ChrisCooper
messages: + msg202250

2013-08-31 00:12:41pitrousetmessages: + msg196602
2013-08-31 00:01:13westley.martinezsetfiles: + pickle-add-warning_18840.diff
keywords: + patch
messages: + msg196601
2013-08-30 23:36:57westley.martinezsetmessages: + msg196600
2013-08-30 23:27:03terry.reedysetmessages: + msg196598
2013-08-30 23:07:27westley.martinezsetnosy: + westley.martinez
messages: + msg196595
2013-08-30 21:14:54terry.reedysetnosy: + terry.reedy

messages: + msg196577
stage: needs patch
2013-08-26 17:04:15fdrakesetnosy: - fdrake
2013-08-26 15:52:11pitrousetmessages: + msg196217
2013-08-26 15:42:33pitrousetnosy: + pitrou
messages: + msg196214
2013-08-26 15:13:49dstufftsetmessages: + msg196211
2013-08-26 15:02:03fdrakesetmessages: + msg196209
2013-08-26 12:59:38dstufftsetmessages: + msg196206
2013-08-26 12:57:25dstufftsetmessages: + msg196205
2013-08-26 12:51:40fdrakesetnosy: + fdrake
messages: + msg196204
2013-08-26 12:37:58dstufftcreate