This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Security review of pickle/marshal docs
Type: Stage:
Components: Extension Modules Versions:
process
Status: closed Resolution: accepted
Dependencies: Superseder:
Assigned To: akuchling Nosy List: akuchling, barry, dcjim, jhylton, nobody, phr, tim.peters
Priority: high Keywords:

Created on 2001-10-16 22:42 by tim.peters, last changed 2022-04-10 16:04 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
pickle-docs.patch akuchling, 2003-02-05 20:40 Patch to pickle and marshal docs.
pickle-docs.patch akuchling, 2003-02-26 19:22 Updated version of patch
Messages (31)
msg6959 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2001-10-16 22:42
Paul Rubin points out that the security implications 
of using marshal and/or pickle aren't clear from the 
docs.  Assigning to Jeremy as he's more sensitive to 
such issues than I am; maybe Barry would like to get 
paranoid too <wink>.

A specific example:  the pickle docs say that pickle 
doesn't support code objects, and "at least this 
avoids the possibility of smuggling Trojan horses into 
a program".  However,

1) The marshal docs don't mention this vulnerability 
at all.

while

2) The pickle docs don't spell out possible dangers 
due to things pickle does that marshal doesn't (like 
importing modules, and running class constructors).
msg6960 - (view) Author: paul rubin (phr) Date: 2001-10-16 23:06
Logged In: YES 
user_id=72053

Certainly anyone unserializing potentially malicious data
with pickle, marshal, or anything else, should check the
results before doing anything dangerous with them (like
executing code).  However, unpickle can potentially do
damage before it even returns, by creating loading modules
and creating initialized class instances.  So pickle.loads
should never be used on untrusted strings, except possibly
in a rexec wrapper (as proposed by Tim).  Another
possibility (also by Tim) is to override the load_inst
method of the Pickler class, though I don't think you can
do that for cPickle.

A sample exploit for unpickle can be found at
<http://www.nightsong.com/phr/python/pickletest.py>.
Unpickling the test string runs penguin.__init__ contrary
to the doc's saying no initialization unless there's a
__getinitargs__ method in the class def.

The "exploding penguin" class is artificial, but
applications are vulnerable if there's an unsafe
constructor anywhere in any class of the application or in
the python library (example: the NNTP constructor opens an
IP connection to an arbitrary address, so a malicious
imported string can send a message through your firewall
when you import it).
msg6961 - (view) Author: Nobody/Anonymous (nobody) Date: 2001-11-07 20:08
Logged In: NO 

Irmen de Jong points out that the standard cookie module
uses pickling for serial and smart cookies.  The 2.1.1
cookie module docs explicitly say not to use those
classes because of the security hole--that they're provided
for backward compatibility only (but with what?!  Any
server that uses those classes on open ports needs to be
changed right away).

Irmen's library, http://pyro.sourceforge.net, also uses
unpickle insecurely (he's aware of this now and is figuring
out a fix).

IMO this is really a code bug rather than a documentation
bug, and should be fixed in the code rather than the docs.
Documenting the bug rather than fixing it leaves a
deficiency in the Python library: obvious uses of pickling,
like Pyro and the cookie module, can't be implemented
using cPickle and have to resort to a slower Python
deserializer, or use marshal and have possible compatibility
problems between versions (and not be able to serialize
class instances).

Paul

msg6962 - (view) Author: Jeremy Hylton (jhylton) (Python triager) Date: 2001-11-07 22:02
Logged In: YES 
user_id=31392

What's the code bug?  Your last message has a lot of gloom
and doom <wink>, but I'm not sure what specific problem
you'd like to see fixed.  Are you saying that something
needs to be fixed in cPickle and not in pickle?
msg6963 - (view) Author: Nobody/Anonymous (nobody) Date: 2001-11-07 23:54
Logged In: NO 

IMO it's a code bug that you can't unpickle strings from
untrusted sources.  Pyro and the cookie module are examples
of programs that got bitten by this bug.  Whether it's
really a bug is a matter of opinion--I had a big email
exchange with Guido and Tim about it, and they felt it
was enough to fix the pickle documentation.

Pickle has the same problem as cPickle, but with pickle
you can subclass the pickler and override the method that
unpickles class objects, and work around the (IMO) bug.
The workaround doesn't help cPickle since cPickle can't
be subclassed.  See bug #467384 for some related discussion.

Paul
msg6964 - (view) Author: Jeremy Hylton (jhylton) (Python triager) Date: 2001-11-08 17:37
Logged In: YES 
user_id=31392

I don't think of the issue you describe as a bug in the
code.  You're suggesting a new feature for pickle.  As far
as I can tell, the original design requirements for pickle
did not include the ability to securely load a pickle from
an untrusted source.

It may be a legitimate feature request, but it's too late to
make it into Python 2.2.  I suggest we look at the design
issues for Python 2.3 and decide if it's a feature we want
to support.  I imagine a PEP may be necessary to lay out the
issues and the solution.  Do you want to have a hand in that
PEP?

I still don't understand what it means that Pyro and cookie
were bit by a bug.  It sounds like they were using pickle in
ways that pickle was not intended to support.  A careful
analysis of how those two applications use pickle would be
helpful for generating requirements.
msg6965 - (view) Author: Nobody/Anonymous (nobody) Date: 2001-11-09 09:21
Logged In: NO 

Well, Guido and Tim agree with you that it's not a pickle
bug.  I still feel it is one, because its docs currently
make you think you can securely load untrusted pickles, and
because it's a natural, non-obscure thing to want to do
(see pyro and the cookie module), but whatever.  If it's
not a code bug then I feel it's a significant functionality
shortcoming in the python library.

Pyro uses pickle to serialize data for RPC calls over the
internet.  A malicious client could make a hostile pickle
take over the server.  The cookie module lets web
applications store user session state in browser cookies.
Its SerialCookie and SmartCookie classes let you put
arbitrary Python objects into the user session, and
serializes them when pickle.  Again, a malicious client
can make a hostile pickle, send it in a cookie header to
the http server, and take over the server when the 
application unpickles the cookie.

The current documentation for the pickle module makes it
very clear to me that the doc writer thought it was safe
to unpickle untrusted cookies.  If pickle wasn't designed
for that, then there was a communication failure between
the designer and the doc writer.

Yes, I'm willing to help with a PEP for fixing this
situation.

Paul
msg6966 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2001-11-10 17:37
Logged In: YES 
user_id=12800

I'm going to agree with Paul that this is a problem needing
fixing, however there are really several issues.

1. Cookie module makes it too easy to code exploits.  Cookie
exports a class, also called Cookie, which is aliased to
SmartCookie, so that a naive program will simply pass cookie
data to Cookie.Cookie() and you're screwed.  So, Cookie
module's defaults should be for more security rather than
less, and Cookie.Cookie should be aliased to SimpleCookie
instead.

2. There is no built-in safe mechanism for de-serializing
data from untrusted sources.  You can't use pickle without
overloading a "magic" method.  You can't use cPickle because
you can't do the overloading trick.  You can't use marshal
because it isn't bulletproof against recursive
datastructures.  So how /do/ you do it?

I think it just may be serious enough to deal with in Python
2.2, and I volunteer to address it (so I'll steal this bug
report).  Without looking at the code, or the level of
effort necessary, I would make the following suggestions:

1. Add to the public interface of pickle and cPickle, a flag
that either disables the unpickling of instances altogether,
or disables calling any code with unpickled data, e.g.
constructors.

2. Fix marshal to be bulletproof against recursive
datastructures.
3. Update the docs for both pickle/cPickle and marshal to
explain how to safely write de-serializers of untrusted strings.
msg6967 - (view) Author: Nobody/Anonymous (nobody) Date: 2001-11-10 19:42
Logged In: NO 

See bug #467384 for discussion about marshal.  Besides the
recursion issue, marshal's format is explicitly undocumented
and subject to change--you can't rely on it to interoperate
between two different Python versions, so it's no good as
an RPC serializer.  The format has kludges (e.g. the
representation of long ints) that make it undesirable to
freeze and document it and force future versions to be
backward compatible.

Adding a pickle.loads flag to prevent instance unpickling
isn't perfect but is probably the best alternative on 
your list.  Perhaps the flag can have a value that allows
unpickling the instances by restoring the instance 
attributes rather than calling the initializer.  That's
not always the right way to unpickle an instance (that's
why the unpickler no longer works that way) but it's good
enough a lot of the time.  

There's another issue with pickle/cPickle which is that they
unpickle quoted strings by evaling them.  This is scary.
While I don't see an immediate exploit, I also haven't
examined the 1000's of lines of code I'd need to examine
to convince myself that there's NOT an exploit.  I think
the unpickler should be changed to never call eval but just
parse the string as it needs to.  

Guido seemed to think pickle might have other possible
exploits.  I don't know what he had in mind but before
declaring it safe for untrusted data I think it needs to
be gone over with a fine toothed comb.

Paul
msg6968 - (view) Author: Jeremy Hylton (jhylton) (Python triager) Date: 2001-11-10 22:07
Logged In: YES 
user_id=31392

I don't think we should be doing anything about marshal.  
Maybe we should name in pyclib or something <0.9 wink>.  It 
works fine for .pyc files, but I don't see a reason for it 
to do anymore than is necessary for that purpose.

I think the notion of an unpickler that only handles 
builtin datatypes is the most attractive option you offer, 
but Paul has a good point about eval for strings.  (It 
currently has some hacks that address specific exploits, 
but I doubt they are sufficient.)  I also wonder how hard 
it is to handle builtin types and avoid subclasses of 
builtin types.

If there are any changes to pickle, I think we need to be 
careful about how it is described.  If we claim that an 
unpickler is safe for untrusted pickles, we've made a 
fairly strong claim.  I still think such a design change 
requires a PEP that includes some requirements and use 
cases and a thorough analysis of potential exploits. 
msg6969 - (view) Author: Nobody/Anonymous (nobody) Date: 2001-11-12 06:33
Logged In: NO 

I like marshal and think it's much cleaner than pickle.
I'd like to see its format cleaned up and documented
so it can be a portable transport format.  Even the
recursion issue isn't so compelling--there's only a danger
on marshalling, not unmarshalling, and the application
has control over what it marshals.  So I think the idea
of documenting marshal (bug #467384) should be kept alive.
However, it shouldn't be changed in 2.2.  

If the Cookie class is still really aliased by default to
SmartCookie, that's a bad bug and should definitely be
fixed ASAP.  The Cookie docs clearly say not to use
SmartCookie or SerialCookie.  In fact I think SmartCookie
and SerialCookie should be disabled altogether.  Any
applications using them on the open internet have a security
but and should be fixed immediately.

Here's a possible temporary approach to fixing SmartCookie
and SerialCookie: append a cryptographic message
authentication code (MAC) to each pickled cookie.  That
means the application has some secret string K as a
configuration parameter (it should be 10 or more random
8-bit characters, or 20 or more random hex digits, etc). 
Then a smart cookie would be a pickle p, followed by
SHA(p+K).  The Cookie module would validate the MAC before
calling unpickle, and raise an exception if the MAC wasn't
valid.

The security of this scheme rests on K being kept secret
from attackers, which is generally not an easy thing to
manage, but it's something to think about.

Paul
msg6970 - (view) Author: Jim Fulton (dcjim) (Python triager) Date: 2001-11-12 18:14
Logged In: YES 
user_id=73023

It sounds like there are some documentation bugs:

- The security ramifications are not discussed, nor are the
remedies.

- The cPickle module isn't documented very well. I submitted
some
   documentation a long time ago that never got incorporated
AFAIK.
   I wish I still had it. :)

- cPickle has a feature for turning off instance support and
for 
   restricting which classes can be unpickled. You can set
the find_global
   attribute on a cPickle.Unpickler. The find_global
attribute can be 
   a function or None.  If it is None, then no instances can
be 
   unpickled. If it is a function, then it should accept a
module and name
   and return the corresponding global. It is responsible
for looking
   up the global and can raise an error to prevent a
global's use.

   See the ZEO storage server implementation for an example
   of using this hook.


  

msg6971 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2001-11-12 18:52
Logged In: YES 
user_id=31435

Why are people (Paul, Jeremy) concerned about eval'ing 
strings?  cPickle and pickle both check that they're 
properly quoted, and this isn't sh or Perl:  Python has 
no "dynamic" gimmicks buried in string literals.  All 
eval'ing a string literal can do is produce a binary blob 
via interpeting simple escape sequences.  They're like C 
strings this way -- maybe we'll run out of memory, but 
that's it.

I would agree that Python should be refactored internally 
to supply a clean function for changing string literals 
into binary blobs, but that would be for clarity and 
efficiency, not security.
msg6972 - (view) Author: Nobody/Anonymous (nobody) Date: 2001-11-13 00:38
Logged In: NO 

It's possible that the eval is safe, but validating that
takes a line by line code review of all the paths that
evaling a string can go through.  It's also brittle in
that maybe someone will change the evaluator (say to
support Perl-like interpolation) in a future Python version
and not remember to change the unpickler.  Something like
that has already happened with the Cookie module.  My
guess is that it happened with the pickle module--the
whole point of that quoted string check is that the 
original pickle implementer didn't trust the input. 
The stuff about unpickling class instances was added
later (maybe by another person) without remembering the
security issue. 

Using eval this way is like storing a vat of cyanide in
your child's bedroom.  Sure, maybe if you check the seals
and locks on the vat carefully enough, you can convince
yourself that your child won't be able to get to the
cyanide.  But wouldn't you feel safer just not having the
vat there at all?  That's basic safety-consciousness.
Security consciousness works the same way.  Try to keep
dangerous ingredients and attackers as far away from each
other as possible.

Paul

msg6973 - (view) Author: Nobody/Anonymous (nobody) Date: 2001-11-13 06:18
Logged In: NO 

The find_global variable sounds encouraging and if it
fixes this problem, that's great.  I'd still feel better
if the eval call goes away.  Is removing it allowed
under the 2.2 feature freeze?  It doesn't affect anything
visible, so no tests would have to change, etc.

Paul
msg6974 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2001-11-13 21:39
Logged In: YES 
user_id=12800

Assigning to Jeremy so he'll remember to forward me Jim's
email.  Jeremy can ping-pong it to Fred if he wants, and
then reassign to me after forwarding the email.
msg6975 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2001-11-16 00:12
Logged In: YES 
user_id=12800

I have rewritten the pickle documentation, and updated the
marshal, cPickle, and copy_reg documentation.  I believe all
the documentation issues raised here have been fixed.  

Implementation issues will be pushed to Python 2.3, and the
plan is to write a PEP to plan future work.
msg6976 - (view) Author: Nobody/Anonymous (nobody) Date: 2001-11-16 00:43
Logged In: NO 

Barry, can you also do something about the Cookie module,
or at least about its documentation?  If Cookie is aliased
to SmartCookie, I think that needs mention.
 --Paul
msg6977 - (view) Author: Nobody/Anonymous (nobody) Date: 2001-11-17 22:07
Logged In: NO 

Are the new docs downloadable from somewhere?  thanx  --Paul
msg6978 - (view) Author: Nobody/Anonymous (nobody) Date: 2001-11-18 06:56
Logged In: NO 

Barry - the new docs just went up and they're a big
improvement over the old.  However the sentence 
"The issue here is usually not that a class's constructor
will get called -- it won't by the unpickler -- but that the
class's destructor (i.e. its __del__() method) might get
called when the object is garbage collected." isn't
correct.  

There's a flag in the pickle stream that tells
the unpickler that the pickler found a __getinitargs__
method at pickling time.  If the flag is set in the pickle,
then the unpickler calls the class constructor, whether
there's a __getinitargs__ method in the class or not.

The sample exploit that I posted earlier on,
<http://www.nightsong.com/phr/python/pickletest.py>,
is an example of an artificially concocted pickle that
makes a class constructor get called.
msg6979 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2001-11-18 16:08
Logged In: YES 
user_id=12800

You're right of course.  I meant to fix that and forgot. 
Will do so asap.  Glad you like the rest of it, though! :)
msg6980 - (view) Author: A.M. Kuchling (akuchling) * (Python committer) Date: 2003-02-05 20:40
Logged In: YES 
user_id=11375

Re-opening.  With the changes to pickling in Python 2.3, 
the security material should be replaced with a warning to 
not unpickle untrusted data.  Patch attached.

msg6981 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2003-02-05 21:29
Logged In: YES 
user_id=12800

Karmically (no, not comically) reassigning to Tim who
started this whole thing, and who happens to be dumping his
pickles these days.
msg6982 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2003-02-05 22:57
Logged In: YES 
user_id=31435

Andrew, didn't you go overboard in deleting text here?  The 
__safe_for_unpickling__ check is gone in 2.3, but most of the 
rest of the text still applies.  In particular, Cookie is still a 
problem for people inclined to worry about this, and overriding 
methods like find_global and load_global is still valuable stuff 
(the unpickler still never imports anything that doesn't come 
from an opcode triggering one of these methods).  
msg6983 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2003-02-05 23:03
Logged In: YES 
user_id=12800

I'll just mention that anybody using anything other than
Cookie.SimpleCookie is insane, and even using the Cookie
module at all to parse real-world cookies is mildly nuts
because it's way too strict.  I prefer m&m's in my cookies.
msg6984 - (view) Author: A.M. Kuchling (akuchling) * (Python committer) Date: 2003-02-06 14:29
Logged In: YES 
user_id=11375

The Cookie classes that use pickle have DeprecationWarnings in 
2.3, and should disappear in 2.4.  If {find,load}_global are 
still going to be supported, though, then they should still be documented (though, given that you shouldn't be unpickling untrusted data, why would you ever bother overriding find_global?)
msg6985 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2003-02-06 19:40
Logged In: YES 
user_id=31435

I think there are several reasons to override these methods.  
The one most relevant to this bug report is that, while Python 
has stopped pretending that pickles are secure by default, 
the choke points are still there, and motivated users can still 
expolit them.

For example, search pickle.py for __import__.  The only 
occurrence of __import__ in the Unpickler class is in method 
find_class(), and that's by design.  If a user overrides 
find_class(), the only imports the Unpickler *can* do are 
those the user explicitly performs in their own find_class() 
implementation.  So if that's a notion of "security" a user is 
happy with, they can still have it.  The docs trying to describe 
this are still valid.  It's only the "by magic" safety checks that 
have gone away (and they were buggy anyway, so no loss).
msg6986 - (view) Author: A.M. Kuchling (akuchling) * (Python committer) Date: 2003-02-26 19:22
Logged In: YES 
user_id=11375

OK; here's a reworked version that retitles the "Pickle security" 
section to "Subclassing Unpicklers", leaving the discussion of subclassing alone except for a few typo fixes.
msg6987 - (view) Author: A.M. Kuchling (akuchling) * (Python committer) Date: 2003-05-13 13:34
Logged In: YES 
user_id=11375

Can I check this in?
msg6988 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2003-05-13 18:48
Logged In: YES 
user_id=31435

Yes, please do!  And thank you.
msg6989 - (view) Author: A.M. Kuchling (akuchling) * (Python committer) Date: 2003-05-14 16:52
Logged In: YES 
user_id=11375

Checked in.
History
Date User Action Args
2022-04-10 16:04:31adminsetgithub: 35339
2001-10-16 22:42:24tim.peterscreate