classification
Title: Loading malicious pickle may cause excessive memory usage
Type: security Stage: needs patch
Components: Extension Modules Versions: Python 3.2
process
Status: closed Resolution: wont fix
Dependencies: Superseder:
Assigned To: Nosy List: alexandre.vassalotti, georg.brandl, pitrou
Priority: critical Keywords:

Created on 2010-09-28 00:07 by alexandre.vassalotti, last changed 2010-11-20 10:41 by georg.brandl. This issue is now closed.

Messages (5)
msg117492 - (view) Author: Alexandre Vassalotti (alexandre.vassalotti) * (Python committer) Date: 2010-09-28 00:07
This was mentioned during the review of issue #9410
(http://codereview.appspot.com/1694050/diff/2001/3001#newcode347), however we forgot to fix this.

The new array-based memo for the Unpickler class assumes incorrectly that memo indices are always contiguous. This is not the case. And due to this, the following pickle will cause Unpickler to use about 3GB of memory to store the memo array.

./python -c "import pickle; pickle.loads(b'\x80\x02]r\xff\xff\xff\x06.')"

To fix this, we can add code to fall-back to a dictionary-based memo when the memo keys are not contiguous.
msg117493 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-09-28 00:12
I don't think there's any point doing this. Pickle is insecure by construction; it shouldn't crash when used legitimately, but trying to make it robust in the face of hand-crafted pickle strings sounds like an uphill battle (*).

(*) e.g. http://nadiana.com/python-pickle-insecure
msg117494 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-09-28 00:27
As an example of malicious pickle causing "excessive" memory usage, you can simply write:

>>> s = b'\x80\x03cbuiltins\nbytearray\nq\x00J\x00\x00\x00\x7f\x85q\x01Rq\x02.'
>>> _ = pickle.loads(s)

This will allocate an almost 2GB bytearray. You can of course change the size as you like. Here is the disassembly:

>>> pickletools.dis(s)
    0: \x80 PROTO      3
    2: c    GLOBAL     'builtins bytearray'
   22: q    BINPUT     0
   24: J    BININT     2130706432
   29: \x85 TUPLE1
   30: q    BINPUT     1
   32: R    REDUCE
   33: q    BINPUT     2
   35: .    STOP
highest protocol among opcodes = 2


Therefore, I would recommend closing this issue.
msg117498 - (view) Author: Alexandre Vassalotti (alexandre.vassalotti) * (Python committer) Date: 2010-09-28 00:45
I was going to say this method http://docs.python.org/dev/py3k/library/pickle.html#restricting-globals  could be used to prevent this kind of attack on bytearray. But, I came up with this fun thing:

pickle.loads(b'\x80\x03cbuiltins\nlist\ncbuiltins\nrange\nJ\xff\xff\xff\x03\x85R\x85R.')

Sigh... you are right about pickle being insecure by design. The only solution is to use HMAC to check the integrity and the authenticity of incoming pickles.
msg121602 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2010-11-20 10:41
Closing as "won't fix".
History
Date User Action Args
2010-11-20 10:41:20georg.brandlsetstatus: open -> closed

nosy: + georg.brandl
messages: + msg121602

resolution: wont fix
2010-09-28 00:45:25alexandre.vassalottisetmessages: + msg117498
2010-09-28 00:27:17pitrousetmessages: + msg117494
2010-09-28 00:12:54pitrousetmessages: + msg117493
2010-09-28 00:08:21alexandre.vassalottisetnosy: + pitrou
2010-09-28 00:07:45alexandre.vassalotticreate