classification
Title: Improve unpickling errors handling
Type: enhancement Stage: resolved
Components: Extension Modules, Library (Lib), Tests Versions: Python 3.6
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: alexandre.vassalotti, pitrou, python-dev, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2015-11-29 10:59 by serhiy.storchaka, last changed 2016-09-08 08:07 by serhiy.storchaka. This issue is now closed.

Files
File name Uploaded Description Edit
unpickling_mark_errors.patch serhiy.storchaka, 2015-11-29 12:19 review
unpickling_eof_errors.patch serhiy.storchaka, 2015-12-06 21:36 review
Messages (7)
msg255567 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-11-29 10:59
This issue is for better detecting and reporting errors in broken pickle data.
msg255569 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2015-11-29 11:14
New changeset d47e5b162072 by Serhiy Storchaka in branch '3.4':
Issue #25761: Added more test cases for testing unpickling broken data.
https://hg.python.org/cpython/rev/d47e5b162072

New changeset c7e7d77ef8bf by Serhiy Storchaka in branch '2.7':
Issue #25761: Added more test cases for testing unpickling broken data.
https://hg.python.org/cpython/rev/c7e7d77ef8bf

New changeset 4897438543da by Serhiy Storchaka in branch '3.5':
Issue #25761: Added more test cases for testing unpickling broken data.
https://hg.python.org/cpython/rev/4897438543da

New changeset c852c7d8d681 by Serhiy Storchaka in branch 'default':
Issue #25761: Added more test cases for testing unpickling broken data.
https://hg.python.org/cpython/rev/c852c7d8d681
msg255571 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-11-29 12:19
Proposed patch allows unpickler to detect errors related to reading a stack behind a mark.

For now Python implementation just pops a sentinel used as a mark. This can cause TypeError, AttributeError or UnpicklingError besides IndexError:

>>> pickle._loads(b'}(NNs.')
Traceback (most recent call last):
  ...
TypeError: 'object' object does not support item assignment
>>> pickle._loads(b'](Na.')
Traceback (most recent call last):
  ...
AttributeError: 'object' object has no attribute 'append'

Or can silently expose the mark object:

>>> pickle._loads(b')(.')
<object object at 0xb71084b0>
>>> pickle._loads(b']](a.')
[<object object at 0xb71084d8>]

C implementation just ignores incorrect mark:

>>> pickle.loads(b'}(NNs.')
{None: None}
>>> pickle.loads(b'](Na.')
[None]
>>> pickle.loads(b')(.')
()
>>> pickle.loads(b']](a.')
[[]]

But in case of complex data this can cause errors later.

With the patch C implementation always raises UnpicklingError with relevant message and Python implementation always raises IndexError.

>>> pickle.loads(b'}(NNs.')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'pickle' is not defined
>>> import pickle, pickletools
>>> pickle.loads(b'}(NNs.')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
_pickle.UnpicklingError: unexpected MARK found
>>> pickle.loads(b'](Na.')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
_pickle.UnpicklingError: unexpected MARK found
>>> pickle.loads(b')(.')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
_pickle.UnpicklingError: unexpected MARK found
>>> pickle.loads(b']](a.')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
_pickle.UnpicklingError: unexpected MARK found
>>> pickle._loads(b'}(NNs.')
Traceback (most recent call last):
  ...
IndexError: list index out of range
>>> pickle._loads(b'](Na.')
Traceback (most recent call last):
  ...
IndexError: list index out of range
>>> pickle._loads(b')(.')
Traceback (most recent call last):
  ...
IndexError: pop from empty list
>>> pickle._loads(b']](a.')
Traceback (most recent call last):
  ...
IndexError: pop from empty list
msg256028 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2015-12-06 20:02
New changeset 5c670af0100f by Serhiy Storchaka in branch 'default':
Issue #25761: Improved detecting errors in broken pickle data.
https://hg.python.org/cpython/rev/5c670af0100f
msg256032 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-12-06 21:36
When pickle stream is unexpectedly ended, different exceptions can be raised. EOFError("Ran out of input") is raised when the stream is unexpectedly ended without the STOP opcode. But it can be raised also when the data for the opcode is incomplete. Other possible exceptions are UnpicklingError, AttributeError and ValueError.

Examples:

>>> pickle.loads(b'L')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
_pickle.UnpicklingError: pickle data was truncated
>>> pickle.loads(b'L10')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
EOFError: Ran out of input
>>> pickle.loads(b'L10L')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: '10L'
>>> pickle.loads(b"S'abc'")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
_pickle.UnpicklingError: the STRING opcode argument must be quoted
>>> pickle.loads(b'(cbuiltins\nlist')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: Can't get attribute 'lis' on <module 'builtins' (built-in)>

Following patch makes C implementation of unpickler always raise UnpicklingError("pickle data was truncated") if the data for the opcode is truncated (above examples). EOFError("Ran out of input") is raised when the stream is unexpectedly ended without the STOP opcode, as before.

I'm not sure, may be always raise UnpicklingError or EOFError? Or change error message?
msg256055 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2015-12-07 09:32
New changeset 001514146c21 by Serhiy Storchaka in branch 'default':
Issue #25761: Fixed reference leak added in previous changeset (5c670af0100f).
https://hg.python.org/cpython/rev/001514146c21
msg274626 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2016-09-06 20:55
New changeset 231f578dfd3d by Serhiy Storchaka in branch 'default':
Issue #25761: Improved error reporting about truncated pickle data in
https://hg.python.org/cpython/rev/231f578dfd3d
History
Date User Action Args
2016-09-08 08:07:23serhiy.storchakasetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2016-09-06 20:55:32python-devsetmessages: + msg274626
2015-12-07 09:32:41python-devsetmessages: + msg256055
2015-12-06 21:36:37serhiy.storchakasetfiles: + unpickling_eof_errors.patch

messages: + msg256032
2015-12-06 20:02:15python-devsetmessages: + msg256028
2015-11-29 19:09:45serhiy.storchakasetnosy: + pitrou, alexandre.vassalotti
2015-11-29 12:19:34serhiy.storchakasetfiles: + unpickling_mark_errors.patch
versions: + Python 3.6
messages: + msg255571

keywords: + patch
stage: patch review
2015-11-29 11:14:44python-devsetnosy: + python-dev
messages: + msg255569
2015-11-29 10:59:25serhiy.storchakacreate