Message 325430 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	serhiy.storchaka
Recipients	alexandre.vassalotti, serhiy.storchaka, shuoz, xtreak
Date	2018-09-15.11:02:11
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1537009331.62.0.956365154283.issue34656@psf.upfronthosting.co.za>
In-reply-to

Content
>>> import pickletools >>> pickletools.dis(b'\x80\x04\x95\x1d\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x03age\x94K\x17\x8c\x03jobr\x8c\x07student\x94u.') 0: \x80 PROTO 4 2: \x95 FRAME 29 11: } EMPTY_DICT 12: \x94 MEMOIZE (as 0) 13: ( MARK 14: \x8c SHORT_BINUNICODE 'age' 19: \x94 MEMOIZE (as 1) 20: K BININT1 23 22: \x8c SHORT_BINUNICODE 'job' 27: r LONG_BINPUT 1953695628 32: u SETITEMS (MARK at 13) 33: d DICT no MARK exists on stack Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/serhiy/py/cpython/Lib/pickletools.py", line 2457, in dis raise ValueError(errormsg) ValueError: no MARK exists on stack Ignore the error of unbalanced MARK. The problem code is LONG_BINPUT with the excessive large argument 1953695628. The C implementation of pickle tries to resize the the memo list to the size twice larger than this index. And here an integer overflow occurred. This unlikely occurred in real world. The pickle needs to have more than 230-1 ≈ 109 memoized items for encountering this bug. It means that its size on disk and in memory should be tens or hundreds of gigabytes. Pickle is not the best format for serializing such amount of data.

>>> import pickletools
>>> pickletools.dis(b'\x80\x04\x95\x1d\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x03age\x94K\x17\x8c\x03jobr\x8c\x07student\x94u.')
    0: \x80 PROTO      4
    2: \x95 FRAME      29
   11: }    EMPTY_DICT
   12: \x94 MEMOIZE    (as 0)
   13: (    MARK
   14: \x8c     SHORT_BINUNICODE 'age'
   19: \x94     MEMOIZE    (as 1)
   20: K        BININT1    23
   22: \x8c     SHORT_BINUNICODE 'job'
   27: r        LONG_BINPUT 1953695628
   32: u        SETITEMS   (MARK at 13)
   33: d    DICT       no MARK exists on stack
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/serhiy/py/cpython/Lib/pickletools.py", line 2457, in dis
    raise ValueError(errormsg)
ValueError: no MARK exists on stack

Ignore the error of unbalanced MARK. The problem code is LONG_BINPUT with the excessive large argument 1953695628. The C implementation of pickle tries to resize the the memo list to the size twice larger than this index. And here an integer overflow occurred.

This unlikely occurred in real world. The pickle needs to have more than 2**30-1 ≈ 10**9 memoized items for encountering this bug. It means that its size on disk and in memory should be tens or hundreds of gigabytes. Pickle is not the best format for serializing such amount of data.

History
Date	User	Action	Args
2018-09-15 11:02:11	serhiy.storchaka	set	recipients: + serhiy.storchaka, alexandre.vassalotti, xtreak, shuoz
2018-09-15 11:02:11	serhiy.storchaka	set	messageid: <1537009331.62.0.956365154283.issue34656@psf.upfronthosting.co.za>
2018-09-15 11:02:11	serhiy.storchaka	link	issue34656 messages
2018-09-15 11:02:11	serhiy.storchaka	create