classification
Title: Issues with BINUNICODE8 and BINBYTES8 opcodes in pickle
Type: behavior Stage: resolved
Components: Extension Modules, Library (Lib), Tests Versions: Python 3.6, Python 3.4, Python 3.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: alexandre.vassalotti, pitrou, python-dev, serhiy.storchaka, vstinner
Priority: normal Keywords: patch

Created on 2015-09-29 06:14 by serhiy.storchaka, last changed 2015-09-29 19:15 by serhiy.storchaka. This issue is now closed.

Files
File name Uploaded Description Edit
pickle_binbytes8.patch serhiy.storchaka, 2015-09-29 06:14
pickle_binbytes8_2.patch serhiy.storchaka, 2015-09-29 13:14 review
Messages (5)
msg251823 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-09-29 06:14
There are issues with BINUNICODE8 and BINBYTES8 opcodes in pickle.

1. Unpickling BINBYTES8 is not implemented in Python implementation.

2. Highest 32 bits of 64-bit size are silently ignored in C implementation on 32-bit platforms.

3. There are no tests for BINUNICODE8 and BINBYTES8.

Proposed patch fixes these issues.
msg251826 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-09-29 07:25
To access an array item, the type of the index variable must be size_t (or a type with the same size), otherwise the compiler produce less efficient machine code:
http://www.viva64.com/en/a/0050/

=> please keep Py_ssize_t type for i

(I didn't check for the specific case of Py_ssize_t: it's signed. Does GCC emit the most efficient machine code for it?)

diff -r 16c8278c03f6 Modules/_pickle.c
--- a/Modules/_pickle.c
+++ b/Modules/_pickle.c
@@ -4606,10 +4606,17 @@ static Py_ssize_t
 calc_binsize(char *bytes, int nbytes)
 {
     unsigned char *s = (unsigned char *)bytes;
-    Py_ssize_t i;
+    int i;
     size_t x = 0;
 
-    for (i = 0; i < nbytes && (size_t)i < sizeof(size_t); i++) {
+    if (nbytes > (int)sizeof(size_t)) {
+        for (i = (int)sizeof(size_t); i < nbytes; i++) {
+            if (s[i])
+                return -1;
+        }
+        nbytes = (int)sizeof(size_t);
+    }

Please add a comment here to explain that the first loop check for integer overflow, it's not obvious at the first read.

Does the Python implementation of pickle produce BINBYTES8? If not: why not?

Note: the patch is probably based on a private Mercurial revision, so it didn't get the [Review] button.
msg251856 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-09-29 13:14
> To access an array item, the type of the index variable must be size_t (or a
> type with the same size), otherwise the compiler produce less efficient
> machine code: http://www.viva64.com/en/a/0050/
> 
> => please keep Py_ssize_t type for i

i is small integer between 0 and 8. I don't think that this article is 
related. Py_ssize_t is signed, and i is compared with int in a loop. And isn't 
Viva64 work only with Visual C++?  At least I found statements that can be not 
correct on platforms not supported by Microsoft.

> Please add a comment here to explain that the first loop check for integer
> overflow, it's not obvious at the first read.

Done.

> Does the Python implementation of pickle produce BINBYTES8? If not: why not?

Yes, it produces BINBYTES8 on 64-bit platform.

> Note: the patch is probably based on a private Mercurial revision, so it
> didn't get the [Review] button.

Yes, I had made some refactoring for unpickling tests and forgot to push they. 
Here is rebased patch.
msg251876 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2015-09-29 19:14
New changeset d4f8316d0860 by Serhiy Storchaka in branch '3.4':
Issue #25262. Added support for BINBYTES8 opcode in Python implementation of
https://hg.python.org/cpython/rev/d4f8316d0860

New changeset da9ad20dd470 by Serhiy Storchaka in branch '3.5':
Issue #25262. Added support for BINBYTES8 opcode in Python implementation of
https://hg.python.org/cpython/rev/da9ad20dd470

New changeset 8de1967edfdb by Serhiy Storchaka in branch 'default':
Issue #25262. Added support for BINBYTES8 opcode in Python implementation of
https://hg.python.org/cpython/rev/8de1967edfdb
msg251877 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-09-29 19:15
Thank you Victor and Antoine for your reviews.
History
Date User Action Args
2015-09-29 19:15:57serhiy.storchakasetstatus: open -> closed
resolution: fixed
messages: + msg251877

stage: patch review -> resolved
2015-09-29 19:14:27python-devsetnosy: + python-dev
messages: + msg251876
2015-09-29 13:14:31serhiy.storchakasetfiles: + pickle_binbytes8_2.patch

messages: + msg251856
2015-09-29 07:25:18vstinnersetnosy: + vstinner
messages: + msg251826
2015-09-29 06:14:22serhiy.storchakacreate