classification
Title: non-deterministic behavior of int subclass
Type: behavior Stage: commit review
Components: Interpreter Core Versions: Python 3.2, Python 3.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: mark.dickinson Nosy List: brechtm, mark.dickinson, pitrou, python-dev, skrah
Priority: high Keywords: patch

Created on 2012-04-20 07:51 by brechtm, last changed 2012-04-20 21:02 by mark.dickinson. This issue is now closed.

Files
File name Uploaded Description Edit
pdf.py brechtm, 2012-04-20 07:51
issue14630.patch mark.dickinson, 2012-04-20 17:24 review
Messages (15)
msg158803 - (view) Author: Brecht Machiels (brechtm) Date: 2012-04-20 07:51
I have subclassed int to add an extra attribute:

class Integer(int):
    def __new__(cls, value, base=10, indirect=False):
        try:
            obj = int.__new__(cls, value, base)
        except TypeError:
            obj = int.__new__(cls, value)
        return obj

    def __init__(self, value, base=10, indirect=False):
        self.indirect = indirect

Using this class in my application, int(Integer(b'0')) sometimes returns a value of 48 (= ord('0')!) or 192, instead of the correct value 0. str(Integer(b'0')) always returns '0'. This seems to only occur for the value 0. First decoding b'0' to a string, or passing int(b'0') to Integer makes no difference. The problem lies with converting an Integer(0) to an int with int().

Furthermore, this occurs in a random way. Subsequent runs will produce 48 or 192 at different points in the application (a parser). Both Python 3.2.2 and 3.2.3 behave the same (32-bit, Windows XP). Apparently, the 64-bit Windows Python 3.2.3 does not show this behavior [2]. I haven't tested on other operating systems.

I cannot seem to reproduce this in a simple test program. The following produces no output:

for i in range(100000):
    integer = int(Integer(b'0'))
    if integer > 0:
        print(integer)

Checking for the condition int(Integer()) > 0 in my application (when I know the argument to Integer is b'0') and conditionally printing int(Integer(b'0')) a number of times, the results 48 and 192 do show up now and then.

As I can't reproduce the problem in a short test program, I have attached the relevant code. It is basically a PDF parser. The output for this [2] PDF file is, for example:

b'0' 0 Integer(0) 192 0 b'0' 16853712
b'0' 0 Integer(0) 48 0 b'0' 16938088
b'0' 0 Integer(0) 192 0 b'0' 17421696
b'0' 0 Integer(0) 48 0 b'0' 23144888
b'0' 0 Integer(0) 48 0 b'0' 23185408
b'0' 0 Integer(0) 48 0 b'0' 23323272

Search for print function calls in the code to see what this represents.

[1] http://stackoverflow.com/questions/10230604/non-deterministic-behavior-of-int-subclass#comment13156508_10230604
[2] http://www.gust.org.pl/projects/e-foundry/math-support/vieth2008.pdf
msg158812 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2012-04-20 10:01
I can reproduce this on a 32-bit OS X build of the default branch, so it doesn't seem to be Windows specific (though it may be 32-bit specific).

Brecht, if you can find a way to reduce the size of your example at all that would be really helpful.
msg158814 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-04-20 10:53
Reproduced under 32-bit Linux.
The problem seems to be that Py_SIZE(x) == 0 when x is Integer(0), but ob_digit[0] is still supposed to be significant. There's probably some overwriting with the trailing attributes.
By forcing Py_SIZE(x) == 1, the bug disappears, but it probably breaks lots of other stuff in longobject.c.
msg158815 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2012-04-20 10:56
If we're accessing ob_digit[0] when Py_SIZE(x) == 0, that sounds like a bug to me.
msg158816 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-04-20 11:07
> If we're accessing ob_digit[0] when Py_SIZE(x) == 0, that sounds like a 
> bug to me.

_PyLong_Copy does.
It's ok as long as the object is int(0), because it's part of the small ints and its allocated size is one digit.

The following hack seems to fix the issue here. Perhaps we can simply fix _PyLong_Copy, but I wonder how many other parts of longobject.c rely on accessing ob_digit[0].


diff --git a/Objects/longobject.c b/Objects/longobject.c
--- a/Objects/longobject.c
+++ b/Objects/longobject.c
@@ -4194,6 +4194,8 @@ long_subtype_new(PyTypeObject *type, PyO
     n = Py_SIZE(tmp);
     if (n < 0)
         n = -n;
+    if (n == 0)
+        n = 1;
     newobj = (PyLongObject *)type->tp_alloc(type, n);
     if (newobj == NULL) {
         Py_DECREF(tmp);
diff --git a/Objects/object.c b/Objects/object.c
--- a/Objects/object.c
+++ b/Objects/object.c
@@ -1010,6 +1010,8 @@ PyObject **
         tsize = ((PyVarObject *)obj)->ob_size;
         if (tsize < 0)
             tsize = -tsize;
+        if (tsize == 0 && PyLong_Check(obj))
+            tsize = 1;
         size = _PyObject_VAR_SIZE(tp, tsize);
 
         dictoffset += (long)size;
@@ -1090,6 +1092,8 @@ PyObject *
                 tsize = ((PyVarObject *)obj)->ob_size;
                 if (tsize < 0)
                     tsize = -tsize;
+                if (tsize == 0 && PyLong_Check(obj))
+                    tsize = 1;
                 size = _PyObject_VAR_SIZE(tp, tsize);
 
                 dictoffset += (long)size;
msg158817 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2012-04-20 11:35
> _PyLong_Copy does.

Grr.  So it does.  That at least should be fixed, but I agree that it would be good to have the added protection of ensuring that we always allocate space for at least one limb.

We should also check whether 2.7 is susceptible.
msg158819 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2012-04-20 11:53
Self-contained example that fails for me on 32-bit OS X.




class Integer(int):
    def __new__(cls, value, base=10, indirect=False):
        try:
            obj = int.__new__(cls, value, base)
        except TypeError:
            obj = int.__new__(cls, value)
        return obj

    def __init__(self, value, base=10, indirect=False):
        self.indirect = indirect


integers = []
for i in range(1000):
    integer = Integer(b'0')
    integers.append(integer)

for integer in integers:
    assert int(integer) == 0
msg158822 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-04-20 12:06
The fix for _PyLong_Copy is the following:

diff --git a/Objects/longobject.c b/Objects/longobject.c
--- a/Objects/longobject.c
+++ b/Objects/longobject.c
@@ -156,7 +156,7 @@ PyObject *
     if (i < 0)
         i = -(i);
     if (i < 2) {
-        sdigit ival = src->ob_digit[0];
+        sdigit ival = (i == 0) ? 0 : src->ob_digit[0];
         if (Py_SIZE(src) < 0)
             ival = -ival;
         CHECK_SMALL_INT(ival);
msg158823 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2012-04-20 12:18
Using MEDIUM_VALUE also works.

I'll cook up a patch tonight, after work.


diff -r 6762b943ee59 Objects/longobject.c
--- a/Objects/longobject.c	Tue Apr 17 21:42:07 2012 -0400
+++ b/Objects/longobject.c	Fri Apr 20 13:18:01 2012 +0100
@@ -156,9 +156,7 @@
     if (i < 0)
         i = -(i);
     if (i < 2) {
-        sdigit ival = src->ob_digit[0];
-        if (Py_SIZE(src) < 0)
-            ival = -ival;
+        sdigit ival = MEDIUM_VALUE(src);
         CHECK_SMALL_INT(ival);
     }
     result = _PyLong_New(i);
msg158854 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2012-04-20 17:23
Here's the patch.  I searched through the rest of Objects/longobject.c for other occurrences of [0], and found nothing else that looked suspicious, so I'm reasonably confident that this was an isolated case.
msg158861 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2012-04-20 17:52
Also, Python 2.7 looks safe here.
msg158863 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-04-20 17:57
The patch works fine here, and the test exercises the issue correctly.
msg158877 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-04-20 19:49
The patch looks good to me.
msg158886 - (view) Author: Roundup Robot (python-dev) Date: 2012-04-20 20:44
New changeset cdcc6b489862 by Mark Dickinson in branch '3.2':
Issue #14630: Fix an incorrect access of ob_digit[0] for a zero instance of an int subclass.
http://hg.python.org/cpython/rev/cdcc6b489862

New changeset c7b0f711dc15 by Mark Dickinson in branch 'default':
Issue #14630: Merge fix from 3.2.
http://hg.python.org/cpython/rev/c7b0f711dc15
msg158888 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2012-04-20 20:45
Fixed.  Thanks Brecht for the report (and Antoine for diagnosing the problem).
History
Date User Action Args
2012-04-20 21:02:16mark.dickinsonsetstatus: open -> closed
2012-04-20 20:45:28mark.dickinsonsetresolution: fixed
messages: + msg158888
2012-04-20 20:44:30python-devsetnosy: + python-dev
messages: + msg158886
2012-04-20 19:49:51skrahsetmessages: + msg158877
2012-04-20 17:57:51pitrousetmessages: + msg158863
2012-04-20 17:52:15mark.dickinsonsetstage: needs patch -> commit review
2012-04-20 17:52:08mark.dickinsonsetmessages: + msg158861
versions: - Python 2.7
2012-04-20 17:24:06mark.dickinsonsetfiles: + issue14630.patch
keywords: + patch
2012-04-20 17:23:55mark.dickinsonsetmessages: + msg158854
2012-04-20 12:18:37mark.dickinsonsetassignee: mark.dickinson
messages: + msg158823
2012-04-20 12:06:37pitrousetcomponents: + Interpreter Core, - None
stage: needs patch
2012-04-20 12:06:28pitrousetassignee: mark.dickinson -> (no value)
messages: + msg158822
2012-04-20 11:53:11mark.dickinsonsetmessages: + msg158819
2012-04-20 11:45:03mark.dickinsonsetassignee: mark.dickinson
2012-04-20 11:35:59mark.dickinsonsetmessages: + msg158817
versions: + Python 2.7
2012-04-20 11:07:07pitrousetmessages: + msg158816
2012-04-20 10:56:45mark.dickinsonsetmessages: + msg158815
2012-04-20 10:53:32pitrousetnosy: + skrah, pitrou
messages: + msg158814
2012-04-20 10:01:56mark.dickinsonsetpriority: normal -> high

messages: + msg158812
versions: + Python 3.3
2012-04-20 08:07:05mark.dickinsonsetnosy: + mark.dickinson
2012-04-20 07:51:22brechtmcreate