int arithmetic relies on C signed overflow behaviour #51655

mdickinson · 2009-11-29T11:56:23Z

BPO	7406
Nosy	@tim-one, @terryjreedy, @gpshead, @mdickinson, @ericvsmith, @tiran, @serhiy-storchaka

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2021-10-20.12:51:01.576>
created_at = <Date 2009-11-29.11:56:22.624>
labels = ['interpreter-core']
title = 'int arithmetic relies on C signed overflow behaviour'
updated_at = <Date 2021-10-20.12:51:01.575>
user = 'https://github.com/mdickinson'

bugs.python.org fields:

activity = <Date 2021-10-20.12:51:01.575>
actor = 'christian.heimes'
assignee = 'none'
closed = True
closed_date = <Date 2021-10-20.12:51:01.576>
closer = 'christian.heimes'
components = ['Interpreter Core']
creation = <Date 2009-11-29.11:56:22.624>
creator = 'mark.dickinson'
dependencies = []
files = []
hgrepos = []
issue_num = 7406
keywords = []
message_count = 10.0
messages = ['95803', '95912', '95956', '95957', '95959', '95974', '95976', '95986', '245791', '404443']
nosy_count = 9.0
nosy_names = ['tim.peters', 'terry.reedy', 'gregory.p.smith', 'zooko', 'mark.dickinson', 'eric.smith', 'christian.heimes', 'serhiy.storchaka', 'Kevin Shweh']
pr_nums = []
priority = 'normal'
resolution = 'out of date'
stage = 'resolved'
status = 'closed'
superseder = None
type = None
url = 'https://bugs.python.org/issue7406'
versions = ['Python 2.6', 'Python 2.7']

mdickinson · 2009-11-29T11:56:18Z

Much of the code in Objects/intobject.c assumes that an arithmetic
operation on signed longs will wrap modulo 2**(bits_in_long) on
overflow. However, signed overflow causes undefined behaviour according
to the C standards (e.g., C99 6.5, para. 5), and gcc is known to assume
that signed overflow never occurs in correct code, and to make use of
this assumption when optimizing.

An obvious example is found in int_add, which looks like this:

static PyObject *
int_add(PyIntObject *v, PyIntObject *w)
{
	register long a, b, x;
	CONVERT_TO_LONG(v, a);
	CONVERT_TO_LONG(w, b);
	x = a + b;
	if ((x^a) >= 0 || (x^b) >= 0)
		return PyInt_FromLong(x);
	return PyLong_Type.tp_as_number->nb_add((PyObject *)v, (PyObject 
*)w);
}

Here Python is relying on the line 'x = a + b' wrapping on overflow.
While this code doesn't seem to have caused any problems to date, it's
not at all inconceivable that some future version of GCC is clever
enough to figure out that (with its assumption that correct code never
includes signed overflow) the if condition is always false, so can be
optimized away. At that point, a Python interpreter built with this
version of GCC would produce incorrect results for int addition.

More generally, Python's source makes a number of assumptions about
integer arithmetic that aren't guaranteed by the C standards. Most of
these assumptions are likely to be harmless on modern machines, but the
assumptions should probably at least be documented somewhere, and
ideally also checked somewhere in the configuration, so that attempts to
port Python to machines that don't obey these assumptions complain
loudly. Namely, the source assumes at least that:

C signed ints are represented in two's complement, not ones'
complement or sign-and-magnitude.
the bit pattern 1000....000 is not a trap representation (so
e.g., INT_MIN = -INT_MAX-1, not -INT_MAX).
conversion from an unsigned integer type to the corresponding signed
type wraps modulo 2**(appropriate_number_of_bits).

(Relevant standard sections: C99 6.2.6.2, C99 6.3.1.3p3.)

See also bpo-1621.

mdickinson · 2009-12-02T17:44:21Z

Fixed int_sub, int_add, int_mul, and the fast paths for BINARY_ADD and
BINARY_SUB in ceval.c, in r76629 (trunk) and r76630 (release26-maint).

zooko · 2009-12-04T05:22:02Z

Here is a way to test for overflow which is correct for any C implementation:

static PyObject *
int_add(PyIntObject *v, PyIntObject *w)
{
	register long a, b;
	CONVERT_TO_LONG(v, a);
	CONVERT_TO_LONG(w, b);
	if (((a>0)&&(b>0)&&((LONG_MAX-a)<b))
		||((a<0)&&(b<0)&&((LONG_MIN-a)>b))) {
		/* would overflow the long type */
	    return PyLong_Type.tp_as_number->nb_add((PyObject *)v, (PyObject *)w);
	}
	
	return PyInt_FromLong(a+b);
}

zooko · 2009-12-04T05:50:57Z

Here is a set of macros that I use to test for overflow:

http://allmydata.org/trac/libzutil/browser/libzutil/zutilimp.h

mdickinson · 2009-12-04T10:58:49Z

Zooko: Yes; that's the sort of solution that's needed if we're not
allowed to assume two's complement with the extraordinary value (-
sys.maxint - 1) not a trap representation. If we are allowed to
assume this, then more efficient solutions are available.

Also, if we're not allowed to assume two's complement + no trap
representation, then int_and, int_xor, int_or are plain wrong:

For ones' complement or sign-and-magnitude, the result of these
logical operations won't match the result of the corresponding
operations on longs, for negative operands.

For two's complement with (-sys.maxint-1) a trap representation,
int_and and int_xor should be producing a Python long instead
of a Python int in some cases: -sys.maxint ^ 1 should be -sys.maxint - 1,
which wouldn't be representable as a Python int.

That's why I want to make these extra assumptions beyond what's
guaranteed by the C standards; working around them introduces
inefficiencies for all implementations, for the benefit
of implementations that (probably) don't even exist.

terryjreedy · 2009-12-04T22:32:12Z

I consider the binary bitwise operations, for negative ints, to be
either undefined or wrongly implemented. Consider the following (3.1)
>>> 3^2
1
>>> -3^2
-1
>>> 3^-2
-3
>>> -3^-2
3
>>> 2^3
1
>>> -2^3
-3

Something change sign just flips the sign of the result, sometimes it
also changes the magnitude. From the viewpoint of arithmetic, and signed
base-two representations, the latter seems senseless.

The doc says only "The ^ operator yields the bitwise XOR (exclusive OR)
of its arguments, which must be integers." But it does not define what
bitwise XOR means for signed ints, as opposed to unsigned bit strings,
possible interpreted as (unsigned) counts, which is the traditional
domain of bit-operation definition. So there is no way to predict the
result for negative ints. Or rather, the sensible prediction does not
match the observed behavior.

My impression is that Python longs are signed magnitudes. If so, the
bitwise ops should arguably be the signed result of the op on the
magnitudes.

tim-one · 2009-12-04T23:23:04Z

Terry, the language reference also says:

"""
For the purpose of shift and mask operations, a binary representation is
assumed, and negative numbers are represented in a variant of 2's
complement which gives the illusion of an infinite string of sign bits
extending to the left.
"""

That explains every result you saw:

3 = ...000011
2 = ...000010
1 = ...000001

-3 = ...111101
2 = ...000010
-1 = ...111111

3 = ...000011
-2 = ...111110
-3 = ...111101

-3 = ...111101
-2 = ...111110
3 = ...000011

2 = ...000010
3 = ...000011
1 = ...000001

-2 = ...111110
3 = ...000011
-3 = ...111101

In every case, the result is simply the xor of the inputs viewed as
infinite bitstrings. And it works exactly the same way if you use |, &,
<<, >>, or unary ~.

It's true that CPython's longs are /implemented/ via sign-magnitude, but
that should never be visible in the result of any operation.

terryjreedy · 2009-12-05T08:03:35Z

Thanks Tim. I see that is back in 3.2 rather than in the shift and mask
sections. At least I know what to refer to now.

KevinShweh · 2015-06-25T03:12:06Z

It looks like the fast paths for INPLACE_ADD and INPLACE_SUBTRACT in Python 2 don't have the cast-to-unsigned fix, so they're still relying on undefined behavior. For example, in INPLACE_ADD:

            /* INLINE: int + int */
            register long a, b, i;
            a = PyInt_AS_LONG(v);
            b = PyInt_AS_LONG(w);
            i = a + b;
            if ((i^a) < 0 && (i^b) < 0)
                goto slow_iadd;

tiran · 2021-10-20T12:51:02Z

Python 2 is no longer supported. Python 3's _PyLong_Add() function doesn't rely on overflow.

mdickinson added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label Nov 29, 2009

tiran closed this as completed Oct 20, 2021

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

int arithmetic relies on C signed overflow behaviour #51655

int arithmetic relies on C signed overflow behaviour #51655

mdickinson commented Nov 29, 2009

mdickinson commented Nov 29, 2009

mdickinson commented Dec 2, 2009

zooko mannequin commented Dec 4, 2009

zooko mannequin commented Dec 4, 2009

mdickinson commented Dec 4, 2009

terryjreedy commented Dec 4, 2009

tim-one commented Dec 4, 2009

terryjreedy commented Dec 5, 2009

KevinShweh mannequin commented Jun 25, 2015

tiran commented Oct 20, 2021

int arithmetic relies on C signed overflow behaviour #51655

int arithmetic relies on C signed overflow behaviour #51655

Comments

mdickinson commented Nov 29, 2009

mdickinson commented Nov 29, 2009

mdickinson commented Dec 2, 2009

zooko mannequin commented Dec 4, 2009

zooko mannequin commented Dec 4, 2009

mdickinson commented Dec 4, 2009

terryjreedy commented Dec 4, 2009

tim-one commented Dec 4, 2009

terryjreedy commented Dec 5, 2009

KevinShweh mannequin commented Jun 25, 2015

tiran commented Oct 20, 2021