classification
Title: Overflow in unicode_hash
Type: crash Stage: needs patch
Components: Interpreter Core Versions: Python 3.2
process
Status: closed Resolution: duplicate
Dependencies: Superseder: Do not assume signed integer overflow behavior
View: 1621
Assigned To: Nosy List: ezio.melotti, georg.brandl, lemburg, skrah
Priority: critical Keywords:

Created on 2011-02-10 09:14 by skrah, last changed 2011-02-10 19:25 by skrah. This issue is now closed.

Messages (4)
msg128270 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2011-02-10 09:14
Due to an integer overflow in unicode_hash, the python interpreter
crashes if built with -ftrapv:

./configure --with-pydebug CFLAGS="-ftrapv"



Starting program: /home/stefan/svn/py3k/python 
[Thread debugging using libthread_db enabled]

Program received signal SIGABRT, Aborted.
0x00007ffff71e6a75 in *__GI_raise (sig=<value optimized out>) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
64      ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
        in ../nptl/sysdeps/unix/sysv/linux/raise.c
(gdb) bt
#0  0x00007ffff71e6a75 in *__GI_raise (sig=<value optimized out>) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1  0x00007ffff71ea5c0 in *__GI_abort () at abort.c:92
#2  0x00000000005e30a0 in __mulvdi3 ()
#3  0x000000000046304b in unicode_hash (self=0x7ffff7fab110) at Objects/unicodeobject.c:7600
#4  0x000000000041a313 in PyObject_Hash (v=0x7ffff7fab110) at Objects/object.c:762
#5  0x00000000005a9093 in PyDict_GetItem (op=0x8be030, key=0x7ffff7fab110) at Objects/dictobject.c:715
#6  0x000000000046d88c in PyUnicode_InternInPlace (p=0x7fffffffdf38) at Objects/unicodeobject.c:10026
#7  0x000000000046da8b in PyUnicode_InternFromString (cp=0x5e7c99 "__len__") at Objects/unicodeobject.c:10065
#8  0x0000000000445eba in init_slotdefs () at Objects/typeobject.c:5801
#9  0x000000000044633b in add_operators (type=0x846400) at Objects/typeobject.c:5955
#10 0x000000000043e950 in PyType_Ready (type=0x846400) at Objects/typeobject.c:3860
#11 0x000000000043e87e in PyType_Ready (type=0x846000) at Objects/typeobject.c:3824
#12 0x000000000041c786 in _Py_ReadyTypes () at Objects/object.c:1513
#13 0x00000000004c99a6 in Py_InitializeEx (install_sigs=1) at Python/pythonrun.c:229
#14 0x00000000004c9d78 in Py_Initialize () at Python/pythonrun.c:321
#15 0x00000000004ead8c in Py_Main (argc=1, argv=0x7ffff7fa9040) at Modules/main.c:597
#16 0x00000000004187cf in main (argc=1, argv=0x7fffffffe3c8) at ./Modules/python.c:59



Breakpoint 1, unicode_hash (self=0x7ffff7fab110) at Objects/unicodeobject.c:7594
7594        if (self->hash != -1)
(gdb) n
7596        len = Py_SIZE(self);
(gdb) n
7597        p = self->str;
(gdb) n
7598        x = *p << 7;
(gdb) n
7599        while (--len >= 0)
(gdb) p x
$1 = 12160
(gdb) n
7600            x = (1000003*x) ^ *p++;
(gdb) n
7599        while (--len >= 0)
(gdb) n
7600            x = (1000003*x) ^ *p++;
(gdb) n
7599        while (--len >= 0)
(gdb) n
7600            x = (1000003*x) ^ *p++;
(gdb) n

Program received signal SIGABRT, Aborted.
0x00007ffff71e6a75 in *__GI_raise (sig=<value optimized out>) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
64      ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
        in ../nptl/sysdeps/unix/sysv/linux/raise.c
(gdb) quit



This might be related to issue #10156 (unicode initialization is
not clearly defined).
msg128272 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2011-02-10 09:28
Ok, this is known, see #1621. Closing.
msg128273 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2011-02-10 09:40
Could you try the same in Python 2.7 ?

The overflow is intended (after all, it's a hash function), but we should probably add a cast to Py_hash_t to the hash building line in order to make the compiler aware of this.
msg128331 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2011-02-10 19:25
Marc-Andre Lemburg <report@bugs.python.org> wrote:
> 
> Marc-Andre Lemburg <mal@egenix.com> added the comment:
> 
> Could you try the same in Python 2.7 ?

It's the same, just in stringobject.c. Many hash functions have this issue.

> The overflow is intended (after all, it's a hash function), but we should
> probably add a cast to Py_hash_t to the hash building line in order to make
> the compiler aware of this.

I think I'd just do the hash calculation in unsigned and cast at the end
of the function. For the conversion from unsigned to signed we'd still
rely on implementation defined behavior [1], but at least the signed
integer overflow would be gone.

[1] Mark Dickinson made an effort to document assumptions for unsigned
to signed conversions. I don't know if this has found its way it into
the developer docs:

http://mail.python.org/pipermail/python-dev/2009-December/094388.html
History
Date User Action Args
2011-02-10 19:25:16skrahsetnosy: lemburg, georg.brandl, ezio.melotti, skrah
messages: + msg128331
2011-02-10 09:40:29lemburgsetnosy: + lemburg
messages: + msg128273
2011-02-10 09:28:35skrahsetstatus: open -> closed
superseder: Do not assume signed integer overflow behavior
messages: + msg128272

nosy: georg.brandl, ezio.melotti, skrah
resolution: duplicate
2011-02-10 09:18:52ezio.melottisetnosy: + ezio.melotti
2011-02-10 09:14:16skrahcreate