classification
Title: Possible integer overflow in operations with addresses and sizes.
Type: behavior Stage: resolved
Components: Interpreter Core Versions: Python 3.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: haypo, jcea, mark.dickinson, pitrou, python-dev, serhiy.storchaka
Priority: low Keywords: needs review, patch

Created on 2012-06-22 20:02 by serhiy.storchaka, last changed 2012-09-28 11:45 by jcea. This issue is now closed.

Files
File name Uploaded Description Edit
align_operations.patch serhiy.storchaka, 2012-06-22 20:02 review
align_operations2.patch serhiy.storchaka, 2012-09-20 12:35 review
Messages (22)
msg163473 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-06-22 20:02
In unicodeobject.c and stringlib aligned addresses and sizes are used for optimization. pointer->integer and implicit integer->integer conversions may overflow or underflow on platforms with sizeof(size_t) != sizeof(void *) or sizeof(size_t) != sizeof(int). The proposed patch fixes such unsafe things in unicodeobject.c, stringlib and some other files.

There are still a few unsafe places in libffi, but in this library Py_uintptr_t nor uintptr_t are not available.
msg164799 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2012-07-07 10:01
If we're worrying about undefined behaviour, it looks like recent optimizations have *introduced* new undefined behaviour in the form of strict aliasing violations.  E.g., from ascii_decode:

    unsigned long value = *(const unsigned long *) _p;

(here _p has type const char *).  This should really be fixed;  compilers are known to make optimizations based on the assumption that this sort of undefined behaviour doesn't occur.
msg164800 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-07-07 10:08
> This should really be fixed;  compilers are known to make optimizations 
> based on the assumption that this sort of undefined behaviour doesn't
> occur.

Doesn't the compiler have all the necessary information here? It's not like a subroutine is called.
(also, I'm not sure what optimization it could actually make)
msg164801 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2012-07-07 10:11
> (also, I'm not sure what optimization it could actually make)

Me neither, but it doesn't seem safe to assume that no compiler will take advantage of this.  I don't want to start guessing what compilers might or might not do;  it would be much better simply to stick to valid C where possible.
msg164804 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2012-07-07 10:14
N.B. This could probably be fixed without affecting performance by using the usual union trick.  (IIUC, that trick was technically still undefined behaviour for a while, but was eventually made legal by C99 + TC3.)  As far as I know there aren't any instances of compilers causing problems with that construct.
msg164805 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-07-07 10:16
> N.B. This could probably be fixed without affecting performance by
> using the usual union trick.

How would it work? We would have to add various unions to the
PyUnicode_Object definition?
msg164809 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2012-07-07 10:24
> How would it work? We would have to add various unions to the
> PyUnicode_Object definition?

No, you'd just need a temporary union defined in unicodeobject.c that would look something like:

typedef union { unsigned long v; char s[SIZEOF_LONG]; } U;

(with better choices of names).  Python/dtoa.c does a similar thing to read / write the pieces of a C double using integers safely.
msg164810 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2012-07-07 10:26
I'll see if I can come up with a patch, and open a new issue for it (since I've successfully derailed this issue from its original topic)
msg164818 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-07-07 10:49
> If we're worrying about undefined behaviour, it looks like recent optimizations have *introduced* new undefined behaviour in the form of strict aliasing violations.  E.g., from ascii_decode:
> 
>     unsigned long value = *(const unsigned long *) _p;
> 
> (here _p has type const char *).

I don't see what the undefined behavior. Can you specify exactly the
item of the Standard, in which it is mentioned?

>   This should really be fixed;  compilers are known to make optimizations based on the assumption that this sort of undefined behaviour doesn't occur.

I don't know how else you can rewrite it, without destroying completely
the effect of optimization.

In any case, I don't think that the original patch introduces some new
undefined behavior.
msg164821 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2012-07-07 11:09
> I don't see what the undefined behavior. Can you specify exactly the
> item of the Standard, in which it is mentioned?

It's C99 section 6.5, paragraph 7:  "An object shall have its stored value accessed only by an lvalue expression that has one of the following types ...".  It's the dereferencing of the pointer that's the problem: that's accessing a stored value of type 'char' by an lvalue expression that has type 'unsigned long'.

> In any case, I don't think that the original patch introduces some new
> undefined behavior.

Ah;  were the strict aliasing problems already there before the patch?  I didn't check.
msg167480 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-08-05 10:39
> Ah;  were the strict aliasing problems already there before the patch?  I didn't check.

Please open separate issue for this.
msg170568 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-09-16 18:47
Please, Mark or some other C expert, can you do a code review for the patch?
msg170620 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-09-17 19:09
Perhaps the three new macros should be made available in a .h file?
(in which case they should be private, i.e. start with _).
msg170623 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-09-17 20:09
> Perhaps the three new macros should be made available in a .h file?
> (in which case they should be private, i.e. start with _).

Perhaps. There are similar macros in other files (Include/objimpl.h, 
Objects/obmalloc.c, Python/pyarena.c, Modules/expat/xmlparse.c).
msg170684 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2012-09-18 22:27
> Perhaps the three new macros should be made available in a .h file?

Good idea. Maybe pymacros.h? These macros need to be documented (with a comment in the .h file)
msg170809 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-09-20 12:35
Well, here is a new patch. The five new macros moved to pymacros.h and used in more files.
msg170811 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2012-09-20 12:52
Apologies; I got distracted from the main point of this issue with the strict aliasing stuff, and then it fell off the to-do list.  Unassigning;  Antoine or Victor, do you want to take this?
msg170817 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-09-20 15:14
Mark, please open a new discussion, so we don't lose it and that was the place for discussion.
msg170828 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-09-20 18:51
Thanks for the patch! These macros will be useful.
msg170830 - (view) Author: Roundup Robot (python-dev) Date: 2012-09-20 18:57
New changeset 99112b851b25 by Antoine Pitrou in branch 'default':
Issue #15144: Fix possible integer overflow when handling pointers as integer values, by using Py_uintptr_t instead of size_t.
http://hg.python.org/cpython/rev/99112b851b25
msg170831 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-09-20 18:58
Committed in 3.3(.1).
msg170842 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2012-09-20 20:17
> Mark, please open a new discussion.

Done:  issue 15992.
History
Date User Action Args
2012-09-28 11:45:51jceasetnosy: + jcea
2012-09-20 20:17:26mark.dickinsonsetmessages: + msg170842
2012-09-20 18:58:50pitrousetstatus: open -> closed
versions: + Python 3.3, - Python 3.4
messages: + msg170831

resolution: fixed
stage: patch review -> resolved
2012-09-20 18:57:45python-devsetnosy: + python-dev
messages: + msg170830
2012-09-20 18:51:51pitrousetmessages: + msg170828
2012-09-20 15:14:30serhiy.storchakasetmessages: + msg170817
2012-09-20 12:52:12mark.dickinsonsetassignee: mark.dickinson ->
messages: + msg170811
2012-09-20 12:35:14serhiy.storchakasetfiles: + align_operations2.patch

messages: + msg170809
2012-09-18 22:27:41hayposetnosy: + haypo
messages: + msg170684
2012-09-17 20:09:53serhiy.storchakasetmessages: + msg170623
2012-09-17 19:09:26pitrousettype: security -> behavior
messages: + msg170620
versions: + Python 3.4, - Python 3.3
2012-09-16 18:47:02serhiy.storchakasetmessages: + msg170568
2012-08-05 10:40:15serhiy.storchakasetkeywords: + needs review
priority: normal -> low
2012-08-05 10:39:16serhiy.storchakasetmessages: + msg167480
stage: patch review
2012-07-07 11:09:28mark.dickinsonsetmessages: + msg164821
2012-07-07 10:49:58serhiy.storchakasetmessages: + msg164818
2012-07-07 10:26:13mark.dickinsonsetmessages: + msg164810
2012-07-07 10:24:48mark.dickinsonsetmessages: + msg164809
2012-07-07 10:16:10pitrousetmessages: + msg164805
2012-07-07 10:14:25mark.dickinsonsetmessages: + msg164804
2012-07-07 10:11:33mark.dickinsonsetmessages: + msg164801
2012-07-07 10:08:07pitrousetnosy: + pitrou
messages: + msg164800
2012-07-07 10:01:18mark.dickinsonsetmessages: + msg164799
2012-06-23 12:05:25mark.dickinsonsetassignee: mark.dickinson
2012-06-22 20:03:22mark.dickinsonsetnosy: + mark.dickinson
2012-06-22 20:02:48serhiy.storchakacreate