msg100106 - (view) |
Author: Matt Bandy (mbandy) |
Date: 2010-02-25 21:34 |
Using the Py_ADDRESS_IN_RANGE macro can result in a crash under certain threading conditions. Since it is intentionally possible for (POOL)->arenaindex to reference memory which is not under Python's control, another thread may modify that memory. In particular, the following sequence of operations is possible and leads to a crash:
1. A thread running Python code enters the Py_ADDRESS_IN_RANGE macro to test a pointer value which was allocated by the system allocator, not the pool-based allocator.
2. That thread intentionally reads a value (POOL)->arenaindex from memory which was not allocated by the Python interpreter.
3. The value (POOL)->arenaindex is tested and is less than maxarenas.
4. An unrelated thread which actually allocated the memory Py_ADDRESS_IN_RANGE is in the middle of looking at modifies the value referenced by (POOL)->arenaindex, changing it to a different value which is larger than maxarenas.
5. The original thread running Python code subscripts arenas with the new value and reads memory past the end of the arenas array, leading to unpredictable behavior or an access violation.
It's possible to fix this problem by changing Py_ADDRESS_IN_RANGE to a function so that it reads (POOL)->arenaindex only once, but the adjacent comment suggests that it's important for performance that it be a macro.
I observed this crash on Windows Vista x64 using an embedded Python 2.6.4 interpreter, although the same code appears to exist in later versions of Python as well.
|
msg100108 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2010-02-25 22:00 |
It's certainly possible to keep it as a macro while avoiding reading (POOL)->arenaindex twice. Just make it so that the macro expands as a block rather than an expression.
Something like:
#define Py_ADDRESS_IN_RANGE(P, POOL, RESULT) do { \
uint _arenaindex = (POOL)->arenaindex; \
RESULT = (_arenaindex < maxarenas && \
(uptr)(P) - arenas[_arenaindex].address < (uptr)ARENA_SIZE) && \
arenas[_arenaindex].address != 0); \
} while(0)
|
msg100109 - (view) |
Author: Matt Bandy (mbandy) |
Date: 2010-02-25 22:03 |
You can't add a block without changing the way the macro is used -- it's usually used like:
if (Py_ADDRESS_IN_RANGE(p, pool))
This won't work with your proposed change since you can't put a do {} loop into the test expression of the if statement.
|
msg100110 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2010-02-25 22:06 |
Obviously it also involves changing the code where the macro is invoked. It would be quite non-controversial though.
I've a got a more interesting question: how does the memory allocator end up being invoked from two threads at once? Usually memory allocations should be serialized by the GIL (the Global Interpreter Lock).
|
msg100111 - (view) |
Author: Tim Peters (tim.peters) * |
Date: 2010-02-25 22:10 |
Note that nothing in obmalloc is _intended_ to be thread-safe. Almost all Python API functions make the caller responsible for serializing calls (which is usually accomplished "by magic" via the GIL). This includes calls to all facilities supplied by obmalloc.
|
msg100113 - (view) |
Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * |
Date: 2010-02-25 22:17 |
No, the description above says it: only the first thread calls Py_ADDRESS_IN_RANGE. The other thread happens to modify the memory at the beginning of the page, just at the address (POOL)->arenaindex.
Could we use an "inline" function instead of a macro? I suppose the compiler would optimize it and generate code as fast as a macro.
|
msg100114 - (view) |
Author: Matt Bandy (mbandy) |
Date: 2010-02-25 22:24 |
Amaury is correct -- in the case I observed, the other thread had nothing to do with Python and was not calling Python functions that would have required obtaining the GIL at any time (again, I'm using an embedded Python interpreter in a larger app, not the standalone interpreter). The other thread is just unlucky to have allocated non-Python related memory in the same page where a Python object was allocated.
|
msg100115 - (view) |
Author: Tim Peters (tim.peters) * |
Date: 2010-02-25 22:25 |
Amaury, I have no real idea what your "No, the description above says it: ..." is a response to, but if it's to my point that obmalloc is not intended to be thread-safe, it's not addressing that point. Any case in which another thread can affect a thread currently running in obmalloc is forbidden: forbidden, not allowed, begging for undefined behavior, etc etc.
> Could we use an "inline" function instead of a macro?
If every C compiler on every platform Python runs on supports such a thing with exactly the same spelling, sure ;-)
|
msg100117 - (view) |
Author: Tim Peters (tim.peters) * |
Date: 2010-02-25 22:36 |
Just noting there are ways to trick the macro into reading a thing once. For example, embed something like a
((x = arenas[(POOL)->arenaindex].address) || 1)
clause early on & refer to x later (where x is some file-scope new variable).
obmalloc is not intended to be thread-safe _period_, but I agree the current behavior in the case described is obnoxious enough that if it's easy to worm around it, fine.
|
msg100119 - (view) |
Author: Matt Bandy (mbandy) |
Date: 2010-02-25 22:44 |
It's a pretty major limitation on the embedding case if you can't allow other threads that aren't related to Python to run at any time that another thread may be in obmalloc, and one I haven't seen documented anywhere. The only other fix that occurs to me would be to ensure that any memory Python allocates is not in the same page as memory not allocated by Python, but that seems like it would be much more complex than the current "just call malloc()" solution. Your proposed change to the macro sounds more reasonable to me.
|
msg100122 - (view) |
Author: Tim Peters (tim.peters) * |
Date: 2010-02-25 22:59 |
Right, I already agreed it would be fine to fix this if it's cheap ;-)
I didn't give any real thought to the macro solution, but that's fine by me too. It would be best to embed the assignment in a _natural_ place, though; like, say:
#define Py_ADDRESS_IN_RANGE(P, POOL) \
((POOL)->arenaindex < maxarenas && \
(uptr)(P) - (x = arenas[(POOL)->arenaindex].address) < (uptr)ARENA_SIZE && \
x != 0)
where some better name than `x` is picked, and the trick is clearly documented at x's declaration point. Nothing wrong with expediency here! This is time-critical code and has always been skating on the edge.
|
msg100124 - (view) |
Author: Matt Bandy (mbandy) |
Date: 2010-02-25 23:10 |
That should probably be:
#define Py_ADDRESS_IN_RANGE(P, POOL) \
((x = (POOL)->arenaindex) < maxarenas && \
(uptr)(P) - arenas[x].address < (uptr)ARENA_SIZE && \
arenas[x].address != 0)
The address in the arena shouldn't change since it does belong to Python, so no one should be monkeying with it without the GIL. The arenaindex is vulnerable since POOL can point to something not owned by Python.
|
msg100125 - (view) |
Author: Tim Peters (tim.peters) * |
Date: 2010-02-25 23:12 |
Fine by me!
|
msg100126 - (view) |
Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * |
Date: 2010-02-25 23:15 |
`x` should be a local variable. Py_ADDRESS_IN_RANGE is used in only 2 functions, and assembler code will be smaller.
|
msg100128 - (view) |
Author: Tim Peters (tim.peters) * |
Date: 2010-02-25 23:35 |
> `x` should be a local variable. ... assembler code will be smaller.
??? It should make no difference to an optimizing compiler. Lifetime analysis (even if it's feeble) should be able to determine that all uses are purely local, with no need to store the value away.
|
msg100157 - (view) |
Author: Matt Bandy (mbandy) |
Date: 2010-02-26 16:14 |
I tried this out in VC++ 2008 with the new temporary as both a global variable and a local. At least for VC++, Amaury is correct -- the compiler is generating the store to the global even though it is never read, so making it a local instead does save an instruction.
|
msg100160 - (view) |
Author: Tim Peters (tim.peters) * |
Date: 2010-02-26 16:26 |
Whatever works best. Unsure what you mean by "global", though, because it's not a C concept. File-scope variables in C can have internal or external linkage; to get internal linkage, so that lifetime analysis is effective without needing (roughly speaking) cross-file analysis, you have to use the "static" declaration modifier. Without "static", linkage is external, and then in the absence of (roughly speaking) cross-file analysis the compiler must assume the value can be read by code in other files.
So, did you use "static", or not?
|
msg100161 - (view) |
Author: Matt Bandy (mbandy) |
Date: 2010-02-26 16:33 |
Sorry, I should have been more precise. I did use static for the variable I declared at file scope. Looking at the disassembly again also reminded me of something I should have mentioned in the original bug report: at least when using Visual C++ 2008, in the release build the compiler optimizes out the repeated reads even with the original code, so the generated assembly with the local variable patch is actually identical. I encountered the bug using the debug build of the interpreter, where that optimization does not occur and the memory is actually read three times.
|
msg100164 - (view) |
Author: Tim Peters (tim.peters) * |
Date: 2010-02-26 16:36 |
Cool! Python has a long history of catering to feeble compilers too ;-), so using function locals is fine. Speed matters more than obscurity here.
|
msg100167 - (view) |
Author: Matt Bandy (mbandy) |
Date: 2010-02-26 17:36 |
Here's my patch using the local variable solution.
|
msg125716 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2011-01-07 21:51 |
I've committed Matt's patch in r87835 (3.2), r87836 (3.1) and r87837 (2.7). Thank you for contributing!
|
|
Date |
User |
Action |
Args |
2022-04-11 14:56:58 | admin | set | github: 52268 |
2011-01-07 21:51:44 | pitrou | set | status: open -> closed versions:
- Python 2.6 nosy:
tim.peters, amaury.forgeotdarc, pitrou, mbandy messages:
+ msg125716
resolution: fixed stage: needs patch -> resolved |
2010-02-26 17:36:57 | mbandy | set | files:
+ py_address_in_range.patch keywords:
+ patch messages:
+ msg100167
|
2010-02-26 16:36:30 | tim.peters | set | messages:
+ msg100164 |
2010-02-26 16:34:00 | mbandy | set | messages:
+ msg100161 |
2010-02-26 16:26:21 | tim.peters | set | messages:
+ msg100160 |
2010-02-26 16:14:00 | mbandy | set | messages:
+ msg100157 |
2010-02-25 23:35:14 | tim.peters | set | messages:
+ msg100128 |
2010-02-25 23:15:23 | amaury.forgeotdarc | set | messages:
+ msg100126 |
2010-02-25 23:12:17 | tim.peters | set | messages:
+ msg100125 |
2010-02-25 23:10:18 | mbandy | set | messages:
+ msg100124 |
2010-02-25 22:59:26 | tim.peters | set | messages:
+ msg100122 |
2010-02-25 22:44:53 | mbandy | set | messages:
+ msg100119 |
2010-02-25 22:36:31 | tim.peters | set | messages:
+ msg100117 |
2010-02-25 22:25:16 | tim.peters | set | messages:
+ msg100115 |
2010-02-25 22:24:47 | mbandy | set | messages:
+ msg100114 |
2010-02-25 22:17:01 | amaury.forgeotdarc | set | nosy:
+ amaury.forgeotdarc messages:
+ msg100113
|
2010-02-25 22:10:18 | tim.peters | set | nosy:
+ tim.peters messages:
+ msg100111
|
2010-02-25 22:06:01 | pitrou | set | messages:
+ msg100110 |
2010-02-25 22:03:11 | mbandy | set | messages:
+ msg100109 |
2010-02-25 22:00:40 | pitrou | set | nosy:
+ pitrou messages:
+ msg100108
|
2010-02-25 21:54:28 | pitrou | set | priority: high stage: needs patch type: crash versions:
+ Python 3.1, Python 2.7, Python 3.2 |
2010-02-25 21:34:20 | mbandy | create | |