This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Crash in Py_ADDRESS_IN_RANGE macro
Type: crash Stage: resolved
Components: Interpreter Core Versions: Python 3.1, Python 3.2, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: amaury.forgeotdarc, mbandy, pitrou, tim.peters
Priority: high Keywords: patch

Created on 2010-02-25 21:34 by mbandy, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
py_address_in_range.patch mbandy, 2010-02-26 17:36 review
Messages (21)
msg100106 - (view) Author: Matt Bandy (mbandy) Date: 2010-02-25 21:34
Using the Py_ADDRESS_IN_RANGE macro can result in a crash under certain threading conditions.  Since it is intentionally possible for (POOL)->arenaindex to reference memory which is not under Python's control, another thread may modify that memory.  In particular, the following sequence of operations is possible and leads to a crash:

1. A thread running Python code enters the Py_ADDRESS_IN_RANGE macro to test a pointer value which was allocated by the system allocator, not the pool-based allocator.
2. That thread intentionally reads a value (POOL)->arenaindex from memory which was not allocated by the Python interpreter.
3. The value (POOL)->arenaindex is tested and is less than maxarenas.
4. An unrelated thread which actually allocated the memory Py_ADDRESS_IN_RANGE is in the middle of looking at modifies the value referenced by (POOL)->arenaindex, changing it to a different value which is larger than maxarenas.
5. The original thread running Python code subscripts arenas with the new value and reads memory past the end of the arenas array, leading to unpredictable behavior or an access violation.

It's possible to fix this problem by changing Py_ADDRESS_IN_RANGE to a function so that it reads (POOL)->arenaindex only once, but the adjacent comment suggests that it's important for performance that it be a macro.

I observed this crash on Windows Vista x64 using an embedded Python 2.6.4 interpreter, although the same code appears to exist in later versions of Python as well.
msg100108 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-02-25 22:00
It's certainly possible to keep it as a macro while avoiding reading (POOL)->arenaindex twice. Just make it so that the macro expands as a block rather than an expression.

Something like:

#define Py_ADDRESS_IN_RANGE(P, POOL, RESULT) do { \
    uint _arenaindex = (POOL)->arenaindex; \
    RESULT = (_arenaindex < maxarenas && \
        (uptr)(P) - arenas[_arenaindex].address < (uptr)ARENA_SIZE) && \
        arenas[_arenaindex].address != 0); \
} while(0)
msg100109 - (view) Author: Matt Bandy (mbandy) Date: 2010-02-25 22:03
You can't add a block without changing the way the macro is used -- it's usually used like:

if (Py_ADDRESS_IN_RANGE(p, pool))

This won't work with your proposed change since you can't put a do {} loop into the test expression of the if statement.
msg100110 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-02-25 22:06
Obviously it also involves changing the code where the macro is invoked. It would be quite non-controversial though.

I've a got a more interesting question: how does the memory allocator end up being invoked from two threads at once? Usually memory allocations should be serialized by the GIL (the Global Interpreter Lock).
msg100111 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2010-02-25 22:10
Note that nothing in obmalloc is _intended_ to be thread-safe.  Almost all Python API functions make the caller responsible for serializing calls (which is usually accomplished "by magic" via the GIL).  This includes calls to all facilities supplied by obmalloc.
msg100113 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2010-02-25 22:17
No, the description above says it: only the first thread calls Py_ADDRESS_IN_RANGE. The other thread happens to modify the memory at the beginning of the page, just at the address (POOL)->arenaindex.

Could we use an "inline" function instead of a macro? I suppose the compiler would optimize it and generate code as fast as a macro.
msg100114 - (view) Author: Matt Bandy (mbandy) Date: 2010-02-25 22:24
Amaury is correct -- in the case I observed, the other thread had nothing to do with Python and was not calling Python functions that would have required obtaining the GIL at any time (again, I'm using an embedded Python interpreter in a larger app, not the standalone interpreter).  The other thread is just unlucky to have allocated non-Python related memory in the same page where a Python object was allocated.
msg100115 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2010-02-25 22:25
Amaury, I have no real idea what your "No, the description above says it: ..." is a response to, but if it's to my point that obmalloc is not intended to be thread-safe, it's not addressing that point.  Any case in which another thread can affect a thread currently running in obmalloc is forbidden:  forbidden, not allowed, begging for undefined behavior, etc etc.

> Could we use an "inline" function instead of a macro?

If every C compiler on every platform Python runs on supports such a thing with exactly the same spelling, sure ;-)
msg100117 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2010-02-25 22:36
Just noting there are ways to trick the macro into reading a thing once.  For example, embed something like a

((x = arenas[(POOL)->arenaindex].address) || 1)

clause early on & refer to x later (where x is some file-scope new variable).

obmalloc is not intended to be thread-safe _period_, but I agree the current behavior in the case described is obnoxious enough that if it's easy to worm around it, fine.
msg100119 - (view) Author: Matt Bandy (mbandy) Date: 2010-02-25 22:44
It's a pretty major limitation on the embedding case if you can't allow other threads that aren't related to Python to run at any time that another thread may be in obmalloc, and one I haven't seen documented anywhere.  The only other fix that occurs to me would be to ensure that any memory Python allocates is not in the same page as memory not allocated by Python, but that seems like it would be much more complex than the current "just call malloc()" solution.  Your proposed change to the macro sounds more reasonable to me.
msg100122 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2010-02-25 22:59
Right, I already agreed it would be fine to fix this if it's cheap ;-)

I didn't give any real thought to the macro solution, but that's fine by me too.  It would be best to embed the assignment in a _natural_ place, though; like, say:

#define Py_ADDRESS_IN_RANGE(P, POOL)			\
	((POOL)->arenaindex < maxarenas &&		\
	 (uptr)(P) - (x = arenas[(POOL)->arenaindex].address) < (uptr)ARENA_SIZE && \
	 x != 0)

where some better name than `x` is picked, and the trick is clearly documented at x's declaration point.  Nothing wrong with expediency here!  This is time-critical code and has always been skating on the edge.
msg100124 - (view) Author: Matt Bandy (mbandy) Date: 2010-02-25 23:10
That should probably be:

#define Py_ADDRESS_IN_RANGE(P, POOL)			\
	((x = (POOL)->arenaindex) < maxarenas &&		\
	 (uptr)(P) - arenas[x].address < (uptr)ARENA_SIZE && \
	 arenas[x].address != 0)

The address in the arena shouldn't change since it does belong to Python, so no one should be monkeying with it without the GIL.  The arenaindex is vulnerable since POOL can point to something not owned by Python.
msg100125 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2010-02-25 23:12
Fine by me!
msg100126 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2010-02-25 23:15
`x` should be a local variable. Py_ADDRESS_IN_RANGE is used in only 2 functions, and assembler code will be smaller.
msg100128 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2010-02-25 23:35
> `x` should be a local variable. ... assembler code will be smaller.

??? It should make no difference to an optimizing compiler. Lifetime analysis (even if it's feeble) should be able to determine that all uses are purely local, with no need to store the value away.
msg100157 - (view) Author: Matt Bandy (mbandy) Date: 2010-02-26 16:14
I tried this out in VC++ 2008 with the new temporary as both a global variable and a local.  At least for VC++, Amaury is correct -- the compiler is generating the store to the global even though it is never read, so making it a local instead does save an instruction.
msg100160 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2010-02-26 16:26
Whatever works best.  Unsure what you mean by "global", though, because it's not a C concept.  File-scope variables in C can have internal or external linkage; to get internal linkage, so that lifetime analysis is effective without needing (roughly speaking) cross-file analysis, you have to use the "static" declaration modifier.  Without "static", linkage is external, and then in the absence of (roughly speaking) cross-file analysis the compiler must assume the value can be read by code in other files.

So, did you use "static", or not?
msg100161 - (view) Author: Matt Bandy (mbandy) Date: 2010-02-26 16:33
Sorry, I should have been more precise.  I did use static for the variable I declared at file scope.  Looking at the disassembly again also reminded me of something I should have mentioned in the original bug report: at least when using Visual C++ 2008, in the release build the compiler optimizes out the repeated reads even with the original code, so the generated assembly with the local variable patch is actually identical.  I encountered the bug using the debug build of the interpreter, where that optimization does not occur and the memory is actually read three times.
msg100164 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2010-02-26 16:36
Cool!  Python has a long history of catering to feeble compilers too ;-), so using function locals is fine.  Speed matters more than obscurity here.
msg100167 - (view) Author: Matt Bandy (mbandy) Date: 2010-02-26 17:36
Here's my patch using the local variable solution.
msg125716 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-01-07 21:51
I've committed Matt's patch in r87835 (3.2), r87836 (3.1) and r87837 (2.7). Thank you for contributing!
History
Date User Action Args
2022-04-11 14:56:58adminsetgithub: 52268
2011-01-07 21:51:44pitrousetstatus: open -> closed
versions: - Python 2.6
nosy: tim.peters, amaury.forgeotdarc, pitrou, mbandy
messages: + msg125716

resolution: fixed
stage: needs patch -> resolved
2010-02-26 17:36:57mbandysetfiles: + py_address_in_range.patch
keywords: + patch
messages: + msg100167
2010-02-26 16:36:30tim.peterssetmessages: + msg100164
2010-02-26 16:34:00mbandysetmessages: + msg100161
2010-02-26 16:26:21tim.peterssetmessages: + msg100160
2010-02-26 16:14:00mbandysetmessages: + msg100157
2010-02-25 23:35:14tim.peterssetmessages: + msg100128
2010-02-25 23:15:23amaury.forgeotdarcsetmessages: + msg100126
2010-02-25 23:12:17tim.peterssetmessages: + msg100125
2010-02-25 23:10:18mbandysetmessages: + msg100124
2010-02-25 22:59:26tim.peterssetmessages: + msg100122
2010-02-25 22:44:53mbandysetmessages: + msg100119
2010-02-25 22:36:31tim.peterssetmessages: + msg100117
2010-02-25 22:25:16tim.peterssetmessages: + msg100115
2010-02-25 22:24:47mbandysetmessages: + msg100114
2010-02-25 22:17:01amaury.forgeotdarcsetnosy: + amaury.forgeotdarc
messages: + msg100113
2010-02-25 22:10:18tim.peterssetnosy: + tim.peters
messages: + msg100111
2010-02-25 22:06:01pitrousetmessages: + msg100110
2010-02-25 22:03:11mbandysetmessages: + msg100109
2010-02-25 22:00:40pitrousetnosy: + pitrou
messages: + msg100108
2010-02-25 21:54:28pitrousetpriority: high
stage: needs patch
type: crash
versions: + Python 3.1, Python 2.7, Python 3.2
2010-02-25 21:34:20mbandycreate