Title: Mismatching use of memory APIs
Type: crash Stage:
Components: Interpreter Core Versions: Python 2.7, Python 2.6, Python 2.5
Status: closed Resolution: accepted
Dependencies: Superseder:
Assigned To: Nosy List: kristjan.jonsson, lemburg, ncoghlan, r.david.murray, tim.peters
Priority: normal Keywords: patch

Created on 2009-09-04 11:54 by kristjan.jonsson, last changed 2009-09-28 15:58 by kristjan.jonsson. This issue is now closed.

File name Uploaded Description Edit
debugmalloc.patch kristjan.jonsson, 2009-09-04 11:54 debugmalloc which catches API violations
parsetok.patch kristjan.jonsson, 2009-09-04 13:23
Messages (11)
msg92251 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2009-09-04 11:54
There are instances in python where memory allocated using one api 
(PyMem_*) is freed using the other (PyObject_*).
The provided patch (suggested for submission once we fix all instances) 
illustrates this.
It is sufficient to fire up python_d and "import traceback" to trigger the 
msg92252 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2009-09-04 12:03
Poking the timbot - this seems like a good idea to me (and it also means
C extension developers would be able to use a debug build of Python to
look for errors in their own use of these APIs).
msg92253 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2009-09-04 13:23
using the debugmalloc patch provided, I found only this one case when 
running the testsuite.
Patch is provided.  It is the simple solution of duplicating the string in 
this case.  A more cumbersome solution would be to allocate the "encoding" 
using PyObject_MALLOC in the first place.  But it doesn't seem the right 
api for arbitrary strings, and it might involve much more work.
msg92298 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2009-09-06 09:48
Yup, it's a good idea.  In fact, storing info in the debug malloc blocks
to identify the API family used was part of "the plan", but got dropped
when time ran out.

serialno should not be abused for this purpose, though.  On a 32-bit
box, a 24-bit real serialno is too small.  Mucking with serialno also
breaks the current straightforward use of data breakpoints (under
systems that support those) to rerun a deterministic program until a
specific value for serialno is reached.

The original intent was to use one of "forbidden" pad bytes for this
purpose, either the last one following the block or the first one
preceding the block.  That wouldn't interfere with anything, and the
code would be substantially simpler (no endless shifting and masking
needed when a byte's worth of data is stored /in/ a byte).

In any case, internal comments must document the possible values for the
"id" and their meanings.  It's just plain cruel to make the code reader
leap all over the code trying to reverse-engineer the intent ;-)
msg92327 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2009-09-06 19:51
Good point, Tim.  I'll rework it so that one of the border bytes is 
used, since it needs to be a "known" value anyway.  That should make 
things less messy.
Although resoning about the breakpoint is probably incorrect since you 
would put the breakpoint on the "serialno" global variable, and not on 
the location in each memory block.
Meanwhile, does anyone have anything to say about my suggested bugfix?  
I'm not familiar enough with the parser to know if there may be a 
performance penaltly in copying the string here, or whether we ought to 
try to widen the use of the PyObject_ api to skirt the issue.
msg92438 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2009-09-09 01:40
Right, I /was/ hallucinating about serialno -- good catch.

Mysterious little integers still suck, though ;-)  If you're going to
store it in a byte, then you can #define semi-meaningful letter codes
instead; e.g.,

#define _PYMALLOC_MEM_ID 'm'

The place where those are defined would be a good place to document what
the heck they mean too.
msg93207 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2009-09-28 13:09
Committed the fix to parsetok.c in revision 75103
msg93208 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2009-09-28 13:13
Committed changed debug memory API in revision 75104
msg93209 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2009-09-28 13:45
merged to py3k in 75105
msg93213 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2009-09-28 15:45
Krisjan, after your commit I'm getting the following when trying to make
python on both trunk and py3k:

Objects/obmalloc.c:1372: error: conflicting types for
msg93214 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2009-09-28 15:58
visual Studio didn't detect the missing 'const' for the void pointer.  
Fixed now in revision 75016 and revision 75107
Date User Action Args
2009-09-28 15:58:46kristjan.jonssonsetstatus: open -> closed

messages: + msg93214
2009-09-28 15:45:25r.david.murraysetstatus: closed -> open
nosy: + r.david.murray
messages: + msg93213

2009-09-28 13:45:28kristjan.jonssonsetstatus: open -> closed
resolution: accepted
messages: + msg93209
2009-09-28 13:13:09kristjan.jonssonsetmessages: + msg93208
2009-09-28 13:09:21kristjan.jonssonsetmessages: + msg93207
2009-09-09 01:40:21tim.peterssetmessages: + msg92438
2009-09-06 19:51:03kristjan.jonssonsetmessages: + msg92327
2009-09-06 09:48:57tim.peterssetmessages: + msg92298
2009-09-04 13:23:39kristjan.jonssonsetfiles: + parsetok.patch

messages: + msg92253
2009-09-04 12:03:39ncoghlansetnosy: + tim.peters
messages: + msg92252
2009-09-04 11:54:27kristjan.jonssoncreate