This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Memory corruption in multiprocessing module, OS X 10.5.4
Type: crash Stage:
Components: Extension Modules Versions: Python 2.6
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: jnoller Nosy List: amaury.forgeotdarc, jnoller, mark.dickinson
Priority: normal Keywords: patch

Created on 2008-07-17 22:22 by mark.dickinson, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
issue3399.patch mark.dickinson, 2008-08-01 15:37 Possible fix
issue3399_2.patch mark.dickinson, 2008-08-01 16:33 Updated patch
add_semicolons.diff jnoller, 2008-08-01 17:58
Messages (19)
msg69917 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2008-07-17 22:22
As of revision 65077 of the trunk, I'm getting errors in 
test_multiprocessing that seem to point to memory corruption in object 
allocation/deallocation.  The failures are intermittent, and of a 
similar nature to the errors I was seeing previously, outlined in issue 
3088.

The platform is OS X 10.5.4 (not a fresh install---it was an upgrade 
from OS X 10.4, in case this makes any difference), running on a MacBook 
Pro.  I'm running a freshly checked out debug build of the trunk.

Here's what I did:

(1) make a fresh svn+ssh checkout of the trunk
(2) ./configure --with-pydebug && make
(3) ./python.exe Lib/test/test_multiprocessing.py
(4) repeat step (3) until something nasty happens.

The results vary from run to run, and 80-90% of the runs of 
test_multiprocessing pass.  Here are 3 of the failures I've seen, 
occurring on three separate runs of test_multiprocessing.

Failure 1:

test_notify_all (__main__.WithManagerTestCondition) ... Assertion 
failed: (pool->ref.count > 0), function PyObject_Free, file 
Objects/obmalloc.c, line 1100.

Failure 2:

test_imap_unordered (__main__.WithManagerTestPool) ...
python.exe(32381,0xb0513000) malloc: *** error for object 0xdbdbdbdb:
pointer being reallocated was not allocated
*** set a breakpoint in malloc_error_break to debug
python.exe(32381,0xb0513000) malloc: *** error for object 0xdbdbdbdb:
Non-aligned pointer being freed
*** set a breakpoint in malloc_error_break to debug
Fatal Python error: UNREF invalid object
ERROR

Failure 3:

test_imap_unordered (__main__.WithManagerTestPool) ... Fatal Python
error: UNREF invalid object
ERROR


I have very little (i.e. no) experience of debugging this kind of 
failure, and little understanding of how the multiprocessing module 
works.  But I can and will follow instructions and suggestions about how 
to debug this.

Stupid question:  it appears from reading the comments in that file that 
obmalloc.c is (intentionally) not thread-safe.  Could this have anything 
to do with the failures above?
msg69921 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2008-07-17 22:46
And one more:

Failure 4:

test_make_pool (__main__.WithManagerTestPool) ... Assertion failed: (bp != 
NULL), function PyObject_Malloc, file Objects/obmalloc.c, line 746.
msg69924 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2008-07-17 23:13
And another:

Failure 5:

test_notify (__main__.WithManagerTestCondition) ... Assertion failed: 
(usable_arenas->freepools == NULL), function PyObject_Malloc, file 
Objects/obmalloc.c, line 809.
ERROR
msg69925 - (view) Author: Jesse Noller (jnoller) * (Python committer) Date: 2008-07-17 23:13
On Jul 17, 2008, at 6:22 PM, Mark Dickinson <report@bugs.python.org>  
wrote:

>
> New submission from Mark Dickinson <dickinsm@gmail.com>:
>
> As of revision 65077 of the trunk, I'm getting errors in
> test_multiprocessing that seem to point to memory corruption in object
> allocation/deallocation.  The failures are intermittent, and of a
> similar nature to the errors I was seeing previously, outlined in  
> issue
> 3088.
>
> The platform is OS X 10.5.4 (not a fresh install---it was an upgrade
> from OS X 10.4, in case this makes any difference), running on a  
> MacBook
> Pro.  I'm running a freshly checked out debug build of the trunk.
>
> Here's what I did:
>
> (1) make a fresh svn+ssh checkout of the trunk
> (2) ./configure --with-pydebug && make
> (3) ./python.exe Lib/test/test_multiprocessing.py
> (4) repeat step (3) until something nasty happens.
>
> The results vary from run to run, and 80-90% of the runs of
> test_multiprocessing pass.  Here are 3 of the failures I've seen,
> occurring on three separate runs of test_multiprocessing.
>

I am/was going to help you with this when you emailed me your last  
email - I'm disturbed none of my machines or the buildbots for that  
matter are seeing this. Can you post the output from:

Echo $LD_LIBRARY_PATH
which gcc
gcc -v
msg69926 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2008-07-17 23:17
LD_LIBRARY_PATH isn't set.  gcc is the system gcc from Apple:

Macintosh-3:trunk dickinsm$ echo $LD_LIBRARY_PATH

Macintosh-3:trunk dickinsm$ which gcc
/usr/bin/gcc
Macintosh-3:trunk dickinsm$ gcc -v
Using built-in specs.
Target: i686-apple-darwin9
Configured with: /var/tmp/gcc/gcc-5484~1/src/configure --disable-
checking -enable-werror --prefix=/usr --mandir=/share/man --enable-
languages=c,objc,c++,obj-c++ --program-transform-name=/^[cg][^.-
]*$/s/$/-4.0/ --with-gxx-include-dir=/include/c++/4.0.0 --with-
slibdir=/usr/lib --build=i686-apple-darwin9 --with-arch=apple --with-
tune=generic --host=i686-apple-darwin9 --target=i686-apple-darwin9
Thread model: posix
gcc version 4.0.1 (Apple Inc. build 5484)
msg69930 - (view) Author: Jesse Noller (jnoller) * (Python committer) Date: 2008-07-18 00:13
Can you try removing the --with-pydebug flag from configure and  
running that way?
msg69941 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2008-07-18 06:37
Okay:  I just tried the following:

(1) clean svn checkout
(2) ./configure && make
(3) 100 runs of test_multiprocessing, via the shell command:
for ((i=0;i<100;i+=1)); do ./python.exe 
Lib/test/test_multiprocessing.py; sleep 1; done

I got 4 failed runs out of those 100 runs (details below);  2 hangs in 
test_notify_all, a KeyError in test_remote, and a failure of 
test_number_of_objects.

Failed run 1
------------
test_notify_all (__main__.WithManagerTestCondition) ... Process Process-
48:
Traceback (most recent call last):
  File 
"/Users/dickinsm/python_source/trunk/Lib/multiprocessing/process.py", 
line 232, in _bootstrap
    self.run()
  File 
"/Users/dickinsm/python_source/trunk/Lib/multiprocessing/process.py", 
line 88, in run
    self._target(*self._args, **self._kwargs)
  File "Lib/test/test_multiprocessing.py", line 600, in f
    cond.acquire()
  File 
"/Users/dickinsm/python_source/trunk/Lib/multiprocessing/managers.py", 
line 946, in acquire
    return self._callmethod('acquire', (blocking,))
  File 
"/Users/dickinsm/python_source/trunk/Lib/multiprocessing/managers.py", 
line 718, in _callmethod
    self._connect()
  File 
"/Users/dickinsm/python_source/trunk/Lib/multiprocessing/managers.py", 
line 705, in _connect
    conn = self._Client(self._token.address, authkey=self._authkey)
  File 
"/Users/dickinsm/python_source/trunk/Lib/multiprocessing/connection.py", 
line 133, in Client
    c = SocketClient(address)
  File 
"/Users/dickinsm/python_source/trunk/Lib/multiprocessing/connection.py", 
line 254, in SocketClient
    s.connect(address)
  File "<string>", line 1, in connect
error: [Errno 61] Connection refused
^CProcess PoolWorker-5:4:
Traceback (most recent call last):
  File 
"/Users/dickinsm/python_source/trunk/Lib/multiprocessing/process.py", 
line 232, in _bootstrap
Process PoolWorker-5:3:
Traceback (most recent call last):
    self.run()
  File 
"/Users/dickinsm/python_source/trunk/Lib/multiprocessing/process.py", 
line 88, in run
    self._target(*self._args, **self._kwargs)
  File 
"/Users/dickinsm/python_source/trunk/Lib/multiprocessing/pool.py", line 
57, in worker
  File 
"/Users/dickinsm/python_source/trunk/Lib/multiprocessing/process.py", 
line 232, in _bootstrap
    task = get()
  File 
"/Users/dickinsm/python_source/trunk/Lib/multiprocessing/queues.py", 
line 337, in get
    self.run()
  File 
"/Users/dickinsm/python_source/trunk/Lib/multiprocessing/process.py", 
line 88, in run
    self._target(*self._args, **self._kwargs)
  File 
"/Users/dickinsm/python_source/trunk/Lib/multiprocessing/pool.py", line 
57, in worker
    racquire()
KeyboardInterrupt
    task = get()
  File 
"/Users/dickinsm/python_source/trunk/Lib/multiprocessing/queues.py", 
line 337, in get
    racquire()
KeyboardInterrupt
Process PoolWorker-5:1:
Traceback (most recent call last):
  File 
"/Users/dickinsm/python_source/trunk/Lib/multiprocessing/process.py", 
line 232, in _bootstrap
Process Process-50:
Process Process-49:
Traceback (most recent call last):
    self.run()
  File 
"/Users/dickinsm/python_source/trunk/Lib/multiprocessing/process.py", 
line 88, in run
    self._target(*self._args, **self._kwargs)
  File 
"/Users/dickinsm/python_source/trunk/Lib/multiprocessing/process.py", 
line 232, in _bootstrap
  File 
"/Users/dickinsm/python_source/trunk/Lib/multiprocessing/pool.py", line 
57, in worker
    task = get()
  File 
"/Users/dickinsm/python_source/trunk/Lib/multiprocessing/queues.py", 
line 339, in get
    self.run()
  File 
"/Users/dickinsm/python_source/trunk/Lib/multiprocessing/process.py", 
line 88, in run
    self._target(*self._args, **self._kwargs)
  File "Lib/test/test_multiprocessing.py", line 602, in f
    return recv()
KeyboardInterrupt
    cond.wait(timeout)
  File 
"/Users/dickinsm/python_source/trunk/Lib/multiprocessing/managers.py", 
line 959, in wait
    return self._callmethod('wait', (timeout,))
Traceback (most recent call last):
  File "Lib/test/test_multiprocessing.py", line 1786, in <module>
  File 
"/Users/dickinsm/python_source/trunk/Lib/multiprocessing/managers.py", 
line 722, in _callmethod
    kind, result = conn.recv()
KeyboardInterrupt
    main()
  File "Lib/test/test_multiprocessing.py", line 1783, in main
    test_main(unittest.TextTestRunner(verbosity=2).run)
  File "Lib/test/test_multiprocessing.py", line 1773, in test_main
    run(suite)
  File "/Users/dickinsm/python_source/trunk/Lib/unittest.py", line 750, 
in run
Process PoolWorker-5:2:
    test(result)
  File "/Users/dickinsm/python_source/trunk/Lib/unittest.py", line 461, 
in __call__
    return self.run(*args, **kwds)
  File "/Users/dickinsm/python_source/trunk/Lib/unittest.py", line 457, 
in run
    test(result)
  File "/Users/dickinsm/python_source/trunk/Lib/unittest.py", line 461, 
in __call__
    return self.run(*args, **kwds)
  File "/Users/dickinsm/python_source/trunk/Lib/unittest.py", line 457, 
in run
    test(result)
  File "/Users/dickinsm/python_source/trunk/Lib/unittest.py", line 300, 
in __call__
    return self.run(*args, **kwds)
  File "/Users/dickinsm/python_source/trunk/Lib/unittest.py", line 279, 
in run
    testMethod()
  File "Lib/test/test_multiprocessing.py", line 701, in test_notify_all
    sleeping.acquire()
  File 
"/Users/dickinsm/python_source/trunk/Lib/multiprocessing/managers.py", 
line 946, in acquire
    return self._callmethod('acquire', (blocking,))
  File 
"/Users/dickinsm/python_source/trunk/Lib/multiprocessing/managers.py", 
line 722, in _callmethod
    kind, result = conn.recv()
KeyboardInterrupt
Traceback (most recent call last):
  File 
"/Users/dickinsm/python_source/trunk/Lib/multiprocessing/process.py", 
line 232, in _bootstrap
    self.run()
  File 
"/Users/dickinsm/python_source/trunk/Lib/multiprocessing/process.py", 
line 88, in run
    self._target(*self._args, **self._kwargs)
  File "Lib/test/test_multiprocessing.py", line 602, in f
    cond.wait(timeout)
  File 
"/Users/dickinsm/python_source/trunk/Lib/multiprocessing/managers.py", 
line 959, in wait
    return self._callmethod('wait', (timeout,))
  File 
"/Users/dickinsm/python_source/trunk/Lib/multiprocessing/managers.py", 
line 722, in _callmethod
    kind, result = conn.recv()
KeyboardInterrupt
Traceback (most recent call last):


Failed run 2
------------
test_task_done (__main__.WithManagerTestQueue) ... ok
test_remote (__main__.WithManagerTestRemoteManager) ... ERROR
test_bounded_semaphore (__main__.WithManagerTestSemaphore) ... ok
test_semaphore (__main__.WithManagerTestSemaphore) ... ok
test_timeout (__main__.WithManagerTestSemaphore) ... ok
test_getobj_getlock (__main__.WithManagerTestValue) ... ok
test_rawvalue (__main__.WithManagerTestValue) ... ok
test_value (__main__.WithManagerTestValue) ... ok
test_number_of_objects (__main__.WithManagerTestZZZNumberOfObjects) ... 
ok

======================================================================
ERROR: test_remote (__main__.WithManagerTestRemoteManager)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "Lib/test/test_multiprocessing.py", line 1157, in test_remote
    queue = manager2.get_queue()
  File 
"/Users/dickinsm/python_source/trunk/Lib/multiprocessing/managers.py", 
line 635, in temp
    authkey=self._authkey, exposed=exp
  File 
"/Users/dickinsm/python_source/trunk/Lib/multiprocessing/managers.py", 
line 887, in AutoProxy
    incref=incref)
  File 
"/Users/dickinsm/python_source/trunk/Lib/multiprocessing/managers.py", 
line 696, in __init__
    self._incref()
  File 
"/Users/dickinsm/python_source/trunk/Lib/multiprocessing/managers.py", 
line 743, in _incref
    dispatch(conn, None, 'incref', (self._id,))
  File 
"/Users/dickinsm/python_source/trunk/Lib/multiprocessing/managers.py", 
line 79, in dispatch
    raise convert_to_error(kind, result)
RemoteError: 
------------------------------------------------------------------------
---
Traceback (most recent call last):
  File 
"/Users/dickinsm/python_source/trunk/Lib/multiprocessing/managers.py", 
line 181, in handle_request
    result = func(c, *args, **kwds)
  File 
"/Users/dickinsm/python_source/trunk/Lib/multiprocessing/managers.py", 
line 397, in incref
    self.id_to_refcount[ident] += 1
KeyError: '5bf968'
------------------------------------------------------------------------
---

----------------------------------------------------------------------
Ran 121 tests in 9.230s

FAILED (errors=1)

Failed run 3
------------
test_number_of_objects (__main__.WithManagerTestZZZNumberOfObjects) ...   
680490:       refcount=1
    <threading._Semaphore object at 0x680490>
  680bd0:       refcount=1
    <multiprocessing.pool.Pool object at 0x680bd0>
FAIL

======================================================================
FAIL: test_number_of_objects 
(__main__.WithManagerTestZZZNumberOfObjects)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "Lib/test/test_multiprocessing.py", line 1042, in 
test_number_of_objects
    self.assertEqual(refs, EXPECTED_NUMBER)
AssertionError: 2 != 1

----------------------------------------------------------------------
Ran 121 tests in 9.228s

FAILED (failures=1)

Failed run 4
------------
test_notify_all (__main__.WithManagerTestCondition) ... Process Process-
50:
Traceback (most recent call last):
  File 
"/Users/dickinsm/python_source/trunk/Lib/multiprocessing/process.py", 
line 232, in _bootstrap
    self.run()
  File 
"/Users/dickinsm/python_source/trunk/Lib/multiprocessing/process.py", 
line 88, in run
    self._target(*self._args, **self._kwargs)
  File "Lib/test/test_multiprocessing.py", line 600, in f
    cond.acquire()
  File 
"/Users/dickinsm/python_source/trunk/Lib/multiprocessing/managers.py", 
line 946, in acquire
    return self._callmethod('acquire', (blocking,))
  File 
"/Users/dickinsm/python_source/trunk/Lib/multiprocessing/managers.py", 
line 718, in _callmethod
    self._connect()
  File 
"/Users/dickinsm/python_source/trunk/Lib/multiprocessing/managers.py", 
line 705, in _connect
    conn = self._Client(self._token.address, authkey=self._authkey)
  File 
"/Users/dickinsm/python_source/trunk/Lib/multiprocessing/connection.py", 
line 133, in Client
    c = SocketClient(address)
  File 
"/Users/dickinsm/python_source/trunk/Lib/multiprocessing/connection.py", 
line 254, in SocketClient
    s.connect(address)
  File "<string>", line 1, in connect
error: [Errno 61] Connection refused
^CProcess PoolWorker-5:4:
Traceback (most recent call last):
  File 
"/Users/dickinsm/python_source/trunk/Lib/multiprocessing/process.py", 
line 232, in _bootstrap
Process PoolWorker-5:3:
Traceback (most recent call last):
    self.run()
  File 
"/Users/dickinsm/python_source/trunk/Lib/multiprocessing/process.py", 
line 88, in run
    self._target(*self._args, **self._kwargs)
  File 
"/Users/dickinsm/python_source/trunk/Lib/multiprocessing/pool.py", line 
57, in worker
  File 
"/Users/dickinsm/python_source/trunk/Lib/multiprocessing/process.py", 
line 232, in _bootstrap
    task = get()
  File 
"/Users/dickinsm/python_source/trunk/Lib/multiprocessing/queues.py", 
line 337, in get
    racquire()
    self.run()
KeyboardInterrupt
  File 
"/Users/dickinsm/python_source/trunk/Lib/multiprocessing/process.py", 
line 88, in run
    self._target(*self._args, **self._kwargs)
  File 
"/Users/dickinsm/python_source/trunk/Lib/multiprocessing/pool.py", line 
57, in worker
    task = get()
  File 
"/Users/dickinsm/python_source/trunk/Lib/multiprocessing/queues.py", 
line 337, in get
    racquire()
KeyboardInterrupt
Process PoolWorker-5:1:
Traceback (most recent call last):
  File 
"/Users/dickinsm/python_source/trunk/Lib/multiprocessing/process.py", 
line 232, in _bootstrap
Process Process-48:
    self.run()
  File 
"/Users/dickinsm/python_source/trunk/Lib/multiprocessing/process.py", 
line 88, in run
    self._target(*self._args, **self._kwargs)
Traceback (most recent call last):
  File 
"/Users/dickinsm/python_source/trunk/Lib/multiprocessing/pool.py", line 
57, in worker
    task = get()
  File 
"/Users/dickinsm/python_source/trunk/Lib/multiprocessing/queues.py", 
line 339, in get
  File 
"/Users/dickinsm/python_source/trunk/Lib/multiprocessing/process.py", 
line 232, in _bootstrap
    return recv()
KeyboardInterrupt
    self.run()
  File 
"/Users/dickinsm/python_source/trunk/Lib/multiprocessing/process.py", 
line 88, in run
    self._target(*self._args, **self._kwargs)
  File "Lib/test/test_multiprocessing.py", line 602, in f
    cond.wait(timeout)
  File 
"/Users/dickinsm/python_source/trunk/Lib/multiprocessing/managers.py", 
line 959, in wait
Traceback (most recent call last):
  File "Lib/test/test_multiprocessing.py", line 1786, in <module>
    return self._callmethod('wait', (timeout,))
  File 
"/Users/dickinsm/python_source/trunk/Lib/multiprocessing/managers.py", 
line 722, in _callmethod
    kind, result = conn.recv()
KeyboardInterrupt
    main()
  File "Lib/test/test_multiprocessing.py", line 1783, in main
    test_main(unittest.TextTestRunner(verbosity=2).run)
  File "Lib/test/test_multiprocessing.py", line 1773, in test_main
Process PoolWorker-5:2:
    run(suite)
  File "/Users/dickinsm/python_source/trunk/Lib/unittest.py", line 750, 
in run
Traceback (most recent call last):
    test(result)
  File "/Users/dickinsm/python_source/trunk/Lib/unittest.py", line 461, 
in __call__
  File 
"/Users/dickinsm/python_source/trunk/Lib/multiprocessing/process.py", 
line 232, in _bootstrap
    return self.run(*args, **kwds)
  File "/Users/dickinsm/python_source/trunk/Lib/unittest.py", line 457, 
in run
    test(result)
  File "/Users/dickinsm/python_source/trunk/Lib/unittest.py", line 461, 
in __call__
    self.run()
    return self.run(*args, **kwds)
  File "/Users/dickinsm/python_source/trunk/Lib/unittest.py", line 457, 
in run
  File 
"/Users/dickinsm/python_source/trunk/Lib/multiprocessing/process.py", 
line 88, in run
    self._target(*self._args, **self._kwargs)
  File 
"/Users/dickinsm/python_source/trunk/Lib/multiprocessing/pool.py", line 
57, in worker
    test(result)
  File "/Users/dickinsm/python_source/trunk/Lib/unittest.py", line 300, 
in __call__
    return self.run(*args, **kwds)
  File "/Users/dickinsm/python_source/trunk/Lib/unittest.py", line 279, 
in run
    task = get()
    testMethod()
  File "Lib/test/test_multiprocessing.py", line 701, in test_notify_all
  File 
"/Users/dickinsm/python_source/trunk/Lib/multiprocessing/queues.py", 
line 337, in get
    sleeping.acquire()
  File 
"/Users/dickinsm/python_source/trunk/Lib/multiprocessing/managers.py", 
line 946, in acquire
    racquire()
KeyboardInterrupt
    return self._callmethod('acquire', (blocking,))
  File 
"/Users/dickinsm/python_source/trunk/Lib/multiprocessing/managers.py", 
line 722, in _callmethod
    kind, result = conn.recv()
KeyboardInterrupt
Process Process-49:
Traceback (most recent call last):
  File 
"/Users/dickinsm/python_source/trunk/Lib/multiprocessing/process.py", 
line 232, in _bootstrap
    self.run()
  File 
"/Users/dickinsm/python_source/trunk/Lib/multiprocessing/process.py", 
line 88, in run
    self._target(*self._args, **self._kwargs)
  File "Lib/test/test_multiprocessing.py", line 602, in f
    cond.wait(timeout)
  File 
"/Users/dickinsm/python_source/trunk/Lib/multiprocessing/managers.py", 
line 959, in wait
    return self._callmethod('wait', (timeout,))
  File 
"/Users/dickinsm/python_source/trunk/Lib/multiprocessing/managers.py", 
line 722, in _callmethod
    kind, result = conn.recv()
KeyboardInterrupt
msg69942 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2008-07-18 06:40
I should add to the previous message that this was revision 65090, and
that it was a non-debug build.
msg69956 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2008-07-18 15:08
It looks like this isn't just me.  See the buildbot output at:

http://www.python.org/dev/buildbot/all/x86%20osx.5%20trunk/builds/33/ste
p-test/0

which shows:

test_multiprocessing
Assertion failed: (bp != NULL), function PyObject_Malloc, file 
Objects/obmalloc.c, line 746.
test test_multiprocessing failed -- errors occurred; run in verbose mode 
for details
msg70023 - (view) Author: Jesse Noller (jnoller) * (Python committer) Date: 2008-07-19 13:24
Ok, so for the moment, let's set aside the connection refused messages: 
that may be a case of not cleaning up a socket correctly (which is still 
bad, but not memory corruption).

Of note from the buildbot failure:
Assertion failed: (bp != NULL), function PyObject_Malloc, file 
Objects/obmalloc.c, line 746.
test test_multiprocessing failed -- errors occurred; run in verbose mode 
for details

I don't know enough about obmalloc.c to state if this is a problem with 
it not being multithreaded

Here's another failure (from my own buildbot to boot):

test_multiprocessing
/Users/buildbot/buildarea/trunk.noller-
osx86/build/Lib/multiprocessing/__init__.py:82: ImportWarning: Not 
importing directory '/Users/buildbot/buildarea/trunk.noller-
osx86/build/Modules/_multiprocessing': missing __init__.py
  import _multiprocessing
Fatal Python error: Objects/tupleobject.c:169 object at 0x539d538 has 
negative ref count -606348326
make: *** [buildbottest] Abort trap
program finished with exit code 2
msg70551 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2008-08-01 13:50
I finally found some more time to look at this.  I cut down the test-suite 
to try to find a minimal failing example.  I can fairly reliably make a 
debug build of the trunk crash using the following nine lines

import multiprocessing.managers
def sqr(x): return x*x
manager = multiprocessing.managers.SyncManager()
manager.start()
pool = manager.Pool(4)
it = pool.imap_unordered(sqr, range(10000))
assert sorted(it) == map(sqr, range(10000))
pool.terminate()
manager.shutdown()

Typical output is:

Fatal Python error: UNREF invalid object
(followed by traceback)

or:

Assertion failed: (bp != NULL), function PyObject_Malloc, file 
Objects/obmalloc.c, line 755.

or:

Debug memory block at address p=0x247778:
    26 bytes originally requested
    The 4 pad bytes at p-4 are not all FORBIDDENBYTE (0xfb):
        at p-4: 0xdb *** OUCH
        at p-3: 0xdb *** OUCH
        at p-2: 0xdb *** OUCH
        at p-1: 0xdb *** OUCH
    Because memory is corrupted at the start, the count of bytes requested
       may be bogus, and checking the trailing pad bytes may segfault.
    The 4 pad bytes at tail=0x247792 are not all FORBIDDENBYTE (0xfb):
        at tail+0: 0x35 *** OUCH
        at tail+1: 0x00 *** OUCH
        at tail+2: 0xfb
        at tail+3: 0xfb
    The block was made by call #4227530756 to debug malloc/realloc.
    Data at p: 00 00 00 00 00 00 00 00 ... 00 00 08 00 00 00 b0 72
Fatal Python error: bad leading pad byte
msg70554 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2008-08-01 14:20
> Assertion failed: (bp != NULL), function PyObject_Malloc, file 
> Objects/obmalloc.c, line 755.

This one gives one probable cause of the problem:

- in Modules/_multiprocessing/connection.h, connection_send_obj()
releases the GIL around a call to conn_send_string().
- in Modules/_multiprocessing/socket_connection.c, conn_send_string()
uses PyMem_Malloc()

This is wrong (the GIL must be held when using the PyMem_* and
PyObject_* functions), and is probably the cause of the failed assertion.
msg70556 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2008-08-01 14:31
> This is wrong (the GIL must be held when using the PyMem_* and
> PyObject_* functions), and is probably the cause of the failed 
assertion.

This sounds quite likely.

I just managed (using the low-tech method of setting a static variable 
on entry and clearing it on exit) to confirm that PyObject_Malloc in 
obmalloc.c is being accessed simultaneously by multiple threads when
test_multiprocessing is run.
msg70560 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2008-08-01 15:37
Here's a patch that fixes the problem for me.  It releases the GIL around
the calls to _conn_sendall within conn_send_string, instead of releasing 
the GIL for the whole call to conn_send_string.
msg70564 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2008-08-01 15:57
To be complete, the patch should also deal with conn_recv_string() which
has the same problem.
And please do not forget the win32 implementation, in pipe_connection.c.
msg70569 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2008-08-01 16:33
Thanks, Amaury!  How's this?

I have no access to a Windows machine, so this patch is untested on 
Windows.
msg70577 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2008-08-01 17:54
Mark,
There are 3 semicolons missing in your patch, in pipe_connection.c, just
after the calls to WriteFile and ReadFile.
After this, it compiles and runs correctly. Tests pass.

Note that on Windows, your "nine lines" cannot work as is, because the
processes are spawned, not forked: the sqr() function is not copied.
And if you save the lines in a script file, it will be imported by every
subprocess, and every subprocess will start its own manager... and
memory explodes.

import multiprocessing.managers
def sqr(x): return x*x
if __name__ == '__main__':
    manager = multiprocessing.managers.SyncManager()
    manager.start()
    pool = manager.Pool(4)
    it = pool.imap_unordered(sqr, range(1000))
    assert sorted(it) == [sqr(x) for x in range(1000)]
    pool.terminate()
    manager.shutdown()
msg70580 - (view) Author: Jesse Noller (jnoller) * (Python committer) Date: 2008-08-01 17:58
I added the semicolons Amaury, and have it teed up in my local repo for 
submit. Can you review this diff just to confirm?
msg70589 - (view) Author: Jesse Noller (jnoller) * (Python committer) Date: 2008-08-01 19:50
I've committed this as-is based off my last patch. I will watch the 
buildbots for failures. 

Mark/Amaury - if I see you guys at pycon, I owe you a drink.
History
Date User Action Args
2022-04-11 14:56:36adminsetgithub: 47649
2008-08-01 19:50:50jnollersetstatus: open -> closed
resolution: fixed
messages: + msg70589
2008-08-01 17:58:52jnollersetfiles: + add_semicolons.diff
messages: + msg70580
2008-08-01 17:54:11amaury.forgeotdarcsetmessages: + msg70577
2008-08-01 16:33:08mark.dickinsonsetfiles: + issue3399_2.patch
messages: + msg70569
2008-08-01 15:57:02amaury.forgeotdarcsetmessages: + msg70564
2008-08-01 15:37:24mark.dickinsonsetfiles: + issue3399.patch
keywords: + patch
messages: + msg70560
2008-08-01 14:31:25mark.dickinsonsetmessages: + msg70556
2008-08-01 14:20:53amaury.forgeotdarcsetnosy: + amaury.forgeotdarc
messages: + msg70554
2008-08-01 13:51:00mark.dickinsonsetmessages: + msg70551
2008-07-19 13:25:00jnollersetmessages: + msg70023
2008-07-18 15:08:51mark.dickinsonsetmessages: + msg69956
2008-07-18 06:40:18mark.dickinsonsetmessages: + msg69942
2008-07-18 06:37:47mark.dickinsonsetmessages: + msg69941
2008-07-18 00:13:26jnollersetmessages: + msg69930
2008-07-17 23:17:39mark.dickinsonsetmessages: + msg69926
2008-07-17 23:13:54jnollersetmessages: + msg69925
2008-07-17 23:13:37mark.dickinsonsetmessages: + msg69924
2008-07-17 22:46:55mark.dickinsonsetmessages: + msg69921
2008-07-17 22:22:25mark.dickinsoncreate