This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Crash with mmap and sparse files on Mac OS X
Type: crash Stage: resolved
Components: Versions: Python 3.2, Python 3.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: ixokai, nadeem.vawda, ned.deily, neologix, pitrou, python-dev, ronaldoussoren, sdaoden, skrah, vstinner
Priority: high Keywords: patch

Created on 2011-02-21 22:13 by pitrou, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
11277.5.diff sdaoden, 2011-04-27 14:17 review
11277-test_mmap.1.py sdaoden, 2011-05-06 15:33 review
11277-test_mmap-27.1.py sdaoden, 2011-05-06 15:33
11277.apple-fix-3.diff sdaoden, 2011-07-06 12:19 review
Messages (108)
msg129002 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-02-21 22:13
Following r88460 (issue10276), test_zlib crashes on the Snow Leopard buildbot (apparently in the new "test_big_buffer" test case).
msg129003 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-02-21 22:21
Do adler32() and crc32() support length up to UINT32_MAX? Or should we maybe limit the length to INT32_MAX?
msg129004 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-02-21 22:22
I've tried INT_MAX and it didn't change anything.
msg129006 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2011-02-21 22:34
Current OS X zlib is 1.2.3.  Test crashes with most recently released zlib, 1.2.5, as well.
msg129011 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2011-02-21 23:43
Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: 10 at address: 0x000000010170e000
0x00000001016eeaa0 in crc32 ()

(gdb) backtrace
#0  0x00000001016eeaa0 in crc32 ()
#1  0x00000001016e806d in PyZlib_crc32 (self=0x1016aa588, args=0x1016bf220) at /private/tmp/a/py3k/Modules/zlibmodule.c:993

PyZlib_crc32(PyObject *self, PyObject *args)
...
        while (len > (size_t) UINT_MAX) {
            crc32val = crc32(crc32val, buf, UINT_MAX);
...
msg129023 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2011-02-22 01:30
So on my system, that 'while' loop is executed once (put a printf() after the bug and len adjustments and it was never hit).
msg129029 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2011-02-22 03:15
>>> from test.support import _4G
>>> _4G
4294967296
>>> mapping.size()
4294967300

pbuf.len = 4294967300, len = 4294967300
UINT_MAX = 4294967295
msg129034 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2011-02-22 04:06
Does it matter that _4G < UINT_MAX?
msg129050 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-02-22 11:04
> Does it matter that _4G < UINT_MAX?

You mean _4G > UINT_MAX, right?
Yes, it matters, otherwise that defeats the point of the test :)
msg129052 - (view) Author: Steffen Daode Nurpmeso (sdaoden) Date: 2011-02-22 11:29
'Have no glue, but Ned Daily's patch (msg129011) seems to be required for adler, too.  (You know...)
msg129053 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-02-22 11:30
Well, it's not a patch, just a traceback :)
msg129054 - (view) Author: Steffen Daode Nurpmeso (sdaoden) Date: 2011-02-22 11:42
Wait a few minutes, i'll write this simple patch for adler and crc.  But excessive testing and such is beyond my current capabilities.
msg129056 - (view) Author: Steffen Daode Nurpmeso (sdaoden) Date: 2011-02-22 12:15
File: issue11277.patch.
Hmm.  Two non-register constants and equal code on 32 and 64 bit.  Does Python has a '64 bit' switch or the like - PY_SSIZE_T_MAX is not preprocessor-clean, i would guess.
msg129057 - (view) Author: Steffen Daode Nurpmeso (sdaoden) Date: 2011-02-22 12:17
Sorry - that was a mess.
msg129058 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-02-22 12:18
> File: issue11277.patch.
> Hmm.  Two non-register constants and equal code on 32 and 64 bit.
> Does Python has a '64 bit' switch or the like - PY_SSIZE_T_MAX is not
> preprocessor-clean, i would guess.

Er, how is this patch different from r88460?
msg129061 - (view) Author: Steffen Daode Nurpmeso (sdaoden) Date: 2011-02-22 12:24
I guess not at all.  Well.
msg129063 - (view) Author: Steffen Daode Nurpmeso (sdaoden) Date: 2011-02-22 12:40
test_zlib.py (with my patch but that's somewhat identical in the end, say) does

.............................s.......
----------------------------------------------------------------------
Ran 37 tests in 1.809s

OK (skipped=1)

This is on Snow Leopard 64 bit, 02b70cb59701 (r88451) -> Python 3.3a0.
Is there a switch i must trigger?  Just pulled 24 changesets, recompiling and trying again with r88460.
msg129066 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-02-22 12:50
> This is on Snow Leopard 64 bit, 02b70cb59701 (r88451) -> Python 3.3a0.
> Is there a switch i must trigger?  Just pulled 24 changesets,
> recompiling and trying again with r88460.

Have you tried "./python -m test -v -uall test_zlib" ?
msg129067 - (view) Author: Steffen Daode Nurpmeso (sdaoden) Date: 2011-02-22 12:56
No, i've got no idea of this framework... Just did 'python3 test_zlib.py' directly.  Thanks for the switch.  But i can't test your thing due to issue11285, so this may take a while (others have more knowledge anyway)..

(P.S.: your constant-folding stack patch is a great thing, just wanted to say this once..)
msg129069 - (view) Author: Steffen Daode Nurpmeso (sdaoden) Date: 2011-02-22 13:06
So here is this (with my patch, but this is for real: issue11277.2.patch):

== CPython 3.3a0 (py3k, Feb 22 2011, 14:00:52) [GCC 4.2.1 (Apple Inc. build 5664)]
==   Darwin-10.6.0-i386-64bit little-endian
==   /private/var/folders/Da/DaZX3-k5G8a57zw6MSmjJ++++TM/-Tmp-/test_python_89365
Testing with flags: sys.flags(debug=0, division_warning=0, inspect=0, interactive=0, optimize=0, dont_write_bytecode=0, no_user_site=0, no_site=0, ignore_environment=0, verbose=0, bytes_warning=0, quiet=0)
[1/1] test_zlib
test_adler32empty (test.test_zlib.ChecksumTestCase) ... ok
test_adler32start (test.test_zlib.ChecksumTestCase) ... ok
test_crc32_adler32_unsigned (test.test_zlib.ChecksumTestCase) ... ok
test_crc32empty (test.test_zlib.ChecksumTestCase) ... ok
test_crc32start (test.test_zlib.ChecksumTestCase) ... ok
test_penguins (test.test_zlib.ChecksumTestCase) ... ok
test_same_as_binascii_crc32 (test.test_zlib.ChecksumTestCase) ... ok
test_badargs (test.test_zlib.ExceptionTestCase) ... ok
test_badcompressobj (test.test_zlib.ExceptionTestCase) ... ok
test_baddecompressobj (test.test_zlib.ExceptionTestCase) ... ok
test_badlevel (test.test_zlib.ExceptionTestCase) ... ok
test_decompressobj_badflush (test.test_zlib.ExceptionTestCase) ... ok
test_big_compress_buffer (test.test_zlib.CompressTestCase) ... ok
test_big_decompress_buffer (test.test_zlib.CompressTestCase) ... ok
test_incomplete_stream (test.test_zlib.CompressTestCase) ... ok
test_length_overflow (test.test_zlib.CompressTestCase) ... skipped 'not enough free memory, need at least 4 GB'
test_speech (test.test_zlib.CompressTestCase) ... ok
test_speech128 (test.test_zlib.CompressTestCase) ... ok
test_badcompresscopy (test.test_zlib.CompressObjectTestCase) ... ok
test_baddecompresscopy (test.test_zlib.CompressObjectTestCase) ... ok
test_big_compress_buffer (test.test_zlib.CompressObjectTestCase) ... ok
test_big_decompress_buffer (test.test_zlib.CompressObjectTestCase) ... ok
test_compresscopy (test.test_zlib.CompressObjectTestCase) ... ok
test_compressincremental (test.test_zlib.CompressObjectTestCase) ... ok
test_compressoptions (test.test_zlib.CompressObjectTestCase) ... ok
test_decompimax (test.test_zlib.CompressObjectTestCase) ... ok
test_decompinc (test.test_zlib.CompressObjectTestCase) ... ok
test_decompincflush (test.test_zlib.CompressObjectTestCase) ... ok
test_decompress_incomplete_stream (test.test_zlib.CompressObjectTestCase) ... ok
test_decompresscopy (test.test_zlib.CompressObjectTestCase) ... ok
test_decompressmaxlen (test.test_zlib.CompressObjectTestCase) ... ok
test_decompressmaxlenflush (test.test_zlib.CompressObjectTestCase) ... ok
test_empty_flush (test.test_zlib.CompressObjectTestCase) ... ok
test_flushes (test.test_zlib.CompressObjectTestCase) ... ok
test_maxlenmisc (test.test_zlib.CompressObjectTestCase) ... ok
test_odd_flush (test.test_zlib.CompressObjectTestCase) ... ok
test_pair (test.test_zlib.CompressObjectTestCase) ... ok

----------------------------------------------------------------------
Ran 37 tests in 1.789s

OK (skipped=1)
1 test OK.
msg129071 - (view) Author: Steffen Daode Nurpmeso (sdaoden) Date: 2011-02-22 13:08
(Is not that much help for a >4GB error, huh?)
msg129072 - (view) Author: Steffen Daode Nurpmeso (sdaoden) Date: 2011-02-22 13:22
Just stepping ... with c8d1f99f25eb/r88476:

== CPython 3.3a0 (py3k, Feb 22 2011, 14:18:19) [GCC 4.2.1 (Apple Inc. build 5664)]
==   Darwin-10.6.0-i386-64bit little-endian
==   /private/var/folders/Da/DaZX3-k5G8a57zw6MSmjJ++++TM/-Tmp-/test_python_5126
Testing with flags: sys.flags(debug=0, division_warning=0, inspect=0, interactive=0, optimize=0, dont_write_bytecode=0, no_user_site=0, no_site=0, ignore_environment=0, verbose=0, bytes_warning=0, quiet=0)
[1/1] test_zlib
test_adler32empty (test.test_zlib.ChecksumTestCase) ... ok
test_adler32start (test.test_zlib.ChecksumTestCase) ... ok
test_crc32_adler32_unsigned (test.test_zlib.ChecksumTestCase) ... ok
test_crc32empty (test.test_zlib.ChecksumTestCase) ... ok
test_crc32start (test.test_zlib.ChecksumTestCase) ... ok
test_penguins (test.test_zlib.ChecksumTestCase) ... ok
test_same_as_binascii_crc32 (test.test_zlib.ChecksumTestCase) ... ok
test_big_buffer (test.test_zlib.ChecksumBigBufferTestCase) ... 
^C
^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C
Bus error
msg129073 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-02-22 13:30
> Just stepping ... with c8d1f99f25eb/r88476:

Right, that's what we should investigate :)
Could try to diagnose the crash?
msg129086 - (view) Author: Steffen Daode Nurpmeso (sdaoden) Date: 2011-02-22 15:37
.. even with a self-compiled 1.2.3, INT_MAX/1000 ... nothing.
The problem is not crc32(), but the buffer itself:

   if (pbuf.len > 1024*5) {
        unsigned char *buf = pbuf.buf;
        Py_ssize_t len = pbuf.len;
        Py_ssize_t i;
fprintf(stderr, "CRC 32 2.1\n");
for(i=0; (size_t)i < (size_t)len;++i)
    *buf++ = 1;
fprintf(stderr, "CRC 32 2.2\n");

2.2 is never reached (in fact accessing buf[1] already causes fault).
Thus the problem is not zlib, but PyArg_ParseTuple().
But just don't ask me more on that!
msg129087 - (view) Author: Steffen Daode Nurpmeso (sdaoden) Date: 2011-02-22 15:39
(P.S.: of course talking about ChecksumBigBufferTestCase and the 4GB, say.)
msg129090 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-02-22 15:56
> .. even with a self-compiled 1.2.3, INT_MAX/1000 ... nothing.
> The problem is not crc32(), but the buffer itself:
> 
>    if (pbuf.len > 1024*5) {
>         unsigned char *buf = pbuf.buf;
>         Py_ssize_t len = pbuf.len;
>         Py_ssize_t i;
> fprintf(stderr, "CRC 32 2.1\n");
> for(i=0; (size_t)i < (size_t)len;++i)
>     *buf++ = 1;
> fprintf(stderr, "CRC 32 2.2\n");

Thank you! So it's perhaps a bug in mmap on Snow Leopard.
Could you try to debug a bit more precisely and see at which buffer
offset (from the start) the fault occurs?
msg129091 - (view) Author: Steffen Daode Nurpmeso (sdaoden) Date: 2011-02-22 15:58
Snippet

    if (pbuf.len > 1024*5) {
        volatile unsigned char *buf = pbuf.buf;
        Py_ssize_t len = pbuf.len;
Py_ssize_t i = 0;
volatile unsigned char au[100];
volatile unsigned char*x = au;
        fprintf(stderr, "CRC ENTER, buffer=%p\n", buf);
for (i=0; (size_t)i < (size_t)len; ++i) {
    fprintf(stderr, "%ld, buf=%p\n", (signed long)i, buf);
    *x = *buf++;
}

results in

test_big_buffer (test.test_zlib.ChecksumBigBufferTestCase) ... CRC ENTER, buffer=0x1014ab000
0, buf=0x1014ab000
msg129093 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-02-22 16:37
Out of curiosity, could you try the following patch?

Index: Lib/test/test_zlib.py
===================================================================
--- Lib/test/test_zlib.py	(révision 88500)
+++ Lib/test/test_zlib.py	(copie de travail)
@@ -70,7 +70,7 @@
         with open(support.TESTFN, "wb+") as f:
             f.seek(_4G)
             f.write(b"asdf")
-            f.flush()
+        with open(support.TESTFN, "rb") as f:
             self.mapping = mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ)
 
     def tearDown(self):
msg129107 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-02-22 18:07
> .. even with a self-compiled 1.2.3, INT_MAX/1000 ... nothing.
> The problem is not crc32(), but the buffer itself:
> 
>    if (pbuf.len > 1024*5) {
>         unsigned char *buf = pbuf.buf;
>         Py_ssize_t len = pbuf.len;
>         Py_ssize_t i;
> fprintf(stderr, "CRC 32 2.1\n");
> for(i=0; (size_t)i < (size_t)len;++i)
>     *buf++ = 1;
> fprintf(stderr, "CRC 32 2.2\n");

Unless I'm mistaken, in the test the file is mapped with PROT_READ, so it's normal to get SIGSEGV when writting to it:

   def setUp(self): 
            with open(support.TESTFN, "wb+") as f: 
                f.seek(_4G) 
                f.write(b"asdf") 
                f.flush() 
                self.mapping = mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ) 

> for(i=0; (size_t)i < (size_t)len;++i)
>     *buf++ = 1;

But it seems you're also getting segfaults when only reading it, right ?

I've got a stupid question: how much memory do you have ?
Cause there seems to be some issues with page cache when reading mmaped files on OS-X:
http://lists.apple.com/archives/darwin-development/2003/Jun/msg00141.html

On Linux, the page cache won't fill forever, so you don't need to have enough free memory to accomodate the whole file (the page cache should grow, but not forever). But on OS-X, it seems that the page replacement algorithm seems to retain mmaped pages in the page cache much longer, which could potentially trigger an OOM later (because of overcommitting, mmap can very well return a valid address range which leads to a segfault when accessed later).
I'm not sure why it would segfault on the first page, though.
msg129120 - (view) Author: Steffen Daode Nurpmeso (sdaoden) Date: 2011-02-22 20:23
I have a MacBook with 2 GB RAM.  Of course i'm a little bit messy, so an entry is half written before it comes to an ... end.  msg129091 is real life, though.

Antoine, your msg129093 patch of test_zlib.py does it (with and without fprintf(3)s).  CRC ok etc., it just works.
(Seems mmap(2) has a problem here, i would say; the mentioned bug report is from 2003, so the golden sunset watchers may need some more additional time, if you allow me that comment.)
msg129124 - (view) Author: Steffen Daode Nurpmeso (sdaoden) Date: 2011-02-22 20:32
(That is to say: i think it's better not to assume that these boys plan to *ever* fix it.  (Though mmap(2) is not CoreAudio/AudioUnit.))
msg129125 - (view) Author: Steffen Daode Nurpmeso (sdaoden) Date: 2011-02-22 20:35
(neologix: SIGBUS is not the same as SIGSEGV.  You know.  Thanks for this nice bug report.  Eight years is a .. time in computer programming - unbelievable, thinking of all these nervous wrecks who ever reported a bug to Apple!  Man!!!)
msg129126 - (view) Author: Steffen Daode Nurpmeso (sdaoden) Date: 2011-02-22 20:37
neologix: even with 2 GB RAM top(1) shows more than 600 MB free memory with the 4 GB test up and running ... in an Mac OS X environment ...  Lucky me, i don't believe them a single word...
msg129133 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-02-22 21:39
> Antoine, your msg129093 patch of test_zlib.py does it (with and
> without fprintf(3)s).  CRC ok etc., it just works.

Indeed, and it also seems to work on the buildbot. I will commit the
patch soon. Thanks for your help!

> (Seems mmap(2) has a problem here, i would say; the mentioned bug
> report is from 2003, so the golden sunset watchers may need some more
> additional time, if you allow me that comment.)
msg129140 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-02-22 22:37
Committed in r88511 (3.3) and r88514 (3.2).
msg129177 - (view) Author: Steffen Daode Nurpmeso (sdaoden) Date: 2011-02-23 11:52
I append a doc_lib_mmap.patch which may be helpful for those poor creatures who plan to write Python scripts for Mac OS X.  (It may be a useful add-on anyway.)
msg129184 - (view) Author: Steffen Daode Nurpmeso (sdaoden) Date: 2011-02-23 12:44
Sorry, i've got that kid running around which sometimes doesn't know what it is doing.  But this documentation patch may really be a help.  It's my first doc-patch, so it surely needs to be revised, if interest exists in such a patch for mmap at all, say.  Thanks for your understanding.
msg129391 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-02-25 17:05
Could you try with this:

 def setUp(self):
          with open(support.TESTFN, "wb+") as f:
              f.seek(_4G)
              f.write(b"asdf")
              f.flush()
+            os.fsync(f.fileno())
              self.mapping = mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ)

HFS+ doesn't seem to support sparse files, so the file is actually
zero-filled asynchronously.
Maybe the mapping gets done before the blocks have been allocated,
which triggers a segfault when the first page is accessed.
I'm not sure it'll make any difference, but I'm curious...

Also, I'd be curious to see the result of

"""
import os

name = '/tmp/foo'
f = open(name, 'wb')
f.seek(1 << 32)
f.write(b'asdf')
f.flush()
print(os.fstat(f.fileno()))
f.close()
print(os.stat(name))
"""

Thanks !
msg129520 - (view) Author: Steffen Daode Nurpmeso (sdaoden) Date: 2011-02-26 10:34
11:12 ~/tmp $ python3 ~/usr/opt/py3k/lib/python3.3/test_zlib.py
Bus error

Your code snippet:

11:21 ~/tmp $ /usr/bin/time -lp python3 test.py

posix.stat_result(st_mode=33184, st_ino=10066605, st_dev=234881025, st_nlink=1, st_uid=502, st_gid=20, st_size=4294967300, st_atime=1298715813, st_mtime=1298715813, st_ctime=1298715813)
posix.stat_result(st_mode=33184, st_ino=10066605, st_dev=234881025, st_nlink=1, st_uid=502, st_gid=20, st_size=4294967300, st_atime=1298715813, st_mtime=1298715813, st_ctime=1298715813)
real        71.66
user         0.06
sys          3.71
          0  maximum resident set size
          0  average shared memory size
          0  average unshared data size
          0  average unshared stack size
          0  page reclaims
          0  page faults
          0  swaps
          0  block input operations
         57  block output operations
          0  messages sent
          0  messages received
          0  signals received
       2112  voluntary context switches
          0  involuntary context switches

On Fri, Feb 25, 2011 at 05:05:19PM +0000, Charles-Francois Natali wrote:
>
>Charles-Francois Natali <neologix@free.fr> added the comment:
>
>Could you try with this:
>
> def setUp(self):
>          with open(support.TESTFN, "wb+") as f:
>              f.seek(_4G)
>              f.write(b"asdf")
>              f.flush()
>+            os.fsync(f.fileno())
>              self.mapping = mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ)
>
>HFS+ doesn't seem to support sparse files, so the file is actually
>zero-filled asynchronously.
>Maybe the mapping gets done before the blocks have been allocated,
>which triggers a segfault when the first page is accessed.
>I'm not sure it'll make any difference, but I'm curious...
>
>Also, I'd be curious to see the result of
>
>"""
>import os
>
>name = '/tmp/foo'
>f = open(name, 'wb')
>f.seek(1 << 32)
>f.write(b'asdf')
>f.flush()
>print(os.fstat(f.fileno()))
>f.close()
>print(os.stat(name))
>"""
>
>Thanks !
>
>----------
>
>_______________________________________
>Python tracker <report@bugs.python.org>
><http://bugs.python.org/issue11277>
>_______________________________________
msg129531 - (view) Author: Steffen Daode Nurpmeso (sdaoden) Date: 2011-02-26 11:30
I'll give you the same result again but with additional clock(), 
just for a heart's pleasure:

clock(): 0.100958 , fstat(): posix.stat_result(st_mode=33184, st_ino=10075508, st_dev=234881025, st_nlink=1, st_uid=502, st_gid=20, st_size=4294967300, st_atime=1298719201, st_mtime=1298719305, st_ctime=1298719305)
> f.close()
> print('clock():', time.clock(), ', stat():', os.stat(name))
clock(): 3.75792 , stat(): posix.stat_result(st_mode=33184, st_ino=10075508, st_dev=234881025, st_nlink=1, st_uid=502, st_gid=20, st_size=4294967300, st_atime=1298719201, st_mtime=1298719305, st_ctime=1298719305)

Please don't assume i go for Mac OS X ... 
In the end you *always* need to implement an expensive state 
machine to get around long-known bugs, mis-implementations or 
other poops there.
msg132938 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-04-04 11:30
This issue is not dead: test_zlib failed twice on "AMD64 Snow Leopard 3.x" buildbot: build 30 (024967cdc2f0e850f0b338e7593a12d965017a6a, Mar 31 01:40:00 2011) and 44 (ebc03d7e711052c0b196aacdbec6778c0a6d5c0c, Apr 4 10:11:20 2011).

Build 44 has a traceback thanks to faulthandler:
--------------------
...
[ 79/354] test_time
[ 80/354] test_zlib
Fatal Python error: Bus error

Traceback (most recent call first):
  File "/Users/pythonbuildbot/buildarea/3.x.hansen-osx-x86/build/Lib/test/test_zlib.py", line 85 in test_big_buffer
  File "/Users/pythonbuildbot/buildarea/3.x.hansen-osx-x86/build/Lib/unittest/case.py", line 387 in _executeTestPart
  File "/Users/pythonbuildbot/buildarea/3.x.hansen-osx-x86/build/Lib/unittest/case.py", line 442 in run
  File "/Users/pythonbuildbot/buildarea/3.x.hansen-osx-x86/build/Lib/unittest/case.py", line 494 in __call__
  File "/Users/pythonbuildbot/buildarea/3.x.hansen-osx-x86/build/Lib/unittest/suite.py", line 105 in run
  File "/Users/pythonbuildbot/buildarea/3.x.hansen-osx-x86/build/Lib/unittest/suite.py", line 67 in __call__
  File "/Users/pythonbuildbot/buildarea/3.x.hansen-osx-x86/build/Lib/unittest/suite.py", line 105 in run
  File "/Users/pythonbuildbot/buildarea/3.x.hansen-osx-x86/build/Lib/unittest/suite.py", line 67 in __call__
  File "/Users/pythonbuildbot/buildarea/3.x.hansen-osx-x86/build/Lib/test/support.py", line 1078 in run
  File "/Users/pythonbuildbot/buildarea/3.x.hansen-osx-x86/build/Lib/test/support.py", line 1166 in _run_suite
  File "/Users/pythonbuildbot/buildarea/3.x.hansen-osx-x86/build/Lib/test/support.py", line 1192 in run_unittest
  File "/Users/pythonbuildbot/buildarea/3.x.hansen-osx-x86/build/Lib/test/test_zlib.py", line 611 in test_main
  File "./Lib/test/regrtest.py", line 1032 in runtest_inner
  File "./Lib/test/regrtest.py", line 826 in runtest
  File "./Lib/test/regrtest.py", line 650 in main
  File "./Lib/test/regrtest.py", line 1607 in <module>
make: *** [buildbottest] Bus error
program finished with exit code 2
elapsedTime=1400.363321
--------------------
http://www.python.org/dev/buildbot/all/builders/AMD64%20Snow%20Leopard%203.x/builds/44/steps/test/logs/stdio

test_zlib.py:85 is the crc32(+4 GB) test:
----------------------
# Issue #10276 - check that inputs >=4GB are handled correctly.
class ChecksumBigBufferTestCase(unittest.TestCase):

    def setUp(self):
        with open(support.TESTFN, "wb+") as f:
            f.seek(_4G)
            f.write(b"asdf")
        with open(support.TESTFN, "rb") as f:
            self.mapping = mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ)

    def tearDown(self):
        self.mapping.close()
        support.unlink(support.TESTFN)

    @unittest.skipUnless(mmap, "mmap() is not available.")
    @unittest.skipUnless(sys.maxsize > _4G, "Can't run on a 32-bit system.")
    @unittest.skipUnless(support.is_resource_enabled("largefile"),
                         "May use lots of disk space.")
    def test_big_buffer(self):
        self.assertEqual(zlib.crc32(self.mapping), 3058686908) <~~~ HERE
        self.assertEqual(zlib.adler32(self.mapping), 82837919)
----------------------
msg132940 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-04-04 11:31
Issue #11760 has been marked as a duplicate of this issue.
msg132941 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-04-04 11:57
Is the SIGBUS generated on the first page access ?
How much memory does this buildbot have ?
msg132983 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-04-04 22:34
The new FreeBSD buildbot had a sporadic SIGKILL in http://www.python.org/dev/buildbot/all/builders/AMD64%20FreeBSD%208.2%203.x/builds/1/steps/test/logs/stdio

(apparently, faulthandler didn't dump a traceback)

By the way, we can be fairly certain now that the problem is on the OS side rather than on our (Python) side, so I'm lowering the priority.
msg132984 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-04-04 22:45
By the way, at this point I think we could simply skip the test on BSDs and OS X. The tested functionality is cross-platform, so testing under a limited set of systems should be ok.
msg132985 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2011-04-04 22:47
For the new FreeBSD bot, the issue was simply insufficient swap space.
With 1GB of memory and 2GB of swap test_zlib runs fine.
msg133154 - (view) Author: Steffen Daode Nurpmeso (sdaoden) Date: 2011-04-06 18:48
I can't confirm this for my MacBook:


20:39 ~ $ time python3 -E -Wd -m test -r -w -uall test_zlib
Using random seed 1960084
[1/1] test_zlib
1 test OK.
[91618 refs]

real	4m1.051s
user	0m15.031s
sys	0m26.908s

...

20:40 ~ $ ll tmp/test_python_6778/
4194308 -rw-r-----  1 steffen  staff  4294967300  6 Apr 20:40 @test_6778_tmp

...

Processes: 63 total, 2 running, 3 stuck, 58 sleeping, 246 threads                20:40:30
Load Avg: 0.59, 0.65, 0.56  CPU usage: 6.79% user, 13.10% sys, 80.9% idle
SharedLibs: 8260K resident, 9972K data, 0B linkedit.
MemRegions: 6043 total, 218M resident, 13M private, 185M shared.
PhysMem: 446M wired, 328M active, 138M inactive, 912M used, 1135M free.
VM: 143G vsize, 1042M framework vsize, 29610(0) pageins, 0(0) pageouts.
Networks: packets: 807/440K in, 933/129K out. Disks: 13881/581M read, 26057/16G written.

PID   COMMAND      %CPU TIME     #TH  #WQ  #PORT #MRE RPRVT  RSHRD  RSIZE  VPRVT  VSIZE
6778  python3      4.5  00:00.94 2    0    37    139  13M    320K   15M    38M    2403M

...

Processes: 63 total, 3 running, 60 sleeping, 253 threads                         20:41:30
Load Avg: 0.54, 0.62, 0.55  CPU usage: 12.98% user, 14.90% sys, 72.11% idle
SharedLibs: 8260K resident, 9972K data, 0B linkedit.
MemRegions: 6062 total, 269M resident, 13M private, 274M shared.
PhysMem: 443M wired, 329M active, 184M inactive, 955M used, 1091M free.
VM: 147G vsize, 1042M framework vsize, 41530(11520) pageins, 0(0) pageouts.
Networks: packets: 807/440K in, 933/129K out. Disks: 13950/627M read, 29598/19G written.

PID   COMMAND      %CPU TIME     #TH  #WQ  #PORT #MRE RPRVT  RSHRD  RSIZE  VPRVT  VSIZE
6778  python3      11.6 00:03.74 2    0    37    140  60M+   320K   62M+   4134M  6499M

...

20:43 ~ $ ll tmp/test_python_6778/
4194308 -rw-r-----  1 steffen  staff  4294967300  6 Apr 20:40 @test_6778_tmp


As i've stated for #11779, maybe these random errors of the bot
are caused by some strange hardware based error?
msg133677 - (view) Author: Nadeem Vawda (nadeem.vawda) * (Python committer) Date: 2011-04-13 16:30
> By the way, at this point I think we could simply skip the test on BSDs
> and OS X. The tested functionality is cross-platform, so testing under
> a limited set of systems should be ok.

Another solution would be to rewrite the test to not use mmap() at all:

    @precisionbigmemtest(size=_4G + 4, memuse=1)
    def test_big_buffer(self, size):
        if size < _4G + 4:
            self.skipTest("not enough free memory, need at least 4 GB")
        data = bytearray(_4G + 4)
        data[-4:] = b"asdf"
        self.assertEqual(zlib.crc32(data), 3058686908)
        self.assertEqual(zlib.adler32(data), 82837919)

This is more consistent with the other bigmem tests in test_zlib, but
I'm guessing it will mean that the test gets run much less often (since a
lot of machines won't have enough memory). If that's OK, then I'd prefer
doing it this way (since it keeps things simpler). Otherwise, skipping
the test on OS X sounds fine to me.
msg133687 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2011-04-13 19:27
Just to give another data point: A couple of days ago I reduced the
memory on the AMD64 FreeBSD bot to (375MB RAM, 2GB swap) and the zlib
tests still pass.
msg133689 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-04-13 20:35
> Another solution would be to rewrite the test to not use mmap() at all:
> 
>     @precisionbigmemtest(size=_4G + 4, memuse=1)
>     def test_big_buffer(self, size):
>         if size < _4G + 4:
>             self.skipTest("not enough free memory, need at least 4 GB")
>         data = bytearray(_4G + 4)
>         data[-4:] = b"asdf"
>         self.assertEqual(zlib.crc32(data), 3058686908)
>         self.assertEqual(zlib.adler32(data), 82837919)
> 
> This is more consistent with the other bigmem tests in test_zlib, but
> I'm guessing it will mean that the test gets run much less often (since a
> lot of machines won't have enough memory). If that's OK, then I'd prefer
> doing it this way (since it keeps things simpler).

I think there's basically noone and nothing (even among the buildbots)
that runs bigmem tests on a regular basis, so I'd much rather keep the
mmap() solution, even if that means it must be skipped on OS X.
msg133697 - (view) Author: Nadeem Vawda (nadeem.vawda) * (Python committer) Date: 2011-04-13 21:43
> I think there's basically noone and nothing (even among the buildbots)
> that runs bigmem tests on a regular basis, so I'd much rather keep the
> mmap() solution, even if that means it must be skipped on OS X.

Fair enough.

(As an aside, if it is preferable to use an mmap() hack for this sort of
test, it would be good to add some machinery to test.support to make it
easier to use. But that's something for a separate issue.)
msg133741 - (view) Author: Steffen Daode Nurpmeso (sdaoden) Date: 2011-04-14 14:16
Pfff.
Now i really spent some time on Mac OS X memory management.
I did it for Wanda.  :)

(First: i don't know why you want to drop that nice mmap(2),
it's wonderful to test harddisk performance!
And if you want to support Apple, then you have to spend
some blood - right???
Implement a state machine, disallow mmap(2) if the file has
been written to.  Maybe i should spend some more hours and
try to find rules behind this??)

Situation is a follows.
Also here there are some nice sysctl(8) variables:

    hw.memsize: 2147483648
    hw.usermem = 1651843072
    vm.global_no_user_wire_amount: 67108864
    vm.global_user_wire_limit: 1811939328
    vm.user_wire_limit: 1811939328

That doesn't mean much though.
I've searched Apples developer pages with their unbelievable
stupid search engine and found
    developer.apple.com/library/mac/#documentation/Darwin/Conceptual/KernelProgramming/About/About.html

I've downloaded that as a PDF (upper right corner).
That doesn't mean much though.

So i finally did some tests using Nadeem's code snippet
from msg133677.  The largest top(1) i ever got was
    30477  python3      2.7       00:09.77 1    0    18    77    912M+  240K
but the system is unusable then.

I killed all tests which spent more than about three minutes first,
later i did so whenever 900M was not reached in top(1) output -
it seems to me that Apple's VM is not intelligent enough to detect
that it effectively has entered an endless loop!!!

The result of all that is that i think i can savely give you the
following advice for my MacBook (Mac OS X 10.6.7, but uname 10.7):

    x = bytearray(hw.usermem=1651843072   // 2)
responses in few fractions of a second almost regardless of system
load and gives that top(1) line:
    31369  python3      0.0       00:01.29 1    0    18    71    794M   240K

    x = bytearray(hw.user_wire_limit=1811939328   // 2)
responses in noticably more fractions of a second and gives the
following top(1) line:
   32899  python3      0.0       00:01.39 1    0    18    71    870M   240K

Note that the system seemed to handle the first case somewhat
easily, whereas the latter resulted in unresponsive window
switching etc. etc., so that it seems as if...

I don't know wether Python offers the available memory size values
somewhere, but note that Apple has moved sysctl(8) from /sbin
to /usr/sbin, which is a lie if you would ask me to express my
opinion.

Except for all this i don't understand this thread.
Isn't that bot the one for which haypo has noticed those
random failures???
msg133764 - (view) Author: Steffen Daode Nurpmeso (sdaoden) Date: 2011-04-14 19:29
And about mmap(2):

import os,sys,time,mmap

MP0 = ((2**30)-1)
MP1 = ((2**31)-1)
MP2 = ((2**32)-1)
MPV = (2**20) * 100
SIZES = (MP0-MPV, MP0, MP0+MPV,
         MP1-MPV, MP1, MP1+MPV,
         MP2-MPV, MP2, MP2+MPV)

FILE = 'test.dat'

print('Start:', time.gmtime())
for i in SIZES:
    print('Testing file size ', i, ': ', sep='', end='')
    sys.stdout.flush()
    with open(FILE, "wb+") as f:
        f.seek(i)
        f.write(b'asdf')
        f.flush()
        sb = os.stat(FILE) 
        if sb.st_size != i+4:
            print('size failure:', sb.st_size, ' != ', i, sep='', end='')
            sys.stdout.flush()
        mem = mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ)
        if mem[0] != ord('\0'):
            print('offset 0 failed: ', ord(mem[0]), ' ', end='', sep='')
        else:
            print('offset 0 ok ', end='', sep='')
        sys.stdout.flush()
        if mem[i] != ord('a'):
            print('offset i failed: ', ord(mem[i]), ' ', end='', sep='')
        else:
            print('offset i ok ', end='', sep='')
        print()
        sys.stdout.flush()
    os.unlink(FILE)
print('End:', time.gmtime())

...

Start: time.struct_time(tm_year=2011, tm_mon=4, tm_mday=14, tm_hour=17, tm_min=27, tm_sec=30, tm_wday=3, tm_yday=104, tm_isdst=0)
Testing file size 968884223: offset 0 ok offset i ok 
Testing file size 1073741823: offset 0 ok offset i ok 
Testing file size 1178599423: offset 0 ok offset i ok 
Testing file size 2042626047: offset 0 ok offset i ok 
Testing file size 2147483647: offset 0 ok offset i ok 
Testing file size 2252341247: offset 0 ok offset i ok 
Testing file size 4190109695: offset 0 ok offset i ok 
Testing file size 4294967295: offset 0 ok offset i ok 
Testing file size 4399824895: offset 0 ok offset i ok 
End: time.struct_time(tm_year=2011, tm_mon=4, tm_mday=14, tm_hour=17, tm_min=27, tm_sec=30, tm_wday=3, tm_yday=104, tm_isdst=0)

Now i think that can't be any faster.
Changing to
    MP0 = ((2**30)-0)
    MP1 = ((2**31)-0)
    MP2 = ((2**32)-0)
results in

Start: time.struct_time(tm_year=2011, tm_mon=4, tm_mday=14, tm_hour=17, tm_min=27, tm_sec=55, tm_wday=3, tm_yday=104, tm_isdst=0)
Testing file size 968884224: offset 0 ok offset i ok 
Testing file size 1073741824: offset 0 ok offset i ok 
Testing file size 1178599424: offset 0 ok offset i ok 
Testing file size 2042626048: offset 0 ok offset i ok 
Testing file size 2147483648: offset 0 ok offset i ok 
Testing file size 2252341248: offset 0 ok offset i ok 
Testing file size 4190109696: offset 0 ok offset i ok 
Testing file size 4294967296: <- EOF here

Manually adjusted SIZES:

Testing file size 4294967295: offset 0 ok offset i ok 
Testing file size 4296015872: offset 0 ok offset i ok (MP2+1024*1024)
Testing file size 4295491584: offset 0 ok offset i ok (MP2+1024*512)
Testing file size 4295229440: offset 0 ok offset i ok (MP2+1024*256)
...
Testing file size 4294971392: offset 0 ok offset i ok (MP2+1024*4)
Testing file size 4294969344: <- EOF here (MP2+1024*2)

Pagesize = 4096.
I think the state machine can be easier than i thought.
msg133837 - (view) Author: Steffen Daode Nurpmeso (sdaoden) Date: 2011-04-15 15:20
I was able to spend more time on that this afternoon.
'Got an unkillable diff(1) along the way which required me to
force a cold reboot.  Well.

I attach a C version (11277.mmap.c) which i've used for testing.
The file 11277.zsum32.c is a quick-and-dirty C program to
calculate CRC-32 and Adler-32 checksums (i had none for the
latter and maybe you want to test some more, so); it requires zlib.
I also attach 11277.1.diff which updates test/test_zlib.py, though
this is rather useless, because that still results in a bus error.

This is the real interesting thing however, because the C version
actually works quite well for the chosen value, and the resulting
files are identical, as zsum32 shows:

    Adler-32 <14b9018b> CRC-32 <c6e340bf> -- test_python_413/@test_413_tmp
    Adler-32 <14b9018b> CRC-32 <c6e340bf> -- c-mmap-testfile

I thought
             os.fsync(f.fileno())
does the trick because it does it in C (hi, Charles-Francois),
but no.
So what do i come up with?
Nothing.  A quick look into 11277.mmap.c will show you this:

    /* *Final* sizes (string written after lseek(2): "abcd") */
...
        /* Tested good */
        //0x100000000 - PAGESIZE - 5,
        //0x100000000 - 4,
        //0x100000000 - 3,
        //0x100000000 - 1,
        0x100000000 + PAGESIZE + 4,
        //0x100000000 + PAGESIZE + 5,
        /* Tested bad */
        //0x100000000,
        //0x100000000 + PAGESIZE,
        //0x100000000 + PAGESIZE + 1,
        //0x100000000 + PAGESIZE + 3,

Hm!
Now i have to go but maybe i can do some more testing tomorrow to
answer the question why test_zlib.py fails even though there is
the fsync() and even though the values work in C.
Any comments?
msg133860 - (view) Author: Steffen Daode Nurpmeso (sdaoden) Date: 2011-04-15 18:12
My last idea for today was to split the writes.
This also works for the C version, but it does not for test_zlib.py.
I attach the updated files.  And for completeness:

    Adler-32 <7a54018b> CRC-32 <7f1be672> -- @test_13713_tmp
    Adler-32 <7a54018b> CRC-32 <7f1be672> -- c-mmap-testfile
msg133892 - (view) Author: Nadeem Vawda (nadeem.vawda) * (Python committer) Date: 2011-04-16 14:55
> So i finally did some tests using Nadeem's code snippet
> from msg133677.  The largest top(1) i ever got was
>    30477  python3      2.7       00:09.77 1    0    18    77    912M+  240K
> but the system is unusable then.

The code I posted was only intended to run on machines with at least
4GB of free memory. The precisionbigmemtest decorator should cause it to
be skipped on machines with less RAM.

> Except for all this i don't understand this thread.
> Isn't that bot the one for which haypo has noticed those
> random failures???

The current status of this issue (as I understand it) is:
* test_zlib's test_big_buffer() is failing sporadically on the AMD64
   Snow Leopard buildbot (http://www.python.org/dev/buildbot/all/builders/AMD64%20Snow%20Leopard%203.x/)
* The cause seems to be a bug in OS X's handling of mmap()'d files, not a
   problem with the Python code.
* Antoine has proposed skipping this test on OS X as a workaround, and
   no-one has objected to this.

I don't think it is necessary to further investigate the behaviour of
Snow Leopard's mmap() - we know that it's broken, and we have a fix.

At the moment, we need someone to actually write and commit the fix.
I would do it myself, but I'm hesitant to commit code without testing it,
and don't have access to a Mac system.
msg133894 - (view) Author: Steffen Daode Nurpmeso (sdaoden) Date: 2011-04-16 16:13
Yet another bug of Mac OS X: it sometimes creates messed up sparse
regions:

14:00 ~/tmp/test $ ~/src/cpython/python.exe test_mmap.py 
..
14:01 ~/tmp/test $ zsum32 py-mmap-testfile 
Adler-32 <db8d743c> CRC-32 <78ebae7a> -- py-mmap-testfile
14:03 ~/tmp/test $ ./test_mmap
Size 4294971396/0x100001004: open. lseek. write. fsync. fstat. mmap. [0]. [s.st_size-4]. munmap.
14:04 ~/tmp/test $ zsum32 c-mmap-testfile 
Adler-32 <14b9018b> CRC-32 <c6e340bf> -- c-mmap-testfile
14:08 ~/tmp/test $ hexdump -C -s 4000 -n 128 c-mmap-testfile 
00000fa0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00001020
14:08 ~/tmp/test $ hexdump -C -s 4000 -n 128 py-mmap-testfile 
00000fa0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00001000  db db db db db db db db  db db db db db db db db  |................|
*
00001020

Conclusions:

1. It is unwise to create memory regions GT
       hw.usermem=1651843072   // 2
   and extremely unwise to do so for regions GT
       hw.user_wire_limit=1811939328   // 2

   Exceeding this limit and Mac OS X effectively enters an endless
   loop which may cause so much paging activity that the system is
   almost locked.

   (P.S.: if you invoke diff(1) on two extremely large files you
   may produce an unkillable process which survives SIGKILL and
   "Activity Monitor" initiated "Force Quit"s; not to talk about
   termination of the parent shell.)

2. Mac OS X does not reliably produce sparse files.
   If the attached files 11277.mmap-2.c and 11277.mmap-2.py are
   modified not to unlink(2) the produced files (not hard for the
   Python version), then:

       cmp --verbose py-mmap-testfile c-mmap-testfile | wc
       95832  287496 1820808

3. For at least sparse files the VMS of Mac OS X is unable to
   create an accessible mmap(2)ing if the size of the mapping is
   in the inclusive range
       UINT32_MAX+1 .. UINT32_MAX + PAGESIZE (== 4096)
   and the file has been written to.

   Closing the file and reopening it will make the mapping not
   only succeed but also accessible (talking about Python).

4. If you chose a size which does not fail immediately, then
   if you don't reopen but only instrument mmapmodule.c then
       subscript self=0x100771350
           CALCULATED SUBSCRIPT 4095
       subscript self=0x100771350
           CALCULATED SUBSCRIPT 4096
       Bus Error
   Thus, accessing the first byte of the second page causes
   Python to fail with SIGBUS, *even* if you explicitely fsync()
   the fd in new_mmap_object(); fstat(2) code runs anyway.
   The C version does *not* have this problem, here fsync() alone
   does the magic.

5. Python's C code: mumble mumble mumble.
   That really needs to be said at least.

6. The error is in mmapmodule.c, function new_mmap_object().
   It is the call to mmap(2).
   Wether i dup(2) or not.  Whatever i do.
   Even if i reduce new_mmap_object() to the running code from
   11277.mmap-2.c:

if (fd != -1 && fstat(fd, &st) == 0 && S_ISREG(st.st_mode) &&
    map_size == 0)
    map_size = st.st_size;
fprintf(stderr,"before mmap(2): size=%lu,fd=%d\n",(size_t)map_size, fd);
{void *addr = mmap(NULL, (size_t)map_size, PROT_READ, MAP_SHARED, fd, 0);
fprintf(stderr, "after mmap(2): size=%lu,fd=%d got address=%p\n",(size_t)map_size, fd, addr);
{size_t j;
for (j = 0; j < map_size; ++j) {
    char x; 
    if (j % 1024 == 0)
        fprintf(stderr, "INDEX %lu\n",j);
    x = ((volatile char*)addr)[j]
}
fprintf(stderr, "PASSED ALL INDICIES\n");
exit(1);
}
}

...

17:41 ~/tmp/test $ ~/src/cpython/python.exe 11277.mmap-2.py
DESCRIPTOR FLAGS WILL BE 0
DESCRIPTOR FLAGS WILL BE 0
Start: time.struct_time(tm_year=2011, tm_mon=4, tm_mday=16, tm_hour=15, tm_min=41, tm_sec=22, tm_wday=5, tm_yday=106, tm_isdst=0)
Testing file size 4294971400: DESCRIPTOR FLAGS WILL BE 1538
new_mmap_object
_GetMapSize o=0x1001f5d10
before mmap(2): size=4294971396,fd=3
after mmap(2): size=4294971396,fd=3 got address=0x101140000
INDEX 0
INDEX 1024
INDEX 2048
INDEX 3072
INDEX 4096
Bus error

7. Note the C version also works if i prepend many malloc(3)
   calls.

8. I have no idea what Python does here.
   Maybe it's ld(1) and dynamic module-loading related.
   Maybe Apples VM gets confused about memory regions if several
   conditions come together.  I have no idea of what Python does
   along it's way to initialize itself.  It's a lot.

   And i'm someone who did not even look into Doc/c-api/ at all
   yet except for a
        grep -Fr tp_as_buf Doc/
   today (the first version of the iterate-cpl-buffer used buffer
   interface).  So please explain any comments you might give.
   Maybe i'll write a patch to add tests to test_mmap.py.
   Beside that i'm out of this.

9. Maybe it's really better to simply skip this on Mac OS X.

Z. ... and maybe someone with a name should ask someone with
   a name-name-name to ask those californian ocean surfers to fix
   at least some of the OS bugs?  My bug reports are not even
   adhered by Opera, even if i attach reproducable scripts or
   URLs...
msg133896 - (view) Author: Steffen Daode Nurpmeso (sdaoden) Date: 2011-04-16 16:19
On Sat, Apr 16, 2011 at 02:55:29PM +0000, Nadeem Vawda wrote:
> I don't think it is necessary to further investigate the behaviour of
> Snow Leopard's mmap() - we know that it's broken, and we have a fix.
> 
> At the moment, we need someone to actually write and commit the fix.
> I would do it myself, but I'm hesitant to commit code without testing it,
> and don't have access to a Mac system.

I see.
msg134032 - (view) Author: Steffen Daode Nurpmeso (sdaoden) Date: 2011-04-19 10:08
Took some time, but here is a patch that makes mmap(2) work on
Mac OS X.
This also applies to #11779.

Background:
on OS X, fsync(2) seems to behave as fdatasync(2).
To give people the possibility to do some kind of fync(2)
nonetheless, a new fcntl(2) has been introduced: F_FULLFSYNC.
If you use that, the ,sparse` file is synchronized with physical
backing store immediately and all is fine.
Vampire magic!
msg134033 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-04-19 10:10
@sdaoden: This issue has a lot of patches, can you remove old patches?
msg134035 - (view) Author: Steffen Daode Nurpmeso (sdaoden) Date: 2011-04-19 10:13
(The working patch is http://bugs.python.org/file21715/11277.3.diff.)
msg134036 - (view) Author: Steffen Daode Nurpmeso (sdaoden) Date: 2011-04-19 10:15
(Dropped the tests, too.)
msg134038 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-04-19 10:34
11277.3.diff: this patch looks correct (I'm unable to test it), but can you add a sentence in mmap doc to explain that mmap.mmap() does flush the file on Mac OS X and VMS?
msg134039 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-04-19 10:43
Oh, fcntl has already a F_FULLFSYNC constant, so we can use like fcntl.fcntl(fd, fcntl.F_FULLFSYNC) in Python.

> can you add a sentence in mmap doc to explain that mmap.mmap()
> does flush the file on Mac OS X and VMS?

Hum, it does flush the file on VMS using fsync(), but on Mac OS X, it does just set F_FULLFSYNC flag using fcntl. It doesn't call fsync() explicitly. Does mmap() "call fsync()" implicitly?
msg134040 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-04-19 10:47
Oh, and can you add a comment explaining why F_FULLFSYNC is needed on Mac OS X in your patch? (If I understood correctly, it is needed to avoid crash with sparse files).
msg134041 - (view) Author: Steffen Daode Nurpmeso (sdaoden) Date: 2011-04-19 11:14
Updated 11277.4.diff also includes mmap.rst update.
(Maybe os.fsync() and os.sync() should be modified to really do
that fcntl, too?  I'll think i'll open an issue with patch soon.)
msg134044 - (view) Author: Steffen Daode Nurpmeso (sdaoden) Date: 2011-04-19 12:07
On Tue, Apr 19, 2011 at 10:34:11AM +0000, STINNER Victor wrote:
> 11277.3.diff: this patch looks correct

Unbelievable - you really fought yourself through this immense
bunch of code in such a short time!
:)
msg134045 - (view) Author: Steffen Daode Nurpmeso (sdaoden) Date: 2011-04-19 12:29
(My last reply-mail changed the title.  Fixing.)
msg134047 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-04-19 12:56
> (My last reply-mail changed the title.  Fixing.)

Yeah, it's a common problem if you use the email interface :-/
msg134566 - (view) Author: Steffen Daode Nurpmeso (sdaoden) Date: 2011-04-27 14:17
What do you think - i think this issue can really be closed now.
I'll attach a final 11277.5.diff which has a less irritated and
thus better understandable comment than .4.diff.
I'll also drop .3 and .4.
A lot of noise again 8|
msg134943 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011-05-01 23:15
New changeset cb464f8fb3a1 by Victor Stinner in branch '3.1':
Issue #11277: mmap calls fcntl(fd, F_FULLFSYNC) on Mac OS X to get around a
http://hg.python.org/cpython/rev/cb464f8fb3a1

New changeset e9d298376dde by Victor Stinner in branch '3.2':
(Merge 3.1) Issue #11277: mmap.mmap() calls fcntl(fd, F_FULLFSYNC) on Mac OS X
http://hg.python.org/cpython/rev/e9d298376dde

New changeset d578fdc9b157 by Victor Stinner in branch 'default':
(Merge 3.2) Issue #11277: mmap.mmap() calls fcntl(fd, F_FULLFSYNC) on Mac OS X
http://hg.python.org/cpython/rev/d578fdc9b157
msg134945 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-05-01 23:22
I am not able to check the fix, but the buildbots are :-)

What should be done for Python 2.7? In Python 2.7, zlib.crc32() stores the buffer length into an int (so the maximum length is INT_MAX), and so test_zlib doesn't test a (sparse) file of 4 GB (ChecksumBigBufferTestCase).

But I suppose that mmap bug can also occur with a file of 2 GB.

@sdaoden: Can you try on Python 2.7?
msg134974 - (view) Author: Steffen Daode Nurpmeso (sdaoden) Date: 2011-05-02 13:57
On Mon,  2 May 2011 01:22:41 +0200, STINNER Victor <report@bugs.python.org> wrote:
> @sdaoden: Can you try on Python 2.7?

@haypo: Python 2.7 is absolute horror.
But i tried and produced a (terrible - i don't know the test
framework and that test_support stuff seems to have been changed
a lot since 2.7) 2 gigabyte+ big buffer test for 2.7.
(Of course: even though Python uses int, ZLib uses uInt.)
It took some time because i fell over #1202 from 2007 unprepared.

The (nasty) test works quite well on Apple, which is not such
a big surprise, because Apple's OS X is especially designed for
artists which need to work on large files, like video+ cutters,
sound designers with sample databases etc., so i would be
terribly disappointed if that wouldn't work!  Apple even
propagandize OS X for, and makes money with that very application
task - i really couldn't understand your doubts here.
msg134977 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-05-02 14:53
> @haypo: Python 2.7 is absolute horror.

Oh, zlib doesn't use PY_SSIZE_T_CLEAN in Python 2.7.

11277-27.1.diff contains "# Issue #10276 - check that inputs >=4GB are handled correctly.". I don't understand this comment because the test uses a buffer of 2 GB + 2 bytes.

How is it possible to pass a buffer of 2 GB+2 bytes to crc32(), whereas it stores the size into an int. The maximum size is INT_MAX which is 2 GB-1 byte. It looks like the "i" format of PyArg_ParseTuple() doesn't check for integer overflow => issue #8651. This issue was fixed in 3.1, 3.2 and 3.3, but not in Python 2.

Should we fix Python 2.7?
 - backport issue #8651
 - use PY_SSIZE_T_CLEAN in zlibmodule.c
msg135030 - (view) Author: Steffen Daode Nurpmeso (sdaoden) Date: 2011-05-03 12:14
> Should we fix Python 2.7?
>  - backport issue #8651
>  - use PY_SSIZE_T_CLEAN in zlibmodule.c

I really thought about this over night.
I'm a C programmer and thus:
- Produce no bugs
- If you've produced a bug, fix it at once
- If you've fixed a bug, scream out loud "BUGFIX!" -
  or at least incorporate the patch in the very next patch release

But i have no experience with maintaining a scripting language.
My survey of something like this spans about three months now.
And if even such a heavy known bug as #1202 survives at least two
minor releases (2.6 and 2.7) without being fixed, then maybe no
more effort should be put into 2.7 at all.

> 11277-27.1.diff contains "# Issue #10276 - check that inputs
> =4GB are handled correctly.". I don't understand this comment
> because the test uses a buffer of 2 GB + 2 bytes.
> How is it possible to pass a buffer of 2 GB+2 bytes to crc32(),
> whereas it stores the size into an int. The maximum size is
> INT_MAX which is 2 GB-1 byte. It looks like the "i" format of
> PyArg_ParseTuple() doesn't check for integer overflow => issue
> #8651. This issue was fixed in 3.1, 3.2 and 3.3, but not in
> Python 2

11277-27.2.diff uses INT_MAX and thus avoids any such pitfall.
Maybe it brings up memory mapping errors somewhere which i surely
would try fix everywhere i can.
msg135031 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011-05-03 12:36
New changeset 618c3e971e80 by Victor Stinner in branch '2.7':
(Merge 3.1) Issue #11277: mmap.mmap() calls fcntl(fd, F_FULLFSYNC) on Mac OS X
http://hg.python.org/cpython/rev/618c3e971e80
msg135037 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-05-03 13:23
I commited mmap fix for Mac OS X, crc test on 2 GB file, and issue #8651 fix into Python 2.7.

Use PY_SSIZE_T_CLEAN in zlibmodule.c is a new feature. I don't want to implement it, I don't need it, and I don't feel confortable in zlibmodule.c. Open a new issue if you want it.

I think that we are done with issue. If you see buildbots failures, reopen it.

Thanks again Steffen for your diagnosis and fixes.
msg135123 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-05-04 12:00
Reopen, test_zlib fails with Python 2.7 on Windows:

======================================================================
ERROR: test_big_buffer (test.test_zlib.ChecksumBigBufferTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "D:\cygwin\home\db3l\buildarea\2.7.bolen-windows\build\lib\test\test_zlib.py", line 91, in test_big_buffer
    m = mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ)
WindowsError: [Error 87] The parameter is incorrect

http://www.python.org/dev/buildbot/all/builders/x86%20XP-4%202.7/builds/854/steps/test/logs/stdio
msg135124 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011-05-04 12:02
New changeset 1ef2a7319849 by Victor Stinner in branch '2.7':
Issue #11277: fix issue number in a test_zlib comment
http://hg.python.org/cpython/rev/1ef2a7319849
msg135125 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-05-04 12:08
"x86 debian parallel 2.7", "x86 Ubuntu Shared 2.7" and "x86 Tiger 2.7" fail with mmap.error('[Errno 12] Cannot allocate memory').

http://www.python.org/dev/buildbot/all/builders/x86%20Ubuntu%20Shared%202.7/builds/866/steps/test/logs/stdio
http://www.python.org/dev/buildbot/all/builders/x86%20Tiger%202.7/builds/776/steps/test/logs/stdio
http://www.python.org/dev/buildbot/all/builders/x86%20debian%20parallel%202.7/builds/739/steps/test/logs/stdio
======================================================================
ERROR: test_big_buffer (test.test_zlib.ChecksumBigBufferTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/srv/buildbot/buildarea/2.7.bolen-ubuntu/build/Lib/test/test_zlib.py", line 91, in test_big_buffer
    m = mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ)
error: [Errno 12] Cannot allocate memory
msg135129 - (view) Author: Steffen Daode Nurpmeso (sdaoden) Date: 2011-05-04 13:48
> error: [Errno 12] Cannot allocate memory

@haypo: Well i told you i have no idea.  These bots are 32 bit?

I'll attach 11277-27.3.diff which does @skipUnless(not 32 bit).
Note i'll test against >_4G - does this work (on 32 bit and in
Python)?  A pity that Python does not offer a 'condition is
always true due to datatype storage restriction' check?!

And i don't think it makes sense to test a _1GB mmap on 32 bit at
all (but at least address space shouldn't exhaust for that).
So, sorry, also for the two bugs in that two-liner, but very
especially the 'm' case.
msg135150 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011-05-04 19:27
New changeset 7f3cab59ef3e by Victor Stinner in branch '2.7':
Issue #11277: test_zlib tests a buffer of 1 GB on 32 bits
http://hg.python.org/cpython/rev/7f3cab59ef3e
msg135151 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011-05-04 19:41
New changeset e6a4deb84e47 by Victor Stinner in branch '2.7':
Issue #11277: oops, fix checksum values of test_zlib on 32 bits
http://hg.python.org/cpython/rev/e6a4deb84e47
msg135152 - (view) Author: Steffen Daode Nurpmeso (sdaoden) Date: 2011-05-04 19:46
@haypo: Oh. Not:

   if sys.maxsize > _4G:
        # (64 bits system) crc32() and adler32() stores the buffer size into an
        # int, the maximum filesize is INT_MAX (0x7FFFFFFF)
        filesize = 0x7FFFFFFF
        crc_res = 0x709418e7
        adler_res = -2072837729
    else:
        # (32 bits system) On a 32 bits OS, a process cannot usually address
        # more than 2 GB, so test only 1 GB
        filesize = _1G
        crc_res = 0x2b09ee11
        adler_res = -1002962529

                    self.assertEqual(zlib.crc32(m), self.crc_res)
                    self.assertEqual(zlib.adler32(m), self.adler_res)

I'm not that fast.
msg135193 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-05-05 12:33
@sdaoden(, @pitrou): Antoine proposes to skip the zlib "big buffer" (1 GB) test on 32 bits system. What do you think?

On 64 bits system, we check a buffer of 2 GB-1 byte (0x7FFFFFFF bytes). Is the test useful or not? What do we test?

Can you check if the test crashs on Mac OS X on a 32 bits system (1 GB buffer) if you disable F_FULLFSYNC in mmapmodule.c? Same question on a 64 bits system (2 GB-1 byte buffer)?

The most important test if to test crc32 & adler32 with a buffer bigger than 4 GB, but we cannot write such test in Python 2.7 because the zlib module stores buffer sizes into int variables. So the "big buffer" test of Python 2.7 test_zlib is maybe just useful (on 32 and 64 bits). Can we just remove the test?
msg135203 - (view) Author: Steffen Daode Nurpmeso (sdaoden) Date: 2011-05-05 14:01
@haypo: trouble, trouble on the dev-list, as i've seen currently.
Sorry, sorry.  (Cannot subscribe, my DynIP's are often blacklisted ;)
Of course my comments were completely wrong, as Ethan has pointed
out correctly.

This is all s**t.  These are mmap(2) related issues and should be
tested in Lib/test/test_mmap.py.  However that does not use
    with open:
        create sparse file
        materialize
yet so that the Pear OS X sparsefile bug doesn't show up.  In fact
it doesn't do a full beam-me-up test at all yet?

> Is the test useful or not? What do we test?

We do test that mmap.mmap materializes a buffer which can be
accessed (readonly) from [0]..[len-1].
And that the checksums that zlib produces for that buffer are
correct.  Unfortunately we cannot test 0x80000000+ no more because
Python prevents that such a buffer can be used - that's a shame.
Maybe we could test 0x7FFFFFFF*2 aka 0xfffffffe in two iterations.

> Can you check if the test crashs on Mac OS X on a 32 bits system
> (1 GB buffer) if you disable F_FULLFSYNC in mmapmodule.c? Same
> question on a 64 bits system (2 GB-1 byte buffer)?

Aeh - F_FULLFSYNC was not yet committed at that time in 2.7.

> Can we just remove the test?

If i - choke! - need to write tests, i try to catch corner cases.
The corner cases would be 0,MAX_LEN(-1) and some (rather pseudo)
random values around these and maybe some in the middle.
(Plus some invalid inputs.)

Can we remove it?  I would keep it, Apple is preparing the next
major release (uname -a yet states 10.7.0 even though it's
10.6.7), and maybe then mmap() will fail for 0xDEADBEEF.
Who will be the one which detects that otherwise??
msg135239 - (view) Author: Steffen Daode Nurpmeso (sdaoden) Date: 2011-05-05 20:28
In fact i like my idea of using iterations.
I have some time tomorrow, so if nobody complains until then,
i write diffs for the tests of 3.x and 2.7 with these updates:

- Two different target sizes:
    1. 0xFFFFFFFF + x (7)
    2. 0x7FFFFFFF + x (7)
- On 32 bit systems, use iterations on a potentially safe buffer
  size.  I think 0x40000000 a.k.a 1024*1024*1024 is affordable,
  but 512 MB are probably more safe?  I'll make that a variable.
- The string will be 'DeadAffe' (8).
- The last 4 bytes of the string will always be read on their own
  (just in case the large buffer sizes irritated something down
  the path).
msg135255 - (view) Author: Nadeem Vawda (nadeem.vawda) * (Python committer) Date: 2011-05-06 00:54
haypo> Can we just remove the test?

I think so. The test was originally intended to catch the case where crc32() or
adler32() would get a buffer of >=4GB, and then silently truncate the size and
produce an incorrect result (issue10276). However, 2.7's zlib doesn't define
PY_SSIZE_T_CLEAN, so passing in a buffer of >=2GB raises an exception. So the
condition that it was testing for can't happen in 2.7.


sdaoden> Can we remove it?  I would keep it, Apple is preparing the next
sdaoden> major release (uname -a yet states 10.7.0 even though it's
sdaoden> 10.6.7), and maybe then mmap() will fail for 0xDEADBEEF.
sdaoden> Who will be the one which detects that otherwise??

I initially thought the same thing, but it turns out that the OS X sparsefile
crash is also covered by LargeMmapTests.test_large_offset() in test_mmap.
That test had also been failing sporadically before the F_FULLSYNC patch was
committed (see issue11779). So keeping this test around would be redundant.


sdaoden> Unfortunately we cannot test 0x80000000+ no more because
sdaoden> Python prevents that such a buffer can be used - that's a shame.
sdaoden> Maybe we could test 0x7FFFFFFF*2 aka 0xfffffffe in two iterations.

That wouldn't accomplish the same thing. The point of the test is to pick up
truncation issues that occur when you pass in a big buffer. These issues
won't show up if you split the data up into smaller pieces. And in any case,
they can't happen at all in 2.7, because the functions don't accept big
buffers in the first place ;)
msg135308 - (view) Author: Steffen Daode Nurpmeso (sdaoden) Date: 2011-05-06 15:33
On Fri,  6 May 2011 02:54:07 +0200, Nadeem Vawda wrote:
> I think so. [.]
> it turns out that the OS X sparsefile crash is also covered by
> LargeMmapTests.test_large_offset() in test_mmap [!!!]. [.]

So i followed your suggestion and did not do something on zlib no
more.  Even if that means that there is no test which checksums an
entire superlarge mmap() region.
Instead i've changed/added test cases in test_mmap.py:

- Removed all context-manager usage from LargeMmapTests().
  This feature has been introduced in 3.2 and is already tested
  elsewhere.  Like this the test is almost identical on 2.7 and 3.x.
- I've dropped _working_largefile().  This creates a useless large
  file only to unlink it directly.  Instead the necessary try:catch:
  is done directly in the tests.
- (Directly testing after .flush() without reopening the file.)
- These new tests don't run on 32 bit.

May the juice be with you
msg135376 - (view) Author: Nadeem Vawda (nadeem.vawda) * (Python committer) Date: 2011-05-06 22:40
Thanks for the tests; I'll review and commit them tomorrow morning.

> Even if that means that there is no test which checksums an
> entire superlarge mmap() region.

Bear in mind that the test is only to be removed from 2.7; it will still
be present in the 3.* branches, where crc32() and adler32() actually can
accept such large inputs.

@haypo, @pitrou: Are there any objections to removing test_big_buffer()
from Lib/test/test_zlib.py? If not, I think we can close this issue once
that and the additional mmap tests are committed.
msg135417 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-05-07 08:30
> @haypo, @pitrou: Are there any objections to removing test_big_buffer()
from Lib/test/test_zlib.py?

I now agree Antoine: the test is useless. It can be removed today.

About mmap: add a new test for this issue (mmap on Mac OS X and F_FULLSYNC)  is a good idea. I suppose that we will need to backport the F_FULLSYNC fix too.
msg135429 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011-05-07 09:42
New changeset 201dcfc56e86 by Nadeem Vawda in branch '2.7':
Issue #11277: Remove useless test from test_zlib.
http://hg.python.org/cpython/rev/201dcfc56e86
msg135445 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011-05-07 11:12
New changeset d5d4f2967879 by Nadeem Vawda in branch '3.1':
Issue #11277: Add tests for mmap crash when using large sparse files on OS X.
http://hg.python.org/cpython/rev/d5d4f2967879

New changeset e447a68742e7 by Nadeem Vawda in branch '3.2':
Merge: #11277: Add tests for mmap crash when using large sparse files on OS X.
http://hg.python.org/cpython/rev/e447a68742e7

New changeset bc13badf10a1 by Nadeem Vawda in branch 'default':
Merge: #11277: Add tests for mmap crash when using large sparse files on OS X.
http://hg.python.org/cpython/rev/bc13badf10a1
msg135446 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011-05-07 11:17
New changeset 8d27d2b22394 by Nadeem Vawda in branch '2.7':
Issue #11277: Add tests for mmap crash when using large sparse files on OS X.
http://hg.python.org/cpython/rev/8d27d2b22394
msg135448 - (view) Author: Steffen Daode Nurpmeso (sdaoden) Date: 2011-05-07 11:57
@Nadeem: note that the committed versions of the tests would not
show up the Mac OS X mmap() bug AFAIK, because there is an
intermediate .close() of the file to be mmapped.  The OS X bug is
that the VMS/VFS interaction fails to provide a valid memory
region for <<pages which are not yet physically present on disc>>
- i.e. there is no true sparse file support as on Linux, which
simply uses references to a single COW zero page.
(I've not tried it out for real yet, but i'm foolish like a prowd
cock, so i've looked at the changeset :)
msg135450 - (view) Author: Steffen Daode Nurpmeso (sdaoden) Date: 2011-05-07 12:05
(Of course this may also be intentional, say.
But then i would vote against it :), because it's better the
tests bring out errors than end-user apps.)
msg135452 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011-05-07 12:19
New changeset 9b9f0de19684 by Nadeem Vawda in branch '2.7':
Issue #11277: Fix tests - crash will not trigger if the file is closed and reopened.
http://hg.python.org/cpython/rev/9b9f0de19684

New changeset b112c72f8c01 by Nadeem Vawda in branch '3.1':
Issue #11277: Fix tests - crash will not trigger if the file is closed and reopened.
http://hg.python.org/cpython/rev/b112c72f8c01

New changeset a9da17fcb564 by Nadeem Vawda in branch '3.2':
Merge: #11277: Fix tests - crash will not trigger if the file is closed and reopened.
http://hg.python.org/cpython/rev/a9da17fcb564

New changeset b3a94906c4a0 by Nadeem Vawda in branch 'default':
Merge: #11277: Fix tests - crash will not trigger if the file is closed and reopened.
http://hg.python.org/cpython/rev/b3a94906c4a0
msg135455 - (view) Author: Nadeem Vawda (nadeem.vawda) * (Python committer) Date: 2011-05-07 12:27
sdaoden> @Nadeem: note that the committed versions of the tests would not
sdaoden> show up the Mac OS X mmap() bug AFAIK, because there is an
sdaoden> intermediate .close() of the file to be mmapped.

Thanks for catching that. Should be fixed now.

haypo> I now agree Antoine: the test is useless. It can be removed today.
haypo>
haypo> About mmap: add a new test for this issue (mmap on Mac OS X and
haypo> F_FULLSYNC)  is a good idea.

Done and done.

haypo> I suppose that we will need to backport the F_FULLSYNC fix too.

It has already been backported, as changeset 618c3e971e80.
msg137817 - (view) Author: Steffen Daode Nurpmeso (sdaoden) Date: 2011-06-07 12:00
Aehm, note that Apple has fixed the mmap(2) bug!!
I'm still surprised and can't really believe it, but it's true!
Just in case you're interested, i'll apply an updated patch.

Maybe Ned Deily should have a look at the version check, which
does not apply yet, but i don't know any other way to perform exact
version checking.  (Using 10.6.7 is not enough, it must be 10.7.0;
uname -a yet reports that all through, but no CPP symbol does!?)
msg137868 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2011-06-07 17:43
Thanks for the update.  Since the fix will be in a future version of OS X 10.7 Lion, and which has not been released yet, so it is not appropriate to change mmap until there has been an opportunity to test it.  But even then, we would need to be careful about adding a compile-time test as OS X binaries are often built to be compatible for a range of operating system version so avoid adding compilation conditionals unless really necessary.  If after 10.7 is released and someone is able to test that it works as expected, the standard way to support it would be to use the Apple-supplied availability macros to test for the minimum supported OS level of the build assuming it makes enough of a performance difference to bother to do so: http://developer.apple.com/library/mac/#technotes/tn2064/_index.html

(Modules/_ctypes/darwin/dlfcn_simple.c is one of the few that has this kind of test.)
msg137889 - (view) Author: Steffen Daode Nurpmeso (sdaoden) Date: 2011-06-07 22:14
@ Ned Deily <report@bugs.python.org> wrote (2011-06-07 19:43+0200):
> Thanks for the update.  Since the fix will be in a future
> version of OS X 10.7 Lion, and which has not been released yet,
> so it is not appropriate to change mmap until there has been an
> opportunity to test it.

It's really working fine.  That i see that day!
(Not that they start to fix the CoreAudio crashes...)

> But even then, we would need to be careful about adding
> a compile-time test as OS X binaries are often built to be
> compatible for a range of operating system version so avoid
> adding compilation conditionals unless really necessary.
> If after 10.7 is released and someone is able to test that it
> works as expected, the standard way to support it would be to
> use the Apple-supplied availability macros to test for the
> minimum supported OS level of the build assuming it makes enough
> of a performance difference to bother to do so

Of course it only moves the delay from before mmap(2) to after
close(2).  Well, i don't know, if hardcoding is not an option,
a dynamic sysctl(2) lookup may do:

    kern.version = Darwin Kernel Version 10.7.0: Sat Jan 29 15:17:16 PST 2011

This is obviously not the right one.  :)
--
Ciao, Steffen
sdaoden(*)(gmail.com)
() ascii ribbon campaign - against html e-mail
/\ www.asciiribbon.org - against proprietary attachments
msg137891 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-06-07 22:37
Yes, you should check the Mac OS X version at runtime (as you should check the Linux kernel at runtime). platform.mac_ver() uses something like:

sysv = _gestalt.gestalt('sysv')
if sysv:
  major = (sysv & 0xFF00) >> 8
  minor = (sysv & 0x00F0) >> 4
  patch = (sysv & 0x000F)

Note: patch is not reliable with 'sysv', you have to use ('sys1','sys2','sys3').

So if you would like to check that you have Mac OS 10.7 or later, you can do something like:

sysv = _gestalt.gestalt('sysv')
__MAC_10_7 = (sysv and (sysv >> 4) >= 0x0a7)

In C, it should be something like:
-------
const OSType SYSV = 0x73797376U; /* 'sysv' in big endian */
SInt32 response;
OSErr iErr;
iErr = Gestalt(SYSV, &response);
if (iErr == 0 && (response >> 4) >= 0x0a7)
  /* have Mac OS >= 10.7 */
-------

I'm not sure of 0x73797376, I used hex(struct.unpack('!I', 'sysv')[0]).
msg137892 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2011-06-08 00:22
Victor, please do not use magic constants like that in C.  The symbolic values are available in include files:

#include <CoreServices/CoreServices.h>
SInt32 major = 0;
SInt32 minor = 0;   
Gestalt(gestaltSystemVersionMajor, &major);
Gestalt(gestaltSystemVersionMinor, &minor);
if ((major == 10 && minor >= 7) || major >= 11) { ... }

(See, for instance, http://www.cocoadev.com/index.pl?DeterminingOSVersion and http://stackoverflow.com/questions/2115373/os-version-checking-in-cocoa. The code in platform and _gestalt.c could stand to be updated at some point.)

But, again, mmap should *not* be changed until 10.7 has been released and the Apple fix is verified and only if it makes sense to add the additional complexity.
msg137901 - (view) Author: Steffen Daode Nurpmeso (sdaoden) Date: 2011-06-08 12:21
Ok, this patch could be used.
*Unless* the code is not protected by the GIL.

- Gestalt usage is a bit more complicated according to

    http://www.cocoadev.com/index.pl?DeterminingOSVersion

  unless Python only supports OS X 10.4 and later.
  (And platform.py correctly states that in _mac_ver_gestalt(),
  but see below.)

- Due to usage of Gestalt, '-framework CoreServices' must be
  linked against mmapmodule.c.
  The Python configuration stuff is interesting for me, i managed
  compilation by adding the line

    mmap mmapmodule.c -framework CoreServices

  to Modules/Setup, but i guess it's only OS X which is happy
  about that.

platform.py: _mac_ver_xml() should be dropped entirely according
to one of Ned Deily's links ("never officially supported"), and
_mac_ver_gestalt() obviously never executed because i guess it
would fail due to "versioninfo".  Unless i missed something.

By the way: where do you get the info from?  "sys1", "sys2",
"sys3"?  Cannot find it anywhere, only the long names, e.g.
gestaltSystemVersionXy.

Note that i've mailed Apple.  I did not pay 99$ or even 249$, so
i don't know if there will be a response.
--
Ciao, Steffen
sdaoden(*)(gmail.com)
() ascii ribbon campaign - against html e-mail
/\ www.asciiribbon.org - against proprietary attachments
msg137907 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2011-06-08 13:51
Steffen: _mac_ver_xml should not be dropped, it is a perfectly fine way to determine the system version.  Discussing it is also off-topic for this issue, please keep the discussion focussed.

Wrt. mailing Apple: I wouldn't expect and answer. Is there something specific you want to know? I'm currently at WWDC and might be able to ask the question at one of the labs (where Apple's engineers hang out).

If it is really necessary to check for the OS version to enable the OSX-specific bugfix it is possible to look at the uname information instead of using gestalt.  In particular something simular to this Python code:

   v = os.uname()[2]
   major = int(v.split('.')[0])
   if major <= 10:
      # We're on OSX 10.6 or earlier
      enableWorkaround()

This tests the kernel version instead of the system version, but until now the major version of the kernel has increased with every major release of the OS and I have no reason to suspect that Lion will be any different.

BTW2: OSX 10.7 is not released yet and should not be discussed in public fora, as you should know if you have legal access.
msg137964 - (view) Author: Steffen Daode Nurpmeso (sdaoden) Date: 2011-06-09 12:39
@ Ronald Oussoren wrote:
>    if major <= 10:
>       # We're on OSX 10.6 or earlier
>       enableWorkaround()

(You sound as if you participate in an interesting audiophonic
event.  27" imac's are indeed great recording studio hardware.
But no Coffee Shops in California - brrrrr.)
--
Ciao, Steffen
sdaoden(*)(gmail.com)
() ascii ribbon campaign - against html e-mail
/\ www.asciiribbon.org - against proprietary attachments
msg137967 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2011-06-09 13:52
steffen: I have no idea what you are trying to say in your last message. Could you please try to stay on topic.
msg139931 - (view) Author: Steffen Daode Nurpmeso (sdaoden) Date: 2011-07-06 12:19
So sorry that i'm stressing this, hopefully it's the final message.
Apples iterative kernel-update strategy resulted in these versions:

    14:02 ~/tmp $ /usr/sbin/sysctl kern.version
    kern.version: Darwin Kernel Version 10.8.0: Tue Jun  7 16:33:36 PDT 2011; root:xnu-1504.15.3~1/RELEASE_I386
    14:02 ~/tmp $ gcc -o zt osxversion.c -framework CoreServices
    14:03 ~/tmp $ ./zt 
    OS X version: 10.6.8
    apple_osx_needs_fullsync: -1

I.e. the new patch uses >10.7.0 or >=10.6.8 to avoid that
FULLFSYNC disaster (even slower than the Macrohard memory
allocator during "Wintel" partnership!), and we end up as:

    14:03 ~/src/cpython $ ./python.exe -E -Wd -m test -r -w -uall test_mmap
    Using random seed 8466468
    [1/1] test_mmap
    1 test OK.

P.S.: i still have no idea how to do '-framework CoreServices'
regulary.  Like i've said in #11046 i never used GNU Autoconf/M4,
sorry.  You know.  Maybe the version check should be moved
somewhere else and simply be exported, even replacing the stuff
from platform.py?  I don't know.  Bye.
--
Ciao, Steffen
sdaoden(*)(gmail.com)
() ascii ribbon campaign - against html e-mail
/\ www.asciiribbon.org - against proprietary attachments
History
Date User Action Args
2022-04-11 14:57:13adminsetgithub: 55486
2011-07-06 12:21:27sdaodensetfiles: - 11277.apple-fix-2.diff
2011-07-06 12:19:43sdaodensetfiles: + 11277.apple-fix-3.diff

messages: + msg139931
2011-06-09 13:52:53ronaldoussorensetmessages: + msg137967
2011-06-09 12:39:31sdaodensetmessages: + msg137964
2011-06-08 13:51:59ronaldoussorensetmessages: + msg137907
2011-06-08 12:27:47sdaodensetfiles: - 11277.apple-fix.diff
2011-06-08 12:21:14sdaodensetfiles: + 11277.apple-fix-2.diff

messages: + msg137901
2011-06-08 00:22:16ned.deilysetnosy: + ronaldoussoren
messages: + msg137892
2011-06-07 22:37:54vstinnersetmessages: + msg137891
2011-06-07 22:14:13sdaodensetmessages: + msg137889
2011-06-07 17:43:39ned.deilysetmessages: + msg137868
2011-06-07 12:00:20sdaodensetfiles: + 11277.apple-fix.diff

messages: + msg137817
2011-05-07 12:27:39nadeem.vawdasetstatus: open -> closed
resolution: fixed
messages: + msg135455

stage: needs patch -> resolved
2011-05-07 12:19:22python-devsetmessages: + msg135452
2011-05-07 12:05:51sdaodensetmessages: + msg135450
2011-05-07 11:57:49sdaodensetmessages: + msg135448
2011-05-07 11:17:51python-devsetmessages: + msg135446
2011-05-07 11:12:41python-devsetmessages: + msg135445
2011-05-07 09:42:42python-devsetmessages: + msg135429
2011-05-07 08:30:34vstinnersetmessages: + msg135417
2011-05-06 22:40:28nadeem.vawdasetmessages: + msg135376
2011-05-06 15:35:10sdaodensetfiles: - 11277-27.3.diff
2011-05-06 15:34:51sdaodensetfiles: - 11277-27.2.diff
2011-05-06 15:33:06sdaodensetfiles: + 11277-test_mmap.1.py, 11277-test_mmap-27.1.py

messages: + msg135308
2011-05-06 00:54:06nadeem.vawdasetmessages: + msg135255
2011-05-05 20:28:49sdaodensetmessages: + msg135239
2011-05-05 14:01:52sdaodensetmessages: + msg135203
2011-05-05 12:33:42vstinnersetmessages: + msg135193
2011-05-04 19:46:17sdaodensetmessages: + msg135152
2011-05-04 19:41:22python-devsetmessages: + msg135151
2011-05-04 19:27:50python-devsetmessages: + msg135150
2011-05-04 13:48:12sdaodensetfiles: + 11277-27.3.diff

messages: + msg135129
2011-05-04 12:08:11vstinnersetmessages: + msg135125
2011-05-04 12:02:21python-devsetmessages: + msg135124
2011-05-04 12:00:36vstinnersetstatus: closed -> open
resolution: fixed -> (no value)
messages: + msg135123
2011-05-03 13:23:33vstinnersetstatus: open -> closed
resolution: fixed
messages: + msg135037
2011-05-03 12:36:44python-devsetmessages: + msg135031
2011-05-03 12:18:20sdaodensetfiles: - 11277-27.1.diff
2011-05-03 12:14:38sdaodensetfiles: + 11277-27.2.diff

messages: + msg135030
2011-05-02 14:53:19vstinnersetmessages: + msg134977
2011-05-02 13:58:12sdaodensetfiles: - 11277.zsum32.c
2011-05-02 13:57:57sdaodensetfiles: + 11277-27.1.diff

messages: + msg134974
2011-05-01 23:22:40vstinnersetmessages: + msg134945
2011-05-01 23:15:31python-devsetnosy: + python-dev
messages: + msg134943
2011-04-27 14:17:34sdaodensetfiles: - 11277.4.diff
2011-04-27 14:17:23sdaodensetfiles: - 11277.3.diff
2011-04-27 14:17:06sdaodensetfiles: + 11277.5.diff

messages: + msg134566
2011-04-19 12:56:26vstinnersetmessages: + msg134047
2011-04-19 12:29:26sdaodensetmessages: + msg134045
title: test_zlib.test_big_buffer crashes under BSD (Mac OS X and FreeBSD) -> Crash with mmap and sparse files on Mac OS X
2011-04-19 12:07:54sdaodensetmessages: + msg134044
title: Crash with mmap and sparse files on Mac OS X -> test_zlib.test_big_buffer crashes under BSD (Mac OS X and FreeBSD)
2011-04-19 11:14:55sdaodensetfiles: + 11277.4.diff

messages: + msg134041
2011-04-19 10:47:41vstinnersetmessages: + msg134040
2011-04-19 10:43:04vstinnersetmessages: + msg134039
2011-04-19 10:38:46vstinnersettitle: test_zlib.test_big_buffer crashes under BSD (Mac OS X and FreeBSD) -> Crash with mmap and sparse files on Mac OS X
2011-04-19 10:34:10vstinnersetmessages: + msg134038
2011-04-19 10:15:16sdaodensetmessages: + msg134036
2011-04-19 10:15:03sdaodensetfiles: - 11277.mmap-2.py
2011-04-19 10:14:59sdaodensetfiles: - 11277.mmap-2.c
2011-04-19 10:14:55sdaodensetfiles: - 11277.mmap-1.c
2011-04-19 10:14:50sdaodensetfiles: - 11277.mmap.c
2011-04-19 10:13:17sdaodensetmessages: + msg134035
2011-04-19 10:12:24sdaodensetfiles: - 11277.2.diff
2011-04-19 10:12:15sdaodensetfiles: - 11277.1.diff
2011-04-19 10:12:09sdaodensetfiles: - doc_lib_mmap.patch
2011-04-19 10:10:37vstinnersetmessages: + msg134033
2011-04-19 10:08:12sdaodensetfiles: + 11277.3.diff

messages: + msg134032
2011-04-16 16:19:56sdaodensetmessages: + msg133896
2011-04-16 16:13:53sdaodensetfiles: + 11277.mmap-2.c, 11277.mmap-2.py

messages: + msg133894
2011-04-16 14:55:27nadeem.vawdasetmessages: + msg133892
stage: resolved -> needs patch
2011-04-15 18:12:23sdaodensetfiles: + 11277.2.diff, 11277.mmap-1.c

messages: + msg133860
2011-04-15 18:04:28sdaodensetfiles: - issue11277.2.patch
2011-04-15 15:20:54sdaodensetfiles: + 11277.1.diff, 11277.mmap.c, 11277.zsum32.c

messages: + msg133837
2011-04-14 19:29:46sdaodensetmessages: + msg133764
2011-04-14 14:16:56sdaodensetmessages: + msg133741
2011-04-13 21:43:24nadeem.vawdasetmessages: + msg133697
2011-04-13 20:35:06pitrousetmessages: + msg133689
2011-04-13 19:27:16skrahsetmessages: + msg133687
2011-04-13 16:30:17nadeem.vawdasetmessages: + msg133677
2011-04-08 16:26:26nadeem.vawdasetnosy: + nadeem.vawda
2011-04-06 18:48:14sdaodensetmessages: + msg133154
2011-04-05 10:54:28vstinnersettitle: test_zlib crashes under Snow Leopard buildbot -> test_zlib.test_big_buffer crashes under BSD (Mac OS X and FreeBSD)
2011-04-04 22:47:10skrahsetmessages: + msg132985
2011-04-04 22:45:39pitrousetmessages: + msg132984
2011-04-04 22:34:26pitrousetpriority: critical -> high
nosy: + skrah
messages: + msg132983

2011-04-04 11:57:25neologixsetmessages: + msg132941
2011-04-04 11:31:23vstinnersetmessages: + msg132940
2011-04-04 11:30:39vstinnersetstatus: closed -> open
resolution: fixed -> (no value)
messages: + msg132938
2011-02-26 19:39:45brett.cannonsetnosy: - brett.cannon
2011-02-26 11:30:27sdaodensetnosy: brett.cannon, ixokai, pitrou, vstinner, ned.deily, neologix, sdaoden
messages: + msg129531
2011-02-26 10:34:10sdaodensetnosy: brett.cannon, ixokai, pitrou, vstinner, ned.deily, neologix, sdaoden
messages: + msg129520
2011-02-25 17:05:19neologixsetnosy: brett.cannon, ixokai, pitrou, vstinner, ned.deily, neologix, sdaoden
messages: + msg129391
2011-02-23 12:44:33sdaodensetfiles: + doc_lib_mmap.patch
nosy: brett.cannon, ixokai, pitrou, vstinner, ned.deily, neologix, sdaoden
messages: + msg129184
2011-02-23 12:21:28sdaodensetfiles: - doc_lib_mmap.patch
nosy: brett.cannon, ixokai, pitrou, vstinner, ned.deily, neologix, sdaoden
2011-02-23 11:52:03sdaodensetfiles: + doc_lib_mmap.patch
nosy: brett.cannon, ixokai, pitrou, vstinner, ned.deily, neologix, sdaoden
messages: + msg129177
2011-02-22 22:37:03pitrousetstatus: open -> closed
nosy: brett.cannon, ixokai, pitrou, vstinner, ned.deily, neologix, sdaoden
messages: + msg129140

resolution: fixed
stage: needs patch -> resolved
2011-02-22 21:39:15pitrousetnosy: brett.cannon, ixokai, pitrou, vstinner, ned.deily, neologix, sdaoden
messages: + msg129133
2011-02-22 20:37:45sdaodensetnosy: brett.cannon, ixokai, pitrou, vstinner, ned.deily, neologix, sdaoden
messages: + msg129126
2011-02-22 20:35:38sdaodensetnosy: brett.cannon, ixokai, pitrou, vstinner, ned.deily, neologix, sdaoden
messages: + msg129125
2011-02-22 20:32:26sdaodensetnosy: brett.cannon, ixokai, pitrou, vstinner, ned.deily, neologix, sdaoden
messages: + msg129124
2011-02-22 20:23:28sdaodensetnosy: brett.cannon, ixokai, pitrou, vstinner, ned.deily, neologix, sdaoden
messages: + msg129120
2011-02-22 18:07:19neologixsetnosy: + neologix
messages: + msg129107
2011-02-22 16:37:26pitrousetnosy: brett.cannon, ixokai, pitrou, vstinner, ned.deily, sdaoden
messages: + msg129093
2011-02-22 15:58:30sdaodensetnosy: brett.cannon, ixokai, pitrou, vstinner, ned.deily, sdaoden
messages: + msg129091
2011-02-22 15:56:28pitrousetnosy: brett.cannon, ixokai, pitrou, vstinner, ned.deily, sdaoden
messages: + msg129090
2011-02-22 15:39:17sdaodensetnosy: brett.cannon, ixokai, pitrou, vstinner, ned.deily, sdaoden
messages: + msg129087
2011-02-22 15:37:57sdaodensetnosy: brett.cannon, ixokai, pitrou, vstinner, ned.deily, sdaoden
messages: + msg129086
2011-02-22 13:30:11pitrousetnosy: brett.cannon, ixokai, pitrou, vstinner, ned.deily, sdaoden
messages: + msg129073
2011-02-22 13:22:39sdaodensetnosy: brett.cannon, ixokai, pitrou, vstinner, ned.deily, sdaoden
messages: + msg129072
2011-02-22 13:08:16sdaodensetnosy: brett.cannon, ixokai, pitrou, vstinner, ned.deily, sdaoden
messages: + msg129071
2011-02-22 13:07:07sdaodensetfiles: - issue11277.patch
nosy: brett.cannon, ixokai, pitrou, vstinner, ned.deily, sdaoden
2011-02-22 13:06:10sdaodensetfiles: + issue11277.2.patch
nosy: brett.cannon, ixokai, pitrou, vstinner, ned.deily, sdaoden
messages: + msg129069
2011-02-22 12:56:35sdaodensetnosy: brett.cannon, ixokai, pitrou, vstinner, ned.deily, sdaoden
messages: + msg129067
2011-02-22 12:50:31pitrousetnosy: brett.cannon, ixokai, pitrou, vstinner, ned.deily, sdaoden
messages: + msg129066
2011-02-22 12:40:01sdaodensetnosy: brett.cannon, ixokai, pitrou, vstinner, ned.deily, sdaoden
messages: + msg129063
2011-02-22 12:24:31sdaodensetnosy: brett.cannon, ixokai, pitrou, vstinner, ned.deily, sdaoden
messages: + msg129061
2011-02-22 12:18:08pitrousetnosy: brett.cannon, ixokai, pitrou, vstinner, ned.deily, sdaoden
messages: + msg129058
2011-02-22 12:17:19sdaodensetfiles: + issue11277.patch
nosy: brett.cannon, ixokai, pitrou, vstinner, ned.deily, sdaoden
messages: + msg129057
2011-02-22 12:17:02sdaodensetfiles: - issue11277.patch
nosy: brett.cannon, ixokai, pitrou, vstinner, ned.deily, sdaoden
2011-02-22 12:15:00sdaodensetfiles: + issue11277.patch

messages: + msg129056
keywords: + patch
nosy: brett.cannon, ixokai, pitrou, vstinner, ned.deily, sdaoden
2011-02-22 11:42:05sdaodensetnosy: brett.cannon, ixokai, pitrou, vstinner, ned.deily, sdaoden
messages: + msg129054
2011-02-22 11:30:01pitrousetnosy: brett.cannon, ixokai, pitrou, vstinner, ned.deily, sdaoden
messages: + msg129053
2011-02-22 11:29:03sdaodensetnosy: + sdaoden
messages: + msg129052
2011-02-22 11:04:10pitrousetnosy: brett.cannon, ixokai, pitrou, vstinner, ned.deily
messages: + msg129050
2011-02-22 04:06:21brett.cannonsetnosy: brett.cannon, ixokai, pitrou, vstinner, ned.deily
messages: + msg129034
stage: needs patch
2011-02-22 03:15:38ned.deilysetnosy: brett.cannon, ixokai, pitrou, vstinner, ned.deily
messages: + msg129029
2011-02-22 01:30:46brett.cannonsetnosy: brett.cannon, ixokai, pitrou, vstinner, ned.deily
messages: + msg129023
2011-02-21 23:45:29brett.cannonsetnosy: + brett.cannon
2011-02-21 23:43:35ned.deilysetnosy: ixokai, pitrou, vstinner, ned.deily
messages: + msg129011
2011-02-21 22:34:16ned.deilysetnosy: ixokai, pitrou, vstinner, ned.deily
messages: + msg129006
2011-02-21 22:22:54pitrousetnosy: ixokai, pitrou, vstinner, ned.deily
messages: + msg129004
2011-02-21 22:21:59vstinnersetnosy: + vstinner
messages: + msg129003
2011-02-21 22:13:15pitroucreate