Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash with mmap and sparse files on Mac OS X #55486

Closed
pitrou opened this issue Feb 21, 2011 · 108 comments
Closed

Crash with mmap and sparse files on Mac OS X #55486

pitrou opened this issue Feb 21, 2011 · 108 comments
Labels
type-crash A hard crash of the interpreter, possibly with a core dump

Comments

@pitrou
Copy link
Member

pitrou commented Feb 21, 2011

BPO 11277
Nosy @ronaldoussoren, @pitrou, @vstinner, @ned-deily, @skrah
Files
  • 11277.5.diff
  • 11277-test_mmap.1.py
  • 11277-test_mmap-27.1.py
  • 11277.apple-fix-3.diff
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2011-05-07.12:27:39.381>
    created_at = <Date 2011-02-21.22:13:15.174>
    labels = ['type-crash']
    title = 'Crash with mmap and sparse files on Mac OS X'
    updated_at = <Date 2011-07-06.12:21:27.143>
    user = 'https://github.com/pitrou'

    bugs.python.org fields:

    activity = <Date 2011-07-06.12:21:27.143>
    actor = 'sdaoden'
    assignee = 'none'
    closed = True
    closed_date = <Date 2011-05-07.12:27:39.381>
    closer = 'nadeem.vawda'
    components = []
    creation = <Date 2011-02-21.22:13:15.174>
    creator = 'pitrou'
    dependencies = []
    files = ['21798', '21909', '21910', '22593']
    hgrepos = []
    issue_num = 11277
    keywords = ['patch']
    message_count = 108.0
    messages = ['129002', '129003', '129004', '129006', '129011', '129023', '129029', '129034', '129050', '129052', '129053', '129054', '129056', '129057', '129058', '129061', '129063', '129066', '129067', '129069', '129071', '129072', '129073', '129086', '129087', '129090', '129091', '129093', '129107', '129120', '129124', '129125', '129126', '129133', '129140', '129177', '129184', '129391', '129520', '129531', '132938', '132940', '132941', '132983', '132984', '132985', '133154', '133677', '133687', '133689', '133697', '133741', '133764', '133837', '133860', '133892', '133894', '133896', '134032', '134033', '134035', '134036', '134038', '134039', '134040', '134041', '134044', '134045', '134047', '134566', '134943', '134945', '134974', '134977', '135030', '135031', '135037', '135123', '135124', '135125', '135129', '135150', '135151', '135152', '135193', '135203', '135239', '135255', '135308', '135376', '135417', '135429', '135445', '135446', '135448', '135450', '135452', '135455', '137817', '137868', '137889', '137891', '137892', '137901', '137907', '137964', '137967', '139931']
    nosy_count = 10.0
    nosy_names = ['ixokai', 'ronaldoussoren', 'pitrou', 'vstinner', 'nadeem.vawda', 'ned.deily', 'skrah', 'neologix', 'sdaoden', 'python-dev']
    pr_nums = []
    priority = 'high'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'crash'
    url = 'https://bugs.python.org/issue11277'
    versions = ['Python 3.2', 'Python 3.3']

    @pitrou
    Copy link
    Member Author

    pitrou commented Feb 21, 2011

    Following r88460 (bpo-10276), test_zlib crashes on the Snow Leopard buildbot (apparently in the new "test_big_buffer" test case).

    @pitrou pitrou added the type-crash A hard crash of the interpreter, possibly with a core dump label Feb 21, 2011
    @vstinner
    Copy link
    Member

    Do adler32() and crc32() support length up to UINT32_MAX? Or should we maybe limit the length to INT32_MAX?

    @pitrou
    Copy link
    Member Author

    pitrou commented Feb 21, 2011

    I've tried INT_MAX and it didn't change anything.

    @ned-deily
    Copy link
    Member

    Current OS X zlib is 1.2.3. Test crashes with most recently released zlib, 1.2.5, as well.

    @ned-deily
    Copy link
    Member

    Program received signal EXC_BAD_ACCESS, Could not access memory.
    Reason: 10 at address: 0x000000010170e000
    0x00000001016eeaa0 in crc32 ()

    (gdb) backtrace
    #0 0x00000001016eeaa0 in crc32 ()
    #1 0x00000001016e806d in PyZlib_crc32 (self=0x1016aa588, args=0x1016bf220) at /private/tmp/a/py3k/Modules/zlibmodule.c:993

    PyZlib_crc32(PyObject *self, PyObject *args)
    ...
            while (len > (size_t) UINT_MAX) {
                crc32val = crc32(crc32val, buf, UINT_MAX);
    ...

    @brettcannon
    Copy link
    Member

    So on my system, that 'while' loop is executed once (put a printf() after the bug and len adjustments and it was never hit).

    @ned-deily
    Copy link
    Member

    >>> from test.support import _4G
    >>> _4G
    4294967296
    >>> mapping.size()
    4294967300
    pbuf.len = 4294967300, len = 4294967300
    UINT_MAX = 4294967295

    @brettcannon
    Copy link
    Member

    Does it matter that _4G < UINT_MAX?

    @pitrou
    Copy link
    Member Author

    pitrou commented Feb 22, 2011

    Does it matter that _4G < UINT_MAX?

    You mean _4G > UINT_MAX, right?
    Yes, it matters, otherwise that defeats the point of the test :)

    @sdaoden
    Copy link
    Mannequin

    sdaoden mannequin commented Feb 22, 2011

    'Have no glue, but Ned Daily's patch (msg129011) seems to be required for adler, too. (You know...)

    @pitrou
    Copy link
    Member Author

    pitrou commented Feb 22, 2011

    Well, it's not a patch, just a traceback :)

    @sdaoden
    Copy link
    Mannequin

    sdaoden mannequin commented Feb 22, 2011

    Wait a few minutes, i'll write this simple patch for adler and crc. But excessive testing and such is beyond my current capabilities.

    @sdaoden
    Copy link
    Mannequin

    sdaoden mannequin commented Feb 22, 2011

    File: bpo-11277.patch.
    Hmm. Two non-register constants and equal code on 32 and 64 bit. Does Python has a '64 bit' switch or the like - PY_SSIZE_T_MAX is not preprocessor-clean, i would guess.

    @sdaoden
    Copy link
    Mannequin

    sdaoden mannequin commented Feb 22, 2011

    Sorry - that was a mess.

    @pitrou
    Copy link
    Member Author

    pitrou commented Feb 22, 2011

    File: bpo-11277.patch.
    Hmm. Two non-register constants and equal code on 32 and 64 bit.
    Does Python has a '64 bit' switch or the like - PY_SSIZE_T_MAX is not
    preprocessor-clean, i would guess.

    Er, how is this patch different from r88460?

    @sdaoden
    Copy link
    Mannequin

    sdaoden mannequin commented Feb 22, 2011

    I guess not at all. Well.

    @sdaoden
    Copy link
    Mannequin

    sdaoden mannequin commented Feb 22, 2011

    test_zlib.py (with my patch but that's somewhat identical in the end, say) does

    .............................s.......
    ----------------------------------------------------------------------
    Ran 37 tests in 1.809s

    OK (skipped=1)

    This is on Snow Leopard 64 bit, 02b70cb59701 (r88451) -> Python 3.3a0.
    Is there a switch i must trigger? Just pulled 24 changesets, recompiling and trying again with r88460.

    @pitrou
    Copy link
    Member Author

    pitrou commented Feb 22, 2011

    This is on Snow Leopard 64 bit, 02b70cb59701 (r88451) -> Python 3.3a0.
    Is there a switch i must trigger? Just pulled 24 changesets,
    recompiling and trying again with r88460.

    Have you tried "./python -m test -v -uall test_zlib" ?

    @sdaoden
    Copy link
    Mannequin

    sdaoden mannequin commented Feb 22, 2011

    No, i've got no idea of this framework... Just did 'python3 test_zlib.py' directly. Thanks for the switch. But i can't test your thing due to bpo-11285, so this may take a while (others have more knowledge anyway)..

    (P.S.: your constant-folding stack patch is a great thing, just wanted to say this once..)

    @sdaoden
    Copy link
    Mannequin

    sdaoden mannequin commented Feb 22, 2011

    So here is this (with my patch, but this is for real: bpo-11277.2.patch):

    == CPython 3.3a0 (py3k, Feb 22 2011, 14:00:52) [GCC 4.2.1 (Apple Inc. build 5664)]
    == Darwin-10.6.0-i386-64bit little-endian
    == /private/var/folders/Da/DaZX3-k5G8a57zw6MSmjJ++++TM/-Tmp-/test_python_89365
    Testing with flags: sys.flags(debug=0, division_warning=0, inspect=0, interactive=0, optimize=0, dont_write_bytecode=0, no_user_site=0, no_site=0, ignore_environment=0, verbose=0, bytes_warning=0, quiet=0)
    [1/1] test_zlib
    test_adler32empty (test.test_zlib.ChecksumTestCase) ... ok
    test_adler32start (test.test_zlib.ChecksumTestCase) ... ok
    test_crc32_adler32_unsigned (test.test_zlib.ChecksumTestCase) ... ok
    test_crc32empty (test.test_zlib.ChecksumTestCase) ... ok
    test_crc32start (test.test_zlib.ChecksumTestCase) ... ok
    test_penguins (test.test_zlib.ChecksumTestCase) ... ok
    test_same_as_binascii_crc32 (test.test_zlib.ChecksumTestCase) ... ok
    test_badargs (test.test_zlib.ExceptionTestCase) ... ok
    test_badcompressobj (test.test_zlib.ExceptionTestCase) ... ok
    test_baddecompressobj (test.test_zlib.ExceptionTestCase) ... ok
    test_badlevel (test.test_zlib.ExceptionTestCase) ... ok
    test_decompressobj_badflush (test.test_zlib.ExceptionTestCase) ... ok
    test_big_compress_buffer (test.test_zlib.CompressTestCase) ... ok
    test_big_decompress_buffer (test.test_zlib.CompressTestCase) ... ok
    test_incomplete_stream (test.test_zlib.CompressTestCase) ... ok
    test_length_overflow (test.test_zlib.CompressTestCase) ... skipped 'not enough free memory, need at least 4 GB'
    test_speech (test.test_zlib.CompressTestCase) ... ok
    test_speech128 (test.test_zlib.CompressTestCase) ... ok
    test_badcompresscopy (test.test_zlib.CompressObjectTestCase) ... ok
    test_baddecompresscopy (test.test_zlib.CompressObjectTestCase) ... ok
    test_big_compress_buffer (test.test_zlib.CompressObjectTestCase) ... ok
    test_big_decompress_buffer (test.test_zlib.CompressObjectTestCase) ... ok
    test_compresscopy (test.test_zlib.CompressObjectTestCase) ... ok
    test_compressincremental (test.test_zlib.CompressObjectTestCase) ... ok
    test_compressoptions (test.test_zlib.CompressObjectTestCase) ... ok
    test_decompimax (test.test_zlib.CompressObjectTestCase) ... ok
    test_decompinc (test.test_zlib.CompressObjectTestCase) ... ok
    test_decompincflush (test.test_zlib.CompressObjectTestCase) ... ok
    test_decompress_incomplete_stream (test.test_zlib.CompressObjectTestCase) ... ok
    test_decompresscopy (test.test_zlib.CompressObjectTestCase) ... ok
    test_decompressmaxlen (test.test_zlib.CompressObjectTestCase) ... ok
    test_decompressmaxlenflush (test.test_zlib.CompressObjectTestCase) ... ok
    test_empty_flush (test.test_zlib.CompressObjectTestCase) ... ok
    test_flushes (test.test_zlib.CompressObjectTestCase) ... ok
    test_maxlenmisc (test.test_zlib.CompressObjectTestCase) ... ok
    test_odd_flush (test.test_zlib.CompressObjectTestCase) ... ok
    test_pair (test.test_zlib.CompressObjectTestCase) ... ok

    ----------------------------------------------------------------------
    Ran 37 tests in 1.789s

    OK (skipped=1)
    1 test OK.

    @sdaoden
    Copy link
    Mannequin

    sdaoden mannequin commented Feb 22, 2011

    (Is not that much help for a >4GB error, huh?)

    @sdaoden
    Copy link
    Mannequin

    sdaoden mannequin commented Feb 22, 2011

    Just stepping ... with c8d1f99f25eb/r88476:

    == CPython 3.3a0 (py3k, Feb 22 2011, 14:18:19) [GCC 4.2.1 (Apple Inc. build 5664)]
    == Darwin-10.6.0-i386-64bit little-endian
    == /private/var/folders/Da/DaZX3-k5G8a57zw6MSmjJ++++TM/-Tmp-/test_python_5126
    Testing with flags: sys.flags(debug=0, division_warning=0, inspect=0, interactive=0, optimize=0, dont_write_bytecode=0, no_user_site=0, no_site=0, ignore_environment=0, verbose=0, bytes_warning=0, quiet=0)
    [1/1] test_zlib
    test_adler32empty (test.test_zlib.ChecksumTestCase) ... ok
    test_adler32start (test.test_zlib.ChecksumTestCase) ... ok
    test_crc32_adler32_unsigned (test.test_zlib.ChecksumTestCase) ... ok
    test_crc32empty (test.test_zlib.ChecksumTestCase) ... ok
    test_crc32start (test.test_zlib.ChecksumTestCase) ... ok
    test_penguins (test.test_zlib.ChecksumTestCase) ... ok
    test_same_as_binascii_crc32 (test.test_zlib.ChecksumTestCase) ... ok
    test_big_buffer (test.test_zlib.ChecksumBigBufferTestCase) ...
    ^C
    ^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C
    Bus error

    @pitrou
    Copy link
    Member Author

    pitrou commented Feb 22, 2011

    Just stepping ... with c8d1f99f25eb/r88476:

    Right, that's what we should investigate :)
    Could try to diagnose the crash?

    @sdaoden
    Copy link
    Mannequin

    sdaoden mannequin commented Feb 22, 2011

    .. even with a self-compiled 1.2.3, INT_MAX/1000 ... nothing.
    The problem is not crc32(), but the buffer itself:

       if (pbuf.len > 1024*5) {
            unsigned char *buf = pbuf.buf;
            Py_ssize_t len = pbuf.len;
            Py_ssize_t i;
    fprintf(stderr, "CRC 32 2.1\n");
    for(i=0; (size_t)i < (size_t)len;++i)
        *buf++ = 1;
    fprintf(stderr, "CRC 32 2.2\n");

    2.2 is never reached (in fact accessing buf[1] already causes fault).
    Thus the problem is not zlib, but PyArg_ParseTuple().
    But just don't ask me more on that!

    @sdaoden
    Copy link
    Mannequin

    sdaoden mannequin commented Feb 22, 2011

    (P.S.: of course talking about ChecksumBigBufferTestCase and the 4GB, say.)

    @pitrou
    Copy link
    Member Author

    pitrou commented Feb 22, 2011

    .. even with a self-compiled 1.2.3, INT_MAX/1000 ... nothing.
    The problem is not crc32(), but the buffer itself:

    if (pbuf.len > 1024*5) {
    unsigned char *buf = pbuf.buf;
    Py_ssize_t len = pbuf.len;
    Py_ssize_t i;
    fprintf(stderr, "CRC 32 2.1\n");
    for(i=0; (size_t)i < (size_t)len;++i)
    *buf++ = 1;
    fprintf(stderr, "CRC 32 2.2\n");

    Thank you! So it's perhaps a bug in mmap on Snow Leopard.
    Could you try to debug a bit more precisely and see at which buffer
    offset (from the start) the fault occurs?

    @sdaoden
    Copy link
    Mannequin

    sdaoden mannequin commented Feb 22, 2011

    Snippet

        if (pbuf.len > 1024*5) {
            volatile unsigned char *buf = pbuf.buf;
            Py_ssize_t len = pbuf.len;
    Py_ssize_t i = 0;
    volatile unsigned char au[100];
    volatile unsigned char*x = au;
            fprintf(stderr, "CRC ENTER, buffer=%p\n", buf);
    for (i=0; (size_t)i < (size_t)len; ++i) {
        fprintf(stderr, "%ld, buf=%p\n", (signed long)i, buf);
        *x = *buf++;
    }

    results in

    test_big_buffer (test.test_zlib.ChecksumBigBufferTestCase) ... CRC ENTER, buffer=0x1014ab000
    0, buf=0x1014ab000

    @pitrou
    Copy link
    Member Author

    pitrou commented Feb 22, 2011

    Out of curiosity, could you try the following patch?

    Index: Lib/test/test_zlib.py
    ===================================================================

    --- Lib/test/test_zlib.py	(révision 88500)
    +++ Lib/test/test_zlib.py	(copie de travail)
    @@ -70,7 +70,7 @@
             with open(support.TESTFN, "wb+") as f:
                 f.seek(_4G)
                 f.write(b"asdf")
    -            f.flush()
    +        with open(support.TESTFN, "rb") as f:
                 self.mapping = mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ)
     
         def tearDown(self):

    @neologix
    Copy link
    Mannequin

    neologix mannequin commented Feb 22, 2011

    .. even with a self-compiled 1.2.3, INT_MAX/1000 ... nothing.
    The problem is not crc32(), but the buffer itself:

    if (pbuf.len > 1024*5) {
    unsigned char *buf = pbuf.buf;
    Py_ssize_t len = pbuf.len;
    Py_ssize_t i;
    fprintf(stderr, "CRC 32 2.1\n");
    for(i=0; (size_t)i < (size_t)len;++i)
    *buf++ = 1;
    fprintf(stderr, "CRC 32 2.2\n");

    Unless I'm mistaken, in the test the file is mapped with PROT_READ, so it's normal to get SIGSEGV when writting to it:

       def setUp(self): 
                with open(support.TESTFN, "wb+") as f: 
                    f.seek(_4G) 
                    f.write(b"asdf") 
                    f.flush() 
                    self.mapping = mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ) 

    for(i=0; (size_t)i < (size_t)len;++i)
    *buf++ = 1;

    But it seems you're also getting segfaults when only reading it, right ?

    I've got a stupid question: how much memory do you have ?
    Cause there seems to be some issues with page cache when reading mmaped files on OS-X:
    http://lists.apple.com/archives/darwin-development/2003/Jun/msg00141.html

    On Linux, the page cache won't fill forever, so you don't need to have enough free memory to accomodate the whole file (the page cache should grow, but not forever). But on OS-X, it seems that the page replacement algorithm seems to retain mmaped pages in the page cache much longer, which could potentially trigger an OOM later (because of overcommitting, mmap can very well return a valid address range which leads to a segfault when accessed later).
    I'm not sure why it would segfault on the first page, though.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented May 4, 2011

    New changeset 7f3cab59ef3e by Victor Stinner in branch '2.7':
    Issue bpo-11277: test_zlib tests a buffer of 1 GB on 32 bits
    http://hg.python.org/cpython/rev/7f3cab59ef3e

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented May 4, 2011

    New changeset e6a4deb84e47 by Victor Stinner in branch '2.7':
    Issue bpo-11277: oops, fix checksum values of test_zlib on 32 bits
    http://hg.python.org/cpython/rev/e6a4deb84e47

    @sdaoden
    Copy link
    Mannequin

    sdaoden mannequin commented May 4, 2011

    @Haypo: Oh. Not:

       if sys.maxsize > _4G:
            # (64 bits system) crc32() and adler32() stores the buffer size into an
            # int, the maximum filesize is INT_MAX (0x7FFFFFFF)
            filesize = 0x7FFFFFFF
            crc_res = 0x709418e7
            adler_res = -2072837729
        else:
            # (32 bits system) On a 32 bits OS, a process cannot usually address
            # more than 2 GB, so test only 1 GB
            filesize = _1G
            crc_res = 0x2b09ee11
            adler_res = -1002962529
                    self.assertEqual(zlib.crc32(m), self.crc_res)
                    self.assertEqual(zlib.adler32(m), self.adler_res)
    

    I'm not that fast.

    @vstinner
    Copy link
    Member

    vstinner commented May 5, 2011

    @sdaoden(, @pitrou): Antoine proposes to skip the zlib "big buffer" (1 GB) test on 32 bits system. What do you think?

    On 64 bits system, we check a buffer of 2 GB-1 byte (0x7FFFFFFF bytes). Is the test useful or not? What do we test?

    Can you check if the test crashs on Mac OS X on a 32 bits system (1 GB buffer) if you disable F_FULLFSYNC in mmapmodule.c? Same question on a 64 bits system (2 GB-1 byte buffer)?

    The most important test if to test crc32 & adler32 with a buffer bigger than 4 GB, but we cannot write such test in Python 2.7 because the zlib module stores buffer sizes into int variables. So the "big buffer" test of Python 2.7 test_zlib is maybe just useful (on 32 and 64 bits). Can we just remove the test?

    @sdaoden
    Copy link
    Mannequin

    sdaoden mannequin commented May 5, 2011

    @Haypo: trouble, trouble on the dev-list, as i've seen currently.
    Sorry, sorry. (Cannot subscribe, my DynIP's are often blacklisted ;)
    Of course my comments were completely wrong, as Ethan has pointed
    out correctly.

    This is all s**t. These are mmap(2) related issues and should be
    tested in Lib/test/test_mmap.py. However that does not use
    with open:
    create sparse file
    materialize
    yet so that the Pear OS X sparsefile bug doesn't show up. In fact
    it doesn't do a full beam-me-up test at all yet?

    Is the test useful or not? What do we test?

    We do test that mmap.mmap materializes a buffer which can be
    accessed (readonly) from [0]..[len-1].
    And that the checksums that zlib produces for that buffer are
    correct. Unfortunately we cannot test 0x80000000+ no more because
    Python prevents that such a buffer can be used - that's a shame.
    Maybe we could test 0x7FFFFFFF*2 aka 0xfffffffe in two iterations.

    Can you check if the test crashs on Mac OS X on a 32 bits system
    (1 GB buffer) if you disable F_FULLFSYNC in mmapmodule.c? Same
    question on a 64 bits system (2 GB-1 byte buffer)?

    Aeh - F_FULLFSYNC was not yet committed at that time in 2.7.

    Can we just remove the test?

    If i - choke! - need to write tests, i try to catch corner cases.
    The corner cases would be 0,MAX_LEN(-1) and some (rather pseudo)
    random values around these and maybe some in the middle.
    (Plus some invalid inputs.)

    Can we remove it? I would keep it, Apple is preparing the next
    major release (uname -a yet states 10.7.0 even though it's
    10.6.7), and maybe then mmap() will fail for 0xDEADBEEF.
    Who will be the one which detects that otherwise??

    @sdaoden
    Copy link
    Mannequin

    sdaoden mannequin commented May 5, 2011

    In fact i like my idea of using iterations.
    I have some time tomorrow, so if nobody complains until then,
    i write diffs for the tests of 3.x and 2.7 with these updates:

    • Two different target sizes:
      1. 0xFFFFFFFF + x (7)
      2. 0x7FFFFFFF + x (7)
    • On 32 bit systems, use iterations on a potentially safe buffer
      size. I think 0x40000000 a.k.a 1024*1024*1024 is affordable,
      but 512 MB are probably more safe? I'll make that a variable.
    • The string will be 'DeadAffe' (8).
    • The last 4 bytes of the string will always be read on their own
      (just in case the large buffer sizes irritated something down
      the path).

    @nadeemvawda
    Copy link
    Mannequin

    nadeemvawda mannequin commented May 6, 2011

    haypo> Can we just remove the test?

    I think so. The test was originally intended to catch the case where crc32() or
    adler32() would get a buffer of >=4GB, and then silently truncate the size and
    produce an incorrect result (bpo-10276). However, 2.7's zlib doesn't define
    PY_SSIZE_T_CLEAN, so passing in a buffer of >=2GB raises an exception. So the
    condition that it was testing for can't happen in 2.7.

    sdaoden> Can we remove it? I would keep it, Apple is preparing the next
    sdaoden> major release (uname -a yet states 10.7.0 even though it's
    sdaoden> 10.6.7), and maybe then mmap() will fail for 0xDEADBEEF.
    sdaoden> Who will be the one which detects that otherwise??

    I initially thought the same thing, but it turns out that the OS X sparsefile
    crash is also covered by LargeMmapTests.test_large_offset() in test_mmap.
    That test had also been failing sporadically before the F_FULLSYNC patch was
    committed (see bpo-11779). So keeping this test around would be redundant.

    sdaoden> Unfortunately we cannot test 0x80000000+ no more because
    sdaoden> Python prevents that such a buffer can be used - that's a shame.
    sdaoden> Maybe we could test 0x7FFFFFFF*2 aka 0xfffffffe in two iterations.

    That wouldn't accomplish the same thing. The point of the test is to pick up
    truncation issues that occur when you pass in a big buffer. These issues
    won't show up if you split the data up into smaller pieces. And in any case,
    they can't happen at all in 2.7, because the functions don't accept big
    buffers in the first place ;)

    @sdaoden
    Copy link
    Mannequin

    sdaoden mannequin commented May 6, 2011

    On Fri, 6 May 2011 02:54:07 +0200, Nadeem Vawda wrote:

    I think so. [.]
    it turns out that the OS X sparsefile crash is also covered by
    LargeMmapTests.test_large_offset() in test_mmap [!!!]. [.]

    So i followed your suggestion and did not do something on zlib no
    more. Even if that means that there is no test which checksums an
    entire superlarge mmap() region.
    Instead i've changed/added test cases in test_mmap.py:

    • Removed all context-manager usage from LargeMmapTests().
      This feature has been introduced in 3.2 and is already tested
      elsewhere. Like this the test is almost identical on 2.7 and 3.x.
    • I've dropped _working_largefile(). This creates a useless large
      file only to unlink it directly. Instead the necessary try:catch:
      is done directly in the tests.
    • (Directly testing after .flush() without reopening the file.)
    • These new tests don't run on 32 bit.

    May the juice be with you

    @nadeemvawda
    Copy link
    Mannequin

    nadeemvawda mannequin commented May 6, 2011

    Thanks for the tests; I'll review and commit them tomorrow morning.

    Even if that means that there is no test which checksums an
    entire superlarge mmap() region.

    Bear in mind that the test is only to be removed from 2.7; it will still
    be present in the 3.* branches, where crc32() and adler32() actually can
    accept such large inputs.

    @Haypo, @pitrou: Are there any objections to removing test_big_buffer()
    from Lib/test/test_zlib.py? If not, I think we can close this issue once
    that and the additional mmap tests are committed.

    @vstinner
    Copy link
    Member

    vstinner commented May 7, 2011

    @Haypo, @pitrou: Are there any objections to removing test_big_buffer()
    from Lib/test/test_zlib.py?

    I now agree Antoine: the test is useless. It can be removed today.

    About mmap: add a new test for this issue (mmap on Mac OS X and F_FULLSYNC) is a good idea. I suppose that we will need to backport the F_FULLSYNC fix too.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented May 7, 2011

    New changeset 201dcfc56e86 by Nadeem Vawda in branch '2.7':
    Issue bpo-11277: Remove useless test from test_zlib.
    http://hg.python.org/cpython/rev/201dcfc56e86

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented May 7, 2011

    New changeset d5d4f2967879 by Nadeem Vawda in branch '3.1':
    Issue bpo-11277: Add tests for mmap crash when using large sparse files on OS X.
    http://hg.python.org/cpython/rev/d5d4f2967879

    New changeset e447a68742e7 by Nadeem Vawda in branch '3.2':
    Merge: bpo-11277: Add tests for mmap crash when using large sparse files on OS X.
    http://hg.python.org/cpython/rev/e447a68742e7

    New changeset bc13badf10a1 by Nadeem Vawda in branch 'default':
    Merge: bpo-11277: Add tests for mmap crash when using large sparse files on OS X.
    http://hg.python.org/cpython/rev/bc13badf10a1

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented May 7, 2011

    New changeset 8d27d2b22394 by Nadeem Vawda in branch '2.7':
    Issue bpo-11277: Add tests for mmap crash when using large sparse files on OS X.
    http://hg.python.org/cpython/rev/8d27d2b22394

    @sdaoden
    Copy link
    Mannequin

    sdaoden mannequin commented May 7, 2011

    @nadeem: note that the committed versions of the tests would not
    show up the Mac OS X mmap() bug AFAIK, because there is an
    intermediate .close() of the file to be mmapped. The OS X bug is
    that the VMS/VFS interaction fails to provide a valid memory
    region for <<pages which are not yet physically present on disc>>

    • i.e. there is no true sparse file support as on Linux, which
      simply uses references to a single COW zero page.
      (I've not tried it out for real yet, but i'm foolish like a prowd
      cock, so i've looked at the changeset :)

    @sdaoden
    Copy link
    Mannequin

    sdaoden mannequin commented May 7, 2011

    (Of course this may also be intentional, say.
    But then i would vote against it :), because it's better the
    tests bring out errors than end-user apps.)

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented May 7, 2011

    New changeset 9b9f0de19684 by Nadeem Vawda in branch '2.7':
    Issue bpo-11277: Fix tests - crash will not trigger if the file is closed and reopened.
    http://hg.python.org/cpython/rev/9b9f0de19684

    New changeset b112c72f8c01 by Nadeem Vawda in branch '3.1':
    Issue bpo-11277: Fix tests - crash will not trigger if the file is closed and reopened.
    http://hg.python.org/cpython/rev/b112c72f8c01

    New changeset a9da17fcb564 by Nadeem Vawda in branch '3.2':
    Merge: bpo-11277: Fix tests - crash will not trigger if the file is closed and reopened.
    http://hg.python.org/cpython/rev/a9da17fcb564

    New changeset b3a94906c4a0 by Nadeem Vawda in branch 'default':
    Merge: bpo-11277: Fix tests - crash will not trigger if the file is closed and reopened.
    http://hg.python.org/cpython/rev/b3a94906c4a0

    @nadeemvawda
    Copy link
    Mannequin

    nadeemvawda mannequin commented May 7, 2011

    sdaoden> @nadeem: note that the committed versions of the tests would not
    sdaoden> show up the Mac OS X mmap() bug AFAIK, because there is an
    sdaoden> intermediate .close() of the file to be mmapped.

    Thanks for catching that. Should be fixed now.

    haypo> I now agree Antoine: the test is useless. It can be removed today.
    haypo>
    haypo> About mmap: add a new test for this issue (mmap on Mac OS X and
    haypo> F_FULLSYNC) is a good idea.

    Done and done.

    haypo> I suppose that we will need to backport the F_FULLSYNC fix too.

    It has already been backported, as changeset 618c3e971e80.

    @nadeemvawda nadeemvawda mannequin closed this as completed May 7, 2011
    @sdaoden
    Copy link
    Mannequin

    sdaoden mannequin commented Jun 7, 2011

    Aehm, note that Apple has fixed the mmap(2) bug!!
    I'm still surprised and can't really believe it, but it's true!
    Just in case you're interested, i'll apply an updated patch.

    Maybe Ned Deily should have a look at the version check, which
    does not apply yet, but i don't know any other way to perform exact
    version checking. (Using 10.6.7 is not enough, it must be 10.7.0;
    uname -a yet reports that all through, but no CPP symbol does!?)

    @ned-deily
    Copy link
    Member

    Thanks for the update. Since the fix will be in a future version of OS X 10.7 Lion, and which has not been released yet, so it is not appropriate to change mmap until there has been an opportunity to test it. But even then, we would need to be careful about adding a compile-time test as OS X binaries are often built to be compatible for a range of operating system version so avoid adding compilation conditionals unless really necessary. If after 10.7 is released and someone is able to test that it works as expected, the standard way to support it would be to use the Apple-supplied availability macros to test for the minimum supported OS level of the build assuming it makes enough of a performance difference to bother to do so: http://developer.apple.com/library/mac/#technotes/tn2064/_index.html

    (Modules/_ctypes/darwin/dlfcn_simple.c is one of the few that has this kind of test.)

    @sdaoden
    Copy link
    Mannequin

    sdaoden mannequin commented Jun 7, 2011

    @ Ned Deily <report@bugs.python.org> wrote (2011-06-07 19:43+0200):

    Thanks for the update. Since the fix will be in a future
    version of OS X 10.7 Lion, and which has not been released yet,
    so it is not appropriate to change mmap until there has been an
    opportunity to test it.

    It's really working fine. That i see that day!
    (Not that they start to fix the CoreAudio crashes...)

    But even then, we would need to be careful about adding
    a compile-time test as OS X binaries are often built to be
    compatible for a range of operating system version so avoid
    adding compilation conditionals unless really necessary.
    If after 10.7 is released and someone is able to test that it
    works as expected, the standard way to support it would be to
    use the Apple-supplied availability macros to test for the
    minimum supported OS level of the build assuming it makes enough
    of a performance difference to bother to do so

    Of course it only moves the delay from before mmap(2) to after
    close(2). Well, i don't know, if hardcoding is not an option,
    a dynamic sysctl(2) lookup may do:

        kern.version = Darwin Kernel Version 10.7.0: Sat Jan 29 15:17:16 PST 2011

    This is obviously not the right one. :)

    Ciao, Steffen
    sdaoden(*)(gmail.com)
    () ascii ribbon campaign - against html e-mail
    /\ www.asciiribbon.org - against proprietary attachments

    @vstinner
    Copy link
    Member

    vstinner commented Jun 7, 2011

    Yes, you should check the Mac OS X version at runtime (as you should check the Linux kernel at runtime). platform.mac_ver() uses something like:

    sysv = _gestalt.gestalt('sysv')
    if sysv:
      major = (sysv & 0xFF00) >> 8
      minor = (sysv & 0x00F0) >> 4
      patch = (sysv & 0x000F)

    Note: patch is not reliable with 'sysv', you have to use ('sys1','sys2','sys3').

    So if you would like to check that you have Mac OS 10.7 or later, you can do something like:

    sysv = _gestalt.gestalt('sysv')
    __MAC_10_7 = (sysv and (sysv >> 4) >= 0x0a7)

    In C, it should be something like:
    -------
    const OSType SYSV = 0x73797376U; /* 'sysv' in big endian */
    SInt32 response;
    OSErr iErr;
    iErr = Gestalt(SYSV, &response);
    if (iErr == 0 && (response >> 4) >= 0x0a7)
    /* have Mac OS >= 10.7 */
    -------

    I'm not sure of 0x73797376, I used hex(struct.unpack('!I', 'sysv')[0]).

    @ned-deily
    Copy link
    Member

    Victor, please do not use magic constants like that in C. The symbolic values are available in include files:

    #include <CoreServices/CoreServices.h>
    SInt32 major = 0;
    SInt32 minor = 0;   
    Gestalt(gestaltSystemVersionMajor, &major);
    Gestalt(gestaltSystemVersionMinor, &minor);
    if ((major == 10 && minor >= 7) || major >= 11) { ... }

    (See, for instance, http://www.cocoadev.com/index.pl?DeterminingOSVersion and http://stackoverflow.com/questions/2115373/os-version-checking-in-cocoa. The code in platform and _gestalt.c could stand to be updated at some point.)

    But, again, mmap should *not* be changed until 10.7 has been released and the Apple fix is verified and only if it makes sense to add the additional complexity.

    @sdaoden
    Copy link
    Mannequin

    sdaoden mannequin commented Jun 8, 2011

    Ok, this patch could be used.
    *Unless* the code is not protected by the GIL.

    • Gestalt usage is a bit more complicated according to

      http://www.cocoadev.com/index.pl?DeterminingOSVersion

      unless Python only supports OS X 10.4 and later.
      (And platform.py correctly states that in _mac_ver_gestalt(),
      but see below.)

    • Due to usage of Gestalt, '-framework CoreServices' must be
      linked against mmapmodule.c.
      The Python configuration stuff is interesting for me, i managed
      compilation by adding the line

      mmap mmapmodule.c -framework CoreServices

      to Modules/Setup, but i guess it's only OS X which is happy
      about that.

    platform.py: _mac_ver_xml() should be dropped entirely according
    to one of Ned Deily's links ("never officially supported"), and
    _mac_ver_gestalt() obviously never executed because i guess it
    would fail due to "versioninfo". Unless i missed something.

    By the way: where do you get the info from? "sys1", "sys2",
    "sys3"? Cannot find it anywhere, only the long names, e.g.
    gestaltSystemVersionXy.

    Note that i've mailed Apple. I did not pay 99$ or even 249$, so
    i don't know if there will be a response.

    Ciao, Steffen
    sdaoden(*)(gmail.com)
    () ascii ribbon campaign - against html e-mail
    /\ www.asciiribbon.org - against proprietary attachments

    @ronaldoussoren
    Copy link
    Contributor

    Steffen: _mac_ver_xml should not be dropped, it is a perfectly fine way to determine the system version. Discussing it is also off-topic for this issue, please keep the discussion focussed.

    Wrt. mailing Apple: I wouldn't expect and answer. Is there something specific you want to know? I'm currently at WWDC and might be able to ask the question at one of the labs (where Apple's engineers hang out).

    If it is really necessary to check for the OS version to enable the OSX-specific bugfix it is possible to look at the uname information instead of using gestalt. In particular something simular to this Python code:

       v = os.uname()[2]
       major = int(v.split('.')[0])
       if major <= 10:
          # We're on OSX 10.6 or earlier
          enableWorkaround()

    This tests the kernel version instead of the system version, but until now the major version of the kernel has increased with every major release of the OS and I have no reason to suspect that Lion will be any different.

    BTW2: OSX 10.7 is not released yet and should not be discussed in public fora, as you should know if you have legal access.

    @sdaoden
    Copy link
    Mannequin

    sdaoden mannequin commented Jun 9, 2011

    @ Ronald Oussoren wrote:

    if major <= 10:
    # We're on OSX 10.6 or earlier
    enableWorkaround()

    (You sound as if you participate in an interesting audiophonic
    event. 27" imac's are indeed great recording studio hardware.
    But no Coffee Shops in California - brrrrr.)

    Ciao, Steffen
    sdaoden(*)(gmail.com)
    () ascii ribbon campaign - against html e-mail
    /\ www.asciiribbon.org - against proprietary attachments

    @ronaldoussoren
    Copy link
    Contributor

    steffen: I have no idea what you are trying to say in your last message. Could you please try to stay on topic.

    @sdaoden
    Copy link
    Mannequin

    sdaoden mannequin commented Jul 6, 2011

    So sorry that i'm stressing this, hopefully it's the final message.
    Apples iterative kernel-update strategy resulted in these versions:

    14:02 ~/tmp $ /usr/sbin/sysctl kern.version
    kern.version: Darwin Kernel Version 10.8.0: Tue Jun  7 16:33:36 PDT 2011; root:xnu-1504.15.3~1/RELEASE_I386
    14:02 ~/tmp $ gcc -o zt osxversion.c -framework CoreServices
    14:03 ~/tmp $ ./zt 
    OS X version: 10.6.8
    apple_osx_needs_fullsync: -1
    

    I.e. the new patch uses >10.7.0 or >=10.6.8 to avoid that
    FULLFSYNC disaster (even slower than the Macrohard memory
    allocator during "Wintel" partnership!), and we end up as:

    14:03 ~/src/cpython $ ./python.exe -E -Wd -m test -r -w -uall test_mmap
    Using random seed 8466468
    [1/1] test_mmap
    1 test OK.
    

    P.S.: i still have no idea how to do '-framework CoreServices'
    regulary. Like i've said in bpo-11046 i never used GNU Autoconf/M4,
    sorry. You know. Maybe the version check should be moved
    somewhere else and simply be exported, even replacing the stuff
    from platform.py? I don't know. Bye.

    Ciao, Steffen
    sdaoden(*)(gmail.com)
    () ascii ribbon campaign - against html e-mail
    /\ www.asciiribbon.org - against proprietary attachments

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    type-crash A hard crash of the interpreter, possibly with a core dump
    Projects
    None yet
    Development

    No branches or pull requests

    5 participants