classification
Title: configure --with-threads on cygwin => crash on thread related tests
Type: crash Stage:
Components: Extension Modules Versions: Python 3.0, Python 2.6
process
Status: closed Resolution: wont fix
Dependencies: Superseder:
Assigned To: Nosy List: amaury.forgeotdarc, ocean-city, rpetrov
Priority: normal Keywords: patch

Created on 2008-09-23 16:55 by ocean-city, last changed 2010-04-27 20:31 by loewis. This issue is now closed.

Files
File name Uploaded Description Edit
a.py ocean-city, 2008-09-23 16:55 the code to reproduce
reproduce.zip ocean-city, 2008-09-26 19:11
disable_setup_ssl_threads_on_cygwin.patch ocean-city, 2008-09-27 03:44
cygwin_crash.zip amaury.forgeotdarc, 2008-11-20 01:17
Messages (13)
msg73649 - (view) Author: Hirokazu Yamamoto (ocean-city) * (Python committer) Date: 2008-09-23 16:55
I'm not sure this is python's bug or cygwin's bug, thread enabled python
crashes thread related tests on cygwin. (ex: test_exit on test_sys.py,
test_threading.py etc)

After some investigation, I found following workaround solves this crash.

Index: Modules/_ssl.c
===================================================================
--- Modules/_ssl.c      (revision 66562)
+++ Modules/_ssl.c      (working copy)
@@ -1580,7 +1580,7 @@

        /* Init OpenSSL */
        SSL_load_error_strings();
-#ifdef WITH_THREAD
+#if defined(WITH_THREAD) && !defined(__CYGWIN__)
        /* note that this will start threading if not already started */
        if (!_setup_ssl_threads()) {
                return;

So I applied following patch. (after reverted above workaround)


Index: Modules/_ssl.c
===================================================================
--- Modules/_ssl.c      (revision 66562)
+++ Modules/_ssl.c      (working copy)
@@ -1517,6 +1517,8 @@
           lock. They can be useful for debugging.
        */

+        printf("-------> %d (%u) %s %d: %ul\n", n, mode & CRYPTO_LOCK,
file, line, PyThread_get_thread_ident());
+
        if ((_ssl_locks == NULL) ||
            (n < 0) || ((unsigned)n >= _ssl_locks_count))
                return;

And this is result.

-------> 20 (1) mem_dbg.c 161: 6684680l
-------> 20 (0) mem_dbg.c 221: 6684680l
-------> 20 (1) mem_dbg.c 161: 6684680l
-------> 20 (0) mem_dbg.c 221: 6684680l
-------> 16 (1) ssl_ciph.c 273: 6684680l
-------> 16 (0) ssl_ciph.c 276: 6684680l
-------> 16 (1) ssl_ciph.c 277: 6684680l
-------> 20 (1) mem_dbg.c 161: 6684680l
-------> 20 (0) mem_dbg.c 221: 6684680l
-------> 20 (1) mem_dbg.c 161: 6684680l
-------> 20 (0) mem_dbg.c 221: 6684680l
-------> 16 (0) ssl_ciph.c 308: 6684680l
    started worker thread
    trying nonsensical thread id
    waiting for worker thread to get started
    verifying worker hasn't exited
    attempting to raise asynch exception in worker
    waiting for worker to say it caught the exception
-------> 1 (1) err.c 418: 7282896l
    all OK -- joining worker
-------> 1 (1) err.c 418: 7282896l
   6020 [unknown (0x650)] python 1156 _cygtls::handle_exceptions: Error
while du
mping state (probably corrupted stack)
Illegal instruction (core dumped)


Thread 7282896l tries to lock same object twice. I'm not familiar with
OpenSSL nor Python Thread, so I cannot fix this.

# Can callback function for CRYPTO_set_locking_callback() be called like
this? How does PyThread_allocate_lock behave in this situation? I don't
know.

I used OpenSSL0.9.8h installed via cygwin setup.
msg73650 - (view) Author: Hirokazu Yamamoto (ocean-city) * (Python committer) Date: 2008-09-23 16:57
releast25-maint is fine probably because CRYPTO_set_locking_callback()
is not used in Modules/_ssl.c. I don't try configure --with-threads on
py3k, but probably same on trunk.
msg73879 - (view) Author: Hirokazu Yamamoto (ocean-city) * (Python committer) Date: 2008-09-26 19:11
>Thread 7282896l tries to lock same object twice.
This was not cause of problem. I saw crash after one lock on another thread.

I could create the C code to reproduce crash. (reproduce.zip)
But strangely, I couldn't crash main.exe if it was built with
http://www.openssl.org/source/openssl-0.9.8h.tar.gz (same version)

I compiled openssl with "config -dCygwin" and "make". (I needed to fix
one broken link in Include dir though)
msg73886 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2008-09-26 20:59
So it is a "Vendor OS problem", python is not involved.
msg73895 - (view) Author: Hirokazu Yamamoto (ocean-city) * (Python committer) Date: 2008-09-26 21:49
Sorry, I noticed another bit. If main.exe is linked to libssl.dll.a and
libcrypto.dll.a it will crash, but linked to libssl.a and libcrypto.a it
won't crash. (I renamed *.dll.a temporary)

I'll try to build http://www.openssl.org/source/openssl-0.9.8h.tar.gz
with shared mode if possible.

>So it is a "Vendor OS problem", python is not involved.

Yes, possibly. Because other platform is not crashing on buildbot.
msg73901 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2008-09-26 22:27
I think I have a beginning of an explanation:
libssl.dll implements a DllMain function, whose DLL_THREAD_DETACH event 
calls ERR_remove_state. 
At this time, the (posix) thread function has already exited; 
pthread::exit() was already called the pthread object has been deleted.

And the same (win32) thread will call sem_wait()... and maybe access 
freed resources.

Linking against the static library does not have this problem.
msg73904 - (view) Author: Hirokazu Yamamoto (ocean-city) * (Python committer) Date: 2008-09-26 23:04
Thank you for great explanation! Probably you are right... I'll look
into the code.
msg73905 - (view) Author: Hirokazu Yamamoto (ocean-city) * (Python committer) Date: 2008-09-26 23:36
Maybe I can fix this openssl bug with pthread_cleanup_push, but this is
openssl bug, we cannot fix it directly.

I propose to commit workaround in msg73649 for 2.6 release.
msg73906 - (view) Author: Hirokazu Yamamoto (ocean-city) * (Python committer) Date: 2008-09-26 23:52
And after openssl will be fixed, change it to

#if defined(WITH_THREAD) && !(defined(__CYGWIN__) &&
OPENSSL_VERSION_NUMBER < ???)
msg73919 - (view) Author: Hirokazu Yamamoto (ocean-city) * (Python committer) Date: 2008-09-27 03:44
That workaround leaves unused function warning. This patch is revised patch.
msg76079 - (view) Author: Roumen Petrov (rpetrov) * Date: 2008-11-19 23:29
I'm not sure that reported issue is openssl bug.
Just tested a GCC(mingw) build of test case reproduce.zip with
openssl(0.9.8i) and "pthreads-w32". The test run without problems on
nt5.1(xp).
msg76086 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2008-11-20 01:17
With cygwin, calling sem_wait() in the DLL_THREAD_DETACH section of a DllMain function 
can crash the program. 
See attached zip file, it contains two C files which only include pthread.h and 
semaphore.h (no python, no openssl). The resulting program crashes ~30% of the time.

If this pattern (in dll.c) is not allowed, it's a problem in the openssl code.
If it is allowed, it's a bug in cygwin's threads implementation.

We should really move this discussion to cygwin. This is no more a python issue.
msg76090 - (view) Author: Hirokazu Yamamoto (ocean-city) * (Python committer) Date: 2008-11-20 06:04
OK, I'll close this entry, and I'll post message to cygwin ml about this
issue.
# I already posted it to openssl-dev, but there was no response.
http://www.nabble.com/Bug%3A-crash-on-cygwin-if-uses-CRYPTO_set_locking_callback-and-shared-library-to19699111.html#a19712690
History
Date User Action Args
2010-04-27 20:31:50loewissetpriority: normal
2008-11-20 06:04:24ocean-citysetstatus: open -> closed
priority: critical -> (no value)
resolution: wont fix
messages: + msg76090
keywords: - needs review
2008-11-20 01:18:00amaury.forgeotdarcsetfiles: + cygwin_crash.zip
messages: + msg76086
2008-11-19 23:29:00rpetrovsetnosy: + rpetrov
messages: + msg76079
2008-09-29 00:35:39ocean-citysetpriority: critical
components: + Extension Modules
versions: + Python 3.0
2008-09-27 03:44:10ocean-citysetkeywords: + patch, needs review
files: + disable_setup_ssl_threads_on_cygwin.patch
messages: + msg73919
2008-09-26 23:52:49ocean-citysetmessages: + msg73906
2008-09-26 23:36:55ocean-citysetmessages: + msg73905
2008-09-26 23:04:38ocean-citysetmessages: + msg73904
2008-09-26 22:27:24amaury.forgeotdarcsetmessages: + msg73901
2008-09-26 21:49:06ocean-citysetmessages: + msg73895
2008-09-26 20:59:38amaury.forgeotdarcsetnosy: + amaury.forgeotdarc
messages: + msg73886
2008-09-26 19:11:11ocean-citysetfiles: + reproduce.zip
messages: + msg73879
2008-09-23 16:57:59ocean-citysetmessages: + msg73650
2008-09-23 16:55:45ocean-citycreate