classification
Title: time.strftime may hung while trying to open /etc/localtime but does not release GIL
Type: performance Stage: patch review
Components: Versions: Python 3.3
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: belopolsky, dorontal, haypo, loewis, pitrou
Priority: normal Keywords: patch

Created on 2010-01-19 09:23 by dorontal, last changed 2011-01-11 00:11 by belopolsky.

Files
File name Uploaded Description Edit
time_strftime_gil.patch haypo, 2010-01-21 01:37
Messages (10)
msg98049 - (view) Author: Doron Tal (dorontal) Date: 2010-01-19 09:23
I've encountered a hung of python process for more than a second. It appears that the stall happens due to time.strftime call, which internally opens a file ('/etc/localtime'). I think it is best if the GIL would have been released to allow other threads to continue working.

Snippet from strace on:
-----------------------
import time
time.strftime('%Z')
-----------------------
Note the line:
open("/etc/localtime", O_RDONLY)        = 4


----=== strace output Starts here  ===---
stat("/usr/local/lib/python2.6/lib-old/time", 0x7fff871deff0) = -1 ENOENT (No such file or directory)
open("/usr/local/lib/python2.6/lib-old/time.so", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/local/lib/python2.6/lib-old/timemodule.so", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/local/lib/python2.6/lib-old/time.py", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/local/lib/python2.6/lib-old/time.pyc", O_RDONLY) = -1 ENOENT (No such file or directory)
stat("/usr/local/lib/python2.6/lib-dynload/time", 0x7fff871deff0) = -1 ENOENT (No such file or directory)
open("/usr/local/lib/python2.6/lib-dynload/time.so", O_RDONLY) = 3
fstat(3, {st_mode=S_IFREG|0755, st_size=46995, ...}) = 0
futex(0x393b4030ec, FUTEX_WAKE_PRIVATE, 2147483647) = 0
open("/usr/local/lib/python2.6/lib-dynload/time.so", O_RDONLY) = 4
read(4, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0000\30\0\0\0\0\0\0"..., 832) = 832
fstat(4, {st_mode=S_IFREG|0755, st_size=46995, ...}) = 0
mmap(NULL, 2115944, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 4, 0) = 0x2adf26fba000
mprotect(0x2adf26fbd000, 2097152, PROT_NONE) = 0
mmap(0x2adf271bd000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 4, 0x3000) = 0x2adf271bd000
close(4)                                = 0
time(NULL)                              = 1263890749
open("/etc/localtime", O_RDONLY)        = 4
fstat(4, {st_mode=S_IFREG|0644, st_size=2197, ...}) = 0
fstat(4, {st_mode=S_IFREG|0644, st_size=2197, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2adf271bf000
read(4, "TZif2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\4\0\0\0\4\0\0\0\0"..., 4096) = 2197
lseek(4, -1394, SEEK_CUR)               = 803
read(4, "TZif2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\5\0\0\0\5\0\0\0\0"..., 4096) = 1394
close(4)                                = 0
munmap(0x2adf271bf000, 4096)            = 0
close(3)                                = 0
time(NULL)                              = 1263890749
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2197, ...}) = 0
rt_sigaction(SIGINT, {SIG_DFL, [], SA_RESTORER, 0x393b60e930}, {0x4c4730, [], SA_RESTORER, 0x393b60e930}, 8) = 0
exit_group(0)                           = ?
msg98055 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-01-19 13:54
You might want to investigate which C library call opens this file, and propose a patch to release the GIL before doing that call. As far as I can tell, Python itself does not open /etc/localtime in any place.
msg98066 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2010-01-19 20:27
I'd also be curious why it takes a second to open /etc/localtime. That sounds like a bug in the C library, or a mismanaged system.
msg98069 - (view) Author: Doron Tal (dorontal) Date: 2010-01-19 21:36
We didn't prove this point yet, currently it's merely circumstantial.
The application calls strftime few dozen times each second from different
threads. We spot the stall using another two threads, the first takes a time
stamp before calling a python function and reset the time stamp when the
python function is done. The other thread watches the timestamp and if it
gets over a certain limit (2 seconds) it aborts the process, leaving a core
dump behind.

We've build this infrastructure to tackle python stalls caused by
misbehaving C calls which takes long but does not release the GIL.

Here is the traceback (one of the threads):

#0  0x0000003424ac5800 in __open_nocancel () from /lib64/libc.so.6

#1  0x0000003424a6c383 in __GI__IO_file_open () from /lib64/libc.so.6

#2  0x0000003424a6c43c in _IO_new_file_fopen () from /lib64/libc.so.6

#3  0x0000003424a61764 in __fopen_internal () from /lib64/libc.so.6

#4  0x0000003424a8d666 in __tzfile_read () from /lib64/libc.so.6

#5  0x0000003424a8c56f in tzset_internal () from /lib64/libc.so.6

#6  0x0000003424a8d010 in tzset () from /lib64/libc.so.6

#7  0x0000003424a91934 in strftime_l () from /lib64/libc.so.6

#8  0x00002aea01bced34 in ?? () from

/usr/lib64/python2.4/lib-dynload/timemodule.so

#9  0x00000034304360f0 in PyObject_Call () from
/usr/lib64/libpython2.4.so.1.0 #10 0x0000003430436bbc in PyObject_CallMethod
() from /usr/lib64/libpython2.4.so.1.0

#11 0x00002aea01dda64a in ?? () from

/usr/lib64/python2.4/lib-dynload/datetime.so

#12 0x00002aea01dda9c3 in ?? () from

/usr/lib64/python2.4/lib-dynload/datetime.so

#13 0x00000034304950ba in PyEval_EvalFrame () from
/usr/lib64/libpython2.4.so.1.0

#14 0x0000003430494b66 in PyEval_EvalFrame () from
/usr/lib64/libpython2.4.so.1.0

#15 0x0000003430495fe5 in PyEval_EvalCodeEx () from
/usr/lib64/libpython2.4.so.1.0

#16 0x000000343044c45a in ?? () from /usr/lib64/libpython2.4.so.1.0

#17 0x00000034304360f0 in PyObject_Call () from
/usr/lib64/libpython2.4.so.1.0

#18 0x000000343049352c in PyEval_EvalFrame () from
/usr/lib64/libpython2.4.so.1.0

#19 0x0000003430495fe5 in PyEval_EvalCodeEx () from
/usr/lib64/libpython2.4.so.1.0 #20 0x000000343044c45a in ?? () from
/usr/lib64/libpython2.4.so.1.0

#21 0x00000034304360f0 in PyObject_Call () from
/usr/lib64/libpython2.4.so.1.0

#22 0x000000343049352c in PyEval_EvalFrame () from
/usr/lib64/libpython2.4.so.1.0

#23 0x0000003430494b66 in PyEval_EvalFrame () from
/usr/lib64/libpython2.4.so.1.0

#24 0x0000003430495fe5 in PyEval_EvalCodeEx () from
/usr/lib64/libpython2.4.so.1.0

#25 0x000000343044c367 in ?? () from /usr/lib64/libpython2.4.so.1.0

#26 0x00000034304360f0 in PyObject_Call () from
/usr/lib64/libpython2.4.so.1.0

#27 0x000000343043c1ef in ?? () from /usr/lib64/libpython2.4.so.1.0

#28 0x00000034304360f0 in PyObject_Call () from
/usr/lib64/libpython2.4.so.1.0

#29 0x000000343048fc3d in PyEval_CallObjectWithKeywords () from
/usr/lib64/libpython2.4.so.1.0 #30 0x00000034304bbbbd in ?? () from
/usr/lib64/libpython2.4.so.1.0

#31 0x00000034256064a7 in start_thread () from /lib64/libpthread.so.0

#32 0x0000003424ad3c2d in clone () from /lib64/libc.so.6

So, we didn't actually proved that opening /etc/localtime takes so long. I
assumed that it is better to release the GIL prior to opening a file.

In any case, it won't solve our problem, since we use old Python 2.4.3
(CentOS5.4)...

--doron

On Tue, Jan 19, 2010 at 10:27 PM, Martin v. Löwis <report@bugs.python.org>wrote:

>
> Martin v. Löwis <martin@v.loewis.de> added the comment:
>
> I'd also be curious why it takes a second to open /etc/localtime. That
> sounds like a bug in the C library, or a mismanaged system.
>
> ----------
> nosy: +loewis
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue7739>
> _______________________________________
>
msg98088 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2010-01-21 01:01
> It appears that the stall happens due to time.strftime call, which internally opens 
> a file ('/etc/localtime')

In the GNU libc, strftime() calls tzset() and tzset() uses a lock to be thread safe. Yes, releasing the GIL before calling strftime() would speed up your use case.
msg98092 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2010-01-21 01:37
Here you have a patch releasing the GIL for the call to strftime().
msg98093 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2010-01-21 01:47
time_tzset() might also release the GIL, but:
 - i hope that multithreaded programs don't call regulary tzset() in different threads
 - time_tzset() calls tzset() and inittimezone()
 - inittimezone() modify the time module attributes and may call tzset() (depending on the OS / C library). it's also called from inittime()

So I prefer to leave it unchanged :-)
msg107415 - (view) Author: Alexander Belopolsky (belopolsky) (Python committer) Date: 2010-06-09 20:55
Victor,

I understand your last comment as wanting to keep the GIL while calling tzset(), but release it around the strftime() call.  You still want to apply your patch, right?

I think the problem is not with a concurrent tzset() call, but with another thread changing TZ environment variable.

See http://sourceware.org/bugzilla/show_bug.cgi?id=4350
msg107614 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2010-06-12 00:20
> I understand your last comment as wanting to keep the GIL while calling
> tzset(), but release it around the strftime() call.

Yes

> You still want to apply your patch, right?

http://sourceware.org/bugzilla/show_bug.cgi?id=4350 describes a crash. This issue is not a crash but a concurrency problem. I don't know if my patch fixes the initial problem: dorontal didn't try it and he doesn't explain us how to reproduce it.

I also not sure that releasing the GIL may not introduce a new concurrency issue: it might crash if a thread calls strftime() and another thread calls setenv("TZ", "...") (if I understood correctly the GNU libc bug 4350).
msg125964 - (view) Author: Alexander Belopolsky (belopolsky) (Python committer) Date: 2011-01-11 00:11
-1 on releasing the GIL around strftime().  Until we have an implementation that can support multiple locales and multiple timezones, people will work around this limitation by messing up with the TZ environment variable or global locale settings.
History
Date User Action Args
2011-01-11 00:11:27belopolskysetversions: + Python 3.3, - Python 2.6
nosy: loewis, belopolsky, pitrou, haypo, dorontal
messages: + msg125964

assignee: belopolsky ->
type: performance
stage: patch review
2010-06-12 00:20:19hayposetmessages: + msg107614
2010-06-09 20:55:10belopolskysetassignee: belopolsky

messages: + msg107415
nosy: + belopolsky
2010-01-21 01:47:39hayposetmessages: + msg98093
2010-01-21 01:37:42hayposetfiles: + time_strftime_gil.patch
keywords: + patch
messages: + msg98092
2010-01-21 01:37:06hayposetfiles: - unnamed
2010-01-21 01:01:23hayposetnosy: + haypo
messages: + msg98088
2010-01-19 21:36:53dorontalsetfiles: + unnamed

messages: + msg98069
2010-01-19 20:27:45loewissetnosy: + loewis
messages: + msg98066
2010-01-19 13:54:46pitrousetnosy: + pitrou
messages: + msg98055
2010-01-19 09:23:55dorontalcreate