Message 168812 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	grahamd
Recipients	grahamd, ncoghlan, pitrou
Date	2012-08-21.22:14:30
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1345587272.98.0.214034944684.issue15751@psf.upfronthosting.co.za>
In-reply-to

Content
In both embedded mode and daemon mode of mod_wsgi, albeit how thread pool is managed is different, there is a fixed number of threads with those being dedicated to handling web requests. On a request arriving next available thread from the thread pool handles accepting of request at C code level, that thread may then map to any WSGI application and so any sub interpreter, or even the main interpreter. Thus there is no one to one mapping between thread and (sub)interpreter. The way the mod_wsgi code works now is that when it knows it will be calling into the main interpreter, it uses PyGILState_Ensure(). If not, it will use a thread state for that thread specific to the sub interpreter it is calling in to. At the end of the request, the thread state is remembered and not thrown away so that thread locals still work for that thread across requests for that sub interpreter. Thus, there can be more than one thread state per thread, but this is fine so long as it is only used against the sub interpreter it was created for. This is actually an enforced requirement of Python, because if you create more than one thread state for a thread for the same sub interpreter, or even an additional one for the main interpreter when there is also the auto TLS, then Python will die if you compile and run it is in debug mode. Now, since mod_wsgi always knows which interpreter it is calling into, the intent was that there was this single API call so that mod_wsgi could say that at this time, this thread is going to be calling into that interpreter. It could then just call PyGILState_Ensure(). Any third party module then which uses the simplistic calling sequence of calling PyGILState_Release() on exiting Python code and thence within the same thread calling PyGILState_Ensure() when coming back into Python with a callback will work, as mod_wsgi has specified the interpreter context for that thread at that time. As pointed out, if a third party module was creating its own background threads at C level and calling PyGILState_Ensure() when calling back into Python code, this could pose a problem. This could also be an issue for Python created background threads. In the case of the latter, if a Python thread is created in a specific sub interpreter, it should automatically designate for that thread that that is its interpreter context, so if it calls out and does the Release/Ensure dance, that it goes back into the same sub interpreter. The C initiated thread is a bit more complicated though and may not be solvable, but a lot of the main third party modules which don't work in sub interpreters, such as lxml, don't use background threads, so the simplistic approach means that will work at least. So, in summary, saw a single API call which allowed designation of which interpreter a thread is operating against, overriding the implicit default of the main interpreter. PyGILState API will need to manage a set of interpreter states for each interpreter, with potential for more than one thread state for a thread due to a thread being able to call into multiple interpreters at different times.

In both embedded mode and daemon mode of mod_wsgi, albeit how thread pool is managed is different, there is a fixed number of threads with those being dedicated to handling web requests.

On a request arriving next available thread from the thread pool handles accepting of request at C code level, that thread may then map to any WSGI application and so any sub interpreter, or even the main interpreter.

Thus there is no one to one mapping between thread and (sub)interpreter.

The way the mod_wsgi code works now is that when it knows it will be calling into the main interpreter, it uses PyGILState_Ensure(). If not, it will use a thread state for that thread specific to the sub interpreter it is calling in to. At the end of the request, the thread state is remembered and not thrown away so that thread locals still work for that thread across requests for that sub interpreter.

Thus, there can be more than one thread state per thread, but this is fine so long as it is only used against the sub interpreter it was created for.

This is actually an enforced requirement of Python, because if you create more than one thread state for a thread for the same sub interpreter, or even an additional one for the main interpreter when there is also the auto TLS, then Python will die if you compile and run it is in debug mode.

Now, since mod_wsgi always knows which interpreter it is calling into, the intent was that there was this single API call so that mod_wsgi could say that at this time, this thread is going to be calling into that interpreter. It could then just call PyGILState_Ensure().

Any third party module then which uses the simplistic calling sequence of calling PyGILState_Release() on exiting Python code and thence within the same thread calling PyGILState_Ensure() when coming back into Python with a callback will work, as mod_wsgi has specified the interpreter context for that thread at that time.

As pointed out, if a third party module was creating its own background threads at C level and calling PyGILState_Ensure() when calling back into Python code, this could pose a problem. This could also be an issue for Python created background threads.

In the case of the latter, if a Python thread is created in a specific sub interpreter, it should automatically designate for that thread that that is its interpreter context, so if it calls out and does the Release/Ensure dance, that it goes back into the same sub interpreter.

The C initiated thread is a bit more complicated though and may not be solvable, but a lot of the main third party modules which don't work in sub interpreters, such as lxml, don't use background threads, so the simplistic approach means that will work at least.

So, in summary, saw a single API call which allowed designation of which interpreter a thread is operating against, overriding the implicit default of the main interpreter. PyGILState API will need to manage a set of interpreter states for each interpreter, with potential for more than one thread state for a thread due to a thread being able to call into multiple interpreters at different times.

History
Date	User	Action	Args
2012-08-21 22:14:33	grahamd	set	recipients: + grahamd, ncoghlan, pitrou
2012-08-21 22:14:32	grahamd	set	messageid: <1345587272.98.0.214034944684.issue15751@psf.upfronthosting.co.za>
2012-08-21 22:14:32	grahamd	link	issue15751 messages
2012-08-21 22:14:30	grahamd	create