classification
Title: Race in test_docxmlrpc.py
Type: behavior Stage: resolved
Components: Tests Versions: Python 3.6, Python 3.5, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: earl.chew, martin.panter, python-dev, r.david.murray, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2016-07-25 15:05 by earl.chew, last changed 2016-08-20 09:24 by martin.panter. This issue is now closed.

Files
File name Uploaded Description Edit
test_docxmlrpc.py earl.chew, 2016-07-25 15:05
setup-before-thread.patch martin.panter, 2016-08-04 02:49 review
setup-before-thread.v2.patch martin.panter, 2016-08-04 03:58 review
Messages (12)
msg271279 - (view) Author: (earl.chew) * Date: 2016-07-25 15:05
The test test_docxmlrpc.py will sometimes hang because of a timing race.

I've verified that this code is the same up to version 3.5 and master at https://github.com/python/cpython/blob/master/Lib/test/test_docxmlrpc.py

A proposed patch is attached.
msg271309 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-07-26 00:17
I don’t understand why we have so many tests that assign the server port in the server thread, and then use some sort of synchronization to get it to the client thread. IMO it would be simpler in this case to do something like:

def setUp(self):
    serv = DOCXMLRPCServer(...)
    self.addCleanup(serv.server_close)
    [_, PORT] = serv.server_address  # Eliminates “ready“ event
    # Other server setup here
    thread = threading.Thread(target=serv.serve_forever)
    thread.start()
    self.addCleanup(thread.join)  # Instead of self.evt
    self.addCleanup(serv.shutdown)
    self.client = httplib.HTTPConnection("localhost:%d" % PORT)
    self.addCleanup(self.client.close)
msg271314 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2016-07-26 01:04
Because the real port number doesn't exist until the server thread starts running and binds the socket using port number 0.
msg271316 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-07-26 01:28
The whole point of my suggestion was to bind and set the server socket to listing mode before starting the other thread. The socketserver constructor should do this before returning:

>>> s = DocXMLRPCServer(("localhost", 0))  # Calls bind() and listen()
>>> s.server_address  # Non-zero port has been allocated
('127.0.0.1', 49806)
>>> c = HTTPConnection(*s.server_address)
>>> c.request("OPTIONS", "*")  # No connection refused error
msg271376 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2016-07-26 15:01
OK, that's a good point.  So I don't know the answer to your question.  In some cases it may be mostly that the tests are old and written when the tooling was not as good.
msg271939 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-08-04 02:49
Here is a patch with my idea of how it should work
msg271946 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-08-04 03:58
It seems the test infrastructure likes all references to thread objects to be deleted, even if they are no longer running.
msg271959 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-08-04 08:56
I have a doubt. Added a question on Rietveld.
msg272022 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-08-05 10:05
Earl: Can you give any more details on your original hang or race condition? Was it related to setting PORT, or shutting down the server, or something else? It is not clear from your patch. I have tried adding artificial sleep() calls at various points but that did not uncover anything.

I’m sorry, but in my enthusiasm for rewriting the test I didn’t properly understand your original problem :)
msg272037 - (view) Author: (earl.chew) * Date: 2016-08-05 15:16
In the original code, the key to the failure I observed is:

        # wait for port to be assigned
        n = 1000
        while n > 0 and PORT is None:
            time.sleep(0.001)
            n -= 1

This gives a fixed deadline for the server thread to create the DocXMLRPCServer instance and provide the corresponding TCP port on which it is listening.

If the server thread is late (or if an exception is thrown -- but this case might work out ok), the TCP port will not be available in the variable PORT. In this case, the client thread blunders on, and inadvertently fails because PORT == None.

Upon failure, the test case tries to complete by tearing down the test:

     def tearDown(self):
         self.client.close()
 
         self.evt.wait()

The test case waits for self.evt to indicate that the server has completed. However, the server thread is running this:

        while numrequests > 0:
            serv.handle_request()
            numrequests -= 1

In other words, the test is deadlocked because the server is waiting to process requests (actually it's waiting for a single request) from the client, but the client has already given up and is waiting for the server thread to complete.
msg272492 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-08-12 03:20
Thanks for the explanation. It seems a bit strange that the server thread was running so slow while the main thread did one thousand polls over at least one second. Perhaps there is a blocking DNS call hidden somewhere in it somewhere? In any case, I am pretty confident my patch should help, so I will commit it unless there are any objections.
msg273192 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2016-08-20 07:45
New changeset 397f05044172 by Martin Panter in branch '3.5':
Issue #27614: Avoid race in test_docxmlrpc server setup
https://hg.python.org/cpython/rev/397f05044172

New changeset 7136304ecf4c by Martin Panter in branch '2.7':
Issue #27614: Avoid race in test_docxmlrpc server setup
https://hg.python.org/cpython/rev/7136304ecf4c

New changeset 17d688dedfca by Martin Panter in branch 'default':
Issue #27614: Merge test_docxmlrpc from 3.5
https://hg.python.org/cpython/rev/17d688dedfca
History
Date User Action Args
2016-08-20 09:24:51martin.pantersetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2016-08-20 07:45:17python-devsetnosy: + python-dev
messages: + msg273192
2016-08-12 03:20:18martin.pantersetmessages: + msg272492
2016-08-05 15:16:49earl.chewsetmessages: + msg272037
2016-08-05 10:06:00martin.pantersetmessages: + msg272022
2016-08-04 08:56:15serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg271959
2016-08-04 03:58:22martin.pantersetfiles: + setup-before-thread.v2.patch

messages: + msg271946
2016-08-04 02:49:44martin.pantersetfiles: + setup-before-thread.patch
keywords: + patch
messages: + msg271939
2016-07-26 15:01:35r.david.murraysetmessages: + msg271376
2016-07-26 01:28:53martin.pantersetmessages: + msg271316
2016-07-26 01:04:08r.david.murraysetnosy: + r.david.murray
messages: + msg271314
2016-07-26 00:17:50martin.pantersetversions: - Python 3.2, Python 3.3, Python 3.4
nosy: + martin.panter

messages: + msg271309

components: + Tests, - Library (Lib)
stage: patch review
2016-07-25 15:05:39earl.chewcreate