Issue14632
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2012-04-20 13:57 by phlogistonjohn, last changed 2022-04-11 14:57 by admin. This issue is now closed.
Files | ||||
---|---|---|---|---|
File name | Uploaded | Description | Edit | |
test.py | phlogistonjohn, 2012-04-20 13:57 | |||
watchedfilehandler.patch | phlogistonjohn, 2012-04-20 15:10 | patch | ||
watchedfilehandler.patch | phlogistonjohn, 2012-04-22 18:41 | patch, 2nd try | ||
test_logging_race.diff | neologix, 2012-04-24 18:55 | review |
Messages (20) | |||
---|---|---|---|
msg158829 - (view) | Author: JohnM (phlogistonjohn) | Date: 2012-04-20 13:57 | |
The (very handy) logging.handlers.WatchedFileHandler appears to be susceptible to a race condition that causes occasional OSErrors during the normal course of operations: Traceback (most recent call last): File "test.py", line 58, in test_race log.warning('Foo bar %d', ii) File "/usr/lib64/python2.7/logging/__init__.py", line 1144, in warning self._log(WARNING, msg, args, **kwargs) File "/usr/lib64/python2.7/logging/__init__.py", line 1250, in _log self.handle(record) File "/usr/lib64/python2.7/logging/__init__.py", line 1260, in handle self.callHandlers(record) File "/usr/lib64/python2.7/logging/__init__.py", line 1300, in callHandlers hdlr.handle(record) File "/usr/lib64/python2.7/logging/__init__.py", line 744, in handle self.emit(record) File "/usr/lib64/python2.7/logging/handlers.py", line 412, in emit stat = os.stat(self.baseFilename) OSError: [Errno 2] No such file or directory: '/tmp/tmpG6_luRTestRaceWatchedFileHandler' Looking at the implementation of the handler's emit function, I've commented on where I think the race can occur. Logrotate (or some similar application) is deleting/unlinking the logfile between the first os.path.exists and os.stat calls: def emit(self, record): """ Emit a record. First check if the underlying file has changed, and if it has, close the old stream and reopen the file to get the current stream. """ if not os.path.exists(self.baseFilename): # ^^^ race occurs between here stat = None changed = 1 else: stat = os.stat(self.baseFilename) # ^^^ and this stat call changed = (stat[ST_DEV] != self.dev) or (stat[ST_INO] != self.ino) if changed and self.stream is not None: self.stream.flush() self.stream.close() self.stream = self._open() if stat is None: stat = os.stat(self.baseFilename) self.dev, self.ino = stat[ST_DEV], stat[ST_INO] logging.FileHandler.emit(self, record) I've attached a test script that attempts to recreate the issue. On my Linux system running the script is able to trigger the issue about every 7 of 10 runs. |
|||
msg158832 - (view) | Author: JohnM (phlogistonjohn) | Date: 2012-04-20 15:10 | |
I'm attaching a patch of an emit function implementation that reduces the number of stats and uses fstat were we can. Using fstat should also prevent any races between opening the new file and stating the path in the file system. Using this patch or something along these lines should avoid the time of check to time of use issues with the current version with repeated stat calls. |
|||
msg158915 - (view) | Author: Charles-François Natali (neologix) * ![]() |
Date: 2012-04-21 11:15 | |
Thanks for the report and patch. I'm making Vinay nosy, as he's the logging expert. A couple remarks: - it would be nice to add your testcase to the logging tests (Lib/test/test_logging.py) - I think you could probably save the whole fstat() result (instead of st_dev and st_inode): this would allow you to use os.pathsamestat() to check whether we're still refering to the same file, and it could maybe be useful someday (maybe use st_mtime) - typo: "existance" -> "existence" - I see you're catching OSError when calling fstat(): however, I don't see what error could be raised apart from EBADF (if the FD is closed, then it's likely a bug that shouldn't be masked, and it will blow up on the next write()), or I/O errors, that shouldn't be silenced |
|||
msg158976 - (view) | Author: Vinay Sajip (vinay.sajip) * ![]() |
Date: 2012-04-22 17:47 | |
There's certainly a problem to be addressed here, but integrating the test into the regular test suite represents a problem in that the test takes around 10 minutes to run, and sometimes completes successfully. I'm not sure a 10-minute extension of the time taken to run regrtest will be acceptable, but if the test is skipped unless explicitly enabled, then I'm not sure it'll add much value. I will look at the patch and Charles-François' review comments soon. |
|||
msg158979 - (view) | Author: JohnM (phlogistonjohn) | Date: 2012-04-22 18:41 | |
Thank you both for looking at this. I've added an updated version of the patch that incorporates the last two suggestions that Charles-François made. I agree that this test may not be appropriate for the python test suite, due to the length and non-determinism of the test. I don't know what the suite's policy on monkey patching stdlib in the tests is, but monkey patching either os.path.exists or os.stat to remove the file at the appropriate time could be one way to make the test fast and deterministic. Seems a bit dirty to me though. :-) I am up for adding some kind of test to the suite for the WatchedFileHandler, though. I'm pretty ambivalent about keeping the whole stat structure around, since the samestat method is just a wrapper around what the emit function is already doing, and we'd be keeping more memory (although a small amount) around. I doubt we'd want to look at the timestamps because they could legitimately change in ways this check doesn't care about. |
|||
msg158980 - (view) | Author: Charles-François Natali (neologix) * ![]() |
Date: 2012-04-22 18:42 | |
> There's certainly a problem to be addressed here, but integrating the test into the regular test suite represents a problem in that the test takes around 10 minutes to run, and sometimes completes successfully. I'm not sure a 10-minute extension of the time taken to run regrtest will be acceptable, but if the test is skipped unless explicitly enabled, then I'm not sure it'll add much value. There's no reason to wait for 10 minutes: one can reduce the number of iterations greatly, so that it can run in a matter of seconds. Since the test is run by all the builders multiple times per day, should the race condition resurface, it will trigger a buildbot failure some day. No need for it to trigger a failure every time. |
|||
msg158982 - (view) | Author: Vinay Sajip (vinay.sajip) * ![]() |
Date: 2012-04-22 19:16 | |
John: Thanks for the updated patch. Charles-François: Certainly I can reduce the iterations to make the test faster. As it is, I did reproduce the failure on my dev system, but only once, after running John's test script about a dozen times: every other time, it completed successfully :-( Isn't it simpler if I just replace the os.path.exists() calls with os.stat(), and check for ENOENT if an exception of type OSError or WindowsError occurs? |
|||
msg159046 - (view) | Author: Charles-François Natali (neologix) * ![]() |
Date: 2012-04-23 17:05 | |
> Charles-François: Certainly I can reduce the iterations to make the test > faster. As it is, I did reproduce the failure on my dev system, but only > once, after running John's test script about a dozen times: every other > time, it completed successfully :-( Juste reduce the sleep times, e.g.: """ @@ -17,7 +17,7 @@ os.unlink(fname) except OSError: pass - stime = 0.04 * random.randint(0, 4) + stime = 0.004 * random.randint(0, 4) time.sleep(stime) @@ -50,7 +50,7 @@ log.setLevel(logging.INFO) for ii in range(LOGCOUNT): - stime = 0.05 # * random.randint(0, 4) + stime = 0.005 # * random.randint(0, 4) time.sleep(stime) log.warning('Foo bar %d', ii) """ With this change, I can trigger a failure reliably in around 1s, and my computer is rather slow. > Isn't it simpler if I just replace the os.path.exists() calls with > os.stat(), and check for ENOENT if an exception of type OSError or > WindowsError occurs? The problem is that it would leave a race window if the file is changed between the time it's opened (I guess in logging.FileHandler.__init__()) and the first call to stat(). John's patch is safe in this regard, thanks to fstat(). |
|||
msg159070 - (view) | Author: Vinay Sajip (vinay.sajip) * ![]() |
Date: 2012-04-23 20:02 | |
[snip] >With this change, I can trigger a failure reliably in around 1s, and >my computer is rather slow. I'm working in a VM, and although I can get John's script to fail more regularly (with the reduced timeouts and counts of 1000), a version of the test which I added to test_logging always succeeds. That code is: @unittest.skipUnless(threading, 'Threading required for this test.') def test_race(self): # Issue #14632 refers. def remove_loop(fname, tries): for _ in range(tries): try: os.unlink(fname) except OSError: pass time.sleep(0.004 * random.randint(0, 4)) def cleanup(remover, fn, handler): handler.close() remover.join() if os.path.exists(fn): os.unlink(fn) fd, fn = tempfile.mkstemp('.log', 'test_logging-3-') os.close(fd) del_count = 1000 log_count = 1000 remover = threading.Thread(target=remove_loop, args=(fn, del_count)) remover.daemon = True remover.start() h = logging.handlers.WatchedFileHandler(fn) self.addCleanup(cleanup, remover, fn, h) f = logging.Formatter('%(asctime)s: %(levelname)s: %(message)s') h.setFormatter(f) for _ in range(log_count): time.sleep(0.005) r = logging.makeLogRecord({'msg': 'testing' }) h.handle(r) I can't see why this always works, while John's script sometimes fails. >The problem is that it would leave a race window if the file is >changed between the time it's opened (I guess in >logging.FileHandler.__init__()) and the first call to stat(). >John's patch is safe in this regard, thanks to fstat(). Oh, right - missed that. |
|||
msg159190 - (view) | Author: Charles-François Natali (neologix) * ![]() |
Date: 2012-04-24 18:55 | |
> I can't see why this always works, while John's script sometimes fails. It does fail consistently on my machine. I'm attaching the diff just in case. |
|||
msg159226 - (view) | Author: Roundup Robot (python-dev) ![]() |
Date: 2012-04-24 22:33 | |
New changeset 15a33d7d2b50 by Vinay Sajip in branch '2.7': Issue #14632: Updated WatchedFileHandler to deal with race condition. Thanks to John Mulligan for the problem report and patch. http://hg.python.org/cpython/rev/15a33d7d2b50 New changeset 5de7c3d64f2a by Vinay Sajip in branch '3.2': Issue #14632: Updated WatchedFileHandler to deal with race condition. Thanks to John Mulligan for the problem report and patch. http://hg.python.org/cpython/rev/5de7c3d64f2a New changeset 380821b47872 by Vinay Sajip in branch 'default': Issue #14632: Updated WatchedFileHandler to deal with race condition. Thanks to John Mulligan for the problem report and patch. http://hg.python.org/cpython/rev/380821b47872 |
|||
msg159228 - (view) | Author: Vinay Sajip (vinay.sajip) * ![]() |
Date: 2012-04-24 22:42 | |
I will leave this open for a few days and see if the buildbots throw up anything. |
|||
msg159387 - (view) | Author: Vinay Sajip (vinay.sajip) * ![]() |
Date: 2012-04-26 15:47 | |
Buildbots seem not to be complaining, so closing. |
|||
msg159770 - (view) | Author: Brian Curtin (brian.curtin) * ![]() |
Date: 2012-05-02 01:55 | |
The test for this issue seems to fail about half of the time on Windows. ====================================================================== ERROR: test_race (test.test_logging.HandlerTest) ---------------------------------------------------------------------- Traceback (most recent call last): File "c:\python-dev\cpython-main\lib\test\test_logging.py", line 605, in test_race h = logging.handlers.WatchedFileHandler(fn, delay=delay) File "c:\python-dev\cpython-main\lib\logging\handlers.py", line 421, in __init__ logging.FileHandler.__init__(self, filename, mode, encoding, delay) File "c:\python-dev\cpython-main\lib\logging\__init__.py", line 965, in __init__ StreamHandler.__init__(self, self._open()) File "c:\python-dev\cpython-main\lib\logging\__init__.py", line 987, in _open return open(self.baseFilename, self.mode, encoding=self.encoding) PermissionError: [Errno 13] Permission denied: 'c:\\users\\brian\\appdata\\local\\temp\\test_logging-3-lxjo5t.log' I also get this failure when running in the VS2010 branch: ====================================================================== ERROR: test_race (test.test_logging.HandlerTest) ---------------------------------------------------------------------- Traceback (most recent call last): File "c:\python-dev\porting\cpython\lib\test\test_logging.py", line 600, in cleanup os.unlink(fn) PermissionError: [Error 32] The process cannot access the file because it is being used by another process: 'c:\\users\\brian\\appdata\\local\\temp\\test_logging3-6ujeil.log' |
|||
msg159820 - (view) | Author: Vinay Sajip (vinay.sajip) * ![]() |
Date: 2012-05-02 21:01 | |
Re. the error on open(), I have no idea why it's failing. That's just open, and it appears to be trying to open a file in a directory for which one would expect you to have write access. Re. the error on unlink(), could that be an older version of the code that you're running? There was a cleanup() function, but I have removed it (and re-tested) before closing the issue. The 3.x buildbots appear not to be showing this failure. |
|||
msg159821 - (view) | Author: Brian Curtin (brian.curtin) * ![]() |
Date: 2012-05-02 21:22 | |
I'm seeing this with the current tip 8635825b9734. I wouldn't trust the build slaves with a race condition test since they're incredibly slow machines, but this issue isn't about the race. That path really should be accessible so I'm not sure why it's failing. It's also failing for Richard (sbt) as noted while working on the VS2010 port. |
|||
msg159825 - (view) | Author: Vinay Sajip (vinay.sajip) * ![]() |
Date: 2012-05-03 00:14 | |
The only thing I can think of is that the file is kept open by something else (the other thread). |
|||
msg159829 - (view) | Author: Vinay Sajip (vinay.sajip) * ![]() |
Date: 2012-05-03 01:19 | |
The changeset you mentioned doesn't have a cleanup function in test_race (an older version used to) - so I'm not sure where that error is coming from. It's going to be hard for me to reproduce this - I was finding it pretty hard to throw up the original race condition which led to the test being added. If you are getting the error repeatably, it would be really handy if you could tweak the test to generate some more diagnostic information - e.g. if the file already exists when the open fails (=> it's held open by something else). Are you running indexing or anti-virus software on the machine where you get the failures? That can hold a file open when you're not expecting it. |
|||
msg159830 - (view) | Author: Brian Curtin (brian.curtin) * ![]() |
Date: 2012-05-03 01:22 | |
I have exemptions set in AV for my dev folders for exactly that reason :) I'll try and poke around and get more info. |
|||
msg159862 - (view) | Author: Vinay Sajip (vinay.sajip) * ![]() |
Date: 2012-05-03 15:31 | |
I noticed that in my cleanup code, I had the lines h.close() remover.join() but it makes more sense for these to be in the opposite order. I've made that changed and pushed it up (for 2.7, 3.2 and default). |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:57:29 | admin | set | github: 58837 |
2012-05-03 15:31:00 | vinay.sajip | set | messages: + msg159862 |
2012-05-03 01:22:52 | brian.curtin | set | messages: + msg159830 |
2012-05-03 01:19:23 | vinay.sajip | set | messages: + msg159829 |
2012-05-03 00:14:38 | vinay.sajip | set | messages: + msg159825 |
2012-05-02 21:22:14 | brian.curtin | set | messages: + msg159821 |
2012-05-02 21:01:11 | vinay.sajip | set | messages: + msg159820 |
2012-05-02 01:55:46 | brian.curtin | set | nosy:
+ brian.curtin messages: + msg159770 |
2012-04-26 15:47:52 | vinay.sajip | set | status: open -> closed messages: + msg159387 |
2012-04-24 22:42:10 | vinay.sajip | set | resolution: fixed messages: + msg159228 |
2012-04-24 22:33:51 | python-dev | set | nosy:
+ python-dev messages: + msg159226 |
2012-04-24 18:55:05 | neologix | set | files:
+ test_logging_race.diff messages: + msg159190 |
2012-04-23 20:02:55 | vinay.sajip | set | messages: + msg159070 |
2012-04-23 17:05:35 | neologix | set | messages: + msg159046 |
2012-04-22 19:16:30 | vinay.sajip | set | messages: + msg158982 |
2012-04-22 18:42:18 | neologix | set | messages: + msg158980 |
2012-04-22 18:41:57 | phlogistonjohn | set | files:
+ watchedfilehandler.patch messages: + msg158979 |
2012-04-22 17:47:04 | vinay.sajip | set | messages: + msg158976 |
2012-04-21 22:51:28 | vinay.sajip | set | assignee: vinay.sajip |
2012-04-21 11:15:19 | neologix | set | nosy:
+ vinay.sajip, neologix messages: + msg158915 versions: + Python 3.2, Python 3.3 |
2012-04-20 15:10:10 | phlogistonjohn | set | files:
+ watchedfilehandler.patch keywords: + patch messages: + msg158832 |
2012-04-20 13:57:07 | phlogistonjohn | create |