classification
Title: test.test_pathlib.PosixPathTest.test_touch_common fails on FreeBSD with ZFS
Type: behavior Stage: patch review
Components: Tests Versions: Python 3.4
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Claudiu.Popa, ezio.melotti, francismb, jcea, koobs, larry, larstiq, pitrou, puppet
Priority: normal Keywords: patch

Created on 2013-11-30 08:44 by Claudiu.Popa, last changed 2014-11-01 16:05 by larstiq.

Files
File name Uploaded Description Edit
issue19838.patch Claudiu.Popa, 2014-06-10 13:44 review
c_test_case_for_issue_19838.c larry, 2014-08-05 00:09 Simple test case written in C reproducing the behavior
Messages (36)
msg204786 - (view) Author: PCManticore (Claudiu.Popa) * (Python triager) Date: 2013-11-30 08:44
Hi!

test_touch_common fails when using 8.3-STABLE FreeBSD 8.3-STABLE and Python 3.4.0b1 (default:a0ec33efa743+, Nov 30 2013, 10:36:58). 

Here are the tracebacks:


======================================================================
FAIL: test_touch_common (test.test_pathlib.PathTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/tank/libs/cpython/Lib/test/test_pathlib.py", line 1402, in test_touch_common
    self.assertGreaterEqual(st.st_mtime_ns, old_mtime_ns)
AssertionError: 1385800632000000000 not greater than or equal to 1385800632871814968

======================================================================
FAIL: test_touch_common (test.test_pathlib.PosixPathTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/tank/libs/cpython/Lib/test/test_pathlib.py", line 1402, in test_touch_common
    self.assertGreaterEqual(st.st_mtime_ns, old_mtime_ns)
AssertionError: 1385800633000000000 not greater than or equal to 1385800633042814928

----------------------------------------------------------------------
Ran 319 tests in 0.368s

FAILED (failures=2, skipped=85)
test test_pathlib failed
1 test failed:
    test_pathlib

This issue seems to be related with issue15745.
msg204794 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-11-30 10:18
I don't really know what to do with this. I think you'll have to investigate a bit and find out exactly what happens during the test.
msg204795 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-11-30 10:19
As a data point, if not for ZFS, test_pathlib passes on FreeBSD 6.4:
http://buildbot.python.org/all/builders/x86%20FreeBSD%206.4%203.x/builds/4261/steps/test/logs/stdio
and FreeBSD 7.2:
http://buildbot.python.org/all/builders/x86%20FreeBSD%207.2%203.x/builds/4731/steps/test/logs/stdio
msg204813 - (view) Author: Kubilay Kocak (koobs) (Python triager) Date: 2013-11-30 14:37
Is this similar/related to #15745?

I took both of my buildbots (koobs-freebsd9, koobs-freebsd10) off ZFS until it could be resolved
msg204814 - (view) Author: Kubilay Kocak (koobs) (Python triager) Date: 2013-11-30 14:39
Sorry Claudiu I missed the issue reference in your comment
msg204815 - (view) Author: PCManticore (Claudiu.Popa) * (Python triager) Date: 2013-11-30 14:41
I believe it's similar, both test_os and test_pathlib fails when executed from within a ZFS container. I checked, I did a fresh checkout of Python inside a normal directory and run the tests there, they ran without problems.
msg220143 - (view) Author: PCManticore (Claudiu.Popa) * (Python triager) Date: 2014-06-10 07:03
Since issue15745 hasn't been fixed yet, would be okay to skip these tests when the test suite runs from a ZFS container? Currently, these failures are a nuissance when running the test suite.
msg220144 - (view) Author: Kubilay Kocak (koobs) (Python triager) Date: 2014-06-10 07:12
I'd like to put the buildbot slave instances back onto ZFS for broader Disk/IO test coverage for Python and other projects as well as to gain some administrative disk utilisation flexibility.

These two issues have unfortunately precluded that, and there's much more value to projects with a reliably green buildbot than one that is red, ending up ignored, and hiding other issues or regressions in the meantime.

+1 on disabling these tests and leaving the issues open so someone can pick them up at a later date and have a ZFS environment in which to reproduce/resolve them in a custom builder
msg220167 - (view) Author: PCManticore (Claudiu.Popa) * (Python triager) Date: 2014-06-10 13:44
Here's a patch that marks those tests as expected failures if the source checkout is inside a zfs container. It uses `df -t zfs`, it was the easiest way I could find to detect that we are running from a zfs container.
msg224056 - (view) Author: Larry Hastings (larry) * (Python committer) Date: 2014-07-26 14:33
This test also fails on Linux when using ZFS.
msg224063 - (view) Author: Larry Hastings (larry) * (Python committer) Date: 2014-07-26 15:08
Sorry to have such an awful configuration, but this approach won't fix the problem for me.

I use a Linux "encrypted home directory", which uses crazy loopback mount logic to create an on-the-fly decrypted representation of my home directory.  So my home directory is actually on an "ecryptfs" device:

    % df -T
    Filesystem            Type      Mounted on
    ...
    home                  zfs       /home
    /home/larry/.Private  ecryptfs  /home/larry

Maybe we could ignore deltas below a certain race-condition threshold?  Perhaps a millisecond?  I think there are already tests like that.
msg224064 - (view) Author: Larry Hastings (larry) * (Python committer) Date: 2014-07-26 15:09
(By "this approach" I meant "the approach employed in the first patch posted".  Sorry for the ambiguity.)
msg224066 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-07-26 15:09
So, I'm not against the patch, but it would be nice to diagnose where exactly the issue comes from.
msg224625 - (view) Author: Daniel Eriksson (puppet) * Date: 2014-08-03 11:19
@Antoine: We have investigated a bit here and it seems that ZFS just keeps a timestamp in seconds in the memory.


See function uberblock_update in uberblock.c
( http://people.freebsd.org/~gibbs/zfs_doxygenation/html/d3/d65/uberblock_8c_source.html - Row 57)
    ub->ub_timestamp = gethrestime_sec();
msg224627 - (view) Author: Daniel Eriksson (puppet) * Date: 2014-08-03 12:11
After thinking and looking a bit on it, this patch isn't the best to put into the code what I can see.

If any of these things really break, because of something completely different - we won't notice it.

A more accurate solution is to maybe look on a different way to compare the timestamps that works on systems that just support seconds too?
msg224628 - (view) Author: PCManticore (Claudiu.Popa) * (Python triager) Date: 2014-08-03 12:27
I agree. The patch was merely a temporary solution to the real problem.
msg224640 - (view) Author: Daniel Eriksson (puppet) * Date: 2014-08-03 14:57
#15745 is probably related to this issue
msg224642 - (view) Author: Daniel Eriksson (puppet) * Date: 2014-08-03 15:17
So me and @larstiq have been sitting with this during the sprint here and we have figured out some things but haven't had the time to put together a patch.

If you put two p.touch() before starting the testing, it will not fail on BSD.

What we have found out so far is that during file creation the resolution of the timestamp is higher then at the touch attempt when a file exists.

The main issue as we see it is that os.stat can return a timestamp with more granularity than os.utime can set.
msg224656 - (view) Author: Larry Hastings (larry) * (Python committer) Date: 2014-08-03 20:34
os.stat can return something more precise than nanosecond resolution?
msg224664 - (view) Author: Francis MB (francismb) * Date: 2014-08-03 21:43
> What we have found out so far is that during file creation the
> resolution of the timestamp is higher then at the touch attempt
> when a file exists.

Could it help to create 2 files (file 1, wait a bit, file 2) and then do the checks only with file 1?
msg224679 - (view) Author: Daniel Eriksson (puppet) * Date: 2014-08-04 05:07
@Larry

The issue is the following:

In the test after the first p.touch
(Pdb) old_mtime
1407128672.4133856
(Pdb) old_mtime_ns
1407128672413385711

After second p.touch
(Pdb) st.st_mtime_ns
1407128689000000000
(Pdb) st.st_mtime
1407128689.0

So the issue is that when utime is used it can't set the time with nanosecond, but just with second resolution.

@Francis - A quick fix that would work, but it is a bit dirty but it will work correctly is to just add two p.touch() before doing anything.

Not really the best solution but could work.
msg224683 - (view) Author: Larry Hastings (larry) * (Python committer) Date: 2014-08-04 05:40
So why can't you use the "ns" parameter for os.utime()?
msg224684 - (view) Author: Daniel Eriksson (puppet) * Date: 2014-08-04 05:43
@Larry - the problem arrives after the second touch, which does os.utime(FILE, None) if the file exists.

The os.utime that is in the test-code doesn't really break anything, what breaks the test is the second touch.
msg224685 - (view) Author: Larry Hastings (larry) * (Python committer) Date: 2014-08-04 05:50
A cursory read of the source code suggests everybody is doing the right thing.  Can you run this with a debug build of CPython, put a breakpoint in posix_utime, and check that when it's called for the second p.touch that it's correctly calling the underlying function to say "set the time of this file to the current time"?
msg224686 - (view) Author: Daniel Eriksson (puppet) * Date: 2014-08-04 05:57
@Larry - we did that yesterday and on the FreeBSD-machine the regular utimes was used and not utimensat.
utimensat is not available on FreeBSD 10 so it fall backs to regular utimes to be used in os.utime.
But when a file is created it is the file system that sets the utime, so it will have nanoseconds.
msg224687 - (view) Author: Larry Hastings (larry) * (Python committer) Date: 2014-08-04 05:59
And was the "time" argument passed in to utimes() set to NULL?
msg224698 - (view) Author: Daniel Eriksson (puppet) * Date: 2014-08-04 08:13
I'm no expert at all at GDB, but I will give it a go and if this is wrong I can try a lot more.

Breakpoint 1, posix_utime (self=0xf00918ed8, args=0xf014459f8, kwargs=0x0) at ./Modules/posixmodule.c:4838
4838	    PyObject *times = NULL;
Current language:  auto; currently minimal
4839	    PyObject *ns = NULL;
4840	    int dir_fd = DEFAULT_DIR_FD;
4841	    int follow_symlinks = 1;
4843	                        "follow_symlinks", NULL};
4856	    memset(&path, 0, sizeof(path));
4857	    path.function_name = "utime";
4858	    memset(&utime, 0, sizeof(utime_t));
4860	    path.allow_fd = 1;
4862	    if (!PyArg_ParseTupleAndKeywords(args, kwargs,
4875	    if (times && (times != Py_None) && ns) {
4903	    else if (ns) {
4919	        utime.now = 1;
4927	    if (path_and_dir_fd_invalid("utime", &path, dir_fd) ||
963	    if (!path->narrow && !path->wide && (dir_fd != DEFAULT_DIR_FD)) {
4928	        dir_fd_and_fd_invalid("utime", dir_fd, path.fd) ||
974	    if ((dir_fd != DEFAULT_DIR_FD) && (fd != -1)) {
4929	        fd_and_follow_symlinks_invalid("utime", path.fd, follow_symlinks))
986	    if ((fd > 0) && (!follow_symlinks)) {
4933	    if ((dir_fd != DEFAULT_DIR_FD) && (!follow_symlinks)) {
4974	    Py_BEGIN_ALLOW_THREADS
4977	    if ((!follow_symlinks) && (dir_fd == DEFAULT_DIR_FD))
4989	    if (path.fd != -1)
4994	    result = utime_default(&utime, path.narrow);
4800	    UTIME_TO_TIMEVAL;
4801	    return utimes(path, time);
(gdb) print time
$1 = {<text variable, no debug info>} 0xf01191d20 <time>
(gdb) print path
$2 = {function_name = 0x58bb79 "utime", argument_name = 0x0, nullable = 0, allow_fd = 1, wide = 0x0, narrow = 0xf0099c0d0 "/tmp/test", fd = -1, length = 9, object = 0xf01445f90,
  cleanup = 0xf0099c0a0}
4996	    Py_END_ALLOW_THREADS
4998	    if (result < 0) {
5006	    Py_INCREF(Py_None);
787	    if (path->cleanup) {
788	        Py_CLEAR(path->cleanup);
5016	}
msg224703 - (view) Author: Larry Hastings (larry) * (Python committer) Date: 2014-08-04 09:54
The "no debug info" makes me think you're using a release build.  Please recompile making sure that "-O0" and "-g" are both set.  By default the makefile has "-O3" and "-g", so normally you just change the "-O3" to "-O0", then "make clean" and "make".
msg224705 - (view) Author: Daniel Eriksson (puppet) * Date: 2014-08-04 10:12
I used configure with --with-pydebug, will look closer on it when I get to
the office
msg224706 - (view) Author: Daniel Eriksson (puppet) * Date: 2014-08-04 10:28
To confirm, my OPT-line is
OPT=            -g -O0 -Wall -Wstrict-prototypes
msg224750 - (view) Author: Larry Hastings (larry) * (Python committer) Date: 2014-08-04 18:52
Yup, you've got a debug build there.  I'm baffled by that gdb "no debug info" line then.

At line 4801 could you also 
    print *utime
    print tv
msg224751 - (view) Author: Daniel Eriksson (puppet) * Date: 2014-08-04 18:57
Larry: Result of that:
(gdb) print *utime
Structure has no component named operator*.
(gdb) print tv
No symbol "tv" in current context.

You can also catch me on IRC with the same nickname as here
msg224770 - (view) Author: Larry Hastings (larry) * (Python committer) Date: 2014-08-04 20:37
I talked to puppet on IRC for a while and we figured out the following about his OS:

* He has utime() and utimes(), but no utimensat().
* utimes() can write with *microsecond* resolution.
* stat() reads the time with *nanosecond* resolution.  (He has HAVE_STAT_TV_NSEC defined.)
* utimes(path, NULL) sets the file to the current time with *second* resolution.  Which means if it happens within the same second as the previous update, it will set mtime to an earlier value.

Just to confirm, he ran this script:
--
import os
b = '/tmp/test2'
open(b,'a').close()
before = os.stat(b)
os.utime(b, None)
after = os.stat(b)
os.unlink(b)
print("before:", before.st_mtime_ns)
print(" after:", after.st_mtime_ns)
print("before <= after", before.st_mtime_ns <= after.st_mtime_ns)
--

and it consistently prints "before <= after False".

*facepalm*

Since utimes supports microsecond resolution, we could in theory work around this problem by explicitly specifying the current time when using utimes().  If we did that, we might want to also see if this behavior affects futimes() and lutimes().

Functions in posixmodule are expected to be atomic, and this would mean two system calls instead of one--so maybe we should only use this workaround on platforms with the bug?
msg224779 - (view) Author: Larry Hastings (larry) * (Python committer) Date: 2014-08-04 22:34
And to further confirm, on my Linux machine with a ZFS partition:

* stat() returns nanosecond resolution.
* I have utimensat(), which supports nanosecond resolution.
* The test code correctly sets "utime.now = 1", so it correctly calls utimensat() with a time pointer of NULL.

It still fails, but here the race is much closer:

    before: 1407191476502407010
     after: 1407191476502090829
    before <= after False

If I run the test on a different filesystem (e.g. on /tmp which is ext4) the problem doesn't occur.

My guess: it's ZFS's fault in both cases, it's not using the highest-precision timer available.
msg224789 - (view) Author: Larry Hastings (larry) * (Python committer) Date: 2014-08-05 00:09
Attached is a simple C test case for the same behavior.  On my Linux machine this prints "This is insane!" on a ZFS partition and "Everything is okay" on an ext4 partition.

The test case won't work unmodified on FreeBSD.  Try commenting out the call to "utimensat", and uncommenting the call to "utimes".  I don't guarantee that's sufficient.

If nothing else, we can be certain it's not a Python bug.
msg224810 - (view) Author: Daniel Eriksson (puppet) * Date: 2014-08-05 06:10
Result:

Before: 1407219004 988716892
 After: 1407219004 0
Before <= After 0
This is insane!
History
Date User Action Args
2014-11-01 16:05:31larstiqsetnosy: + larstiq
2014-08-05 06:10:23puppetsetmessages: + msg224810
2014-08-05 03:30:08jceasetnosy: + jcea
2014-08-05 00:09:45larrysetfiles: + c_test_case_for_issue_19838.c

messages: + msg224789
2014-08-04 22:52:56ezio.melottisetnosy: + ezio.melotti
2014-08-04 22:34:46larrysetmessages: + msg224779
2014-08-04 20:37:15larrysetmessages: + msg224770
2014-08-04 18:57:49puppetsetmessages: + msg224751
2014-08-04 18:52:24larrysetmessages: + msg224750
2014-08-04 10:28:35puppetsetmessages: + msg224706
2014-08-04 10:12:40puppetsetmessages: + msg224705
2014-08-04 09:54:08larrysetmessages: + msg224703
2014-08-04 08:13:20puppetsetmessages: + msg224698
2014-08-04 05:59:30larrysetmessages: + msg224687
2014-08-04 05:57:09puppetsetmessages: + msg224686
2014-08-04 05:50:43larrysetmessages: + msg224685
2014-08-04 05:43:11puppetsetmessages: + msg224684
2014-08-04 05:40:52larrysetmessages: + msg224683
2014-08-04 05:07:07puppetsetmessages: + msg224679
2014-08-03 21:43:08francismbsetnosy: + francismb
messages: + msg224664
2014-08-03 20:34:14larrysetmessages: + msg224656
2014-08-03 15:17:40puppetsetmessages: + msg224642
2014-08-03 14:57:26puppetsetmessages: + msg224640
2014-08-03 12:27:50Claudiu.Popasetmessages: + msg224628
2014-08-03 12:11:44puppetsetmessages: + msg224627
2014-08-03 11:19:12puppetsetnosy: + puppet
messages: + msg224625
2014-07-26 15:09:34pitrousetmessages: + msg224066
stage: patch review
2014-07-26 15:09:14larrysetmessages: + msg224064
2014-07-26 15:08:37larrysetmessages: + msg224063
2014-07-26 14:33:39larrysetnosy: + larry
messages: + msg224056
2014-06-10 13:44:49Claudiu.Popasetfiles: + issue19838.patch
keywords: + patch
messages: + msg220167
2014-06-10 07:12:26koobssetmessages: + msg220144
2014-06-10 07:03:07Claudiu.Popasetmessages: + msg220143
2013-11-30 14:41:42Claudiu.Popasetmessages: + msg204815
2013-11-30 14:39:01koobssetmessages: + msg204814
2013-11-30 14:37:51koobssetnosy: + koobs
messages: + msg204813
2013-11-30 10:19:59pitrousetmessages: + msg204795
2013-11-30 10:18:45pitrousetmessages: + msg204794
2013-11-30 09:03:59ned.deilysetnosy: + pitrou
2013-11-30 08:44:09Claudiu.Popacreate