Change os.fsync() to support physical backing store syncs #56086

sdaoden · 2011-04-19T11:35:38Z

BPO	11877
Nosy	@ronaldoussoren, @pitrou, @vstinner
Files	11877.9.diff 11877-standalone.1.diff 11877.fullsync-1.diff

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2011-04-19.11:35:37.946>
labels = ['library']
title = 'Change os.fsync() to support physical backing store syncs'
updated_at = <Date 2013-05-16.00:04:55.785>
user = 'https://bugs.python.org/sdaoden'

bugs.python.org fields:

activity = <Date 2013-05-16.00:04:55.785>
actor = 'vstinner'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Library (Lib)']
creation = <Date 2011-04-19.11:35:37.946>
creator = 'sdaoden'
dependencies = []
files = ['22016', '22046', '22759']
hgrepos = []
issue_num = 11877
keywords = ['patch']
message_count = 51.0
messages = ['134043', '134063', '134071', '134084', '134142', '134143', '134144', '134145', '134147', '134148', '134151', '134154', '134179', '134213', '134214', '134215', '134221', '134236', '134252', '134253', '134379', '135453', '135457', '135458', '135462', '135520', '135573', '135655', '135699', '135803', '135824', '135847', '135862', '135865', '135870', '136037', '136048', '136158', '136175', '136177', '136180', '136214', '136219', '136431', '136432', '140608', '140657', '140920', '140982', '141121', '189331']
nosy_count = 7.0
nosy_names = ['ronaldoussoren', 'pitrou', 'vstinner', 'nadeem.vawda', 'neologix', 'santoso.wijaya', 'sdaoden']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'open'
superseder = None
type = None
url = 'https://bugs.python.org/issue11877'
versions = ['Python 3.3']

sdaoden · 2011-04-19T11:35:38Z

bpo-11277 revealed that Mac OS X fsync() does not always
and gracefully synchronize file data to physical backing store.
To gain real fsync(2) behaviour fcntl(2) must be called with
the F_FULLFSYNC flag.

vstinner · 2011-04-19T15:04:00Z

Oh, it remembers a long story around ext3/ext4 and write barrier, with a specific problem in Firefox with SQLite.

http://developer.apple.com/library/mac/#documentation/Darwin/Reference/ManPages/man2/fsync.2.html
" For applications that require tighter guarantees about the integrity of their data, Mac OS X provides
the F_FULLFSYNC fcntl. The F_FULLFSYNC fcntl asks the drive to flush all buffered data to permanent
storage. Applications, such as databases, that require a strict ordering of writes should use F_FULLF-
SYNC to ensure that their data is written in the order they expect. Please see fcntl(2) for more
detail."

http://shaver.off.net/diary/2008/05/25/fsyncers-and-curveballs/
"fsync on Mac OS X: Since on Mac OS X the fsync command does not make the guarantee that bytes are written, SQLite sends a F_FULLFSYNC request to the kernel to ensures that the bytes are actually written through to the drive platter."

http://lwn.net/Articles/283161/

http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/2008-September/001356.html
"
OTP-7471 On Mac OS X, file:sync/1 now guarantees that all filesystem
buffers are written to the disk by using the fcntl() with
F_FULLFSYNC option. Previously, file:sync/1 called fsync(),
which only guaranteed that the data had been transferred to
the disk drive. (Thanks to Jan Lehnardt.)"

http://lists.mindrot.org/pipermail/portawiki-discuss/2005-November/000002.html

ronaldoussoren · 2011-04-19T15:44:42Z

Traditionally the functions in os are thin wrappers around OS functions, this patch changes that for os.fsync.

Two nits wrt. the patch itself:

The behaviour on OSX should be documented in the stdlib reference,
that currently says that os.fsync will call the C function fsync
The comment in the __APPLE__ case can be clearer, and should
explain the issue (as short as possible).

I'm -1 on merging this patch though, as the fsync function on OSX behaves just like the one on Linux, the fnctl option adds a stronger guarantee in that it forces a flush of the buffers on the storage device as well. The manpage for fsync on a linux host says:

In case the hard disk has write cache enabled, the data may not really be on permanent storage when fsync/fdatasync return.

Adding the F_FULLSYNC option to os.fsync would be fine though (if it isn't already implemented)

The linux

neologix · 2011-04-19T17:39:22Z

I know that POSIX makes no guarantee regarding durable writes, but IMHO that's definitely wrong, in the sense that when one calls fsync, he expects the data to be committed to disk and be durable.
Fixing this deficiency through Python's exposed fsync might thus be a good idea (for example, the Window version probably doesn't call fsync, so it's already not a direct mapping).

sdaoden · 2011-04-20T12:08:08Z

when one calls fsync, he expects [...]
Fixing this deficiency through Python's exposed fsync [...]

I think so, too.
http://pubs.opengroup.org/onlinepubs/009695399/functions/fsync.html
even permits "null implementation"s etc. etc. etc.
(I like the last paragraph of "Rationale" the most.)
os.rst states for fsync():

[...] to ensure that all internal buffers associated with *f*
are written to disk.

If a platform offers the opportunity to actually implement the
desired behaviour then i would do so, regardless of what needs to
be done internally to achieve it.

(And the single question on apple is simply what to do otherwise
with that VMS/VFS bug for at least large sparse files.
I can only imagine adding multiple notes in the documentation,
here and there.)

ronaldoussoren · 2011-04-20T12:16:39Z

See <http://www.kernel.org/doc/man-pages/online/pages/man2/fsync.2.html\> for linux' behavior, in particular: linux doesn't guarantee that data gets writting to the disk when you call fsync, only that the data gets pushed to the storage device.

This is the same behavior as fsync on OSX, and OSX also has a second API that provides stronger guarantees.

With your patch it is no longer possible to call the C function fsync on OSX, even though it is good enough for a lot of use cases. As os.fcntl already supports F_FULLSYNC I see no good reason to change the implementation of os.fsync on OSX.

BTW. The need to use F_FULLSYNC in bpo-11277 might be a bug in OSX itself, have you filed an issue in Apple's tracker?

vstinner · 2011-04-20T12:18:46Z

A different approach for this issue is to document fcntl.fcntl(fd, fcntl.F_FULLFSYNC) in fsync() documentation.

pitrou · 2011-04-20T12:38:20Z

Agreed with Ronald. If Apple thinks fsync() shouldn't flush the hardware cache, there's no reason for us to override that.

neologix · 2011-04-20T12:50:25Z

in particular: linux doesn't guarantee that data gets writting to the disk when you call fsync, only that the data gets pushed to the storage device.

Barriers are now enable by default in EXT4, and Theodore Tso has been favourable to that for quite some time now:
http://lwn.net/Articles/283288/

As for OS-X, this is definitely a bug (I mean, having to call fsync before mmap is a huge bug in itself).

sdaoden · 2011-04-20T13:32:17Z

On Wed, Apr 20, 2011 at 12:16:39PM +0000, Ronald Oussoren wrote:

This is the same behavior as fsync on OSX, and OSX also has
a second API that provides stronger guarantees.
With your patch it is no longer possible to call the C function
fsync on OSX, even though it is good enough for a lot of use
cases. As os.fcntl already supports F_FULLSYNC I see no good
reason to change the implementation of os.fsync on OSX.

I don't see it that way for multiple reasons.

One of it is that Python states that it "is a beginner language",
and the sentence i've quoted a part of in my last message is very
beginner friendly. Now i don't think you can require that much
further knowledge from a beginner at all than what is described in
that very sentence.

(Just think of an african kid with a sponsored laptop and possibly
10 minutes internet access per day which tries hard to rise up and
puts *galaxies of trust* in what it reads in os.rst, maybe because
it is the only documentation he has available!)

Then Python is a high-level language. It's true that there are
many shallow wrappers around lower-level functions, but still.
It is cross-platform. Just have a look into a, say, file I/O
implementation of a C library/program which aims to be portable.
If all that #if#endif would be necessary in a high-level language,
then why don't write a Makefile and compile that thing?
Not that much more work, then - many thanks to the GCC people!
You know - i wouldn't talk about subtleties of filesystems here.

And then - at least half a dozen of programmers with altogether
maybe many decades of experience, full-time internet access
(talking about you :) required two months to get around the Apple
bug. A simple "man 2 fsync" would have sufficed for a starter!
And how ridiculous - i've spend the free time of four days :-(
It's a shame, it's a bug, i wouldn't it let pass through to users.

Apple? No, i'm looking forward to return to a private BSD/Linux
X/ahwm/aterm/vim laptop.

sdaoden · 2011-04-20T13:47:23Z

On Wed, Apr 20, 2011 at 12:16:39PM +0000, Ronald Oussoren wrote:

This is the same behavior as fsync on OSX, and OSX also has
a second API that provides stronger guarantees.
With your patch it is no longer possible to call the C function
fsync on OSX, even though it is good enough for a lot of use
cases. As os.fcntl already supports F_FULLSYNC I see no good
reason to change the implementation of os.fsync on OSX.

I don't see it that way for multiple reasons.

One of it is that Python states that it "is a beginner language",
and the sentence i've quoted a part of in my last message is very
beginner friendly. Now i don't think you can require that much
further knowledge from a beginner at all than what is described in
that very sentence.

(Just think of an african kid with a sponsored laptop and possibly
10 minutes internet access per day which tries hard to rise up and
puts *galaxies of trust* in what it reads in os.rst, maybe because
it is the only documentation he has available!)

Then Python is a high-level language. It's true that there are
many shallow wrappers around lower-level functions, but still.
It is cross-platform. Just have a look into a, say, file I/O
implementation of a C library/program which aims to be portable.
If all that #if#endif would be necessary in a high-level language,
then why don't write a Makefile and compile that thing?
Not that much more work, then - many thanks to the GCC people!
You know - i wouldn't talk about subtleties of filesystems here.

And then - at least half a dozen of programmers with altogether
maybe many decades of experience, full-time internet access
(talking about you :) required two months to get around the Apple
bug. A simple "man 2 fsync" would have sufficed for a starter!
And how ridiculous - i've spend the free time of four days :-(
It's a shame, it's a bug, i wouldn't it let pass through to users.

Apple? No, i'm looking forward to return to a private BSD/Linux
X/ahwm/aterm/vim laptop.

sdaoden · 2011-04-20T14:06:23Z

8-}
The real message was:

On Wed, Apr 20, 2011 at 12:16:39PM +0000, Antoine Pitrou wrote:

If Apple thinks [...] there's no reason for us

Apple thinks (KernelProgramming.pdf, BPFileSystem.pdf):

Kernel code must be nearly perfect.

Kernel programming is a black art that should be avoided if at
all possible. Fortunately, kernel programming is usually
unnecessary.  You can write most software entirely in user
space.

Sparse files and zero filling. UFS supports sparse files,
which are a way for the file system to store the data in files
without storing unused space allocated for those files. HFS+
does not support sparse files and, in fact, zero-fills all
bytes allocated for a file until end-of-file.
[Steffen thinks HFS+ is driven from "user-space" through IOKit.]

POSIX says:

It would also not be unreasonable to omit testing for fsync(),
allowing it to be treated as a quality-of-implementation
issue.

It seems someone needs a hand.
Nice afternoon with an untrashed INBOX.

sdaoden · 2011-04-20T20:38:00Z

Ronald Oussoren wrote:

Adding the F_FULLSYNC option to os.fsync would be fine though

To show you that i'm not unteachable i'll attach a patch which
does that.
This approach can be heavily extended, then, e.g. by using
sync_file_range(all_the_flags) on Linux if the "full_fsync"
argument is true?! (And then there is sync_file_range2() on
PowerPC and nice ARM...)

(The first time that i use PyArg_xy plus plus - please review.
test_os is ok and using "os.fsync(xy, True)" makes test_zlib
succeed on unpatched hg checkout.
os.rst change is really ugly.)

sdaoden · 2011-04-21T14:20:11Z

I'm convinced. And i've rewritten the patch.
Now fsync(2) is always called first *regardless* of the new
optional argument. The documentation is (a little bit) better
now. And i've added support for NetBSD, AIX and Linux; though:

for AIX i'm testing (O_SYNC && O_DSYNC) which is almost
definitely the wrong test for this, but (a) i don't know when
fsync_range() has been introduced (seems to be many years) and
(b) i've never really worked on AIX.
(I only have the documentation:
http://www.filibeto.org/unix/aix/lib/rel/5.3/basetrf1.pdf.)
i've added sync_file_range() on Linux because of
SYNC_FILE_RANGE_WAIT_AFTER in the hope that it means something
even though the manual page says something else - but Linux and
documentation is something by itself.
http://lwn.net/Articles/178199/ states

Providing all three flags guarantees that those pages are
actually on disk when the call returns.
it seems OpenBSD, FreeBSD, Solaris 11 and HP/UX provide data
reliability through fsync(2) alone, see e.g. the notes near EOF
of the following Solaris 11 file:

http://www.filibeto.org/sun/lib/solaris11-express-docs/2010.11/E19963_01/html/821-1464/fcntl.h-3head.html#fcntl.h-3head

sdaoden · 2011-04-21T14:28:06Z

P.S.: dito Linux and NetBSD. I've reused preprocessor tests
we've discovered internally over the past years, but
i cannot test this here.
I could test Linux next Tuesday, though.

vstinner · 2011-04-21T14:42:31Z

11877.2.diff: On Mac OS X, fsync(fd, full_sync=True); fsync(fd) do a full sync twice. You should restore the old file flags at fsync() exit. Or this surprising behaviour should be documented.

sdaoden · 2011-04-21T15:19:23Z

Ok, 11877.3.diff uses either-or.

neologix · 2011-04-21T18:53:07Z

I'm -10 on sync_file_range on Linux:

it doesn't update the file metadata, so there's a high chance of corruption after a crash
last time I checked, it didn't flush the disk cache (well, it probably does if barriers are enabled, but that's also the case with fsync)

sdaoden · 2011-04-21T20:27:08Z

Charles-Francois Natali wrote:

I'm -10 on sync_file_range on Linux:
[...] last time I checked [...]

I just looked at
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=fs/sync.c;h=c38ec163da6ccba00a0146c75606c1b548b31343;hb=HEAD
and it seems - as far as i understand what i read - that you're
still right; and, furthermore, that fsync() does everything
anyway. (But here an idiot is talking about *very* complicated
stuff.)

I've also "search"ed for the called filemap_write_and_wait_range()
and found the commit message for
2daea67e966dc0c42067ebea015ddac6834cef88 as part of that;
very interesting in respect to our issue here.

I will wait before i update the patch though, just in case some
experienced NetBSD or AIX user posts some message. For OpenBSD
i think i can confirm that fsync(2) alone is enough after taking
a (shallow, all shallow) look into kernel/vfs_syscalls.c and
ufs/ffs/{ffs_softdep.c,softdep.h}.

neologix · 2011-04-21T21:24:19Z

and it seems - as far as i understand what i read - that you're
still right; and, furthermore, that fsync() does everything
anyway. (But here an idiot is talking about *very* complicated
stuff.)

I just double-checked, and indeed, fsync does flush the disk cache
when barriers are enabled on several FS, while sync_file_range does
not. So sync_file_range should definitely not be used on Linux.

sdaoden · 2011-04-25T11:39:09Z

I'll attach 11877.4.diff:

Docu change (i hope to be better)
Kept NetBSD after looking into
http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/kern/vfs_syscalls.c?rev=1.423&content-type=text/x-cvsweb-markup&only_with_tag=MAIN
because fsync_range() with FDISKSYNC set causes a FSYNC_CACHE
flag to be passed through to VOP_FSYNC() in addition to
FSYNC_WAIT, which is what plain fsync() passes exclusively.
(I have not furtherly discovered that. FDISKSYNC was introduced
2005. I believe existence of such a flag will sooner or later
force somebody to make a difference in wether it is set or not.)
Drop of AIX. According to the systems manual (link in
msg134213) there does not seem to be a difference in between
fsync() and fsync_range()
Drop of Linux sync_file_range() (see messages above).
By the way, does Linux still require

neologix · 2011-05-07T12:20:49Z

I'll attach 11877.4.diff

A couple comments:

static PyObject *
posix_fsync(PyObject *self, PyObject *args, PyObject *kwargs)
{
    PyObject *retval = NULL;
    auto PyObject *fdobj;
    auto int full_fsync = 1;

Why are you using the "auto" storage class specifier? I know that "explicit is better than implicit", but I really can't see a good reason for using it here (and anywhere, see http://c-faq.com/decl/auto.html).

# ifdef __APPLE__
    res = fcntl(fd, F_FULLFSYNC);
# endif
# ifdef __NetBSD__
    res = fsync_range(fd, FFILESYNC|FDISKSYNC, 0, 0);
# endif

Since __APPLE__ and __NetBSD__ are exclusive, you could use something like
# if defined(APPLE)
res = fcntl(fd, F_FULLFSYNC);
# elif defined(NetBSD)
res = fsync_range(fd, FFILESYNC|FDISKSYNC, 0, 0);
# endif

Do you really need to use goto for such a simple code?

sdaoden · 2011-05-07T12:50:07Z

On Sat, 7 May 2011 14:20:51 +0200, Charles-François Natali wrote:

I really can't see a good reason for using it here (and
anywhere, see http://c-faq.com/decl/auto.html).

You're right.

Why are you using the "auto" storage class specifier? I know
that "explicit is better than implicit"

Yup. I'm doing what is happening for real in (x86) assembler.
Thus "auto" means (at a glance!) that this one will need space on
the stack.
(Conversely i'm not using "register" because who knows if that is
really true? ;))

Do you really need to use goto for such a simple code?

Yup. Ok, this is more complicated. The reason is that my funs
have exactly one entry point and exactly one place where they're
left. This is because we here do manual instrumentalisation as in

ret
fun(args)
{
locals
s_NYD_ENTER;

assertions

...

jleave:
s_NYD_LEAVE;
return;

[maybe only, if large: possibly "predict-false" code blocks]

[possible error jumps here
goto jleave;]
}

We're prowd of that. N(ot)Y(et)D(ead) is actually pretty cool,
i've used debuggers exactly 5 times in the past about ten years!
I don't even know exactly *how to use debuggers*. 8---} (NYD can
do backtracing, or, with NDEBUG and optional, profiling.)
A really good optimizing compiler will produce the very same code!
And i love nasm, but it's pretty non-portable.
But C is also nice.

But of course i can change this (in C) to simply use return, this
is easy here, no resources to be freed.

Thanks for looking at this, by the way. :)

sdaoden · 2011-05-07T12:52:39Z

On Sat, 7 May 2011 14:20:51 +0200, Charles-François Natali wrote:

ifdef __APPLE__
res = fcntl(fd, F_FULLFSYNC);
endif

ifdef __NetBSD__
res = fsync_range(fd, FFILESYNC|FDISKSYNC, 0, 0);
endif

Since __APPLE__ and __NetBSD__ are exclusive, you could use something like

if defined(APPLE)
res = fcntl(fd, F_FULLFSYNC);
elif defined(NetBSD)
res = fsync_range(fd, FFILESYNC|FDISKSYNC, 0, 0);
endif

Yes, you're right, i'll update the patch accordingly.

�--
Steffen, sdaoden(*)(gmail.com)

sdaoden · 2011-05-07T13:37:36Z

11877.5.diff incorporates all changes suggested by
Charles-François except for the 'auto' keyword, which is extremely
important and which would need to be invented if it would not
yet exist.

I'm dropping the old stuff. And i think this is the final version
of the patch. I've changed the default argument to 'True' as
i really think it's better to be on the safe side here. (The
french are better off doing some gracy and dangerous sports to
discover edges of life!) I'm still of the opinion that this
should be completely hidden, but since it's completely transparent
wether a Python function gets yet another named argument or not...

So, thanks to Ronald, i detected that also NetBSD introduced
a FDISKSYNC flag in 2005 and that you really need fsync_range()
there (at least by definition)! How could they do that? But i'm
also happy to see that all other systems try hard to achieve
security transparently and by default and unless i missed
something once again.

ronaldoussoren · 2011-05-08T08:33:53Z

Steffen, I don't understand your comment about "auto". Declaring variables as "auto" is not necessary in C code and not used anywhere else in Python's source code.

sdaoden · 2011-05-09T10:16:30Z

Ronald Oussoren wrote (2011-05-08 10:33+0200):

Steffen, I don't understand your comment about "auto". Declaring
variables as "auto" is not necessary in C code and not used
anywhere else in Python's source code.

Well, as long as i can keep my underwear all is fine.
(I also looked at Google translate because i first wanted to start
the reply with "croak .. pip .. twist .. wrench .. groan .. ugh".)

neologix · 2011-05-09T21:36:33Z

Steffen, you changed the default to doing a "full sync" in your last patch: while I was favoring that initially, I now agree with Ronald and Antoine, and think that we shouldn't change the default behaviour. The reason being that Apple and NetBSD folks should know what they're doing, and that there might be some subtle side effects (see for example my comment on sync_file_range on Linux). Also, given how many bugs you seem to encouter on OS-X, it's probably better to stick to a syscall known to be safe instead of silently replacing it by a maybe-not-so-tested syscall. Finally, depending on the workload, it could have a significant performance impact. People requiring write durability will probably manage to find this full_sync parameter.

sdaoden · 2011-05-10T11:59:52Z

I don't agree with you and i don't believe it is implemented like
that. But it seems i am the only one on this issue who sees it
like that. Thus i apply 11877.6.diff.

Declaring variables as "auto" is not necessary in C code and not
used anywhere else in Python's source code

Changed.

Steffen, you changed the default to doing a "full sync" in your
last patch

Changed all through.

The reason being that Apple and NetBSD folks should know what
they're doing [.]
People requiring write durability will probably manage to find
this full_sync parameter.

This sounds logical. I've changed the doc in os.rst so that it
includes a note on *which* operating systems actually do something
dependend on that argument.
I should have done that from the very beginning.

Finally, depending on the workload, it could have a significant
performance impact.

:-)

sdaoden · 2011-05-11T20:47:36Z

Ouch, ouch, ouch!!
I'll have to send 11877.7.diff which extends 11877.6.diff.
This is necessary because using fcntl(2) with F_FULLFSYNC may fail
with ENOTTY (inapprobiate ioctl for device) in situations where
a normal fsync(2) succeeds (e.g. STDOUT_FILENO).
By the way - i have no idea of Redmoondian Horror at all
(except for http://msdn.microsoft.com/en-us/sync/bb887623.aspx).

Dropping .5 and .6 - and sorry for the noise.
Good night, Europe.

sdaoden · 2011-05-12T10:43:35Z

Just adding more notes on that by reactivating one of haypo's
links from bpo-8604. (And: maybe some Linux documentation should be
updated?) From Theodore Ts'o,
http://www.linuxfoundation.org/news-media/blogs/browse/2009/03/don’t-fear-fsync:

As the Eat My Data presentation points out very clearly, the only
safe way according that POSIX allows for requesting data written
to a particular file descriptor be safely stored on stable storage
is via the fsync() call.  Linux’s close(2) man page makes this
point very clearly:

A successful close does not guarantee that the data has been
successfully saved to disk, as the kernel defers writes. It is not
common for a file system to flush the buffers when the stream is
closed. If you need to be sure that the data is physically stored
use fsync(2).

Why don’t application programmers follow these sage words?  These
three reasons are most often given as excuses:

- (Perceived) performance problems with fsync()
- The application only needs atomicity, but not durability
- The fsync() causing the hard drive to spin up unnecessarily in
  laptop_mode

(Don't ask me why i'm adding this note though.
I should have searched for it once i've opened that issue?
Bah!!! Ts'o did not write that article for me. He'd better hacked.)

neologix · 2011-05-12T18:14:36Z

Calling fsync on a file descriptor referring to a tty doesn't make much sense.
On Linux, this fails with EINVAL:
$ python -c 'import os; os.fsync(1)'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
OSError: [Errno 22] Invalid argument

So if the full sync fails on ttys, it shouldn't be a problem: since
the default performs a classical fsync, this won't break compatibility
with existing code anyway.
So I think you should stick with the previous version (well, if the
full sync fails on other FDs, then it's another story, but in that
case it should just be dropped altogether if it's not reliable...).

By the way, it's "appropriate", not "approbiate". You made the same
typo in your patch.

sdaoden · 2011-05-12T20:39:10Z

[.]

OSError: [Errno 22] Invalid argument

Sorry, i didn't know that. Mac OS X (2.5 and 2.6 Apple shipped):

21:43 ~/tmp $ python2.5 -c 'import os; os.fsync(1)'; echo $?
0
21:43 ~/tmp $ python2.6 -c 'import os; os.fsync(1)'; echo $?
0
21:43 ~/tmp $ python2.7 -c 'import os; os.fsync(1)'; echo $?
0
21:43 ~/tmp $ python3 -c 'import os; os.fsync(1)'; echo $?
0

sdaoden · 2011-05-12T20:56:55Z

So I think you should stick with the previous version (well, if the
full sync fails on other FDs, then it's another story, but in that
case it should just be dropped altogether if it's not reliable...).

Strong stuff.
*This* is the version which should have been implemented from the
beginning, but Apple states

F_FULLFSYNC Does the same thing as fsync(2) then asks the drive to
flush all buffered data to the permanent storage
device (arg is ignored). This is currently
implemented on HFS, MS-DOS (FAT), and Universal Disk
Format (UDF) file systems.

and i thought

fsync (maybe move buffers to Queue; do reorder Queue as approbiate)
do call fsys impl. to .. whatever
That's why i had a version of the patch which did 'fsync();fcntl();'
because it would have been an additional syscall but the fsync()
part would possibly have been essentially "skipped over" ..unless..
Linux RC scripts had 'sync && sync && sync' but it does not seem
to be necessary any more (was it ever - i don't know).

But who knows if that fcntl will fail on some non-noted fsys?
I think it's better to be on the safe side.
Quoting you, Charles-François

People requiring write durability will probably manage to find
this full_sync parameter
and if they do they thus really strive for data integrity, so call
fsync() as a fallback for the security which Apple provides.
Also: we cannot let os.fsync() fail with ENOTTY!?

By the way, it's "appropriate", not "approbiate". You made the same
typo in your patch.

8~I That was not a typo. Thanks.
I'll changed that.

neologix · 2011-05-12T21:40:30Z

and if they do they thus really strive for data integrity, so call
fsync() as a fallback for the security which Apple provides.

Why?
If I ask a full sync and it fails, I'd rather have an error returned so that I can take the appropriate decision (abort, roll-back, try a standard fsync) rather than have Python silently replace it by an fsync.

Also: we cannot let os.fsync() fail with ENOTTY!?

Why not, since that's what the kernel returns?
Once again, since the default behaviour doesn't change, this won't break any existing application.

sdaoden · 2011-05-15T16:54:07Z

Finally, depending on the workload, it could have a significant
performance impact.

Oh yes (first replaces os.fsync() with an in-python wrapper):

18:12 ~/tmp $ ll mail
ls: mail: No such file or directory
18:12 ~/tmp $ ll X-MAIL
312 -rw-r----- 1 steffen staff 315963 15 May 17:49 X-MAIL
18:12 ~/tmp $ time s-postman.py --folder=~/tmp/mail --dispatch=X-MAIL
Dispatched 37 tickets to 4 targets.

real 0m4.638s
user 0m0.974s
sys 0m0.160s
18:13 ~/tmp $ rm -rf mail
18:13 ~/tmp $ time s-postman.py --folder=~/tmp/mail --dispatch=X-MAIL
Dispatched 37 tickets to 4 targets.

real 0m1.228s
user 0m0.976s
sys 0m0.122s

(I'm using the first one.)

neologix · 2011-05-15T20:11:28Z

(I'm not sure Rietveld sent the message so I post it here, sorry in case of duplicate).

Steffen, I've made a quick review of your patch, in case you're interested.
I think that this functionality can be really useful to some people, and it'd be
nice if your patch could stabilize somewhat so that committers can review it
properly and eventually merge it.

Concerning your benchmark:
I don't know exactly what you're measuring, but when performing I/O-related
benchmarks, it's always a good idea to run each test several times in a row, or
flush the page/buffer cache before each run: the reason is that the the second
run benefits from the page/buffer cache, which often speeds things up
dramatically.
Example:

# time find /lib -type f -exec cat {} \; > /dev/null
real 0m20.455s
user 0m8.145s
sys 0m5.256s

# time find /lib -type f -exec cat {} \; > /dev/null
real 0m6.827s
user 0m8.477s
sys 0m4.804s

# echo 3 > /proc/sys/vm/drop_caches

# time find /lib -type f -exec cat {} \; > /dev/null
real 0m19.954s
user 0m8.069s
sys 0m5.364s

sdaoden · 2011-05-17T13:49:33Z

Thank you, thank you, thank you.
I'm a bit irritated that a french man treats a "wet-her" as a typo!
What if *i* like it?? In fact it is a fantastic physical backing
store. Unbeatable.

Well and about dropping the fsync() in case the fcntl() fails with
ENOTTY. This is "<Esc>2dd", which shouldn't hurt a committer.
I'm convinced that full_fsync=False is optional and false by
default, but i don't trust Apple. I've seen a reference to an
"atomic file" somewhere on bugs.python.org and that does fsync()
first followed by fcntl() if FULLFSYNC is available. Thus, if
someone knows about that, she may do so, but otherwise i would
guess he doesn't, and in that case i would not expect ENOTTY from
an fsync() - still i want a full flush!
This is what NetBSD describes:

NOTES
For optimal efficiency, the fsync_range() call requires that
the file system containing the file referenced by fd support
partial synchronization of file data. For file systems which
do not support partial synchronization, the entire file will
be synchronized and the call will be the equivalent of calling
fsync().

But Apple is *soooo*speeeeciaaaal* again. Happy birthday.

sdaoden · 2011-05-17T20:17:23Z

I've dropped wet-her!
I hope now you're satisfied!!!!!
So the buffer cache is all which remains hot.
How deserted!

And you could also add a test (I guess that just calling fsync
with full_sync=True on a valid FD would be enough.

I was able to add two tests as an extension to what is yet tested
about os.fsync(), but that uses an invalid fd.
(At least it enters the conditional and fails as expected.)

I'm not sure static is necessary, I'd rather make it const.

Yes..

This code is correct as it is, see other extension modules in
the stdlib for other examples of this pattern

..but i've used copy+paste here.

And you could also add a test (I guess that just calling fsync
with full_sync=True on a valid FD would be enough.

The alternative would be that full_sync

Ok, i've renamed full_fsync to full_sync.

pitrou · 2011-05-17T20:33:17Z

I've dropped wet-her!
I hope now you're satisfied!!!!!
So the buffer cache is all which remains hot.
How deserted!

I'm not sure I'm always understanding your messages well (I'm not a
native English speaker), but I don't think this kind of joke is
appropriate for the bug tracker.
Thank you.

vstinner · 2011-05-17T20:59:39Z

I don't like the API because it gives a different behaviour depending on the OS and I don't see how to check that the function does really a full sync.

I would prefer a new option os.fullsync() function which is like your os.fsync(fd, full_sync=False), except that the function doesn't exist if the OS doesn't implement it.

sdaoden · 2011-05-18T09:17:29Z

Excusing myself seems to be the only "probates Mittel".
@antoine Pitrou: It was a real shame to read your mail.
(It's sometimes so loud that "i don't even hear what i write".)

vstinner · 2011-05-18T10:23:08Z

See also issue bpo-12102.

sdaoden · 2011-05-21T12:22:52Z

diff --git a/Doc/library/os.rst b/Doc/library/os.rst
--- a/Doc/library/os.rst
+++ b/Doc/library/os.rst
@@ -810,6 +810,35 @@
Availability: Unix, and Windows.

+.. function:: fullfsync(fd)
+
+ The POSIX standart requires that :c:func:`fsync` must transfer the buffered
+ data to the storage device, not that the data is actually written by the
+ device itself. It explicitely leaves it up to operating system implementors
+ whether users are given stronger guarantees on data integrity or not. Some
+ systems also offer special functions which overtake the part of making such
+ stronger guarantees, i.e., Mac OS X and NetBSD.
+
+ This non-standart function is *optionally* made available to access such
+ special functionality when feasible. It will force write of file buffers to
+ disk and the flush of disk caches of the file given by file descriptor *fd*.
+ To strive for best-possible data integrity, the following can be done::
+
+ # Force writeout of local buffer modifications
+ f.flush()
+ # Then synchronize the changes to physical backing store
+ if hasattr(os, 'fullfsync'):
+ os.fullfsync(f.fileno())
+ elif hasattr(os, 'fsync'):
+ os.fsync(f.fileno())
+
+ ..note::
+ Calling this function may take a long time, since it may block
+ until the disk reports that the transfer has been completed.
+
+ Availability: Unix.
+
+
.. function:: ftruncate(fd, length)

Truncate the file corresponding to file descriptor \*fd*, so that it is at most

diff --git a/Lib/test/test_os.py b/Lib/test/test_os.py
--- a/Lib/test/test_os.py
+++ b/Lib/test/test_os.py
@@ -835,12 +835,12 @@

 class TestInvalidFD(unittest.TestCase):
     singles = ["fchdir", "dup", "fdopen", "fdatasync", "fstat",
-               "fstatvfs", "fsync", "tcgetpgrp", "ttyname"]
+               "fstatvfs", "fsync", "fullfsync", "tcgetpgrp", "ttyname"]
     #singles.append("close")
-    #We omit close because it doesn'r raise an exception on some platforms
+    # We omit close because it doesn't raise an exception on some platforms
     def get_single(f):
         def helper(self):
-            if  hasattr(os, f):
+            if hasattr(os, f):
                 self.check(getattr(os, f))
         return helper
     for f in singles:
diff --git a/Modules/posixmodule.c b/Modules/posixmodule.c
--- a/Modules/posixmodule.c
+++ b/Modules/posixmodule.c
@@ -174,6 +174,11 @@
 #endif /* ! __IBMC__ */

 #ifndef _MSC_VER
+  /* os.fullfsync()? */
+# if (defined HAVE_FSYNC && ((defined __APPLE__ && defined F_FULLFSYNC) || \
+                             (defined __NetBSD__ && defined FDISKSYNC)))
+#  define PROVIDE_FULLFSYNC
+# endif

 #if defined(__sgi)&&_COMPILER_VERSION>=700
 /* declare ctermid_r if compiling with MIPSPro 7.x in ANSI C mode
@@ -2129,6 +2134,41 @@
 {
     return posix_fildes(fdobj, fsync);
 }
+
+# ifdef PROVIDE_FULLFSYNC
+PyDoc_STRVAR(fullfsync__doc__,
+"fullfsync(fd)\n\n"
+"force write of file buffers to disk, and the flush of disk caches\n"
+"of the file given by file descriptor fd.");
+
+static PyObject *
+fullfsync(PyObject *self, PyObject *fdobj)
+{
+    /* See issue 11877 discussion */
+    int res, fd = PyObject_AsFileDescriptor(fdobj);
+    if (fd < 0)
+        return NULL;
+    if (!_PyVerify_fd(fd))
+        return posix_error();
+
+    Py_BEGIN_ALLOW_THREADS
+#  if defined __APPLE__
+    /* F_FULLFSYNC is not supported for all types of FDs/FSYSs;
+     * be on the safe side and test for inappropriate ioctl errors */
+    res = fcntl(fd, F_FULLFSYNC);
+    if (res < 0 && errno == ENOTTY)
+        res = fsync(fd);
+#  elif defined __NetBSD__
+    res = fsync_range(fd, FFILESYNC | FDISKSYNC, 0, 0);
+#  endif
+    Py_END_ALLOW_THREADS
+
+    if (res < 0)
+        return posix_error();
+    Py_INCREF(Py_None);
+    return Py_None;
+}
+# endif /* PROVIDE_FULLFSYNC */
 #endif /* HAVE_FSYNC */

 #ifdef HAVE_SYNC
@@ -9473,6 +9513,9 @@
 #endif
 #ifdef HAVE_FSYNC
     {"fsync",       posix_fsync, METH_O, posix_fsync__doc__},
+# ifdef PROVIDE_FULLFSYNC
+    {"fullfsync",   fullfsync, METH_O, fullfsync__doc__},
+# endif
 #endif
 #ifdef HAVE_SYNC
     {"sync",        posix_sync, METH_NOARGS, posix_sync__doc__},

sdaoden · 2011-05-21T12:36:13Z

(This was an attachment to an empty mail message.)

neologix · 2011-07-18T17:26:30Z

Trying to revive this issue, here's a comment I left on Rietveld:

"""

I don't agree, the documentation states that full_sync will cause a flush to
stable storage where supported and a plain fsync where it isn't. This code
does
just that.

I think I wasn't clear enough, so let me rephrase: the call to fcntl is guarded
by #ifdef, so it is only called on platforms where it is supported, otherwise
fsync() is called. What I was referring to is the fact that an fsync is tried if
the call to fcntl fails with ENOTTY: if I ask my operating system to flush disk
buffers and it can't, then I'd rather have an error returned, so that I can take
the right decision (abort, roll-back, try a classical fsync). IMHO, silently
masking an error by falling back to another syscall is wrong.
"""

Anyway, I think this patch can be useful (see for example issue bpo-8604 as a typical use case). I personally don't care since I use Linux, but OS-X and *BSD folks might find some interest in this.

sdaoden · 2011-07-19T12:22:32Z

Here is something unsorted and loose:

@neologix:
One could argue that something had happened before the fsync(2),
so that code which blindly did so is too dumb to do any right
decision anyway. Even PEP-3151 won't help.
I favour haypos fullsync() approach, because that could probably
make it into it. Yet Python doesn't offer any possibility to
access NetBSD DISKSYNC stuff sofar.
Currently the programmer must be aware of any platform specific
problems. I, for example, am not aware of Windows. How can
i give any guarantee to users which (will) use my S-Postman on
Windows? I need to put trust in turn into the Framework i am
using - Python. And that makes me feel pretty breathless.
Fortunately Python is dynamic, so that one simply can replace
os.fsync(). Works once only though (think signal handlers :=).

+ That is indeed the solution i'm using for my S-Postman,
because *only* like this i can actually make Python's
mailbox.py module reliable on Mac OS X! I can't give any
guarantee for NetBSD, though i document it!

+ Aaarrg! I'm a liar!! I lie about - data integrity!!!

--Steffen
Ciao, sdaoden(*)(gmail.com)
() ascii ribbon campaign - against html e-mail
/\ www.asciiribbon.org - against proprietary attachments

neologix · 2011-07-22T22:26:03Z

One could argue that something had happened before the fsync(2),
so that code which blindly did so is too dumb to do any right
decision anyway. Even PEP-3151 won't help.

I don't understand. If the syscall supposed to flush the disk's buffer
cache fails - be it fcntl() or sync_file_range() - I think the error
should be propagated, not silently ignored and replaced with another
syscall which doesn't have the same semantic. That's all.

I favour haypos fullsync() approach, because that could probably
make it into it. Yet Python doesn't offer any possibility to
access NetBSD DISKSYNC stuff sofar.

Are you willing to update your patch accordingly?

Currently the programmer must be aware of any platform specific
problems. I, for example, am not aware of Windows. How can
i give any guarantee to users which (will) use my S-Postman on
Windows? I need to put trust in turn into the Framework i am
using - Python. And that makes me feel pretty breathless.

I agree.

Fortunately Python is dynamic, so that one simply can replace
os.fsync(). Works once only though (think signal handlers :=).

Yes, but since it's a fairly reasonable and useful feature, I think it
could be merged.

Aaarrg! I'm a liar!! I lie about - data integrity!!!

Well, actually, some hard disks lie about this too :-)

sdaoden · 2011-07-23T12:36:38Z

> Even PEP-3151 won't help.

I don't understand. If the syscall supposed to flush the disk's buffer
cache fails - be it fcntl() or sync_file_range() - I think the error
should be propagated, not silently ignored and replaced with another
syscall which doesn't have the same semantic. That's all.

I'm with you theoretically - of course errors should be propagated
to users so that they can act accordingly.
And this is exactly what the patches do, *unless* it is an error
which is not produced by the native fsync(2) call:

-- >8 --
?0%0[steffen@sherwood tmp]$ cat t.c 
#include <errno.h>
#include <fcntl.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>
int main(void) {
	int r  = fcntl(2, F_FULLFSYNC);
	fprintf(stderr, "1. %d: %d, %s\n", r, errno, strerror(errno));
    errno = 0;
	r = fsync(2);
	fprintf(stderr, "2. %d: %d, %s\n", r, errno, strerror(errno));
    return 0;
}
?0%0[steffen@sherwood tmp]$ gcc -o t t.c && ./t
1. -1: 25, Inappropriate ioctl for device
2. 0: 0, Unknown error: 0
?0%0[steffen@sherwood tmp]$ grep -F 25 /usr/include/sys/errno.h 
#define ENOTTY      25      /* Inappropriate ioctl for device */
-- >8 --

So in fact the patches do what is necessary to make the changed
version act just as the plain systemcall.

> - I favour haypos fullsync() approach
Are you willing to update your patch accordingly?

Both patches still apply onto the tip of friday noon:
http://bugs.python.org/file22016/11877.9.diff,
http://bugs.python.org/file22046/11877-standalone.1.diff.

Again: i personally would favour os.fsync(fd, fullsync=True), because
that is the only way to put reliability onto unaware facilities
unaware (e.g. my S-Postman replaces os.fsync() with a special function
so that reliability is injected in- and onto Python's mailbox.py,
which calls plain os.fsync()), but i've been convinced that this is
impossible to do. It seems to be impossible to change os.fsync()
at all, because it has a standartized function prototype.

So what do you mean? Shall i rewrite 11877-standalone.1.diff to
always offer fullsync() whenever there is fsync()? This sounds to
be a useful change, because testing hasattr() of the one would
imply availability of the other.

> + Aaarrg! I'm a liar!! I lie about - data integrity!!!
Well, actually, some hard disks lie about this too :-)

Yeah. But hey:
"I feel save in New York City. I feel save in New York City."
Nice weekend - and may the juice be with you!

--Steffen
Ciao, sdaoden(*)(gmail.com)
ASCII ribbon campaign ( ) More nuclear fission plants
against HTML e-mail X can serve more coloured
and proprietary attachments / \ and sounding animations

sdaoden · 2011-07-25T20:27:17Z

Are you willing to update your patch accordingly?

I'm a vain rooster! I've used fullfsync() in 11877-standalone.1.diff!
Sorry for that, haypo. :-/

11877.fullsync-1.diff uses fullsync() and will instead always be
provided when fsync() is available, to which it will fall back if
no special operating system functionality is available.

I really think this is the cleanest solution, because like this
a user can state "i want the strongest guarantees available on
data integrity", and Python does just that.

--Steffen
Ciao, sdaoden(*)(gmail.com)
ASCII ribbon campaign ( ) More nuclear fission plants
against HTML e-mail X can serve more coloured
and proprietary attachments / \ and sounding animations

vstinner · 2013-05-16T00:04:56Z

What is the status of this issue? This issue is interesting in the implementation of bpo-8604 (add shutil.atomic_write()).

sdaoden mannequin added the stdlib Python modules in the Lib dir label Apr 19, 2011

sdaoden mannequin changed the title ~~Mac OS X fsync() should really be fcntl(F_FULLFSYNC)~~ Change os.fsync() to support physical backing store syncs Apr 21, 2011

ezio-melotti transferred this issue from another repository Apr 10, 2022

Change os.fsync() to support physical backing store syncs #56086

Change os.fsync() to support physical backing store syncs #56086

Comments

sdaoden mannequin commented Apr 19, 2011

sdaoden mannequin commented Apr 19, 2011

vstinner commented Apr 19, 2011

ronaldoussoren commented Apr 19, 2011

neologix mannequin commented Apr 19, 2011

sdaoden mannequin commented Apr 20, 2011

ronaldoussoren commented Apr 20, 2011

vstinner commented Apr 20, 2011

pitrou commented Apr 20, 2011

neologix mannequin commented Apr 20, 2011

sdaoden mannequin commented Apr 20, 2011

sdaoden mannequin commented Apr 20, 2011

sdaoden mannequin commented Apr 20, 2011

sdaoden mannequin commented Apr 20, 2011

sdaoden mannequin commented Apr 21, 2011

sdaoden mannequin commented Apr 21, 2011

vstinner commented Apr 21, 2011

sdaoden mannequin commented Apr 21, 2011

neologix mannequin commented Apr 21, 2011

sdaoden mannequin commented Apr 21, 2011

neologix mannequin commented Apr 21, 2011

sdaoden mannequin commented Apr 25, 2011

neologix mannequin commented May 7, 2011

sdaoden mannequin commented May 7, 2011

sdaoden mannequin commented May 7, 2011

ifdef __APPLE__

endif

ifdef __NetBSD__

endif

if defined(APPLE)

elif defined(NetBSD)

endif

sdaoden mannequin commented May 7, 2011

ronaldoussoren commented May 8, 2011

sdaoden mannequin commented May 9, 2011

neologix mannequin commented May 9, 2011

sdaoden mannequin commented May 10, 2011

sdaoden mannequin commented May 11, 2011

sdaoden mannequin commented May 12, 2011

neologix mannequin commented May 12, 2011

sdaoden mannequin commented May 12, 2011

sdaoden mannequin commented May 12, 2011

neologix mannequin commented May 12, 2011

sdaoden mannequin commented May 15, 2011

neologix mannequin commented May 15, 2011

sdaoden mannequin commented May 17, 2011

sdaoden mannequin commented May 17, 2011

pitrou commented May 17, 2011

vstinner commented May 17, 2011

sdaoden mannequin commented May 18, 2011

vstinner commented May 18, 2011

sdaoden mannequin commented May 21, 2011

sdaoden mannequin commented May 21, 2011

neologix mannequin commented Jul 18, 2011

sdaoden mannequin commented Jul 19, 2011

neologix mannequin commented Jul 22, 2011

sdaoden mannequin commented Jul 23, 2011

sdaoden mannequin commented Jul 25, 2011

vstinner commented May 16, 2013

ifdef APPLE

ifdef NetBSD