Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change os.fsync() to support physical backing store syncs #56086

Open
sdaoden mannequin opened this issue Apr 19, 2011 · 51 comments
Open

Change os.fsync() to support physical backing store syncs #56086

sdaoden mannequin opened this issue Apr 19, 2011 · 51 comments
Labels
stdlib Python modules in the Lib dir

Comments

@sdaoden
Copy link
Mannequin

sdaoden mannequin commented Apr 19, 2011

BPO 11877
Nosy @ronaldoussoren, @pitrou, @vstinner
Files
  • 11877.9.diff
  • 11877-standalone.1.diff
  • 11877.fullsync-1.diff
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2011-04-19.11:35:37.946>
    labels = ['library']
    title = 'Change os.fsync() to support physical backing store syncs'
    updated_at = <Date 2013-05-16.00:04:55.785>
    user = 'https://bugs.python.org/sdaoden'

    bugs.python.org fields:

    activity = <Date 2013-05-16.00:04:55.785>
    actor = 'vstinner'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Library (Lib)']
    creation = <Date 2011-04-19.11:35:37.946>
    creator = 'sdaoden'
    dependencies = []
    files = ['22016', '22046', '22759']
    hgrepos = []
    issue_num = 11877
    keywords = ['patch']
    message_count = 51.0
    messages = ['134043', '134063', '134071', '134084', '134142', '134143', '134144', '134145', '134147', '134148', '134151', '134154', '134179', '134213', '134214', '134215', '134221', '134236', '134252', '134253', '134379', '135453', '135457', '135458', '135462', '135520', '135573', '135655', '135699', '135803', '135824', '135847', '135862', '135865', '135870', '136037', '136048', '136158', '136175', '136177', '136180', '136214', '136219', '136431', '136432', '140608', '140657', '140920', '140982', '141121', '189331']
    nosy_count = 7.0
    nosy_names = ['ronaldoussoren', 'pitrou', 'vstinner', 'nadeem.vawda', 'neologix', 'santoso.wijaya', 'sdaoden']
    pr_nums = []
    priority = 'normal'
    resolution = None
    stage = None
    status = 'open'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue11877'
    versions = ['Python 3.3']

    @sdaoden
    Copy link
    Mannequin Author

    sdaoden mannequin commented Apr 19, 2011

    bpo-11277 revealed that Mac OS X fsync() does not always
    and gracefully synchronize file data to physical backing store.
    To gain real fsync(2) behaviour fcntl(2) must be called with
    the F_FULLFSYNC flag.

    @sdaoden sdaoden mannequin added the stdlib Python modules in the Lib dir label Apr 19, 2011
    @vstinner
    Copy link
    Member

    Oh, it remembers a long story around ext3/ext4 and write barrier, with a specific problem in Firefox with SQLite.

    http://developer.apple.com/library/mac/#documentation/Darwin/Reference/ManPages/man2/fsync.2.html
    " For applications that require tighter guarantees about the integrity of their data, Mac OS X provides
    the F_FULLFSYNC fcntl. The F_FULLFSYNC fcntl asks the drive to flush all buffered data to permanent
    storage. Applications, such as databases, that require a strict ordering of writes should use F_FULLF-
    SYNC to ensure that their data is written in the order they expect. Please see fcntl(2) for more
    detail."

    http://shaver.off.net/diary/2008/05/25/fsyncers-and-curveballs/
    "fsync on Mac OS X: Since on Mac OS X the fsync command does not make the guarantee that bytes are written, SQLite sends a F_FULLFSYNC request to the kernel to ensures that the bytes are actually written through to the drive platter."

    http://lwn.net/Articles/283161/

    http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/2008-September/001356.html
    "
    OTP-7471 On Mac OS X, file:sync/1 now guarantees that all filesystem
    buffers are written to the disk by using the fcntl() with
    F_FULLFSYNC option. Previously, file:sync/1 called fsync(),
    which only guaranteed that the data had been transferred to
    the disk drive. (Thanks to Jan Lehnardt.)"

    http://lists.mindrot.org/pipermail/portawiki-discuss/2005-November/000002.html

    @ronaldoussoren
    Copy link
    Contributor

    Traditionally the functions in os are thin wrappers around OS functions, this patch changes that for os.fsync.

    Two nits wrt. the patch itself:

    1. The behaviour on OSX should be documented in the stdlib reference,
      that currently says that os.fsync will call the C function fsync

    2. The comment in the __APPLE__ case can be clearer, and should
      explain the issue (as short as possible).

    I'm -1 on merging this patch though, as the fsync function on OSX behaves just like the one on Linux, the fnctl option adds a stronger guarantee in that it forces a flush of the buffers on the storage device as well. The manpage for fsync on a linux host says:

    In case the hard disk has write cache enabled, the data may not really be on permanent storage when fsync/fdatasync return.

    Adding the F_FULLSYNC option to os.fsync would be fine though (if it isn't already implemented)

    The linux

    @neologix
    Copy link
    Mannequin

    neologix mannequin commented Apr 19, 2011

    I know that POSIX makes no guarantee regarding durable writes, but IMHO that's definitely wrong, in the sense that when one calls fsync, he expects the data to be committed to disk and be durable.
    Fixing this deficiency through Python's exposed fsync might thus be a good idea (for example, the Window version probably doesn't call fsync, so it's already not a direct mapping).

    @sdaoden
    Copy link
    Mannequin Author

    sdaoden mannequin commented Apr 20, 2011

    when one calls fsync, he expects [...]
    Fixing this deficiency through Python's exposed fsync [...]

    I think so, too.
    http://pubs.opengroup.org/onlinepubs/009695399/functions/fsync.html
    even permits "null implementation"s etc. etc. etc.
    (I like the last paragraph of "Rationale" the most.)
    os.rst states for fsync():

    [...] to ensure that all internal buffers associated with *f*
    are written to disk.
    

    If a platform offers the opportunity to actually implement the
    desired behaviour then i would do so, regardless of what needs to
    be done internally to achieve it.

    (And the single question on apple is simply what to do otherwise
    with that VMS/VFS bug for at least large sparse files.
    I can only imagine adding multiple notes in the documentation,
    here and there.)

    @ronaldoussoren
    Copy link
    Contributor

    See <http://www.kernel.org/doc/man-pages/online/pages/man2/fsync.2.html\> for linux' behavior, in particular: linux doesn't guarantee that data gets writting to the disk when you call fsync, only that the data gets pushed to the storage device.

    This is the same behavior as fsync on OSX, and OSX also has a second API that provides stronger guarantees.

    With your patch it is no longer possible to call the C function fsync on OSX, even though it is good enough for a lot of use cases. As os.fcntl already supports F_FULLSYNC I see no good reason to change the implementation of os.fsync on OSX.

    BTW. The need to use F_FULLSYNC in bpo-11277 might be a bug in OSX itself, have you filed an issue in Apple's tracker?

    @vstinner
    Copy link
    Member

    A different approach for this issue is to document fcntl.fcntl(fd, fcntl.F_FULLFSYNC) in fsync() documentation.

    @pitrou
    Copy link
    Member

    pitrou commented Apr 20, 2011

    Agreed with Ronald. If Apple thinks fsync() shouldn't flush the hardware cache, there's no reason for us to override that.

    @neologix
    Copy link
    Mannequin

    neologix mannequin commented Apr 20, 2011

    in particular: linux doesn't guarantee that data gets writting to the disk when you call fsync, only that the data gets pushed to the storage device.

    Barriers are now enable by default in EXT4, and Theodore Tso has been favourable to that for quite some time now:
    http://lwn.net/Articles/283288/

    As for OS-X, this is definitely a bug (I mean, having to call fsync before mmap is a huge bug in itself).

    @sdaoden
    Copy link
    Mannequin Author

    sdaoden mannequin commented Apr 20, 2011

    On Wed, Apr 20, 2011 at 12:16:39PM +0000, Ronald Oussoren wrote:

    This is the same behavior as fsync on OSX, and OSX also has
    a second API that provides stronger guarantees.
    With your patch it is no longer possible to call the C function
    fsync on OSX, even though it is good enough for a lot of use
    cases. As os.fcntl already supports F_FULLSYNC I see no good
    reason to change the implementation of os.fsync on OSX.

    I don't see it that way for multiple reasons.

    One of it is that Python states that it "is a beginner language",
    and the sentence i've quoted a part of in my last message is very
    beginner friendly. Now i don't think you can require that much
    further knowledge from a beginner at all than what is described in
    that very sentence.

    (Just think of an african kid with a sponsored laptop and possibly
    10 minutes internet access per day which tries hard to rise up and
    puts *galaxies of trust* in what it reads in os.rst, maybe because
    it is the only documentation he has available!)

    Then Python is a high-level language. It's true that there are
    many shallow wrappers around lower-level functions, but still.
    It is cross-platform. Just have a look into a, say, file I/O
    implementation of a C library/program which aims to be portable.
    If all that #if#endif would be necessary in a high-level language,
    then why don't write a Makefile and compile that thing?
    Not that much more work, then - many thanks to the GCC people!
    You know - i wouldn't talk about subtleties of filesystems here.

    And then - at least half a dozen of programmers with altogether
    maybe many decades of experience, full-time internet access
    (talking about you :) required two months to get around the Apple
    bug. A simple "man 2 fsync" would have sufficed for a starter!
    And how ridiculous - i've spend the free time of four days :-(
    It's a shame, it's a bug, i wouldn't it let pass through to users.

    Apple? No, i'm looking forward to return to a private BSD/Linux
    X/ahwm/aterm/vim laptop.

    1 similar comment
    @sdaoden
    Copy link
    Mannequin Author

    sdaoden mannequin commented Apr 20, 2011

    On Wed, Apr 20, 2011 at 12:16:39PM +0000, Ronald Oussoren wrote:

    This is the same behavior as fsync on OSX, and OSX also has
    a second API that provides stronger guarantees.
    With your patch it is no longer possible to call the C function
    fsync on OSX, even though it is good enough for a lot of use
    cases. As os.fcntl already supports F_FULLSYNC I see no good
    reason to change the implementation of os.fsync on OSX.

    I don't see it that way for multiple reasons.

    One of it is that Python states that it "is a beginner language",
    and the sentence i've quoted a part of in my last message is very
    beginner friendly. Now i don't think you can require that much
    further knowledge from a beginner at all than what is described in
    that very sentence.

    (Just think of an african kid with a sponsored laptop and possibly
    10 minutes internet access per day which tries hard to rise up and
    puts *galaxies of trust* in what it reads in os.rst, maybe because
    it is the only documentation he has available!)

    Then Python is a high-level language. It's true that there are
    many shallow wrappers around lower-level functions, but still.
    It is cross-platform. Just have a look into a, say, file I/O
    implementation of a C library/program which aims to be portable.
    If all that #if#endif would be necessary in a high-level language,
    then why don't write a Makefile and compile that thing?
    Not that much more work, then - many thanks to the GCC people!
    You know - i wouldn't talk about subtleties of filesystems here.

    And then - at least half a dozen of programmers with altogether
    maybe many decades of experience, full-time internet access
    (talking about you :) required two months to get around the Apple
    bug. A simple "man 2 fsync" would have sufficed for a starter!
    And how ridiculous - i've spend the free time of four days :-(
    It's a shame, it's a bug, i wouldn't it let pass through to users.

    Apple? No, i'm looking forward to return to a private BSD/Linux
    X/ahwm/aterm/vim laptop.

    @sdaoden
    Copy link
    Mannequin Author

    sdaoden mannequin commented Apr 20, 2011

    8-}
    The real message was:

    On Wed, Apr 20, 2011 at 12:16:39PM +0000, Antoine Pitrou wrote:

    If Apple thinks [...] there's no reason for us

    Apple thinks (KernelProgramming.pdf, BPFileSystem.pdf):

    Kernel code must be nearly perfect.
    
    Kernel programming is a black art that should be avoided if at
    all possible. Fortunately, kernel programming is usually
    unnecessary.  You can write most software entirely in user
    space.
    
    Sparse files and zero filling. UFS supports sparse files,
    which are a way for the file system to store the data in files
    without storing unused space allocated for those files. HFS+
    does not support sparse files and, in fact, zero-fills all
    bytes allocated for a file until end-of-file.
    [Steffen thinks HFS+ is driven from "user-space" through IOKit.]
    

    POSIX says:

    It would also not be unreasonable to omit testing for fsync(),
    allowing it to be treated as a quality-of-implementation
    issue.
    

    It seems someone needs a hand.
    Nice afternoon with an untrashed INBOX.

    @sdaoden
    Copy link
    Mannequin Author

    sdaoden mannequin commented Apr 20, 2011

    Ronald Oussoren wrote:

    Adding the F_FULLSYNC option to os.fsync would be fine though

    To show you that i'm not unteachable i'll attach a patch which
    does that.
    This approach can be heavily extended, then, e.g. by using
    sync_file_range(all_the_flags) on Linux if the "full_fsync"
    argument is true?! (And then there is sync_file_range2() on
    PowerPC and nice ARM...)

    (The first time that i use PyArg_xy plus plus - please review.
    test_os is ok and using "os.fsync(xy, True)" makes test_zlib
    succeed on unpatched hg checkout.
    os.rst change is really ugly.)

    @sdaoden
    Copy link
    Mannequin Author

    sdaoden mannequin commented Apr 21, 2011

    I'm convinced. And i've rewritten the patch.
    Now fsync(2) is always called first *regardless* of the new
    optional argument. The documentation is (a little bit) better
    now. And i've added support for NetBSD, AIX and Linux; though:

    @sdaoden sdaoden mannequin changed the title Mac OS X fsync() should really be fcntl(F_FULLFSYNC) Change os.fsync() to support physical backing store syncs Apr 21, 2011
    @sdaoden
    Copy link
    Mannequin Author

    sdaoden mannequin commented Apr 21, 2011

    P.S.: dito Linux and NetBSD. I've reused preprocessor tests
    we've discovered internally over the past years, but
    i cannot test this here.
    I could test Linux next Tuesday, though.

    @vstinner
    Copy link
    Member

    11877.2.diff: On Mac OS X, fsync(fd, full_sync=True); fsync(fd) do a full sync twice. You should restore the old file flags at fsync() exit. Or this surprising behaviour should be documented.

    @sdaoden
    Copy link
    Mannequin Author

    sdaoden mannequin commented Apr 21, 2011

    Ok, 11877.3.diff uses either-or.

    @neologix
    Copy link
    Mannequin

    neologix mannequin commented Apr 21, 2011

    I'm -10 on sync_file_range on Linux:

    • it doesn't update the file metadata, so there's a high chance of corruption after a crash
    • last time I checked, it didn't flush the disk cache (well, it probably does if barriers are enabled, but that's also the case with fsync)

    @sdaoden
    Copy link
    Mannequin Author

    sdaoden mannequin commented Apr 21, 2011

    Charles-Francois Natali wrote:

    I'm -10 on sync_file_range on Linux:
    [...] last time I checked [...]

    I just looked at
    http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=fs/sync.c;h=c38ec163da6ccba00a0146c75606c1b548b31343;hb=HEAD
    and it seems - as far as i understand what i read - that you're
    still right; and, furthermore, that fsync() does everything
    anyway. (But here an idiot is talking about *very* complicated
    stuff.)

    I've also "search"ed for the called filemap_write_and_wait_range()
    and found the commit message for
    2daea67e966dc0c42067ebea015ddac6834cef88 as part of that;
    very interesting in respect to our issue here.

    I will wait before i update the patch though, just in case some
    experienced NetBSD or AIX user posts some message. For OpenBSD
    i think i can confirm that fsync(2) alone is enough after taking
    a (shallow, all shallow) look into kernel/vfs_syscalls.c and
    ufs/ffs/{ffs_softdep.c,softdep.h}.

    @neologix
    Copy link
    Mannequin

    neologix mannequin commented Apr 21, 2011

    and it seems - as far as i understand what i read - that you're
    still right; and, furthermore, that fsync() does everything
    anyway.  (But here an idiot is talking about *very* complicated
    stuff.)

    I just double-checked, and indeed, fsync does flush the disk cache
    when barriers are enabled on several FS, while sync_file_range does
    not. So sync_file_range should definitely not be used on Linux.

    @sdaoden
    Copy link
    Mannequin Author

    sdaoden mannequin commented Apr 25, 2011

    I'll attach 11877.4.diff:

    • Docu change (i hope to be better)
    • Kept NetBSD after looking into
      http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/kern/vfs_syscalls.c?rev=1.423&content-type=text/x-cvsweb-markup&only_with_tag=MAIN
      because fsync_range() with FDISKSYNC set causes a FSYNC_CACHE
      flag to be passed through to VOP_FSYNC() in addition to
      FSYNC_WAIT, which is what plain fsync() passes exclusively.
      (I have not furtherly discovered that. FDISKSYNC was introduced
      2005. I believe existence of such a flag will sooner or later
      force somebody to make a difference in wether it is set or not.)
    • Drop of AIX. According to the systems manual (link in
      msg134213) there does not seem to be a difference in between
      fsync() and fsync_range()
    • Drop of Linux sync_file_range() (see messages above).
      By the way, does Linux still require

    @neologix
    Copy link
    Mannequin

    neologix mannequin commented May 7, 2011

    I'll attach 11877.4.diff

    A couple comments:

    static PyObject *
    posix_fsync(PyObject *self, PyObject *args, PyObject *kwargs)
    {
        PyObject *retval = NULL;
        auto PyObject *fdobj;
        auto int full_fsync = 1;

    Why are you using the "auto" storage class specifier? I know that "explicit is better than implicit", but I really can't see a good reason for using it here (and anywhere, see http://c-faq.com/decl/auto.html).

    # ifdef __APPLE__
        res = fcntl(fd, F_FULLFSYNC);
    # endif
    # ifdef __NetBSD__
        res = fsync_range(fd, FFILESYNC|FDISKSYNC, 0, 0);
    # endif

    Since __APPLE__ and __NetBSD__ are exclusive, you could use something like
    # if defined(APPLE)
    res = fcntl(fd, F_FULLFSYNC);
    # elif defined(NetBSD)
    res = fsync_range(fd, FFILESYNC|FDISKSYNC, 0, 0);
    # endif

    Do you really need to use goto for such a simple code?

    @sdaoden
    Copy link
    Mannequin Author

    sdaoden mannequin commented May 7, 2011

    On Sat, 7 May 2011 14:20:51 +0200, Charles-François Natali wrote:

    I really can't see a good reason for using it here (and
    anywhere, see http://c-faq.com/decl/auto.html).

    You're right.

    Why are you using the "auto" storage class specifier? I know
    that "explicit is better than implicit"

    Yup. I'm doing what is happening for real in (x86) assembler.
    Thus "auto" means (at a glance!) that this one will need space on
    the stack.
    (Conversely i'm not using "register" because who knows if that is
    really true? ;))

    Do you really need to use goto for such a simple code?

    Yup. Ok, this is more complicated. The reason is that my funs
    have exactly one entry point and exactly one place where they're
    left. This is because we here do manual instrumentalisation as in

    ret
    fun(args)
    {
    locals
    s_NYD_ENTER;

    assertions
    
    ...
    

    jleave:
    s_NYD_LEAVE;
    return;

    [maybe only, if large: possibly "predict-false" code blocks]

    [possible error jumps here
    goto jleave;]
    }

    We're prowd of that. N(ot)Y(et)D(ead) is actually pretty cool,
    i've used debuggers exactly 5 times in the past about ten years!
    I don't even know exactly *how to use debuggers*. 8---} (NYD can
    do backtracing, or, with NDEBUG and optional, profiling.)
    A really good optimizing compiler will produce the very same code!
    And i love nasm, but it's pretty non-portable.
    But C is also nice.

    But of course i can change this (in C) to simply use return, this
    is easy here, no resources to be freed.

    Thanks for looking at this, by the way. :)

    @sdaoden
    Copy link
    Mannequin Author

    sdaoden mannequin commented May 7, 2011

    On Sat, 7 May 2011 14:20:51 +0200, Charles-François Natali wrote:

    ifdef __APPLE__

    res = fcntl(fd, F_FULLFSYNC);
    

    endif

    ifdef __NetBSD__

    res = fsync_range(fd, FFILESYNC|FDISKSYNC, 0, 0);
    

    endif

    Since __APPLE__ and __NetBSD__ are exclusive, you could use something like

    if defined(APPLE)

    res = fcntl(fd, F_FULLFSYNC);
    

    elif defined(NetBSD)

    res = fsync_range(fd, FFILESYNC|FDISKSYNC, 0, 0);
    

    endif

    Yes, you're right, i'll update the patch accordingly.

    �--
    Steffen, sdaoden(*)(gmail.com)

    @sdaoden
    Copy link
    Mannequin Author

    sdaoden mannequin commented May 7, 2011

    11877.5.diff incorporates all changes suggested by
    Charles-François except for the 'auto' keyword, which is extremely
    important and which would need to be invented if it would not
    yet exist.

    I'm dropping the old stuff. And i think this is the final version
    of the patch. I've changed the default argument to 'True' as
    i really think it's better to be on the safe side here. (The
    french are better off doing some gracy and dangerous sports to
    discover edges of life!) I'm still of the opinion that this
    should be completely hidden, but since it's completely transparent
    wether a Python function gets yet another named argument or not...

    So, thanks to Ronald, i detected that also NetBSD introduced
    a FDISKSYNC flag in 2005 and that you really need fsync_range()
    there (at least by definition)! How could they do that? But i'm
    also happy to see that all other systems try hard to achieve
    security transparently and by default and unless i missed
    something once again.

    @ronaldoussoren
    Copy link
    Contributor

    Steffen, I don't understand your comment about "auto". Declaring variables as "auto" is not necessary in C code and not used anywhere else in Python's source code.

    @sdaoden
    Copy link
    Mannequin Author

    sdaoden mannequin commented May 9, 2011

    Ronald Oussoren wrote (2011-05-08 10:33+0200):

    Steffen, I don't understand your comment about "auto". Declaring
    variables as "auto" is not necessary in C code and not used
    anywhere else in Python's source code.

    Well, as long as i can keep my underwear all is fine.
    (I also looked at Google translate because i first wanted to start
    the reply with "croak .. pip .. twist .. wrench .. groan .. ugh".)

    @neologix
    Copy link
    Mannequin

    neologix mannequin commented May 9, 2011

    Steffen, you changed the default to doing a "full sync" in your last patch: while I was favoring that initially, I now agree with Ronald and Antoine, and think that we shouldn't change the default behaviour. The reason being that Apple and NetBSD folks should know what they're doing, and that there might be some subtle side effects (see for example my comment on sync_file_range on Linux). Also, given how many bugs you seem to encouter on OS-X, it's probably better to stick to a syscall known to be safe instead of silently replacing it by a maybe-not-so-tested syscall. Finally, depending on the workload, it could have a significant performance impact. People requiring write durability will probably manage to find this full_sync parameter.

    @sdaoden
    Copy link
    Mannequin Author

    sdaoden mannequin commented May 10, 2011

    I don't agree with you and i don't believe it is implemented like
    that. But it seems i am the only one on this issue who sees it
    like that. Thus i apply 11877.6.diff.

    Declaring variables as "auto" is not necessary in C code and not
    used anywhere else in Python's source code

    Changed.

    Steffen, you changed the default to doing a "full sync" in your
    last patch

    Changed all through.

    The reason being that Apple and NetBSD folks should know what
    they're doing [.]
    People requiring write durability will probably manage to find
    this full_sync parameter.

    This sounds logical. I've changed the doc in os.rst so that it
    includes a note on *which* operating systems actually do something
    dependend on that argument.
    I should have done that from the very beginning.

    Finally, depending on the workload, it could have a significant
    performance impact.

    :-)

    @sdaoden
    Copy link
    Mannequin Author

    sdaoden mannequin commented May 11, 2011

    Ouch, ouch, ouch!!
    I'll have to send 11877.7.diff which extends 11877.6.diff.
    This is necessary because using fcntl(2) with F_FULLFSYNC may fail
    with ENOTTY (inapprobiate ioctl for device) in situations where
    a normal fsync(2) succeeds (e.g. STDOUT_FILENO).
    By the way - i have no idea of Redmoondian Horror at all
    (except for http://msdn.microsoft.com/en-us/sync/bb887623.aspx).

    Dropping .5 and .6 - and sorry for the noise.
    Good night, Europe.

    @sdaoden
    Copy link
    Mannequin Author

    sdaoden mannequin commented May 12, 2011

    Just adding more notes on that by reactivating one of haypo's
    links from bpo-8604. (And: maybe some Linux documentation should be
    updated?) From Theodore Ts'o,
    http://www.linuxfoundation.org/news-media/blogs/browse/2009/03/don’t-fear-fsync:

    As the Eat My Data presentation points out very clearly, the only
    safe way according that POSIX allows for requesting data written
    to a particular file descriptor be safely stored on stable storage
    is via the fsync() call.  Linux’s close(2) man page makes this
    point very clearly:
    
    A successful close does not guarantee that the data has been
    successfully saved to disk, as the kernel defers writes. It is not
    common for a file system to flush the buffers when the stream is
    closed. If you need to be sure that the data is physically stored
    use fsync(2).
    
    Why don’t application programmers follow these sage words?  These
    three reasons are most often given as excuses:
    
    - (Perceived) performance problems with fsync()
    - The application only needs atomicity, but not durability
    - The fsync() causing the hard drive to spin up unnecessarily in
      laptop_mode
    

    (Don't ask me why i'm adding this note though.
    I should have searched for it once i've opened that issue?
    Bah!!! Ts'o did not write that article for me. He'd better hacked.)

    @neologix
    Copy link
    Mannequin

    neologix mannequin commented May 12, 2011

    Calling fsync on a file descriptor referring to a tty doesn't make much sense.
    On Linux, this fails with EINVAL:
    $ python -c 'import os; os.fsync(1)'
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
    OSError: [Errno 22] Invalid argument

    So if the full sync fails on ttys, it shouldn't be a problem: since
    the default performs a classical fsync, this won't break compatibility
    with existing code anyway.
    So I think you should stick with the previous version (well, if the
    full sync fails on other FDs, then it's another story, but in that
    case it should just be dropped altogether if it's not reliable...).

    By the way, it's "appropriate", not "approbiate". You made the same
    typo in your patch.

    @sdaoden
    Copy link
    Mannequin Author

    sdaoden mannequin commented May 12, 2011

    [.]

    OSError: [Errno 22] Invalid argument

    Sorry, i didn't know that. Mac OS X (2.5 and 2.6 Apple shipped):

    21:43 ~/tmp $ python2.5 -c 'import os; os.fsync(1)'; echo $?
    0
    21:43 ~/tmp $ python2.6 -c 'import os; os.fsync(1)'; echo $?
    0
    21:43 ~/tmp $ python2.7 -c 'import os; os.fsync(1)'; echo $?
    0
    21:43 ~/tmp $ python3 -c 'import os; os.fsync(1)'; echo $?
    0

    @sdaoden
    Copy link
    Mannequin Author

    sdaoden mannequin commented May 12, 2011

    So I think you should stick with the previous version (well, if the
    full sync fails on other FDs, then it's another story, but in that
    case it should just be dropped altogether if it's not reliable...).

    Strong stuff.
    *This* is the version which should have been implemented from the
    beginning, but Apple states

    F_FULLFSYNC Does the same thing as fsync(2) then asks the drive to
    flush all buffered data to the permanent storage
    device (arg is ignored). This is currently
    implemented on HFS, MS-DOS (FAT), and Universal Disk
    Format (UDF) file systems.

    and i thought

    • fsync (maybe move buffers to Queue; do reorder Queue as approbiate)
    • do call fsys impl. to .. whatever
      That's why i had a version of the patch which did 'fsync();fcntl();'
      because it would have been an additional syscall but the fsync()
      part would possibly have been essentially "skipped over" ..unless..
      Linux RC scripts had 'sync && sync && sync' but it does not seem
      to be necessary any more (was it ever - i don't know).

    But who knows if that fcntl will fail on some non-noted fsys?
    I think it's better to be on the safe side.
    Quoting you, Charles-François

    People requiring write durability will probably manage to find
    this full_sync parameter
    and if they do they thus really strive for data integrity, so call
    fsync() as a fallback for the security which Apple provides.
    Also: we cannot let os.fsync() fail with ENOTTY!?

    By the way, it's "appropriate", not "approbiate". You made the same
    typo in your patch.

    8~I That was not a typo. Thanks.
    I'll changed that.

    @neologix
    Copy link
    Mannequin

    neologix mannequin commented May 12, 2011

    and if they do they thus really strive for data integrity, so call
    fsync() as a fallback for the security which Apple provides.

    Why?
    If I ask a full sync and it fails, I'd rather have an error returned so that I can take the appropriate decision (abort, roll-back, try a standard fsync) rather than have Python silently replace it by an fsync.

    Also: we cannot let os.fsync() fail with ENOTTY!?

    Why not, since that's what the kernel returns?
    Once again, since the default behaviour doesn't change, this won't break any existing application.

    @sdaoden
    Copy link
    Mannequin Author

    sdaoden mannequin commented May 15, 2011

    Finally, depending on the workload, it could have a significant
    performance impact.

    Oh yes (first replaces os.fsync() with an in-python wrapper):

    18:12 ~/tmp $ ll mail
    ls: mail: No such file or directory
    18:12 ~/tmp $ ll X-MAIL
    312 -rw-r----- 1 steffen staff 315963 15 May 17:49 X-MAIL
    18:12 ~/tmp $ time s-postman.py --folder=~/tmp/mail --dispatch=X-MAIL
    Dispatched 37 tickets to 4 targets.

    real 0m4.638s
    user 0m0.974s
    sys 0m0.160s
    18:13 ~/tmp $ rm -rf mail
    18:13 ~/tmp $ time s-postman.py --folder=~/tmp/mail --dispatch=X-MAIL
    Dispatched 37 tickets to 4 targets.

    real 0m1.228s
    user 0m0.976s
    sys 0m0.122s

    (I'm using the first one.)

    @neologix
    Copy link
    Mannequin

    neologix mannequin commented May 15, 2011

    (I'm not sure Rietveld sent the message so I post it here, sorry in case of duplicate).

    Steffen, I've made a quick review of your patch, in case you're interested.
    I think that this functionality can be really useful to some people, and it'd be
    nice if your patch could stabilize somewhat so that committers can review it
    properly and eventually merge it.

    Concerning your benchmark:
    I don't know exactly what you're measuring, but when performing I/O-related
    benchmarks, it's always a good idea to run each test several times in a row, or
    flush the page/buffer cache before each run: the reason is that the the second
    run benefits from the page/buffer cache, which often speeds things up
    dramatically.
    Example:

    # time find /lib -type f -exec cat {} \; > /dev/null
    real 0m20.455s
    user 0m8.145s
    sys 0m5.256s

    # time find /lib -type f -exec cat {} \; > /dev/null
    real 0m6.827s
    user 0m8.477s
    sys 0m4.804s

    # echo 3 > /proc/sys/vm/drop_caches

    # time find /lib -type f -exec cat {} \; > /dev/null
    real 0m19.954s
    user 0m8.069s
    sys 0m5.364s

    @sdaoden
    Copy link
    Mannequin Author

    sdaoden mannequin commented May 17, 2011

    Thank you, thank you, thank you.
    I'm a bit irritated that a french man treats a "wet-her" as a typo!
    What if *i* like it?? In fact it is a fantastic physical backing
    store. Unbeatable.

    Well and about dropping the fsync() in case the fcntl() fails with
    ENOTTY. This is "<Esc>2dd", which shouldn't hurt a committer.
    I'm convinced that full_fsync=False is optional and false by
    default, but i don't trust Apple. I've seen a reference to an
    "atomic file" somewhere on bugs.python.org and that does fsync()
    first followed by fcntl() if FULLFSYNC is available. Thus, if
    someone knows about that, she may do so, but otherwise i would
    guess he doesn't, and in that case i would not expect ENOTTY from
    an fsync() - still i want a full flush!
    This is what NetBSD describes:

    NOTES
    For optimal efficiency, the fsync_range() call requires that
    the file system containing the file referenced by fd support
    partial synchronization of file data. For file systems which
    do not support partial synchronization, the entire file will
    be synchronized and the call will be the equivalent of calling
    fsync().

    But Apple is *soooo*speeeeciaaaal* again. Happy birthday.

    @sdaoden
    Copy link
    Mannequin Author

    sdaoden mannequin commented May 17, 2011

    I've dropped wet-her!
    I hope now you're satisfied!!!!!
    So the buffer cache is all which remains hot.
    How deserted!

    And you could also add a test (I guess that just calling fsync
    with full_sync=True on a valid FD would be enough.

    I was able to add two tests as an extension to what is yet tested
    about os.fsync(), but that uses an invalid fd.
    (At least it enters the conditional and fails as expected.)

    I'm not sure static is necessary, I'd rather make it const.

    Yes..

    This code is correct as it is, see other extension modules in
    the stdlib for other examples of this pattern

    ..but i've used copy+paste here.

    And you could also add a test (I guess that just calling fsync
    with full_sync=True on a valid FD would be enough.

    The alternative would be that full_sync

    Ok, i've renamed full_fsync to full_sync.

    @pitrou
    Copy link
    Member

    pitrou commented May 17, 2011

    I've dropped wet-her!
    I hope now you're satisfied!!!!!
    So the buffer cache is all which remains hot.
    How deserted!

    I'm not sure I'm always understanding your messages well (I'm not a
    native English speaker), but I don't think this kind of joke is
    appropriate for the bug tracker.
    Thank you.

    @vstinner
    Copy link
    Member

    I don't like the API because it gives a different behaviour depending on the OS and I don't see how to check that the function does really a full sync.

    I would prefer a new option os.fullsync() function which is like your os.fsync(fd, full_sync=False), except that the function doesn't exist if the OS doesn't implement it.

    @sdaoden
    Copy link
    Mannequin Author

    sdaoden mannequin commented May 18, 2011

    Excusing myself seems to be the only "probates Mittel".
    @antoine Pitrou: It was a real shame to read your mail.
    (It's sometimes so loud that "i don't even hear what i write".)

    @vstinner
    Copy link
    Member

    See also issue bpo-12102.

    @sdaoden
    Copy link
    Mannequin Author

    sdaoden mannequin commented May 21, 2011

    diff --git a/Doc/library/os.rst b/Doc/library/os.rst
    --- a/Doc/library/os.rst
    +++ b/Doc/library/os.rst
    @@ -810,6 +810,35 @@
    Availability: Unix, and Windows.

    +.. function:: fullfsync(fd)
    +
    + The POSIX standart requires that :c:func:`fsync` must transfer the buffered
    + data to the storage device, not that the data is actually written by the
    + device itself. It explicitely leaves it up to operating system implementors
    + whether users are given stronger guarantees on data integrity or not. Some
    + systems also offer special functions which overtake the part of making such
    + stronger guarantees, i.e., Mac OS X and NetBSD.
    +
    + This non-standart function is *optionally* made available to access such
    + special functionality when feasible. It will force write of file buffers to
    + disk and the flush of disk caches of the file given by file descriptor *fd*.
    + To strive for best-possible data integrity, the following can be done::
    +
    + # Force writeout of local buffer modifications
    + f.flush()
    + # Then synchronize the changes to physical backing store
    + if hasattr(os, 'fullfsync'):
    + os.fullfsync(f.fileno())
    + elif hasattr(os, 'fsync'):
    + os.fsync(f.fileno())
    +
    + ..note::
    + Calling this function may take a long time, since it may block
    + until the disk reports that the transfer has been completed.
    +
    + Availability: Unix.
    +
    +
    .. function:: ftruncate(fd, length)

    Truncate the file corresponding to file descriptor \*fd*, so that it is at most
    

    diff --git a/Lib/test/test_os.py b/Lib/test/test_os.py
    --- a/Lib/test/test_os.py
    +++ b/Lib/test/test_os.py
    @@ -835,12 +835,12 @@

     class TestInvalidFD(unittest.TestCase):
         singles = ["fchdir", "dup", "fdopen", "fdatasync", "fstat",
    -               "fstatvfs", "fsync", "tcgetpgrp", "ttyname"]
    +               "fstatvfs", "fsync", "fullfsync", "tcgetpgrp", "ttyname"]
         #singles.append("close")
    -    #We omit close because it doesn'r raise an exception on some platforms
    +    # We omit close because it doesn't raise an exception on some platforms
         def get_single(f):
             def helper(self):
    -            if  hasattr(os, f):
    +            if hasattr(os, f):
                     self.check(getattr(os, f))
             return helper
         for f in singles:
    diff --git a/Modules/posixmodule.c b/Modules/posixmodule.c
    --- a/Modules/posixmodule.c
    +++ b/Modules/posixmodule.c
    @@ -174,6 +174,11 @@
     #endif /* ! __IBMC__ */
     #ifndef _MSC_VER
    +  /* os.fullfsync()? */
    +# if (defined HAVE_FSYNC && ((defined __APPLE__ && defined F_FULLFSYNC) || \
    +                             (defined __NetBSD__ && defined FDISKSYNC)))
    +#  define PROVIDE_FULLFSYNC
    +# endif
     #if defined(__sgi)&&_COMPILER_VERSION>=700
     /* declare ctermid_r if compiling with MIPSPro 7.x in ANSI C mode
    @@ -2129,6 +2134,41 @@
     {
         return posix_fildes(fdobj, fsync);
     }
    +
    +# ifdef PROVIDE_FULLFSYNC
    +PyDoc_STRVAR(fullfsync__doc__,
    +"fullfsync(fd)\n\n"
    +"force write of file buffers to disk, and the flush of disk caches\n"
    +"of the file given by file descriptor fd.");
    +
    +static PyObject *
    +fullfsync(PyObject *self, PyObject *fdobj)
    +{
    +    /* See issue 11877 discussion */
    +    int res, fd = PyObject_AsFileDescriptor(fdobj);
    +    if (fd < 0)
    +        return NULL;
    +    if (!_PyVerify_fd(fd))
    +        return posix_error();
    +
    +    Py_BEGIN_ALLOW_THREADS
    +#  if defined __APPLE__
    +    /* F_FULLFSYNC is not supported for all types of FDs/FSYSs;
    +     * be on the safe side and test for inappropriate ioctl errors */
    +    res = fcntl(fd, F_FULLFSYNC);
    +    if (res < 0 && errno == ENOTTY)
    +        res = fsync(fd);
    +#  elif defined __NetBSD__
    +    res = fsync_range(fd, FFILESYNC | FDISKSYNC, 0, 0);
    +#  endif
    +    Py_END_ALLOW_THREADS
    +
    +    if (res < 0)
    +        return posix_error();
    +    Py_INCREF(Py_None);
    +    return Py_None;
    +}
    +# endif /* PROVIDE_FULLFSYNC */
     #endif /* HAVE_FSYNC */
    
     #ifdef HAVE_SYNC
    @@ -9473,6 +9513,9 @@
     #endif
     #ifdef HAVE_FSYNC
         {"fsync",       posix_fsync, METH_O, posix_fsync__doc__},
    +# ifdef PROVIDE_FULLFSYNC
    +    {"fullfsync",   fullfsync, METH_O, fullfsync__doc__},
    +# endif
     #endif
     #ifdef HAVE_SYNC
         {"sync",        posix_sync, METH_NOARGS, posix_sync__doc__},

    @sdaoden
    Copy link
    Mannequin Author

    sdaoden mannequin commented May 21, 2011

    (This was an attachment to an empty mail message.)

    @neologix
    Copy link
    Mannequin

    neologix mannequin commented Jul 18, 2011

    Trying to revive this issue, here's a comment I left on Rietveld:

    """

    I don't agree, the documentation states that full_sync will cause a flush to
    stable storage where supported and a plain fsync where it isn't. This code
    does
    just that.

    I think I wasn't clear enough, so let me rephrase: the call to fcntl is guarded
    by #ifdef, so it is only called on platforms where it is supported, otherwise
    fsync() is called. What I was referring to is the fact that an fsync is tried if
    the call to fcntl fails with ENOTTY: if I ask my operating system to flush disk
    buffers and it can't, then I'd rather have an error returned, so that I can take
    the right decision (abort, roll-back, try a classical fsync). IMHO, silently
    masking an error by falling back to another syscall is wrong.
    """

    Anyway, I think this patch can be useful (see for example issue bpo-8604 as a typical use case). I personally don't care since I use Linux, but OS-X and *BSD folks might find some interest in this.

    @sdaoden
    Copy link
    Mannequin Author

    sdaoden mannequin commented Jul 19, 2011

    Here is something unsorted and loose:

    • @neologix:
      One could argue that something had happened before the fsync(2),
      so that code which blindly did so is too dumb to do any right
      decision anyway. Even PEP-3151 won't help.

    • I favour haypos fullsync() approach, because that could probably
      make it into it. Yet Python doesn't offer any possibility to
      access NetBSD DISKSYNC stuff sofar.

    • Currently the programmer must be aware of any platform specific
      problems. I, for example, am not aware of Windows. How can
      i give any guarantee to users which (will) use my S-Postman on
      Windows? I need to put trust in turn into the Framework i am
      using - Python. And that makes me feel pretty breathless.

    • Fortunately Python is dynamic, so that one simply can replace
      os.fsync(). Works once only though (think signal handlers :=).

      + That is indeed the solution i'm using for my S-Postman,
      because *only* like this i can actually make Python's
      mailbox.py module reliable on Mac OS X! I can't give any
      guarantee for NetBSD, though i document it!

      + Aaarrg! I'm a liar!! I lie about - data integrity!!!

    --Steffen
    Ciao, sdaoden(*)(gmail.com)
    () ascii ribbon campaign - against html e-mail
    /\ www.asciiribbon.org - against proprietary attachments

    @neologix
    Copy link
    Mannequin

    neologix mannequin commented Jul 22, 2011

    One could argue that something had happened before the fsync(2),
    so that code which blindly did so is too dumb to do any right
    decision anyway. Even PEP-3151 won't help.

    I don't understand. If the syscall supposed to flush the disk's buffer
    cache fails - be it fcntl() or sync_file_range() - I think the error
    should be propagated, not silently ignored and replaced with another
    syscall which doesn't have the same semantic. That's all.

    • I favour haypos fullsync() approach, because that could probably
      make it into it. Yet Python doesn't offer any possibility to
      access NetBSD DISKSYNC stuff sofar.

    Are you willing to update your patch accordingly?

    • Currently the programmer must be aware of any platform specific
      problems. I, for example, am not aware of Windows. How can
      i give any guarantee to users which (will) use my S-Postman on
      Windows? I need to put trust in turn into the Framework i am
      using - Python. And that makes me feel pretty breathless.

    I agree.

    • Fortunately Python is dynamic, so that one simply can replace
      os.fsync(). Works once only though (think signal handlers :=).

    Yes, but since it's a fairly reasonable and useful feature, I think it
    could be merged.

    • Aaarrg! I'm a liar!! I lie about - data integrity!!!

    Well, actually, some hard disks lie about this too :-)

    @sdaoden
    Copy link
    Mannequin Author

    sdaoden mannequin commented Jul 23, 2011

    > Even PEP-3151 won't help.

    I don't understand. If the syscall supposed to flush the disk's buffer
    cache fails - be it fcntl() or sync_file_range() - I think the error
    should be propagated, not silently ignored and replaced with another
    syscall which doesn't have the same semantic. That's all.

    I'm with you theoretically - of course errors should be propagated
    to users so that they can act accordingly.
    And this is exactly what the patches do, *unless* it is an error
    which is not produced by the native fsync(2) call:

    -- >8 --
    ?0%0[steffen@sherwood tmp]$ cat t.c 
    #include <errno.h>
    #include <fcntl.h>
    #include <stdio.h>
    #include <string.h>
    #include <unistd.h>
    int main(void) {
    	int r  = fcntl(2, F_FULLFSYNC);
    	fprintf(stderr, "1. %d: %d, %s\n", r, errno, strerror(errno));
        errno = 0;
    	r = fsync(2);
    	fprintf(stderr, "2. %d: %d, %s\n", r, errno, strerror(errno));
        return 0;
    }
    ?0%0[steffen@sherwood tmp]$ gcc -o t t.c && ./t
    1. -1: 25, Inappropriate ioctl for device
    2. 0: 0, Unknown error: 0
    ?0%0[steffen@sherwood tmp]$ grep -F 25 /usr/include/sys/errno.h 
    #define ENOTTY      25      /* Inappropriate ioctl for device */
    -- >8 --

    So in fact the patches do what is necessary to make the changed
    version act just as the plain systemcall.

    > - I favour haypos fullsync() approach
    Are you willing to update your patch accordingly?

    Both patches still apply onto the tip of friday noon:
    http://bugs.python.org/file22016/11877.9.diff,
    http://bugs.python.org/file22046/11877-standalone.1.diff.

    Again: i personally would favour os.fsync(fd, fullsync=True), because
    that is the only way to put reliability onto unaware facilities
    unaware (e.g. my S-Postman replaces os.fsync() with a special function
    so that reliability is injected in- and onto Python's mailbox.py,
    which calls plain os.fsync()), but i've been convinced that this is
    impossible to do. It seems to be impossible to change os.fsync()
    at all, because it has a standartized function prototype.

    So what do you mean? Shall i rewrite 11877-standalone.1.diff to
    always offer fullsync() whenever there is fsync()? This sounds to
    be a useful change, because testing hasattr() of the one would
    imply availability of the other.

    > + Aaarrg! I'm a liar!! I lie about - data integrity!!!
    Well, actually, some hard disks lie about this too :-)

    Yeah. But hey:
    "I feel save in New York City. I feel save in New York City."
    Nice weekend - and may the juice be with you!

    --Steffen
    Ciao, sdaoden(*)(gmail.com)
    ASCII ribbon campaign ( ) More nuclear fission plants
    against HTML e-mail X can serve more coloured
    and proprietary attachments / \ and sounding animations

    @sdaoden
    Copy link
    Mannequin Author

    sdaoden mannequin commented Jul 25, 2011

    Are you willing to update your patch accordingly?

    I'm a vain rooster! I've used fullfsync() in 11877-standalone.1.diff!
    Sorry for that, haypo. :-/

    11877.fullsync-1.diff uses fullsync() and will instead always be
    provided when fsync() is available, to which it will fall back if
    no special operating system functionality is available.

    I really think this is the cleanest solution, because like this
    a user can state "i want the strongest guarantees available on
    data integrity", and Python does just that.

    --Steffen
    Ciao, sdaoden(*)(gmail.com)
    ASCII ribbon campaign ( ) More nuclear fission plants
    against HTML e-mail X can serve more coloured
    and proprietary attachments / \ and sounding animations

    @vstinner
    Copy link
    Member

    What is the status of this issue? This issue is interesting in the implementation of bpo-8604 (add shutil.atomic_write()).

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants