Author neologix
Recipients mrjbq7, neologix, pitrou, sbt
Date 2013-03-27.21:09:55
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <CAH_1eM0EDSccmWdcCRNqXSo1JfjyMkEw+wob25h=+_OULHOV+A@mail.gmail.com>
In-reply-to <515359ED.8000403@gmail.com>
Content
> I meant when there is no memory pressure.

http://lwn.net/Articles/326552/
"""
The kernel page cache contains in-memory copies of data blocks
belonging to files kept in persistent storage. Pages which are written
to by a processor, but not yet written to disk, are accumulated in
cache and are known as "dirty" pages. The amount of dirty memory is
listed in /proc/meminfo. Pages in the cache are flushed to disk after
an interval of 30 seconds. Pdflush is a set of kernel threads which
are responsible for writing the dirty pages to disk, either explicitly
in response to a sync() call, or implicitly in cases when the page
cache runs out of pages, if the pages have been in memory for too
long, or there are too many dirty pages in the page cache (as
specified by /proc/sys/vm/dirty_ratio).
"""

>>> FreeBSD has a MAP_NOSYNC flag which gives Linux behaviour (otherwise
>>> dirty pages are flushed every 30-60).
>>
>> It's the same on Linux, depending on your mount options, data will be
>> committed to disk every 5 seconds or so, when the journal is
>> committed.
>
> Googling suggsests that MAP_SHARED on Linux is equivalent to MAP_SHARED
> | MAP_NOSYNC on FreeBSD.  I don't think it has anything to do with mount
> options.

"""
MAP_NOSYNC        Causes data dirtied via this VM map to be flushed to
                       physical media only when necessary (usually by the
                       pager) rather than gratuitously.
[...]
"""

This just means that it will reduce synchronous writeback, but
writeback will still occur (by what they call the pager).

On Linux, writeback can be done by background kernel threads
(pdflush), or synchrously on behalf of the process.

The "mount option" thing is the following:
if the file system is mounted with data=journal or data=ordered, data
is written to disk before corresponding metadata is committed. And
metadata is written when the journal is committed, by default every 5
seconds:

man mount:
"""
ext3

       commit=nrsec       data={journal|ordered|writeback}
              Specifies the journalling mode for file data.  Metadata
is always journaled.  To use modes other than ordered on the root
filesystem, pass the mode to the kernel
              as boot parameter, e.g.  rootflags=data=journal.

              journal
                     All data is committed into the journal prior to
being written into the main filesystem.

              ordered
                     This is the default mode.  All data is forced
directly out to the main file system prior to its metadata being
committed to the journal.

              writeback
                     Data ordering is not preserved - data may be
written into the main filesystem after its metadata has been committed
to the journal.  This is  rumoured  to
                     be the highest-throughput option.  It guarantees
internal filesystem integrity, however it can allow old data to appear
in files after a crash and journal
                     recovery.

       commit=nrsec
              Sync all data and metadata every nrsec seconds. The
default value is 5 seconds. Zero means default.
"""
> The Linux man page refuses to specify
>
>    MAP_SHARED
>      Share this mapping. Updates to the mapping are visible to other
>      processes that map this file, and are carried through to the
>      underlying file. **The file may not actually be updated until
>      msync(2) or munmap() is called.**

*may*,:just as fsync() is required to make sure data is committed to
disk for a file, msync() is required for a mapping. But data is
committed asynchronously or synchronously depending on different
criterias (ratio of dirty pages, free memory, dirty pages age, etc).

> Can you demonstrate a slowdown with a benchmark?

I could, but I don't have to: a shared memory won't incur any I/O or
copy (except if it is swapped).
A file-backed mmap will incur a *lot* of I/O: really, just try
writting a 1GB file, and you'll see your disk spin, or use cat
/proc/diskstats.
History
Date User Action Args
2013-03-27 21:09:56neologixsetrecipients: + neologix, pitrou, mrjbq7, sbt
2013-03-27 21:09:55neologixlinkissue17560 messages
2013-03-27 21:09:55neologixcreate