classification
Title: Always rewrite single-file mailboxes in-place
Type: enhancement Stage: patch review
Components: email Versions: Python 3.4
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: barry, petri.lehtinen, r.david.murray
Priority: normal Keywords: needs review, patch

Created on 2012-06-21 10:45 by petri.lehtinen, last changed 2012-09-09 10:51 by petri.lehtinen.

Files
File name Uploaded Description Edit
issue15122.patch petri.lehtinen, 2012-09-09 10:51 review
Messages (10)
msg163322 - (view) Author: Petri Lehtinen (petri.lehtinen) * (Python committer) Date: 2012-06-21 10:45
This seems like a common feature request. Many people suffer from the fact that upon flush, the contents of single-file mailboxes are written into a new file which is then renamed over the old file.

For example: #1599254, #5346, #7359, #7360, #9559, 

The original design rationale was probably to prepare for crashes. When changes are made like this, a power loss, other sytem crash, or even a bug in the mailbox.py code in the middle of writing the mailbox, cannot destroy all the data in the mailbox file.

We could add a flag to the constructors of all single-file mailboxes that changes this behavior to in-place rewriting. This would of course need accompanying documentation that warns about that the same safety guarantees don't apply with this flag.
msg163324 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2012-06-21 11:00
It would be nice to do some research on what MUAs that support mbox format do here.  (eg: mutt, pine if it still exists, etc).
msg163326 - (view) Author: Petri Lehtinen (petri.lehtinen) * (Python committer) Date: 2012-06-21 11:29
I actually already researched mutt 1.5.21 (on Ubuntu), and it seems to overwrite the file in-place. At least the inode number doesn't change when I append and/or delete files from an mbox file and save it.
msg163327 - (view) Author: Petri Lehtinen (petri.lehtinen) * (Python committer) Date: 2012-06-21 11:39
Now I also tested rmail (the Emacs email client). It seems to write-and-rename, at least the inode number changes. It seems to even use the mboxo format nowadays (used to use Babyl).
msg163328 - (view) Author: Petri Lehtinen (petri.lehtinen) * (Python committer) Date: 2012-06-21 11:46
Alpine (the successor of pine) also seems to overwrite in-place, just like mutt.
msg163342 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2012-06-21 15:37
Any chance you'd be willing to look in the source code and see if they do any locking mailbox doesn't?  The evidence you've already gathered is probably enough to justify adding the option, though.
msg163358 - (view) Author: Petri Lehtinen (petri.lehtinen) * (Python committer) Date: 2012-06-21 18:31
I looked at the source code of mutt to see how it rewrites mbox files. It does roughly this:

1. Block some signals.

2. Lock the mbox file (with dotlock, fcntl and flock).

3. Create a temporary file in /tmp.

4. Write messages to the temporary file, from the point where the first change is up to the end of the mbox file. This saves a lot of writing when the first change is near the end of the file.

5. Write changes from the temporary file to the mbox file, modifying the mbox file in-place.

6. Truncate the mbox file to the correct length.

7. Unblock signals.

If writing the changes back to the mbox file fails, the temporary file is copied from /tmp to near the mbox file.
msg164522 - (view) Author: Petri Lehtinen (petri.lehtinen) * (Python committer) Date: 2012-07-02 18:33
This might be the only way to fix #1599254, see http://bugs.python.org/issue1599254#msg30590. If another program is waiting on the fcntl lock (and doesn't use dotlocking), as soon as mailbox.py closes the original file (to replace it with a new file) and releases the lock, the other program writes its message to the now deleted original file.
msg167601 - (view) Author: Petri Lehtinen (petri.lehtinen) * (Python committer) Date: 2012-08-07 04:46
Thinking about this again, I guess the original design rationale was not to prepare for crashes, but for the ease of implementation. It's not generally possible to rewrite the mailbox fully in-place, because the messages are not loaded into memory. If the order of messages changes, for example, a message can be overwritten in the mailbox file, and its contents need to be read afterwards.

Mutt copes with this by writing the changes to a temporary file, and then copying them over to the original file. This is what we should also be doing.
msg170096 - (view) Author: Petri Lehtinen (petri.lehtinen) * (Python committer) Date: 2012-09-09 10:50
Attaching a patch that changes all single-file mailboxes to be rewritten in-place instead of rewriting and renaming. The patch doesn't add an option, because IMO this should be the default behavior.

In addition to solving the locking problems, this makes it significantly faster and less disk space consuming to remove or replace messages in very large mailbox files, because the whole mailbox is not rewritten.

With this patch, the code now writes messages to a temporary file, starting from the first changed message to the end of the mailbox file. After this, the contents of the temporary file are copied back to the mailbox file and the mailbox file is truncated to the correct length.

Care is taken that the process can be restarted if a crash (e.g. KeyboardInterrupt) occurs before starting to copy the changes back to the mailbox file, by altering self._toc and self._pending at the right moment.

I'm not sure whether this would be a bug fix or a new feature. Suggestions?
History
Date User Action Args
2012-09-09 10:51:23petri.lehtinensetfiles: + issue15122.patch
2012-09-09 10:50:34petri.lehtinensetkeywords: + patch, needs review

title: Add an option to always rewrite single-file mailboxes in-place. -> Always rewrite single-file mailboxes in-place
messages: + msg170096
stage: patch review
2012-08-07 04:46:41petri.lehtinensetmessages: + msg167601
2012-07-02 18:33:04petri.lehtinensetmessages: + msg164522
2012-06-26 06:39:00petri.lehtinenlinkissue7360 superseder
2012-06-21 18:31:29petri.lehtinensetmessages: + msg163358
2012-06-21 15:37:17r.david.murraysetmessages: + msg163342
2012-06-21 11:46:33petri.lehtinensetmessages: + msg163328
2012-06-21 11:39:35petri.lehtinensetmessages: + msg163327
2012-06-21 11:29:43petri.lehtinensetmessages: + msg163326
2012-06-21 11:00:14r.david.murraysetmessages: + msg163324
2012-06-21 10:46:16petri.lehtinensetnosy: + barry, r.david.murray
components: + email
2012-06-21 10:45:50petri.lehtinencreate