Title: mailbox.mbox creates new file when adding message to mbox
Type: behavior Stage: resolved
Components: email, Library (Lib) Versions: Python 3.2, Python 3.3, Python 2.7
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: akuchling, barry, chrisisbd, lilydjwg, petri.lehtinen, python-dev, r.david.murray
Priority: normal Keywords: patch

Created on 2010-08-10 16:40 by chrisisbd, last changed 2012-06-29 11:15 by petri.lehtinen. This issue is now closed.

File name Uploaded Description Edit
issue9559.patch petri.lehtinen, 2012-06-26 09:28
Messages (9)
msg113547 - (view) Author: Chris Green (chrisisbd) Date: 2010-08-10 16:40
When you call mailbox.mbox.add() the old mbox file is copied, the new file is modified and then renamed to the name of the'old' mbox file.

This breaks the way that many MUAs detect and manage new mail in an mbox, in particular I discovered this with mutt.  If the python process writing the mbox and mutt are on the same system writing a local file then you get the message "Mailbox was externally modified. Flags may be wrong." from mutt (and various odd things can happen).  If mutt is reading the mbox over NFS then you get a "Stale NFS file handle" error.

This should be strongly noted in the documentation for mailbox.mbox, in addition it would be really nice if there was a mailbox.mbox.append() method which *really* appends the data to the end of the mbox rather than changing it completely.

Most MDAs (all?) do just append new mail to the end of the mbox and I feel that python should really try and do the same.
msg163093 - (view) Author: Petri Lehtinen (petri.lehtinen) * (Python committer) Date: 2012-06-18 08:01
This is actually not true. When calling add(), mbox (and MMDF and Babyl) append the message to the file without rewriting it.

It's the following flush() call that rewrites the whole mailbox contents. I think this could be changed to work correctly by not setting self._pending = True in _singlefileMailbox.add. This way, the file wouldn't be rewritten by flush() if messages are only appended.

OTOH, flush() should still fsync the mailbox file (if we want to ensure that the changes are really written to disk). This would probably require a new flag in addition to self._pending, to indicate that there are unsynced changes.
msg164061 - (view) Author: Petri Lehtinen (petri.lehtinen) * (Python committer) Date: 2012-06-26 09:28
Attached a patch that doesn't rewrite+rename if messages have only been added. In this case, flush() only syncs the mailbox file to make sure all changes have been written to disk.

David & Barry: what do you think about including this on bugfix releases? Could someone depend on the file being rewritten in all situations?
msg164087 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2012-06-26 17:27
Could someone be depending on it?  Sure.  Is that likely enough to block this as a bug fix?  Personally I think not.  Appending to the mailbox when adding messages is, I think, the expected behavior, and always rewriting it is the surprising behavior.

The patch looks good to me, though personally I'd eliminate the extra blank lines.
msg164240 - (view) Author: Roundup Robot (python-dev) Date: 2012-06-28 10:59
New changeset c37cb11b546f by Petri Lehtinen in branch '2.7':
#9559: Append data to single-file mailbox files if messages are only added

New changeset 5f447a005d67 by Petri Lehtinen in branch '3.2':
#9559: Append data to single-file mailbox files if messages are only added

New changeset 0bacbab678ed by Petri Lehtinen in branch 'default':
#9559: Append data to single-file mailbox files if messages are only added
msg164241 - (view) Author: Petri Lehtinen (petri.lehtinen) * (Python committer) Date: 2012-06-28 11:00
Fixed. I removed the extra newlines.
msg164242 - (view) Author: Petri Lehtinen (petri.lehtinen) * (Python committer) Date: 2012-06-28 11:01
See #15122 for always modifying single-file mailboxes in-place.
msg164254 - (view) Author: Petri Lehtinen (petri.lehtinen) * (Python committer) Date: 2012-06-28 13:20
The _pre_mailbox_hook may be called twice, like this:

babyl = mailbox.Babyl('new_file')

This only affects Babyl, that writes the mailbox header in _pre_mailbox_hook. The mailbox is corrupted on disk until flush() is called.
msg164321 - (view) Author: Roundup Robot (python-dev) Date: 2012-06-29 10:55
New changeset 3d7a75e945ee by Petri Lehtinen in branch '2.7':
#9559: Don't call _pre_mailbox_hook more than once

New changeset 7cf5a629fde2 by Petri Lehtinen in branch '3.2':
#9559: Don't call _pre_mailbox_hook more than once

New changeset 5a0ec296b287 by Petri Lehtinen in branch 'default':
#9559: Don't call _pre_mailbox_hook more than once
Date User Action Args
2012-06-29 11:15:52petri.lehtinensetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2012-06-29 10:55:34python-devsetmessages: + msg164321
2012-06-29 05:14:58lilydjwgsetnosy: + lilydjwg
2012-06-28 13:20:00petri.lehtinensetstatus: closed -> open
resolution: fixed -> (no value)
messages: + msg164254
2012-06-28 11:01:08petri.lehtinensetmessages: + msg164242
2012-06-28 11:00:02petri.lehtinensetstatus: open -> closed
resolution: fixed
messages: + msg164241
2012-06-28 10:59:04python-devsetnosy: + python-dev
messages: + msg164240
2012-06-26 17:27:35r.david.murraysetmessages: + msg164087
2012-06-26 09:28:39petri.lehtinensetfiles: + issue9559.patch

components: + email

keywords: + patch
nosy: + barry, r.david.murray
messages: + msg164061
stage: needs patch -> patch review
2012-06-18 08:01:09petri.lehtinensetnosy: + petri.lehtinen
messages: + msg163093
2012-05-06 22:32:35ezio.melottisetstage: needs patch
versions: + Python 3.3, - Python 3.1
2010-08-10 19:01:12r.david.murraysetnosy: + akuchling

versions: + Python 3.1, Python 3.2
2010-08-10 16:40:07chrisisbdcreate