Issue 1599254: mailbox: other programs' messages can vanish without trace

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/44251

classification

Title:	mailbox: other programs' messages can vanish without trace
Type:	behavior	Stage:	patch review
Components:	Library (Lib)	Versions:	Python 3.11, Python 3.10, Python 3.9

process

Status:	open	Resolution:
Dependencies:		Superseder:
Assigned To:		Nosy List:	Jim.Jewett, ajaksu2, baikie, barry, iritkatriel, jwilk, loewis, nnorwitz, peter.ll, petri.lehtinen, r.david.murray, terry.reedy
Priority:	normal	Keywords:	patch

Created on 2006-11-19 16:03 by baikie, last changed 2022-04-11 14:56 by admin.

Files
File name	Uploaded	Description	Edit
mailbox-copy-back.diff	baikie, 2006-11-19 16:03	Rewrite single-file mailbox by copying	review
mailbox-test.patch	akuchling, 2006-12-15 13:17	Adds a test showing problem	review
length-checking.diff	akuchling, 2006-12-20 14:46	Add length-checking to single-file mailboxes	review
mailbox-update-toc.diff	baikie, 2006-12-20 19:17	Update _toc on locking
mailbox-test-lock.diff	baikie, 2006-12-20 19:19	Lock/unlock around message count in test case
mailbox-docs.diff	akuchling, 2007-01-05 19:24	Doc patch	review
mailbox-update-toc-fixed.diff	baikie, 2007-01-06 15:24	Return correct key from add()
mailbox-copy-back-53287.diff	baikie, 2007-01-06 15:30	Update to be against current version	review
mailbox-copy-back-new.diff	baikie, 2007-01-06 19:57	Rewrite single-file mailbox by copying	review
mailbox-fcntl-warn.diff	baikie, 2007-01-13 18:32	Warn when flush() may release lock
mailbox-pending-lock.diff	akuchling, 2007-01-17 21:05	Small diff
mailbox-unified-patch.diff	akuchling, 2007-01-17 21:06	Unified patch that fixes everything.	review
mailbox-unified2-test.diff	baikie, 2007-01-18 18:14	New version (test part)	review
mailbox-unified2-module.diff	baikie, 2007-01-18 18:15	New version (module part, semi-broken)	review
mailbox-update-toc-new.diff	baikie, 2007-01-21 22:10	Preserve message keys when rereading
mailbox-update-toc-again.diff	baikie, 2007-04-13 13:45	New version of _update_toc()	review
test_mailbox-reread.diff	baikie, 2007-04-13 13:46	Tests for concurrency/rereading _toc	review
mailbox-babyl-fix.diff	baikie, 2007-04-13 13:47	In Babyl class, write preamble before adding messages to an empty file
mailbox-copy-back-2.7.diff	baikie, 2014-03-23 22:57
mailbox-all-tests-2.7.diff	baikie, 2014-03-23 22:57
mailbox-tests-2.7-part1-for-copy-back.diff	baikie, 2014-03-30 21:50
mailbox-tests-2.7-part2.diff	baikie, 2014-03-30 21:50

Messages (52)
msg30590 - (view)	Author: David Watson (baikie)	Date: 2006-11-19 16:03
The mailbox classes based on _singlefileMailbox (mbox, MMDF, Babyl) implement the flush() method by writing the new mailbox contents into a temporary file which is then renamed over the original. Unfortunately, if another program tries to deliver messages while mailbox.py is working, and uses only fcntl() locking, it will have the old file open and be blocked waiting for the lock to become available. Once mailbox.py has replaced the old file and closed it, making the lock available, the other program will write its messages into the now-deleted "old" file, consigning them to oblivion. I've caused Postfix on Linux to lose mail this way (although I did have to turn off its use of dot-locking to do so). A possible fix is attached. Instead of new_file being renamed, its contents are copied back to the original file. If file.truncate() is available, the mailbox is then truncated to size. Otherwise, if truncation is required, it's truncated to zero length beforehand by reopening self._path with mode wb+. In the latter case, there's a check to see if the mailbox was replaced while we weren't looking, but there's still a race condition. Any alternative ideas? Incidentally, this fixes a problem whereby Postfix wouldn't deliver to the replacement file as it had the execute bit set.
msg30591 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2006-11-19 20:02
Mailbox locking was invented precisely to support this kind of operation. Why do you complain that things break if you deliberately turn off the mechanism preventing breakage? I fail to see a bug here.
msg30592 - (view)	Author: David Watson (baikie)	Date: 2006-11-19 20:44
This is a bug. The point is that the code is subverting the protection of its own fcntl locking. I should have pointed out that Postfix was still using fcntl locking, and that should have been sufficient. (In fact, it was due to its use of fcntl locking that it chose precisely the wrong moment to deliver mail.) Dot-locking does protect against this, but not every program uses it - which is precisely the reason that the code implements fcntl locking in the first place.
msg30593 - (view)	Author: A.M. Kuchling (akuchling) *	Date: 2006-12-12 21:04
I agree with David's analysis; this is in fact a bug. I'll try to look at the patch.
msg30594 - (view)	Author: A.M. Kuchling (akuchling) *	Date: 2006-12-15 13:17
The attached patch adds a test case to test_mailbox.py that demonstrates the problem. No modifications to mailbox.py are needed to show data loss. Now looking at the patch... File Added: mailbox-test.patch
msg30595 - (view)	Author: A.M. Kuchling (akuchling) *	Date: 2006-12-15 14:06
I'm testing the fix using two Python processes running mailbox.py, and my test case fails even with your patch. This is due to another bug, even in the patched version. mbox has a dictionary attribute, _toc, mapping message keys to positions in the file. flush() writes out all the messages in self._toc and constructs a new _toc with the new file offsets. It doesn't re-read the file to see if new messages were added by another process. One fix that seems to work: instead of doing 'self._toc = new_toc' after flush() has done its work, do self._toc = None. The ToC will be regenerated the next time _lookup() is called, causing a re-read of all the contents of the mbox. Inefficient, but I see no way around the necessity for doing this. It's not clear to me that my suggested fix is enough, though. Process #1 opens a mailbox, reads the ToC, and the process does something else for 5 minutes. In the meantime, process #2 adds a file to the mbox. Process #1 then adds a message to the mbox and writes it out; it never notices process #2's change. Maybe the _toc has to be regenerated every time you call lock(), because at this point you know there will be no further updates to the mbox by any other process. Any unlocked usage of _toc should also really be regenerating _toc every time, because you never know if another process has added a message... but that would be really inefficient.
msg30596 - (view)	Author: David Watson (baikie)	Date: 2006-12-16 19:09
Yes, I see what you mean. I had tried multiple flushes, but only inside a single lock/unlock. But this means that in the no-truncate() code path, even this is currently unsafe, as the lock is momentarily released after flushing. I think _toc should be regenerated after every lock(), as with the risk of another process replacing/deleting/rearranging the messages, it isn't valid to carry sequence numbers from one locked period to another anyway, or from unlocked to locked. However, this runs the risk of dangerously breaking code that thinks it is valid to do so, even in the case where the mailbox was not modified (i.e. turning possible failure into certain failure). For instance, if the program removes message 1, then as things stand, the key "1" is no longer used, and removing message 2 will remove the message that followed 1. If _toc is regenerated in between, however (using the current code, so that the messages are renumbered from 0), then the old message 2 becomes message 1, and removing message 2 will therefore remove the wrong message. You'd also have things like pending deletions and replacements (also unsafe generally) being forgotten. So it would take some work to get right, if it's to work at all...
msg30597 - (view)	Author: A.M. Kuchling (akuchling) *	Date: 2006-12-18 17:43
Eep, correct; changing the key IDs would be a big problem for existing code. We could say 'discard all keys' after doing lock() or unlock(), but this is an API change that means the fix couldn't be backported to 2.5-maint. We could make generating the ToC more complicated, preserving key IDs when possible; that may not be too difficult, though the code might be messy. Maybe it's best to just catch this error condition: save the size of the mailbox, updating it in _append_message(), and then make .flush() raise an exception if the mailbox size has unexpectedly changed.
msg30598 - (view)	Author: A.M. Kuchling (akuchling) *	Date: 2006-12-20 14:46
Attaching a patch that adds length checking: before doing a flush() on a single-file mailbox, seek to the end and verify its length is unchanged. It raises an ExternalClashError if the file's length has changed. (Should there be a different exception for this case, perhaps a subclass of ExternalClashError?) I verified that this change works by running a program that added 25 messages, pausing between each one, and then did 'echo "new line" > /tmp/mbox' from a shell while the program was running. I also noticed that the self._lookup() call in self.flush() wasn't necessary, and replaced it by an assertion. I think this change should go on both the trunk and 25-maint branches. File Added: length-checking.diff
msg30599 - (view)	Author: David Watson (baikie)	Date: 2006-12-20 19:17
Yeah, I think that should definitely go in. ExternalClashError or a subclass sounds fine to me (although you could make a whole taxonomy of these errors, really). It would be good to have the code actually keep up with other programs' changes, though; a program might just want to count the messages at first, say, and not make changes until much later. I've been trying out the second option (patch attached, to apply on top of mailbox-copy-back), regenerating _toc on locking, but preserving existing keys. The patch allows existing _generate_toc()s to work unmodified, but means that _toc now holds the entire last known contents of the mailbox file, with the 'pending' (user-visible) mailbox state being held in a new attribute, _user_toc, which is a mapping from keys issued to the program to the keys of _toc (i.e. sequence numbers in the file). When _toc is updated, any new messages that have appeared are given keys in _user_toc that haven't been issued before, and any messages that have disappeared are removed from it. The code basically assumes that messages with the same sequence number are the same message, though, so even if most cases are caught by the length check, programs that make deletions/replacements before locking could still delete the wrong messages. This behaviour could be trapped, though, by raising an exception in lock() if self._pending is set (after all, code like that would be incorrect unless it could be assumed that the mailbox module kept hashes of each message or something). Also attached is a patch to the test case, adding a lock/unlock around the message count to make sure _toc is up-to-date if the parent process finishes first; without it, there are still intermittent failures. File Added: mailbox-update-toc.diff
msg30600 - (view)	Author: David Watson (baikie)	Date: 2006-12-20 19:19
File Added: mailbox-test-lock.diff
msg30601 - (view)	Author: A.M. Kuchling (akuchling) *	Date: 2006-12-20 19:48
Committed length-checking.diff to trunk in rev. 53110.
msg30602 - (view)	Author: A.M. Kuchling (akuchling) *	Date: 2007-01-05 19:24
As a step toward improving matters, I've attached the suggested doc patch (for both 25-maint and trunk). It encourages people to use Maildir :), explicitly states that modifications should be bracketed by lock(), and fixes the examples to match. It does not say that keys are invalidated by doing a flush(), because we're going to try to avoid the necessity for that. File Added: mailbox-docs.diff
msg30603 - (view)	Author: A.M. Kuchling (akuchling) *	Date: 2007-01-05 19:51
Question about mailbox-update-doc: the add() method still returns self._next_key - 1; should this be self._next_user_key - 1? The keys in _user_toc are the ones returned to external users of the mailbox, right? (A good test case would be to initialize _next_key to 0 and _next_user_key to a different value like 123456.) I'm still staring at the patch, trying to convince myself that it will help -- haven't spotted any problems, but this bug is making me nervous...
msg30604 - (view)	Author: David Watson (baikie)	Date: 2007-01-06 15:24
Aack, yes, that should be _next_user_key. Attaching a fixed version. I've been thinking, though: flush() does in fact invalidate the keys on platforms without a file.truncate(), when the fcntl() lock is momentarily released afterwards. It seems hard to avoid this as, perversely, fcntl() locks are supposed to be released automatically on all file descriptors referring to the file whenever the process closes any one of them - even one the lock was never set on. So, code using mailbox.py such platforms could inadvertently be carrying keys across an unlocked period, which is not made safe by the update-toc patch (as it's only meant to avert disasters resulting from doing this and rebuilding the table of contents, assuming that another process hasn't deleted or rearranged messages). File Added: mailbox-update-toc-fixed.diff
msg30605 - (view)	Author: David Watson (baikie)	Date: 2007-01-06 15:30
File Added: mailbox-copy-back-53287.diff
msg30606 - (view)	Author: David Watson (baikie)	Date: 2007-01-06 19:57
Oops, length checking had made the first two lines of this patch redundant; update-toc applies OK with fuzz. File Added: mailbox-copy-back-new.diff
msg30607 - (view)	Author: A.M. Kuchling (akuchling) *	Date: 2007-01-12 17:12
So shall we document flush() as invalidating keys, then?
msg30608 - (view)	Author: A.M. Kuchling (akuchling) *	Date: 2007-01-12 18:41
I realized that making flush() invalidate keys breaks the final example in the docs, which loops over inbox.iterkeys() and removes messages, doing a pack() after each message. Which platforms lack file.truncate()? Windows has it; POSIX has it, so modern Unix variants should all have it. Maybe mailbox should simply raise an exception (or trigger a warning?) if truncate is missing, and we should then assume that flush() has no effect upon keys.
msg30609 - (view)	Author: A.M. Kuchling (akuchling) *	Date: 2007-01-12 19:41
One OS/2 port lacks truncate(), and so does RISCOS.
msg30610 - (view)	Author: David Watson (baikie)	Date: 2007-01-13 18:32
I like the warning idea - it seems appropriate if the problem is relatively rare. How about this? File Added: mailbox-fcntl-warn.diff
msg30611 - (view)	Author: A.M. Kuchling (akuchling) *	Date: 2007-01-15 19:01
Comment from Andrew MacIntyre (os2vacpp is the OS/2 that lacks ftruncate()): ================ I actively support the OS/2 EMX port (sys.platform == "os2emx"; build directory is PC/os2emx). I would like to keep the VAC++ port alive, but the reality is I don't have the resources to do so. The VAC++ port was the subject of discussion about removal of build support support from the source tree for 2.6 - I don't recall there being a definitive outcome, but if someone does delete the PC/os2vacpp directory, I'm not in a position to argue. AMK: (mailbox.py has a separate section of code used when file.truncate() isn't available, and the existence of this section is bedevilling me. It would be convenient if platforms without file.truncate() weren't a factor; then this branch could just be removed. In your opinion, would it hurt OS/2 users of mailbox.py if support for platforms without file.truncate() was removed?) aimacintyre: No. From what documentation I can quickly check, ftruncate() operates on file descriptors rather than FILE pointers. As such I am sure that, if it became an issue, it would not be difficult to write a ftruncate() emulation wrapper for the underlying OS/2 APIs that implement the required functionality.
msg30612 - (view)	Author: Neal Norwitz (nnorwitz) *	Date: 2007-01-17 06:48
Andrew, do you need any help with this?
msg30613 - (view)	Author: A.M. Kuchling (akuchling) *	Date: 2007-01-17 19:56
Committed a modified version of the doc. patch in rev. 53472 (trunk) and rev. 53474 (release25-maint).
msg30614 - (view)	Author: A.M. Kuchling (akuchling) *	Date: 2007-01-17 20:53
mailbox-unified-patch contains David's copy-back-new and fcntl-warn patches, plus the test-mailbox patch and some additional changes to mailbox.py from me. (I'll upload a diff to David's diffs in a minute.) This is the change I'd like to check in. test_mailbox.py now passes, as does the mailbox-break.py script I'm using.
msg30615 - (view)	Author: A.M. Kuchling (akuchling) *	Date: 2007-01-17 21:05
mailbox-pending-lock is the difference between David's copy-back-new + fcntl-warning and my -unified patch, uploaded so that he can comment on the changes. The only change is to make _singleFileMailbox.lock() clear self._toc, forcing a re-read of the mailbox file. If _pending is true, the ToC isn't cleared and a warning is logged. I think this lets existing code run (albeit possibly with a warning if the mailbox is modified before .lock() is called), but fixes the risk of missing changes written by another process. Triggering a new warning is sort of an API change, but IMHO it's still worth backporting; code that triggers this warning is at risk of losing messages or corrupting the mailbox. Clearing the _toc on .lock() is also sort of an API change, but it's necessary to avoid the potential data loss. It may lead to some redundant scanning of mailbox files, but this price is worth paying, I think; people probably don't have 2Gb mbox files (I hope not, anyway!) and no extra read is done if you create the mailbox and immediately lock it before looking anything up. Neal: if you want to discuss this patch or want an explanation of something, feel free to chat with me about it. I'll wait a day or two and see if David spots any problems. If nothing turns up, I'll commit it to both trunk and release25-maint. File Added: mailbox-pending-lock.diff
msg30616 - (view)	Author: A.M. Kuchling (akuchling) *	Date: 2007-01-17 21:06
Add mailbox-unified-patch. File Added: mailbox-unified-patch.diff
msg30617 - (view)	Author: David Watson (baikie)	Date: 2007-01-18 18:14
Unfortunately, there is a problem with clearing _toc: it's basically the one I alluded to in my comment of 2006-12-16. Back then I thought it could be caught by the test you issue the warning for, but the application may instead do its second remove() after the lock() (so that self._pending is not set at lock() time), using the key carried over from before it called unlock(). As before, this would result in the wrong message being deleted. I've added a test case for this (diff attached), and a bug I found in the process whereby flush() wasn't updating the length, which could cause subsequent flushes to fail (I've got a fix for this). These seem to have turned up a third bug in the MH class, but I haven't looked at that yet. File Added: mailbox-unified2-test.diff
msg30618 - (view)	Author: David Watson (baikie)	Date: 2007-01-18 18:15
This version passes the new tests (it fixes the length checking bug, and no longer clears self._toc), but goes back to failing test_concurrent_add. File Added: mailbox-unified2-module.diff
msg30619 - (view)	Author: A.M. Kuchling (akuchling) *	Date: 2007-01-19 15:24
After reflection, I don't think the potential changing actually makes things any worse. _generate() always starts numbering keys with 1, so if a message's key changes because of lock()'s, re-reading, that means someone else has already modified the mailbox. Without the ToC clearing, you're already fated to have a corrupted mailbox because the new mailbox will be written using outdated file offsets. With the ToC clearing, you delete the wrong message. Neither outcome is good, but data is lost either way. The new behaviour is maybe a little bit better in that you're losing a single message but still generating a well-formed mailbox, and not a randomly jumbled mailbox. I suggest applying the patch to clear self._toc, and noting in the documentation that keys might possibly change after doing a lock().
msg30620 - (view)	Author: David Watson (baikie)	Date: 2007-01-20 18:20
Hang on. If a message's key changes after recreating _toc, that does not mean that another process has modified the mailbox. If the application removes a message and then (inadvertently) causes _toc to be regenerated, the keys of all subsequent messages will be decremented by one, due only to the application's own actions. That's what happens in the "broken locking" test case: the program intends to remove message 0, flush, and then remove message 1, but because _toc is regenerated in between, message 1 is renumbered as 0, message 2 is renumbered as 1, and so the program deletes message 2 instead. To clear _toc in such code without attempting to preserve the message keys turns possible data loss (in the case that another process modified the mailbox) into certain data loss. That's what I'm concerned about.
msg30621 - (view)	Author: A.M. Kuchling (akuchling) *	Date: 2007-01-21 03:16
<sigh> I'm starting to lose track of all the variations on the bug. Maybe we should just add more warnings to the documentation about locking the mailbox when modifying it and not try to fix this at all.
msg30622 - (view)	Author: David Watson (baikie)	Date: 2007-01-21 22:10
Hold on, I have a plan. If _toc is only regenerated on locking, or at the end of a flush(), then the only way self._pending can be set at that time is if the application has made modifications before calling lock(). If we make that an exception-raising offence, then we can assume that self._toc is a faithful representation of the last known contents of the file. That means we can preserve the existing message keys on a reread without any of that _user_toc nonsense. Diff attached, to apply on top of mailbox-unified2. It's probably had even less review and testing than the previous version, but it appears to pass all the regression tests and doesn't change any existing semantics. File Added: mailbox-update-toc-new.diff
msg30623 - (view)	Author: A.M. Kuchling (akuchling) *	Date: 2007-01-22 15:46
This would be an API change, and therefore out-of-bounds for 2.5. I suggest giving up on this for 2.5.1 and only fixing it in 2.6. I'll add another warning to the docs, and maybe to the module as well.
msg30624 - (view)	Author: David Watson (baikie)	Date: 2007-01-22 20:24
So what you propose to commit for 2.5 is basically mailbox-unified2 (your mailbox-unified-patch, minus the _toc clearing)?
msg30625 - (view)	Author: A.M. Kuchling (akuchling) *	Date: 2007-01-24 20:48
I've strengthened the warning again. The MH bug in unified2 is straightforward: MH.remove() opens a file object, locks it, closes the file object, and then tries to unlock it. Presumably the MH test case never bothered locking the mailbox before making changes before.
msg30626 - (view)	Author: A.M. Kuchling (akuchling) *	Date: 2007-03-28 16:56
Created a branch in SVN at svn+ssh://pythondev@svn.python.org/python/branches/amk-mailbox to work on this for 2.5.1. I've committed the unified2 module patch and the test_concurrent_ad() test.
msg30627 - (view)	Author: David Watson (baikie)	Date: 2007-04-13 13:45
Here's a possible solution to the rereading problem. It should allow existing applications to work whether they behave properly or not, as well as (probably) most third-party subclasses. The idea is to compare the new table of contents to the list of known offset pairs, and raise ExternalClashError if any of them have changed or disappeared. Any new pairs can then be added to _toc under new keys. To maintain the list of known pairs, a special dict subclass is used on self._toc that records every offset pair ever set in it - even those that are subsequently removed from the mapping. However, if self._pending is not set when rereading, then the code doesn't rely on this special behaviour: it just uses self._toc.itervalues(), which will work even if a subclass has replaced the special _toc with a normal dictionary. Ways the code can break: - If a subclass replaces self._toc and then the application tries to lock the mailbox after making modifications (so that _update_toc() finds self._pending set, and looks for the special attribute on _toc). - If a subclass tries to store something other than sequences in _toc. - If a subclass' _generate_toc() can produce offsets for messages that don't match those they were written under. File Added: mailbox-update-toc-again.diff
msg30628 - (view)	Author: David Watson (baikie)	Date: 2007-04-13 13:46
Some new test cases for this stuff. File Added: test_mailbox-reread.diff
msg30629 - (view)	Author: David Watson (baikie)	Date: 2007-04-13 13:47
This fixes the Babyl breakage. Perhaps it should be in the superclass? File Added: mailbox-babyl-fix.diff
msg84585 - (view)	Author: Daniel Diniz (ajaksu2) *	Date: 2009-03-30 17:24
Barry: is this on scope for the email sprint?
msg116185 - (view)	Author: Mark Lawrence (BreamoreBoy) *	Date: 2010-09-12 12:46
Does any core committer fancy reviewing the 18 attached patches? :)
msg138216 - (view)	Author: Terry J. Reedy (terry.reedy) *	Date: 2011-06-12 19:46
D. Watson, could you remove obsolete patches and leave just the ones a patch reviewer should look at?
msg164004 - (view)	Author: Petri Lehtinen (petri.lehtinen) *	Date: 2012-06-25 19:13
It seems to me that there are three separate issues here: 1) Concurrent access to a single-file mailbox will discard some messages even if locking is correctly used. This is demonstrated by mailbox-test.patch, which despite its age applies on top of 3.2 and fails (at least for me) 2) Detecting changes to mbox files is hard and should be improved. 3) When the TOC of an mbox is re-read, the message keys should probably be kept intact. Otherwise, the application would need to prepare for the fact that keys can change without notice in the middle of a loop, for example. The first point is a serious bug that should be fixed somehow. The second point has been discussed in greath lengths, but it strikes me that the mtime tracking has not been suggested. AFAIK, this is what most/all mbox readers do. The toc should be re-read when the mtime changes and/or is newer than atime. The third point also sounds important, but might be tricky to implement.
msg213979 - (view)	Author: Jim Jewett (Jim.Jewett) *	Date: 2014-03-18 15:24
What is the status here? As best I can tell from a skim, (a) It should be broken into at least three separate issues. (Though maybe some can just be closed instead of separated?) (b) None of them are security holes, so we missed 2.5 and 2.6, and should now remove 3.2 and possibly 3.3. (c) There are patches with at least test cases (and maybe fixes) for all identified issues. (d) None of these patches have been applied, so we should probably add 3.4 and and 3.5 to the affected list, and then apply at least the test cases and the uncontroversial fixes. (e) Ideally, David Watson and/or Andrew Kuchling should sign off on which patches are uncontroversial.
msg214007 - (view)	Author: A.M. Kuchling (akuchling) *	Date: 2014-03-18 19:08
The bug got so complicated, with so many variations and failure modes, that I just lost track of what to do next. Let's reboot. Problem #1 is that the single-file mailboxes subvert their own fcntl locking by writing to a temp. file and renaming; David's first patch takes care of that by truncating the original and copying. Problem #2 is that the mailbox classes keep an internal table-of-contents mapping messages to positions in the file. If some other process writes to the mailbox, this table-of-contents becomes obsolete, so we need to update it. But the ToC is also used to derive keys, the identifiers kept by scripts using the mailbox, so updating the ToC means these keys become invalid. And here's where I got confused: do we invalidate the ToC in all sorts of places? Or do we just document that you shouldn't trust keys to remain the same unless you've locked the mailbox? (Except this is only true for single-file mailbox formats - Maildir uses unique strings that don't change.) I suggest we apply the fix for #1, and for #2 just discard and update the ToC when we lock the mailbox, ignoring other possible routes to corruption (so no detecting the problem and raising an ExternalClash exception). Since 2007 the docs have said "If you’re modifying a mailbox, you must lock it by calling the lock() and unlock() methods before reading any messages in the file or making any changes". (I'm also reducing the priority of this bug; clearly it's not "high".)
msg214043 - (view)	Author: Jim Jewett (Jim.Jewett) *	Date: 2014-03-19 01:20
OK, if I understand that correctly, (1) The locking mechanism doesn't really work, and that is too complicated to fix in this issue. A new issue would be fine. (2) The locking failure messes up the Table Of Contents, but that is too comprehensive a change to fix here, and should be handled in the new issue. (3) The locking failure also causes data loss/corruption in the messages themselves, and that can be fixed simply by truncate+append, instead of renaming. (Is this actually true, or only true if you also implement better change-detection to catch the other process?) (4) While the patches do (and test for) #3, there is not currently a patch that only does #3. (5) This is a pure bug fix; the only reason to change documentation would be to help make triggering the bugs less likely. === Also note that (6) At least some of the changes do seem to have been included at some point, so regenerating the patches is not mechanical. (7) The documentation has a big warning note, but the wording could use some improvement to emphasize that even reading is unsafe, if changes (even status changes, such as a "Seen" flag) could happen later. Again, based on my possibly faulty understanding: "If you’re modifying a mailbox, you must lock it by calling the lock() and unlock() methods before reading any messages in the file or making any changes" --> "Keys may become invalid at any time, unless the mailbox is locked. Notably, another process may modify the underlying file storage so that the key used to retrieve an existing message becomes invalid, or points to a different message. To prevent this, a mailbox must be locked prior to even reading a message, and must not be unlocked until all changes -- even status changes, such as whether or not a Message was viewed -- for all keys have been completed and flushed."
msg214648 - (view)	Author: David Watson (baikie)	Date: 2014-03-23 22:57
On Tue 18 Mar 2014, A.M. Kuchling wrote: > I suggest we apply the fix for #1, and for #2 just discard and > update the ToC when we lock the mailbox, ignoring other > possible routes to corruption (so no detecting the problem and > raising an ExternalClash exception). Since 2007 the docs have > said "If you’re modifying a mailbox, you must lock it by > calling the lock() and unlock() methods before reading any > messages in the file or making any changes". Well, the warning you're referring to begins "Be very cautious when modifying mailboxes that might be simultaneously changed by some other process. The safest mailbox format to use for such tasks is Maildir...", suggesting that it only applies to mailboxes that might in fact be modified by another process. But if a reread is forced simply by discarding the ToC, then the application's own keys become invalid if it has, e.g. previously deleted a message, even if it's a private mailbox that no other process ever touches. So such a change would make the warning apply to such mailboxes, whereas previously it (in effect) did not. (That does raise the question of why the application is calling .lock() at all on a mailbox that won't be modified by another process, but if the programmer thought the warning didn't apply to their situation, then they presumably wouldn't think that calling .lock() after modifying might actually be dangerous - and it currently isn't dangerous for such a mailbox.) Anyway, I've rebased the mailbox-unified2 patch onto 2.7, renaming it as mailbox-copy-back-2.7.diff; this fixes the original (renaming and fcntl locking) issue and adds some warnings about locking. I've combined all the tests (I think) into a separate patch, and ported them to the multiprocessing module (possibly unwise as I'm not very familiar with it - it would be nice if someone could test it on Windows). I haven't looked at the tests closely again, but they appear to check for many of the ToC issues. There actually isn't a test for the original locking issue, as the tests all use the mailbox API, which doesn't provide a way to turn off dot-locking. Also, the module no longer rewrites the mailbox if messages have only been appended to it - it just syncs it instead. However, I have verified by hand that the problem is still there when the mailbox is rewritten due to deletions, etc.
msg215207 - (view)	Author: David Watson (baikie)	Date: 2014-03-30 21:50
On Sun 23 Mar 2014, David Watson wrote: > There actually isn't a test for the original locking issue, as > the tests all use the mailbox API, which doesn't provide a way to > turn off dot-locking. ... On second thought, the underlying error doesn't actually have anything to do with locking - another process can open the mailbox, and mailbox.py can replace the file before that process even tries to lock it. Andrew's test_concurrent_append() originally did test for this, but commit c37cb11b546f changed the single-file mailbox classes to just sync the file rather than replacing it when messages have only been added, as in that test, meaning it's no longer effective (for that issue at least). I've now made a test for the original renaming-vs-copying issue which just uses the simple _subprocess mechanism that the other tests use, rather than trying to set up any special locking conditions. I've also split the tests into two patches - mailbox-tests-2.7-part1-for-copy-back.diff has the test for the original issue, and passes with the copy-back patch applied, while mailbox-tests-2.7-part2.diff applies on top and includes the rest (these involve the separate issues around rereading, and mostly fail).
msg217378 - (view)	Author: Jim Jewett (Jim.Jewett) *	Date: 2014-04-28 15:12
pinging David Watson: What is the status? If I understand correctly, (and I may well not), you have already opened other issues for parts of this, and (only) the final patch is ready for patch (and hopefully) commit review. Is this correct?
msg217713 - (view)	Author: David Watson (baikie)	Date: 2014-05-01 18:11
On Mon 28 Apr 2014, Jim Jewett wrote: > pinging David Watson: What is the status? If I understand correctly, (and I may well not), you have already opened other issues for parts of this, and (only) the final patch is ready for patch (and hopefully) commit review. Is this correct? No, I haven't opened any separate issues (I would be perfectly happy with fixing the original renaming-vs-copying problem and then leaving this issue open to deal with the rest). The patches mailbox-copy-back-2.7.diff and mailbox-tests-2.7-part1-for-copy-back.diff are what I suggest applying for the renaming-vs-copying problem. I haven't looked at the other patches in much detail, but for what it's worth, the tests in mailbox-tests-2.7-part2.diff do pass with mailbox-update-toc-again.diff applied.
msg408393 - (view)	Author: Irit Katriel (iritkatriel) *	Date: 2021-12-12 16:08
Reproduced on 3.11 - the test in mailbox-test.patch fails.

History
Date	User	Action	Args
2022-04-11 14:56:21	admin	set	github: 44251
2021-12-12 16:08:57	iritkatriel	set	nosy: + iritkatriel messages: + msg408393 versions: + Python 3.9, Python 3.10, Python 3.11, - Python 2.7, Python 3.4, Python 3.5
2014-12-31 16:25:26	akuchling	set	nosy: - akuchling
2014-05-12 09:18:30	jwilk	set	nosy: + jwilk
2014-05-01 18:11:37	baikie	set	messages: + msg217713
2014-04-28 15:12:56	Jim.Jewett	set	messages: + msg217378
2014-03-30 21:50:12	baikie	set	files: + mailbox-tests-2.7-part1-for-copy-back.diff, mailbox-tests-2.7-part2.diff messages: + msg215207
2014-03-23 22:57:20	baikie	set	files: + mailbox-copy-back-2.7.diff, mailbox-all-tests-2.7.diff messages: + msg214648
2014-03-19 01:20:48	Jim.Jewett	set	messages: + msg214043
2014-03-18 19:13:10	terry.reedy	set	versions: + Python 3.4, Python 3.5, - Python 3.2, Python 3.3
2014-03-18 19:08:48	akuchling	set	priority: high -> normal messages: + msg214007
2014-03-18 15:24:27	Jim.Jewett	set	nosy: + Jim.Jewett messages: + msg213979
2012-06-25 19:13:21	petri.lehtinen	set	nosy: + petri.lehtinen messages: + msg164004
2011-06-12 19:46:14	terry.reedy	set	nosy: + r.david.murray, terry.reedy, - BreamoreBoy messages: + msg138216 versions: + Python 3.3, - Python 3.1
2010-11-12 21:00:18	akuchling	set	assignee: akuchling ->
2010-09-12 12:46:30	BreamoreBoy	set	nosy: + BreamoreBoy messages: + msg116185 versions: + Python 3.1, Python 3.2, - Python 2.6
2010-05-11 20:50:53	terry.reedy	set	versions: + Python 2.7
2009-03-30 17:24:49	ajaksu2	set	type: behavior versions: + Python 2.6, - Python 2.5 keywords: + patch nosy: + barry, ajaksu2 messages: + msg84585 stage: patch review
2007-09-17 10:50:06	peter.ll	set	nosy: + peter.ll
2006-11-19 16:03:08	baikie	create