classification
Title: mhlib fails on Btrfs filesystem (test_mhlib failure)
Type: behavior Stage: commit review
Components: Library (Lib) Versions: Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: BreamoreBoy, David.Edelsohn, akuchling, nascheme, pitrou, python-dev, r.david.murray, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2010-01-22 22:22 by nascheme, last changed 2015-11-16 20:45 by serhiy.storchaka. This issue is now closed.

Files
File name Uploaded Description Edit
mhlib_nlinks.txt nascheme, 2010-01-22 22:22
mhlib_nlinks_2.patch serhiy.storchaka, 2015-11-11 07:54 review
mhlib_nlinks_3.patch serhiy.storchaka, 2015-11-11 08:07 review
Messages (20)
msg98169 - (view) Author: Neil Schemenauer (nascheme) * (Python committer) Date: 2010-01-22 22:22
Btrfs does not maintain a link count for directories (MacOS does the same I think). That confuses mhlib.py because it uses os.stat().st_nlinks as an optimization.

The attached patch removes the optimization and make test_mhlib pass on Btrfs (and probably HFS+) filesystems.
msg98181 - (view) Author: Chris Withers (cjw296) * (Python committer) Date: 2010-01-23 12:21
Please can you write a test for your patch?
msg98190 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-01-23 18:09
The documentation mentions that mhlib is deprecated and mailbox should be used instead. Is there any point in trying to fix it?
msg98201 - (view) Author: Neil Schemenauer (nascheme) * (Python committer) Date: 2010-01-23 23:27
On Sat, Jan 23, 2010 at 06:09:33PM +0000, Antoine Pitrou wrote:
> The documentation mentions that mhlib is deprecated and mailbox
> should be used instead. Is there any point in trying to fix it?

It looks like Btrfs will eventually conform to traditional st_nlink
behavior. However, that still leaves HFS+.  Perhaps the easiest fix
would be to have the unit test check for weird st_nlink behavior by
creating a directory with a subdirectory.  If something is weird,
skip testing mhlib.  The downside to that solution is that someone
might use mhlib on a HFS+ filesystem and encounter buggy behavior.

I can imagine that removing the optimization can make mhlib much
slower for large mail boxes.  Maybe that would be better than
risking lost mail though.  On modern machines maybe it doesn't
matter much.

  Neil
msg98204 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-01-24 01:25
> > The documentation mentions that mhlib is deprecated and mailbox
> > should be used instead. Is there any point in trying to fix it?
> 
> It looks like Btrfs will eventually conform to traditional st_nlink
> behavior. However, that still leaves HFS+.

That wasn't really my question. What I ask is: since mhlib is
deprecated, why do we need to fix it while people are encouraged to use
mailbox instead?
And, besides, does mailbox show the same problem?
msg98231 - (view) Author: Neil Schemenauer (nascheme) * (Python committer) Date: 2010-01-24 16:26
On Sun, Jan 24, 2010 at 01:25:18AM +0000, Antoine Pitrou wrote:
> That wasn't really my question. What I ask is: since mhlib is
> deprecated, why do we need to fix it while people are encouraged to use
> mailbox instead?

Sorry, I don't understand what you are proposing. Do you mean we
should just let the test fail for people who develop on HFS+ and
Btrfs filesystems? That seems not so good.

> And, besides, does mailbox show the same problem?

No, it doesn't have that optimization.

  Neil
msg98232 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-01-24 16:30
> Sorry, I don't understand what you are proposing. Do you mean we
> should just let the test fail for people who develop on HFS+ and
> Btrfs filesystems? That seems not so good.

Hmm, you are right. From a quick glance, the patch looks ok. I assume you've checked it doesn't break anything else :)
msg111825 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2010-07-28 15:44
Since mhlib has gone from py3k is there any interest in applying this to 2.6 or 2.7, given that there's been no response to msg98232?
msg119091 - (view) Author: Neil Schemenauer (nascheme) * (Python committer) Date: 2010-10-18 22:54
Closing this bug.  I don't think it makes sense to change the mhlib module in bugfix release.  My patch is fairly simple but not simple enough to make me feel comfortable.
msg254479 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-11-11 07:54
The new buildbot edelsohn-sles-z is red from its setting at 19 Aug 2015. test_mhlib is the only failed test.

http://buildbot.python.org/all/builders/s390x%20SLES%202.7/builds/114/steps/test/logs/stdio
======================================================================
FAIL: test_listfolders (test.test_mhlib.MhlibTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/dje/cpython-buildarea/2.7.edelsohn-sles-z/build/Lib/test/test_mhlib.py", line 185, in test_listfolders
    eq(folders, tfolders)
AssertionError: Lists differ: [] != ['deep', 'deep/f1', 'deep/f2',...

Second list contains 6 additional elements.
First extra element 0:
deep

- []
+ ['deep', 'deep/f1', 'deep/f2', 'deep/f2/f3', 'inbox', 'wide']

----------------------------------------------------------------------

I think we should fix this issue. Proposed patch adds a test that we can use nlinks for count a number of subdirectories.
msg254480 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-11-11 08:07
Here is even simpler and more reliable patch. It works even if the subfolder is a symlink to the directory on the filesystem that doesn't support links counting for directories.
msg254487 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-11-11 14:22
Sure, why not.  Having buildbots be green is good, and I doubt anyone is using mhlib any more even in python2.  And if they are the chances this will break something seems extremely small.
msg254489 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2015-11-11 15:33
New changeset 37431d9abbcd by Serhiy Storchaka in branch '2.7':
Issue #7759: Fixed the mhlib module on filesystems that doesn't support
https://hg.python.org/cpython/rev/37431d9abbcd
msg254493 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-11-11 16:53
The test now is passed.
msg254584 - (view) Author: Neil Schemenauer (nascheme) * (Python committer) Date: 2015-11-13 03:07
I don't see how that patch can be correct.  The logic is now if the directory has two links inside it then skip it.  The filesystems that don't count '.' and '..' will have zero links when empty and will have two links when two real files exist in them.

I think my original patch is safer.
msg254595 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-11-13 09:59
AFAIK the filesystem either counts directory references, in this case st_nlink >= 2, and st_nlink == 2 only for empty directory. The exception is a root directory, but it is not relevant. Or it doesn't count directory references, in this case st_nlink == 1.
msg254626 - (view) Author: Neil Schemenauer (nascheme) * (Python committer) Date: 2015-11-13 22:02
So what happens for the filesystems that doesn't count '.' and '..'?  It looks to me like if there are exactly two messages in a folder then the revised code will return [] (i.e. it will think the folder is empty).  Probably we should revise the unit test to make a folder with two messages.
msg254630 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-11-13 22:35
st_nlink is not related to the number of messages in a folder. It is a number of hard links.

If the filesystem supports hard links counting for directories, every directory (except /) has at least two links: one from its parent directory, and one from itself (via "."). Every subdirectory adds yet one hard link via "..". Non-directory files don't create hard links.

Typical mail folder can contain thousands of messages and none or only a few subfolders. Subfolders (if there are any) usually are created before messages and hence encountered first in directory listing. Hereby the optimization can have significant effect.

If there is a real case when st_nlink != 1 and is less then a number of subdirectories + 2, we should consider removing the optimization.
msg254749 - (view) Author: Neil Schemenauer (nascheme) * (Python committer) Date: 2015-11-16 20:09
Okay, feel free to close this bug.  I had heard that HFS+ counts files but I don't have a way to verify that.
msg254755 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-11-16 20:45
Yes, what I read, HFS+ counts files. st_ntlink is a number of files and directories + 2 (perhaps for fake "." and ".."). This value never less than 2 + number_of_subdirectories, hence the code should work. Just an optimization has no any effect (as well as on FS where st_ntlink == 1).
History
Date User Action Args
2015-11-16 20:45:18serhiy.storchakasetstatus: open -> closed
resolution: fixed
messages: + msg254755
2015-11-16 20:09:28naschemesetmessages: + msg254749
2015-11-13 22:35:48serhiy.storchakasetmessages: + msg254630
2015-11-13 22:02:30naschemesetmessages: + msg254626
2015-11-13 09:59:21serhiy.storchakasetmessages: + msg254595
2015-11-13 03:07:38naschemesetstatus: closed -> open
messages: + msg254584

assignee: serhiy.storchaka
resolution: fixed -> (no value)
stage: resolved -> commit review
2015-11-11 16:53:39serhiy.storchakasetstatus: open -> closed
resolution: wont fix -> fixed
messages: + msg254493

stage: patch review -> resolved
2015-11-11 15:33:48python-devsetnosy: + python-dev
messages: + msg254489
2015-11-11 14:22:53r.david.murraysetnosy: + r.david.murray
messages: + msg254487
2015-11-11 08:07:33serhiy.storchakasetfiles: + mhlib_nlinks_3.patch

messages: + msg254480
2015-11-11 07:54:13serhiy.storchakasetstatus: closed -> open
files: + mhlib_nlinks_2.patch

versions: - Python 2.6
keywords: + patch
nosy: + David.Edelsohn, serhiy.storchaka

messages: + msg254479
2010-10-18 22:54:19naschemesetstatus: open -> closed
resolution: wont fix
messages: + msg119091
2010-07-28 15:44:37BreamoreBoysetnosy: + BreamoreBoy
messages: + msg111825
components: + Library (Lib)
2010-01-24 16:32:12cjw296setnosy: - cjw296
2010-01-24 16:31:00pitrousetmessages: + msg98232
2010-01-24 16:26:39naschemesetmessages: + msg98231
2010-01-24 01:25:16pitrousetmessages: + msg98204
2010-01-23 23:27:49naschemesetmessages: + msg98201
2010-01-23 18:12:51pitrousetnosy: + akuchling
2010-01-23 18:12:28pitrousetstage: test needed -> patch review
versions: + Python 2.6, - Python 3.2
2010-01-23 18:09:30pitrousetnosy: + pitrou
messages: + msg98190
2010-01-23 12:21:45cjw296setnosy: + cjw296

messages: + msg98181
stage: test needed
2010-01-22 22:22:12naschemecreate