classification
Title: simple bug in mmap size check
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 2.7
process
Status: closed Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: jazzer, neologix, pitrou
Priority: normal Keywords: patch

Created on 2011-10-11 00:45 by jazzer, last changed 2011-10-11 09:10 by jazzer. This issue is now closed.

Files
File name Uploaded Description Edit
mmap-greater.patch jazzer, 2011-10-11 00:45 patch against python 2.7.2
Messages (8)
msg145319 - (view) Author: Maxim Yanchenko (jazzer) Date: 2011-10-11 00:45
The condition contradicts the exception text:
            if (offset >= st.st_size) {
                PyErr_SetString(PyExc_ValueError,
                                "mmap offset is greater than file size");
                return NULL;
            }
The condition should be changed to (offset > st.st_size), similar to the later condition which is correct:
        } else if (offset + (size_t)map_size > st.st_size) {
            PyErr_SetString(PyExc_ValueError,
                            "mmap length is greater than file size");
            return NULL;
        }

The patch is attached.
msg145321 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-10-11 01:04
Well, you have to understand what this code does: it tries to prevent non-meaningful offsets. If the offset is equal to the file size, mapping from that offset would not map anything in the file (and the system call may actually fail).
msg145323 - (view) Author: Maxim Yanchenko (jazzer) Date: 2011-10-11 01:27
First of all, it doesn't fail (at least on Linux), I tested it before posting.

And the latest, it's not like I'm just stalking around and reading Python sources for fun. It's a real and pretty valid case, I hit it while upgrading our production code to python 2.7.
I'm using NumPy (linear algebra module) that uses Python's core mmap module under the hood.
In NumPy, it's pretty valid to have arrays of size 0.
I have a file with a fixed-size header that holds size of the array and some other meta-data. I mmap this file as a NumPy array using the offset equal to header size. Of course, if there is no data in the array then the file will be just header, and the offset will be equal to the size of the file - here is where this bug hits us as I can't load this file with Python 2.7.2 anymore (while I was able to with Python 2.5).
This patch fixes this and everything works as expected, giving an array with zero dimensions ('shape' in terms of NumPy):
>>> x.shape
(0,)
>>> x.size
0

Please kindly consider applying the patch.
msg145324 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-10-11 01:31
I don't think it makes sense to accept mmap'ing empty contents when at offset "n" but not at offset "n + 1". Either we remove the check entirely and let people deal with the consequences, or we keep the check as-is.
msg145325 - (view) Author: Maxim Yanchenko (jazzer) Date: 2011-10-11 01:52
Well, "n+1" is clearly outside the file, wile "n" is within and therefore valid.

Also, if your position is to forbid zero-size mmapping completely, then the checks inside "if (map_size == 0) {" don't make any sense, especially as they may or may fail.
From the existing code, zero-size mmapping is OK as long as offset is OK, so the question is whether we consider offset pointing to the end of the file OK. To me, it's fine and valid, and there are valid cases like NumPy's zero-size arrays, hence the proposed patch.

Removing the check completely is a viable option too, it was already requested for special files:
http://bugs.python.org/issue12556

I believe users should have an ability to enjoy whatever their OS provides, and "deal with the consequences" as you said.
msg145332 - (view) Author: Maxim Yanchenko (jazzer) Date: 2011-10-11 06:02
tried on newer Linux - crashes with my patch.
So I'll be thinking about a workaround, probably a fix for NumPy to avoid using mmap in such cases but still provide uniform interface to avoid making special conditional handling in all my scripts.
Therefore, I'm no longer pushing for this change, I will need another workaround anyway.
msg145334 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-10-11 07:45
> The condition contradicts the exception text:

Why?
The offset is zero-based, so 0 <= offset < size is a valid check.

> First of all, it doesn't fail (at least on Linux), I tested it before 
> posting.

Hmmm.
You got lucky, since the offset must be a multiple of the page size.

> tried on newer Linux - crashes with my patch.

That's exactly why we perform such checks. Here's what POSIX says:
"""
[EINVAL]
The addr argument (if MAP_FIXED was specified) or off is not a multiple of the page size as returned by sysconf(), or are considered invalid by the implementation.
[ENXIO]
Addresses in the range [off, off + len) are invalid for the object specified by fildes.
"""

Since we usually want to avoid implementation-defined behavior (and crashes), I think it's better to stick with the current checks (note that issue #12556 concerned a really corner case: /proc entries supporting mmap but reporting a zero-length).

> Therefore, I'm no longer pushing for this change, I will need another 
> workaround anyway.

Alright, closing then.
msg145336 - (view) Author: Maxim Yanchenko (jazzer) Date: 2011-10-11 09:10
> You got lucky, since the offset must be a multiple of the page size.
That's why our header is exactly the page size :)

> Here's what POSIX says
Then it's just another discrepancy between POSIX and Linux, as I received ENOMEM instead of EINVAL (RHEL6 on 2.6.32).


Regarding the contradiction, it's probably still worth changing the exception message to "mmap offset is greater than _or equal to_ file size", to match the condition. Just 'greater than' means the '>' check, not the '>=' check from the code, mathematically.
History
Date User Action Args
2011-10-11 09:10:54jazzersetresolution: rejected ->
messages: + msg145336
2011-10-11 07:45:57neologixsetstatus: open -> closed
resolution: rejected
messages: + msg145334

stage: resolved
2011-10-11 06:02:52jazzersetmessages: + msg145332
2011-10-11 01:52:13jazzersetmessages: + msg145325
2011-10-11 01:31:40pitrousetnosy: + neologix

messages: + msg145324
stage: resolved -> (no value)
2011-10-11 01:27:26jazzersetstatus: closed -> open
resolution: not a bug -> (no value)
messages: + msg145323
2011-10-11 01:04:20pitrousetstatus: open -> closed

nosy: + pitrou
messages: + msg145321

resolution: not a bug
stage: resolved
2011-10-11 00:45:47jazzercreate