Author terry.reedy
Recipients brian.curtin, pitrou, schlamar, serhiy.storchaka, terry.reedy, tim.golden
Date 2012-12-24.22:35:29
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1356388529.62.0.524653964536.issue16743@psf.upfronthosting.co.za>
In-reply-to
Content
Windows memory-maps multi-gigabyte files just fine as long as one uses the proper build (64-bit), which we provide.

Given that mmap produces a finite-length sequence object, as documented, slicing is working as it should. Slicing beyond the length  returns an empty sequence. The is no different from 'abc'[4:6]==''.

Running Python with finite memory has many memory-associated limitations. They are mostly undocumented as the exact details may depend on hardware, OS, implementation, version, and build. One practical limitation is that mmap with a 32-bit build cannot completely map multi-gigabyte files.

The current doc says:
"class mmap.mmap(fileno, length, tagname=None, access=ACCESS_DEFAULT[, offset]) 
(Windows version) Maps length bytes from the file specified by the file handle fileno, and creates a mmap object. If length is larger than the current size of the file, the file is extended to contain length bytes. If length is 0, the maximum length of the map is the current size of the file, except that if the file is empty Windows raises an exception (you cannot create an empty mapping on Windows)."

It does not say what happens if the requested length is larger than the max possible on a particular system. In particular, there is no mention of exception raising. So failure to raise is not a bug for tracker purposes.

The two possibilities of what to do is such situations are best effort and bailout. The current choice (at least on Windows, and whether by us, Microsoft, or the original mmap authors, I don't know) is best effort. I think that is fine, but should be documented. Users who care can compare the mmap object length with the file length or needed length and raise or do whatever if the mmap length is too short.

So I think we should change this to a doc issue and add something like "If the requested length is larger than the limit for the current system, then that limit is used as the length."
or
"The length of the returned mmap object has a limit that depends on the details of the running system."

Or the header should say that there is a system limit and two of the sentences above revised. In the first, change 'length bytes' to 'min(length, system limit) bytes. (I am presuming this is true also when length is not given as 0.) In the last sentence, change 'current size' to 'min(current size, system limit)'.

The Unix version doc should also clarify behavior.
---

If we were to change mmap() (but only in a future release), then users who want the current behavior would have to discover, hard-code, and explicitly but conditionally pass the limit for each system their code might ever run on. I do not know that that is sensibly possible. I would not be surprised if the limit for a given 32-bit build varies for different windows versions and setups.
History
Date User Action Args
2012-12-24 22:35:29terry.reedysetrecipients: + terry.reedy, pitrou, tim.golden, brian.curtin, schlamar, serhiy.storchaka
2012-12-24 22:35:29terry.reedysetmessageid: <1356388529.62.0.524653964536.issue16743@psf.upfronthosting.co.za>
2012-12-24 22:35:29terry.reedylinkissue16743 messages
2012-12-24 22:35:29terry.reedycreate