New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mmap offset should be off_t instead of ssize_t, and size calculation needs corrected #48931
Comments
In trying to use the mmap offset to allow me to work with a 12GB file, I |
I don't think fpos_t is a suitable type here. It isn't guaranteed to be |
Sorry, I should have explained that the bulk of my patch was derived from |
This is different from bz2, though. In bzlib, there is no API to seek to |
Ah, I think I'm beginning to understand. Thanks for the explanation. So, |
Correct. |
Thanks again, revised patch attached. |
There are bigger problems here. When the code calculates the size to map |
Notes on the problem. The Python mmap documentation says: If length is 0, the maximum length Currently, on a system where off_t is 64-bit and size_t is 32-bit, if In addition, it seems weird that a caller would have to check the size Finally, it appears that there isn't a reliable, architecture With these points in mind, I am going to work on coming up with a patch Of course, comments and suggestions are welcome. However, I'm going to |
Ok, put a fork in me, 'cuz I'm done with this. The latest is that Without the size of the data, it is unclear how one would traverse a I'm going to leave this up to someone with more knowledge of how this |
Attached is a fix to make offset use off_t. This means that mmap will work with offset > 2GB on 32bit systems. It also fixes that mmap.size() returns the correct value for files > 2GB on 32bit systems. The first issue of msg78055 was fixed in bpo-10916, this also fixes the second part, raising an exception if the mmap length is too large instead of mmap()ing an invalid or wrong size. |
Thanks for doing this. Looking at the patch: Modules/_io/_iomodule.h has macros for dealing with off_t (it also defines Py_off_t for Windows), perhaps you want to use them. Also, support.unlink() takes care of ignoring ENOENT for you. Regardling the sizes used in the test: I think using hexadecimal literals (e.g. 0x180000000 instead of 6442450944) would make things a bit more readable :) Part of the patch doesn't apply cleanly to latest py3k. Could you regenerate it, please? |
OK, this new patch applies cleanly, uses support.unlink and hexadecimal constants. I left the off_t handling as is (it seems to work on *nix testing). Perhaps someone can handle the Windows side? |
Ok, after looking at the code, it appears Windows already should have proper handling of 64-bit sizes under 32-bit builds. |
will this patch support off_t for "mmap.resize" as well? ftruncate uses off_t for the length of a file as well. |
No, it doesn't because resize() resizes the amount mmapped in memory which can't be more than ssize_t anyway. However, resizing does work on a 12GiB file with a large offset, e.g. 6GiB. |
i am bit lost. ftruncate is used to enlarge the file, isn't it? please consider the script: -- rr.py --- import mmap
f = open('test.x', 'w+b')
f.write(b'\x00' * 10)
f.flush()
fm = mmap.mmap(f.fileno(), 0)
fm.resize(5 * 1024 ** 3)
fm.close()
f.close() the script above resizes file from 10 bytes now, on 32-bit linux (with off_t): $ python3 rr.py
Traceback (most recent call last):
File "rr.py", line 8, in <module>
fm.resize(5 * 1024 ** 3)
OverflowError: Python int too large to convert to C ssize_t above script works perfectly on 64-bit version of linux maybe i do not understand some magic here, but clearly one |
32-bit computers can address up to 4GiB of memory and 64-bit computers can address much more than this. mmap() allows a file to be mapped to a location in memory - the actual amount of memory that exists doesn't matter. This is the reason why a 5GiB file can be mmaped on 64-bit but not 32-bit. Also, python uses the ssize_t type for sequences which allows about 2GiB to be mapped on a 32-bit platform. The mmap.resize() operation as well as changing the size of the underlying file size with ftruncate(), also changes the size of the map in memory. Thus in your sample script, this resizing would need to ftruncate the file to 5GiB and then map all of it into memory which is not possible on 32-bit. I think the following might be a possible way of doing it: f = open('test.x', 'w+b')
f.write(b'\x00' * 10)
f.flush()
fm = mmap.mmap(f.fileno(), 0)
fm.close()
f.truncate(5 * 1024 ** 3)
# mmap only 1GiB of the 5GiB file
fm = mmap.mmap(f.fileno(), 1024**3)
fm.close()
f.close() |
Le jeudi 10 février 2011 à 16:41 +0000, Ross Lagerwall a écrit :
... at least 4 GB. With PAE (Physical Address Extension), we can address Victor |
Le jeudi 10 février 2011 à 17:03 +0000, STINNER Victor a écrit :
We are talking about virtual addresses, not physical addresses. PAE |
Ross, you are completely right. thanks for the tips! |
I was a bit optimistic concerning 32-bit Windows. I had to do some changes, in part because off_t is 32-bit there. The final patch is committed in r88486 (tested under 32-bit and 64-bit Linux and Windows). |
Backported in r88487 to 3.2, and in r88488 to 2.7. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: