This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: io.FileIO cannot write more than 2GB (-4096) bytes??? must be documented (if not fixed)
Type: behavior Stage: resolved
Components: IO Versions: Python 3.5, Python 2.7
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: Yaroslav.Halchenko, benjamin.peterson, eryksun, serhiy.storchaka
Priority: normal Keywords:

Created on 2017-09-30 20:58 by Yaroslav.Halchenko, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
longwrite.py Yaroslav.Halchenko, 2017-09-30 20:58
Messages (5)
msg303429 - (view) Author: Yaroslav Halchenko (Yaroslav.Halchenko) Date: 2017-09-30 20:58
originally detected on python 2.7, but replicated with python 3.5.3 -- apparently io.FileIO, if given a bytestring of 2GB or more, cannot write it all at once -- saves (and returns that size) only 2GB - 4096.

I found no indication for such behavior anywhere in the documentation. And it is surprising to me especially since regular file.write does it just fine!  attached is the code snippet which I list below and which demonstrates it

$> python3 --version; python3 longwrite.py
Python 3.5.3
Written 2147479552 out of 2147483648
4096 bytes were not written
Traceback (most recent call last):
  File "longwrite.py", line 28, in <module>
    assert in_digest == out_digest, "Digests do not match"
AssertionError: Digests do not match
python3 longwrite.py  7.03s user 5.80s system 99% cpu 12.848 total
1 11365 ->1.....................................:Sat 30 Sep 2017 04:56:26 PM EDT:.
smaug:/mnt/btrfs/scrap/tmp
$> cat longwrite.py
# -*- coding: utf-8 -*-
import io
import os
import hashlib

s = u' '*(256**4//2)  #+ u"перфекто"
s=s.encode('utf-8')
#s=' '*(10)

in_digest = hashlib.md5(s).hexdigest()
fname = 'outlong.dat'

if os.path.exists(fname):
    os.unlink(fname)

with io.FileIO(fname, 'wb') as f:
#with open(fname, 'wb') as f:
     n = f.write(s)

#n = os.stat(fname).st_size
print("Written %d out of %d" % (n, len(s)))
if n != len(s):
    print("%d bytes were not written" % (len(s) - n))

# checksum
with open(fname, 'rb') as f:
    out_digest = hashlib.md5(f.read()).hexdigest()
assert in_digest == out_digest, "Digests do not match"
print("all ok")
msg303443 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2017-10-01 01:53
The docs to look at here are https://docs.python.org/3/library/io.html#io.RawIOBase.write, which points out that short writes can happen.
msg303447 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2017-10-01 03:10
Additionally, the FileIO documentation states the following:

    The read() (when called with a positive argument), readinto() and 
    write() methods on this class will only make one system call.

The Linux man page for write() in turn states this:

    On Linux, write() (and similar system calls) will transfer at most 
    0x7ffff000 (2,147,479,552) bytes, returning the number of bytes 
    actually transferred.  (This is true on both 32-bit and 64-bit 
    systems.)
msg303448 - (view) Author: Yaroslav Halchenko (Yaroslav.Halchenko) Date: 2017-10-01 05:16
Thank you for the follow-ups!  

Wouldn't it be better if Python documentation said exactly that 

On Linux, write() (and similar system calls) will transfer at most 0x7ffff000 (2,147,479,552) bytes, returning the number of bytes 
actually transferred.  (This is true on both 32-bit and 64-bit 
systems.)

Also, it might be nice to add a note on top, that this module is for 'low level' IO interface, and that it is recommended to use regular file type for typical file operations (not io.FileIO) to avoid necessity of dealing limitations such as the one mentioned.
msg303452 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-10-01 08:30
> On Linux, write() (and similar system calls) will transfer at most 0x7ffff000 (2,147,479,552) bytes, returning the number of bytes 
> actually transferred.  (This is true on both 32-bit and 64-bit 
> systems.)

This is platform-depending limitation. It can be be changed in future. In addition, there are other causes of writing not all data (see `man 2 write`).

> Also, it might be nice to add a note on top, that this module is for 'low level' IO interface, and that it is recommended to use regular file type for typical file operations (not io.FileIO) to avoid necessity of dealing limitations such as the one mentioned.

This is not true for the module overall. And this is already documented for io.RawIOBase:

"""
Raw binary I/O typically provides low-level access to an underlying OS device or API, and does not try to encapsulate it in high-level primitives (this is left to Buffered I/O and Text I/O, described later in this page).
"""
History
Date User Action Args
2022-04-11 14:58:53adminsetgithub: 75832
2017-10-01 08:30:32serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg303452
2017-10-01 05:16:59Yaroslav.Halchenkosetmessages: + msg303448
2017-10-01 03:10:39eryksunsetnosy: + eryksun
messages: + msg303447
2017-10-01 01:53:54benjamin.petersonsetstatus: open -> closed

nosy: + benjamin.peterson
messages: + msg303443

resolution: not a bug
stage: resolved
2017-09-30 20:58:07Yaroslav.Halchenkocreate