classification
Title: f.write(s) for s > 2GB hangs in win64 (and win32?)
Type: crash Stage:
Components: Extension Modules, IO, Windows Versions: Python 2.6
process
Status: closed Resolution: duplicate
Dependencies: Superseder:
Assigned To: Nosy List: Bill.Steinmetz, amaury.forgeotdarc, cgohlke, mspacek, vstinner
Priority: normal Keywords:

Created on 2010-06-16 23:20 by Bill.Steinmetz, last changed 2011-07-05 09:46 by vstinner. This issue is now closed.

Messages (9)
msg107964 - (view) Author: Bill Steinmetz (Bill.Steinmetz) Date: 2010-06-16 23:20
Here's my Python version info:
Python 2.6.5 (r265:79096, Mar 19 2010, 18:02:59) [MSC v.1500 64 bit (AMD64)] on win32


Here's my code that won't return (Start with a file > 4GB "hugefile.bin"):

siz = (1<<32)

print "making array (%d) bytes" % siz
fin = open("hugefile.bin","rb")
a = array.array("B")
a.fromfile(fin, siz)
fin.close()

print "writing array (%d) bytes" % siz
fout = open("foo.bin","wb")
a.tofile(fout)
print "wrote 2^32 bytes with array.tofile"



I never get the third print statement :(
msg107966 - (view) Author: Bill Steinmetz (Bill.Steinmetz) Date: 2010-06-16 23:53
Looks like the issue is Microsoft's fwrite
msg120313 - (view) Author: Christoph Gohlke (cgohlke) Date: 2010-11-03 08:35
This seems to be related: http://social.msdn.microsoft.com/Forums/en-US/vcgeneral/thread/7c913001-227e-439b-bf07-54369ba07994
msg120379 - (view) Author: Martin Spacek (mspacek) Date: 2010-11-04 04:44
NumPy is addressing this with a workaround in its ndarray, calling fwrite multiple times in reasonably sized chunks. See http://projects.scipy.org/numpy/ticket/1660
msg120385 - (view) Author: Martin Spacek (mspacek) Date: 2010-11-04 08:49
It turns out this isn't just a problem with array.array. It's a problem with Python's file.write() as well. Here's my test code:

# file.write() test:
FOURGBMINUS = 2**32 - 16
s = '0123456789012345' # 16 bytes
longs = ''.join([s for i in xrange(FOURGBMINUS//len(s))])
assert len(longs) == FOURGBMINUS
f = open('test.txt', 'w')
f.write(longs) # completes successfully
f.close()

FOURGB = 2**32
s = '0123456789012345' # 16 bytes
longs = ''.join([s for i in xrange(FOURGB//len(s))])
assert len(longs) == FOURGB
f = open('test.txt', 'w')
f.write(longs) # hangs with 100% CPU, file is 0 bytes
f.close()

SIXGB = 2**32 + 2**31
s = '0123456789012345' # 16 bytes
longs = ''.join([s for i in xrange(SIXGB//len(s))])
assert len(longs) == SIXGB
f = open('test.txt', 'w')
f.write(longs) # hangs with 100% CPU, file is 2**31 bytes
f.close()

# file.read test:
TWOGB = 2**31
TWOGBPLUS = TWOGB + 16
s = '0123456789012345' # 16 bytes
longs = ''.join([s for i in xrange(TWOGBPLUS//len(s))])
assert len(longs) == TWOGBPLUS
f = open('test.txt', 'w')
f.write(longs) # completes successfully
f.close()
f = open('test.txt', 'r')
longs = f.read() # works, but takes >30 min, memory usage keeps jumping around
f.close()
del longs
# maybe f.read() reads 1 char at a time til it hits EOL. try this instead:
f = open('test.txt', 'r')
longs = f.read(TWOGBPLUS) # OverflowError: long int too large to convert to int
longs = f.read(TWOGB) # OverflowError: long int too large to convert to int
longs = f.read(TWOGB - 1) # works, takes only seconds
f.close()


So, I guess in windows (I've only tested in 64-bit Windows 7, Python 2.6.6 amd64), file.write() should call fwrite multiple times in chunks no greater than 2**31 bytes or so. Also, calling f.read(nbytes) where nbytes >= 2**31 raises "OverflowError: long int too large to convert to int". I don't have either of these problems in 64-bit Linux (Ubuntu 10.10) on the same machine (i7, 12GB).
msg120386 - (view) Author: Martin Spacek (mspacek) Date: 2010-11-04 08:53
I suppose someone should confirm this problem on Py > 2.6?
msg120387 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2010-11-04 09:37
It's still an issue with 2.7, and even with 3.2a2, see issue9611.
msg125259 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-01-04 00:33
r87722 should fix the issue, but I didn't tested the fix... see #9611 for more information.
msg139839 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-07-05 09:46
This issue is a duplicate of #9611.
History
Date User Action Args
2011-07-05 09:46:58vstinnersetstatus: open -> closed
resolution: duplicate
messages: + msg139839
2011-01-04 00:33:27vstinnersetnosy: + vstinner
messages: + msg125259
2010-11-04 09:37:33amaury.forgeotdarcsetnosy: + amaury.forgeotdarc
messages: + msg120387
2010-11-04 08:53:33mspaceksetnosy: cgohlke, Bill.Steinmetz, mspacek
messages: + msg120386
components: + Extension Modules, Windows
2010-11-04 08:49:50mspaceksetnosy: cgohlke, Bill.Steinmetz, mspacek
messages: + msg120385
components: + IO, - Extension Modules
title: array.array.tofile cannot write arrays of sizes > 4GB, even compiled for amd64 -> f.write(s) for s > 2GB hangs in win64 (and win32?)
2010-11-04 04:44:18mspaceksetnosy: + mspacek
type: crash
messages: + msg120379
2010-11-03 08:35:03cgohlkesetnosy: + cgohlke
messages: + msg120313
2010-06-16 23:53:15Bill.Steinmetzsetmessages: + msg107966
2010-06-16 23:20:44Bill.Steinmetzcreate