This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author mspacek
Recipients Bill.Steinmetz, cgohlke, mspacek
Date 2010-11-04.08:49:49
SpamBayes Score 9.618963e-09
Marked as misclassified No
Message-id <1288860592.3.0.670484449031.issue9015@psf.upfronthosting.co.za>
In-reply-to
Content
It turns out this isn't just a problem with array.array. It's a problem with Python's file.write() as well. Here's my test code:

# file.write() test:
FOURGBMINUS = 2**32 - 16
s = '0123456789012345' # 16 bytes
longs = ''.join([s for i in xrange(FOURGBMINUS//len(s))])
assert len(longs) == FOURGBMINUS
f = open('test.txt', 'w')
f.write(longs) # completes successfully
f.close()

FOURGB = 2**32
s = '0123456789012345' # 16 bytes
longs = ''.join([s for i in xrange(FOURGB//len(s))])
assert len(longs) == FOURGB
f = open('test.txt', 'w')
f.write(longs) # hangs with 100% CPU, file is 0 bytes
f.close()

SIXGB = 2**32 + 2**31
s = '0123456789012345' # 16 bytes
longs = ''.join([s for i in xrange(SIXGB//len(s))])
assert len(longs) == SIXGB
f = open('test.txt', 'w')
f.write(longs) # hangs with 100% CPU, file is 2**31 bytes
f.close()

# file.read test:
TWOGB = 2**31
TWOGBPLUS = TWOGB + 16
s = '0123456789012345' # 16 bytes
longs = ''.join([s for i in xrange(TWOGBPLUS//len(s))])
assert len(longs) == TWOGBPLUS
f = open('test.txt', 'w')
f.write(longs) # completes successfully
f.close()
f = open('test.txt', 'r')
longs = f.read() # works, but takes >30 min, memory usage keeps jumping around
f.close()
del longs
# maybe f.read() reads 1 char at a time til it hits EOL. try this instead:
f = open('test.txt', 'r')
longs = f.read(TWOGBPLUS) # OverflowError: long int too large to convert to int
longs = f.read(TWOGB) # OverflowError: long int too large to convert to int
longs = f.read(TWOGB - 1) # works, takes only seconds
f.close()


So, I guess in windows (I've only tested in 64-bit Windows 7, Python 2.6.6 amd64), file.write() should call fwrite multiple times in chunks no greater than 2**31 bytes or so. Also, calling f.read(nbytes) where nbytes >= 2**31 raises "OverflowError: long int too large to convert to int". I don't have either of these problems in 64-bit Linux (Ubuntu 10.10) on the same machine (i7, 12GB).
History
Date User Action Args
2010-11-04 08:49:52mspaceksetrecipients: + mspacek, cgohlke, Bill.Steinmetz
2010-11-04 08:49:52mspaceksetmessageid: <1288860592.3.0.670484449031.issue9015@psf.upfronthosting.co.za>
2010-11-04 08:49:50mspaceklinkissue9015 messages
2010-11-04 08:49:49mspacekcreate