classification
Title: Error reading files larger than 4GB
Type: behavior Stage: test needed
Components: IO, Windows Versions: Python 3.1, Python 2.6
process
Status: closed Resolution: wont fix
Dependencies: Superseder: Newline skipped in "for line in file" for huge file
View: 1744752
Assigned To: tim.golden Nosy List: JosephArmbruster, amaury.forgeotdarc, benjamin.peterson, casevh, christian.heimes, joearmbruster, pitrou, tim.golden
Priority: normal Keywords:

Created on 2007-03-03 08:01 by casevh, last changed 2010-09-17 20:21 by amaury.forgeotdarc. This issue is now closed.

Files
File name Uploaded Description Edit
bigfiletest1.py casevh, 2007-03-03 08:01
Messages (7)
msg31414 - (view) Author: Case Van Horsen (casevh) Date: 2007-03-03 08:01
When reading test files larger than 4GB, sometimes two lines are merged into a single line. The problem is reproducible on Windows 2K SP4 with both Python 2.4 and 2.5. It does not appear to occur on Linux.

I have attached a test case that creates the problem.

msg31415 - (view) Author: Joseph Armbruster (joearmbruster) Date: 2007-04-26 01:40
URL:  http://svn.python.org/projects/python/trunk
Revision: 54922

Note:  Reproduced Error using test application on the following system

OS Name:    Microsoft Windows XP Professional
OS Version: 5.1.2600 Service Pack 2 Build 2600

Try modifying line 13:

fh = open(fname, 'r')
to:
fh = open(fname, 'rb')

and notice the behavior.
msg31416 - (view) Author: Joseph Armbruster (joearmbruster) Date: 2007-04-26 02:11
josepharmbruster on #python or #python-dev in freenode
msg63153 - (view) Author: Joseph Armbruster (JosephArmbruster) Date: 2008-03-01 01:02
Using: http://svn.python.org/projects/python/trunk  @  61127
OS Name:    Microsoft Windows XP Professional
OS Version: 5.1.2600 Service Pack 2 Build 2600

I would like to report a positive follow-up on this issue.  The output I
received was as follows, indicating this specific issue may be resolved.
 The resulting bigfile produced from the test totaled 11.5 GB
(12,382,502,912 bytes).

python bigfiletest1.py
10000
20000
30000
40000
50000
60000
70000
80000
90000
100000
110000
120000
130000
140000
150000
160000
170000
180000
190000
200000
210000
220000
230000
240000
250000
260000
270000
280000
290000
300000
310000
320000
330000
340000
350000
360000
370000
380000
390000
400000
410000
420000
430000
440000
450000
460000
470000
480000
490000
500000
510000
520000
530000
540000
550000
560000
570000
580000
590000
600000
610000
620000
630000
640000
650000
660000
670000
680000
690000
700000
710000
720000
730000
740000
750000
760000
770000
780000
790000
800000
810000
820000
830000
840000
850000
860000
870000
880000
890000
900000
910000
920000
930000
940000
950000
960000
970000
980000
990000
1000000
1010000
1020000
1030000
1040000
1050000
1060000
1070000
1080000
1090000
1100000
1110000
1120000
1130000
1140000
1150000
1160000
1170000
1180000
1190000
1200000
1210000
1220000
1230000
1240000
1250000
1260000
1270000
1280000
1290000
1300000
1310000
1320000
1330000
1340000
1350000
1360000
1370000
1380000
1390000
1400000
1410000
1420000
1430000
1440000
1450000
1460000
1470000
1480000
1490000
1500000
msg63162 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2008-03-01 15:46
How about Python 2.5? Have you tested the latest release as well?
msg63186 - (view) Author: Joseph Armbruster (JosephArmbruster) Date: 2008-03-02 20:47
Just got in from New Smyrna beach... and the verdict is:

URL:  http://svn.python.org/projects/python/branches/release25-maint
Revision: 61182

python bigfiletest1.py
10000
20000
30000
40000
50000
60000
70000
80000
90000
100000
110000
120000
130000
140000
150000
160000
170000
180000
190000
200000
210000
220000
230000
240000
250000
260000
270000
280000
290000
300000
310000
320000
330000
340000
350000
360000
370000
380000
390000
400000
410000
420000
430000
440000
450000
460000
470000
480000
490000
500000
510000
520000
530000
540000
550000
560000
570000
580000
590000
600000
610000
620000
630000
640000
650000
660000
670000
680000
690000
700000
710000
720000
730000
740000
750000
760000
770000
780000
790000
800000
810000
820000
830000
840000
850000
860000
870000
880000
890000
900000
910000
920000
930000
940000
950000
960000
970000
980000
990000
1000000
1010000
1020000
1030000
1040000
1050000
1060000
1070000
1080000
1090000
1100000
1110000
1120000
1130000
1140000
1150000
1160000
1170000
1180000
1190000
1200000
1210000
1220000
1230000
1240000
1250000
1260000
1270000
1280000
1290000
1300000
1310000
1320000
1330000
1340000
1350000
1360000
1370000
1380000
1390000
1400000
1410000
1420000
1430000
1440000
1450000
1460000
1470000
1480000
1490000
1500000
msg116718 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2010-09-17 20:21
issue1744752 describes why it's probably a bug in the C library.
possible workarounds are to open the files in universal mode, to use io.open(), or to switch to python 3!
History
Date User Action Args
2010-09-17 20:21:27amaury.forgeotdarcsetstatus: open -> closed

nosy: + amaury.forgeotdarc
messages: + msg116718

superseder: Newline skipped in "for line in file" for huge file
resolution: wont fix
2010-08-06 15:20:28tim.goldensetassignee: tim.golden

nosy: + tim.golden
2009-05-12 13:25:55ajaksu2setnosy: + benjamin.peterson, pitrou
versions: + Python 2.6, Python 3.1, - Python 2.5

components: + IO
stage: test needed
2009-03-30 06:32:45ajaksu2linkissue1451466 dependencies
2008-03-02 20:47:49JosephArmbrustersetmessages: + msg63186
2008-03-01 15:46:46christian.heimessettype: behavior
messages: + msg63162
2008-03-01 01:02:31JosephArmbrustersetnosy: + christian.heimes, JosephArmbruster
messages: + msg63153
2007-03-03 08:01:41casevhcreate