This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author Richard.Christen@unice.fr
Recipients Richard.Christen@unice.fr
Date 2007-09-10.12:45:02
SpamBayes Score 0.008742387
Marked as misclassified No
Message-id <1189428303.96.0.66125933613.issue1141@psf.upfronthosting.co.za>
In-reply-to
Content
September 11, 2007 I downloaded py 3.k

The good news :
Under Windows, Python 3k properly reads files larger than 4 Go (in
contrast to python 2.5 that skips some lines, see below)

The bad news : py 3k is very slow compared to py 2.5; see the results below
the code is 
it reads a 4.9 Go file of 81,017,719 lines (a genbank entry of bacterial
sequences)

#######################
import time 
print (time.localtime())
fichin=open(r'D:\pythons\16s\total_gb_161_16S.gb')
t0= time.localtime()
print (t0)
i=0

for li in fichin:
	i+=1
	if i%1000000==0: 
		print (i,time.localtime())
	
fichin.close()
print ()
print (i)
print (time.localtime())
#########################


I got the following results (Windows XP 64) on the same machine, using
either py 3k or py 2.5
As soon as my BSD and Linux machines are done with calculations, I will
try that on them.
Best
Richard Christen


python 3k

(2007, 9, 10, 13, 53, 36, 0, 253, 1)
(2007, 9, 10, 13, 53, 36, 0, 253, 1)
1000000 (2007, 9, 10, 13, 53, 49, 0, 253, 1)
2000000 (2007, 9, 10, 13, 54, 3, 0, 253, 1)
3000000 (2007, 9, 10, 13, 54, 18, 0, 253, 1)
4000000 (2007, 9, 10, 13, 54, 32, 0, 253, 1)
5000000 (2007, 9, 10, 13, 54, 47, 0, 253, 1)
....
77000000 (2007, 9, 10, 14, 14, 55, 0, 253, 1)
78000000 (2007, 9, 10, 14, 15, 9, 0, 253, 1)
79000000 (2007, 9, 10, 14, 15, 22, 0, 253, 1)
80000000 (2007, 9, 10, 14, 15, 36, 0, 253, 1)
81000000 (2007, 9, 10, 14, 15, 49, 0, 253, 1)

81017719    #this is the proper number of lines 
(2007, 9, 10, 14, 15, 50, 0, 253, 1)


Python 2.5

(2007, 9, 10, 14, 18, 33, 0, 253, 1)
(2007, 9, 10, 14, 18, 33, 0, 253, 1)
(1000000, (2007, 9, 10, 14, 18, 34, 0, 253, 1))
(2000000, (2007, 9, 10, 14, 18, 34, 0, 253, 1))
(3000000, (2007, 9, 10, 14, 18, 35, 0, 253, 1))
(4000000, (2007, 9, 10, 14, 18, 35, 0, 253, 1))
(5000000, (2007, 9, 10, 14, 18, 36, 0, 253, 1))
...
(77000000, (2007, 9, 10, 14, 19, 10, 0, 253, 1))
(78000000, (2007, 9, 10, 14, 19, 11, 0, 253, 1))
(79000000, (2007, 9, 10, 14, 19, 11, 0, 253, 1))
(80000000, (2007, 9, 10, 14, 19, 12, 0, 253, 1))
(81000000, (2007, 9, 10, 14, 19, 12, 0, 253, 1))
()
81014962      #python 2.5 missed some lines !!!!
(2007, 9, 10, 14, 19, 12, 0, 253, 1)
History
Date User Action Args
2007-09-10 12:45:04Richard.Christen@unice.frsetspambayes_score: 0.00874239 -> 0.008742387
recipients: + Richard.Christen@unice.fr
2007-09-10 12:45:03Richard.Christen@unice.frsetspambayes_score: 0.00874239 -> 0.00874239
messageid: <1189428303.96.0.66125933613.issue1141@psf.upfronthosting.co.za>
2007-09-10 12:45:03Richard.Christen@unice.frlinkissue1141 messages
2007-09-10 12:45:02Richard.Christen@unice.frcreate