September 11, 2007 I downloaded py 3.k
The good news :
Under Windows, Python 3k properly reads files larger than 4 Go (in
contrast to python 2.5 that skips some lines, see below)
The bad news : py 3k is very slow compared to py 2.5; see the results below
the code is
it reads a 4.9 Go file of 81,017,719 lines (a genbank entry of bacterial
sequences)
#######################
import time
print (time.localtime())
fichin=open(r'D:\pythons\16s\total_gb_161_16S.gb')
t0= time.localtime()
print (t0)
i=0
for li in fichin:
i+=1
if i%1000000==0:
print (i,time.localtime())
fichin.close()
print ()
print (i)
print (time.localtime())
#########################
I got the following results (Windows XP 64) on the same machine, using
either py 3k or py 2.5
As soon as my BSD and Linux machines are done with calculations, I will
try that on them.
Best
Richard Christen
python 3k
(2007, 9, 10, 13, 53, 36, 0, 253, 1)
(2007, 9, 10, 13, 53, 36, 0, 253, 1)
1000000 (2007, 9, 10, 13, 53, 49, 0, 253, 1)
2000000 (2007, 9, 10, 13, 54, 3, 0, 253, 1)
3000000 (2007, 9, 10, 13, 54, 18, 0, 253, 1)
4000000 (2007, 9, 10, 13, 54, 32, 0, 253, 1)
5000000 (2007, 9, 10, 13, 54, 47, 0, 253, 1)
....
77000000 (2007, 9, 10, 14, 14, 55, 0, 253, 1)
78000000 (2007, 9, 10, 14, 15, 9, 0, 253, 1)
79000000 (2007, 9, 10, 14, 15, 22, 0, 253, 1)
80000000 (2007, 9, 10, 14, 15, 36, 0, 253, 1)
81000000 (2007, 9, 10, 14, 15, 49, 0, 253, 1)
81017719 #this is the proper number of lines
(2007, 9, 10, 14, 15, 50, 0, 253, 1)
Python 2.5
(2007, 9, 10, 14, 18, 33, 0, 253, 1)
(2007, 9, 10, 14, 18, 33, 0, 253, 1)
(1000000, (2007, 9, 10, 14, 18, 34, 0, 253, 1))
(2000000, (2007, 9, 10, 14, 18, 34, 0, 253, 1))
(3000000, (2007, 9, 10, 14, 18, 35, 0, 253, 1))
(4000000, (2007, 9, 10, 14, 18, 35, 0, 253, 1))
(5000000, (2007, 9, 10, 14, 18, 36, 0, 253, 1))
...
(77000000, (2007, 9, 10, 14, 19, 10, 0, 253, 1))
(78000000, (2007, 9, 10, 14, 19, 11, 0, 253, 1))
(79000000, (2007, 9, 10, 14, 19, 11, 0, 253, 1))
(80000000, (2007, 9, 10, 14, 19, 12, 0, 253, 1))
(81000000, (2007, 9, 10, 14, 19, 12, 0, 253, 1))
()
81014962 #python 2.5 missed some lines !!!!
(2007, 9, 10, 14, 19, 12, 0, 253, 1) |