This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author yanlinlin82
Recipients yanlinlin82
Date 2014-06-02.13:39:15
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <>
I noticed this problem when I run a Python2 program (MACS: very inefficiently on a large storage on a high performace server (64-bit Linux). It was much slower (more than two days) than running it on a normal PC (less than two hours).

After ruling out many optimizing conditions, I finally located the problem on the seek() function of Python2. Now I can reproduce the problem in a very simple example:

f = open("Input.sort.bam", "rb"), 2)

Here, the size of file 'Input.sort.bam' is 4,110,535,920 bytes. When I run the program with 'strace' to see the system calls on Linux:

$ strace python2
open("Input.sort.bam", O_RDONLY)        = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=4110535920, ...}) = 0
fstat(3, {st_mode=S_IFREG|0644, st_size=4110535920, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f23d4492000
fstat(3, {st_mode=S_IFREG|0644, st_size=4110535920, ...}) = 0
lseek(3, 4110532608, SEEK_SET)          = 4110532608
read(3, "f\203\337<\334\350\313\315\345&T\227\211\fC\212a\260\204P\235\366\326\353\230\327>\373\361\221\357\373"..., 3312) = 3312
close(3)                                = 0

It seems that python2 just move file cursor to a specific position (4110532608 in this case) and read ahead the rest bytes, rather than seek to the file end directly. I tried to run the exact the same program on the large storage, the position changed to 1073741824, left 889310448 bytes to read to reach the file end, which reduced the performance a lot!
Date User Action Args
2014-06-02 13:39:16yanlinlin82setrecipients: + yanlinlin82
2014-06-02 13:39:16yanlinlin82setmessageid: <>
2014-06-02 13:39:16yanlinlin82linkissue21638 messages
2014-06-02 13:39:15yanlinlin82create