Message83441
OK, I think I see where I went wrong in my perceptions of the file
protocol. I thought that readlines() returned an iterator, not a
list, but I see in the library reference manual on File Objects that
it returns a list. I think I got confused because there is no
equivalent of __iter__ for writing to streams. For input, I'm always
using 'for line in file_object' (in other words,
file_object.__iter__), so I had assumed that writelines was the mirror
image of that, because I never use the readlines method. Then, in my
mind, readlines became the mirror image of writelines, which I had
assumed took an iterator, so I assumed that readlines returned an
iterator. I wonder if this perception problem is common or not.
So, the StreamWriter interface matches the file protocol; readlines()
and writelines() deal with lists. There shouldn't be any change to
it, because it follows the protocol.
Then, the example I wrote would be instead:
rows = (line[:-1].split('\t') for line in in_file)
projected = (keep_fields(row, 0, 3, 7) for row in rows)
filtered = (row for row in projected if row[2]=='1')
formatted = (u'\t'.join(row)+'\n' for row in filtered)
write = out_file.write
for line in formatted:
write(line)
I think it's correct that the file object write C code only does
1000-line chunks for sequences that have a defined length: if it has a
defined length, then that implies that the data exists now, and can be
concatenated and written now. Something without a defined length may
be a generator with items arriving later. |
|
Date |
User |
Action |
Args |
2009-03-10 17:36:04 | dlesco | set | recipients:
+ dlesco, lemburg, facundobatista |
2009-03-10 17:36:03 | dlesco | set | messageid: <1236706563.47.0.656290158457.issue5445@psf.upfronthosting.co.za> |
2009-03-10 17:36:01 | dlesco | link | issue5445 messages |
2009-03-10 17:36:00 | dlesco | create | |
|