This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author dlesco
Recipients dlesco, facundobatista, lemburg
Date 2009-03-10.17:35:59
SpamBayes Score 1.7015962e-09
Marked as misclassified No
Message-id <1236706563.47.0.656290158457.issue5445@psf.upfronthosting.co.za>
In-reply-to
Content
OK, I think I see where I went wrong in my perceptions of the file 
protocol.  I thought that readlines() returned an iterator, not a 
list, but I see in the library reference manual on File Objects that 
it returns a list.  I think I got confused because there is no 
equivalent of __iter__ for writing to streams.  For input, I'm always 
using 'for line in file_object' (in other words, 
file_object.__iter__), so I had assumed that writelines was the mirror 
image of that, because I never use the readlines method.  Then, in my 
mind, readlines became the mirror image of writelines, which I had 
assumed took an iterator, so I assumed that readlines returned an 
iterator.  I wonder if this perception problem is common or not.

So, the StreamWriter interface matches the file protocol; readlines() 
and writelines() deal with lists.  There shouldn't be any change to 
it, because it follows the protocol.

Then, the example I wrote would be instead:

rows = (line[:-1].split('\t') for line in in_file)
projected = (keep_fields(row, 0, 3, 7) for row in rows)
filtered = (row for row in projected if row[2]=='1')
formatted = (u'\t'.join(row)+'\n' for row in filtered)
write = out_file.write
for line in formatted:
    write(line)

I think it's correct that the file object write C code only does 
1000-line chunks for sequences that have a defined length: if it has a 
defined length, then that implies that the data exists now, and can be 
concatenated and written now.  Something without a defined length may 
be a generator with items arriving later.
History
Date User Action Args
2009-03-10 17:36:04dlescosetrecipients: + dlesco, lemburg, facundobatista
2009-03-10 17:36:03dlescosetmessageid: <1236706563.47.0.656290158457.issue5445@psf.upfronthosting.co.za>
2009-03-10 17:36:01dlescolinkissue5445 messages
2009-03-10 17:36:00dlescocreate