This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author dlesco
Recipients dlesco, facundobatista, lemburg
Date 2009-03-10.14:48:50
SpamBayes Score 5.551115e-17
Marked as misclassified No
Message-id <1236696532.96.0.90491164511.issue5445@psf.upfronthosting.co.za>
In-reply-to
Content
In Python's file protocol, readlines and writelines is a protocol for 
iterating over a file. In Python's file protocol, if one doesn't want 
to iterate over the file, one calls read() with no argument in order 
to read the whole file in, or one calls write() with the complete 
contents you want to write.

If writelines is using join, then if one passes an iterator as the 
parameter to writelines, it will not iteratively write to the file, it 
will accumulate everything in memory until the iterator raises 
StopIteration, and then write to the file.  So, if one is tailing the 
output file, one is not going to see anything in the file until the 
end, instead of iteratively seeing content.  So, it's breaking the 
promise of the file protocol's writelines meaning iteratively write.

I think following the protocol is more important than performance. If 
the application is having performance problems, it's up to the 
application to buffer the data in memory and make a single write call.

However, here is an alternative implementation that is slightly more 
complicated, but possibly has better performance for the passed-a-list 
case.  It covers three cases:

1. Passed an empty sequence; do not call self.write at all.
2. Passed a sequence with a length. That implies that all the data is 
available immediately, so one can concantenate and write with one 
self.write call.
3. Passed a sequence with no length.  That implies that all the data 
is not available immediately, so iteratively write it.

    def writelines(self, sequence):

        """ Writes the sequence of strings to the stream
            using .write().
        """
        try:
            sequence_len = len(sequence)
        except TypeError:
            write = self.write
            for value in sequence:
                write(value)
            return
        if sequence_len:
            self.write(''.join(sequence))

I'm not sure which is better.  But one last point is that Python is 
moving more in the direction of using iterators; e.g., in Py3K, 
replacing dict's keys, values, and items with the implementation of 
iterkeys, itervalues, and iteritems.
History
Date User Action Args
2009-03-10 14:48:53dlescosetrecipients: + dlesco, lemburg, facundobatista
2009-03-10 14:48:52dlescosetmessageid: <1236696532.96.0.90491164511.issue5445@psf.upfronthosting.co.za>
2009-03-10 14:48:51dlescolinkissue5445 messages
2009-03-10 14:48:50dlescocreate