This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: [doc] File protocol should document if writelines must handle generators sensibly
Type: Stage: resolved
Components: Documentation, IO Versions: Python 3.11, Python 3.10, Python 3.9
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: docs@python Nosy List: JanKanis, JelleZijlstra, benjamin.peterson, dhaffey, dlesco, docs@python, hynek, josh.r, lemburg, miss-islington, pitrou, slateny, stutzbach, terry.reedy
Priority: normal Keywords: easy, patch

Created on 2014-07-03 09:38 by JanKanis, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 31245 merged slateny, 2022-02-10 07:00
PR 31647 merged miss-islington, 2022-03-03 01:21
PR 31648 merged miss-islington, 2022-03-03 01:21
Messages (8)
msg222165 - (view) Author: Jan Kanis (JanKanis) Date: 2014-07-03 09:38
The resolution of issue 5445 should be documented somewhere properly, so people can depend on it or not.

IOBase.writelines handles generator arguments without problems, i.e. without first draining the entire generator and then writing the result in one go. That would require large amounts of memory if the generator is large, and fail entirely if the generator is infinite. 

codecs.StreamWriter.writelines uses self.write(''.join(argument)) as implementation, which fails on very large or infinite arguments.

According to issue 5445 it is not part of the file protocol that .writelines must handle (large/infinite) generators, only list-like iterables. However as far as I know this is not documented anywhere, and sometimes people assume that writelines is meant for this case. E.g. jinja (https://github.com/mitsuhiko/jinja2/blob/master/jinja2/environment.py#L1153, the dump method is explicitly documented to stream). The guarantees that .writelines makes or does not make in this regard should be documented somewhere, so that either .writeline implementations that don't handle large generators can be pointed out as bugs, or code that makes assumptions on .writeline handling large generators can be.

I personally think .writelines should handle large generators, since in the python 3 world a lot of apis were iterator-ified and it is wat a lot of people would probably expect. But having a clear and documented decision on this is more important. 

(note: I've copied most of the nosy list from #5445)
msg222252 - (view) Author: Josh Rosenberg (josh.r) * (Python triager) Date: 2014-07-04 00:48
+1. I've been assuming writelines handled arbitrary generators without an issue; guess I've gotten lucky and only used the ones that do. I've fed stuff populated by enormous (though not infinite) generators created from stuff like itertools.product and the like into it on the assumption that it would safely write it without generating len(seq) ** repeat values in memory.

I'd definitely appreciate a documented guarantee of this. I don't need it to explicitly guarantee that each item is written before the next item is pulled off the iterator or anything; if it wants to buffer a reasonable amount of data in memory before triggering a real I/O that's fine (generators returning mutable objects and mutating them when the next object comes along are evil anyway, and forcing one-by-one output can prevent some useful optimizations). But anything that uses argument unpacking, collection as a list, ''.join (or at the C level, PySequence_Fast and the like), forcing the whole generator to exhaust before writing byte one, is a bad idea.
msg226120 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2014-08-30 04:17
Security fix only versions do not get doc fixes.
msg261399 - (view) Author: Dan Haffey (dhaffey) Date: 2016-03-09 03:00
+1, I just lost an hour-plus compute job to this. It sure violates POLA. I've been passing large generators to file.writelines since about as long as generators have existed, so I never would have guessed that a class named "StreamWriter" of all things wouldn't, you know, stream its writelines argument.
msg409248 - (view) Author: Stanley (slateny) * Date: 2021-12-28 04:47
I'd be interested in taking a look at this - would these changes clarify things?

Current (https://docs.python.org/3/library/codecs.html#codecs.StreamWriter):

Writes the concatenated list of strings to the stream (possibly by reusing the write() method). The standard bytes-to-bytes codecs do not support this method.

Proposed:

Writes the concatenated list of strings to the stream by reusing the write() method, and thus does not support infinite or very large generators. The standard bytes-to-bytes codecs do not support this method.
msg414395 - (view) Author: Jelle Zijlstra (JelleZijlstra) * (Python committer) Date: 2022-03-03 01:21
New changeset a8c87a239ee1414d6dd0b062fe9ec3e5b0c50cb8 by slateny in branch 'main':
bpo-21910: Clarify docs for codecs writelines method (GH-31245)
https://github.com/python/cpython/commit/a8c87a239ee1414d6dd0b062fe9ec3e5b0c50cb8
msg414396 - (view) Author: miss-islington (miss-islington) Date: 2022-03-03 01:43
New changeset 60b561c246da2073672a016340457e4534dfdf5b by Miss Islington (bot) in branch '3.10':
bpo-21910: Clarify docs for codecs writelines method (GH-31245)
https://github.com/python/cpython/commit/60b561c246da2073672a016340457e4534dfdf5b
msg414397 - (view) Author: miss-islington (miss-islington) Date: 2022-03-03 01:45
New changeset cf8aff6319794807aa578215710e6caa4479516f by Miss Islington (bot) in branch '3.9':
bpo-21910: Clarify docs for codecs writelines method (GH-31245)
https://github.com/python/cpython/commit/cf8aff6319794807aa578215710e6caa4479516f
History
Date User Action Args
2022-04-11 14:58:05adminsetgithub: 66109
2022-03-03 01:45:50miss-islingtonsetmessages: + msg414397
2022-03-03 01:43:04miss-islingtonsetmessages: + msg414396
2022-03-03 01:22:45JelleZijlstrasetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2022-03-03 01:21:56miss-islingtonsetpull_requests: + pull_request29768
2022-03-03 01:21:52miss-islingtonsetnosy: + miss-islington
pull_requests: + pull_request29767
2022-03-03 01:21:44JelleZijlstrasetnosy: + JelleZijlstra
messages: + msg414395
2022-02-10 07:00:34slatenysetkeywords: + patch
stage: needs patch -> patch review
pull_requests: + pull_request29414
2021-12-28 04:47:00slatenysetnosy: + slateny
messages: + msg409248
2021-12-13 18:23:05iritkatrielsetkeywords: + easy
title: File protocol should document if writelines must handle generators sensibly -> [doc] File protocol should document if writelines must handle generators sensibly
versions: + Python 3.9, Python 3.10, Python 3.11, - Python 2.7, Python 3.4, Python 3.5
2016-03-12 00:51:09martin.pantersetstage: needs patch
2016-03-09 03:00:24dhaffeysetnosy: + dhaffey
messages: + msg261399
2014-08-30 04:17:47terry.reedysetmessages: + msg226120
versions: - Python 3.1, Python 3.2, Python 3.3
2014-07-04 00:48:40josh.rsetnosy: + josh.r
messages: + msg222252
2014-07-03 09:38:29JanKaniscreate