Issue 21910: [doc] File protocol should document if writelines must handle generators sensibly

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/66109

classification

Title:	[doc] File protocol should document if writelines must handle generators sensibly
Type:		Stage:	resolved
Components:	Documentation, IO	Versions:	Python 3.11, Python 3.10, Python 3.9

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:	docs@python	Nosy List:	JanKanis, JelleZijlstra, benjamin.peterson, dhaffey, dlesco, docs@python, hynek, josh.r, lemburg, miss-islington, pitrou, slateny, stutzbach, terry.reedy
Priority:	normal	Keywords:	easy, patch

Created on 2014-07-03 09:38 by JanKanis, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Pull Requests
URL	Status	Linked	Edit
PR 31245	merged	slateny, 2022-02-10 07:00
PR 31647	merged	miss-islington, 2022-03-03 01:21
PR 31648	merged	miss-islington, 2022-03-03 01:21

Messages (8)
msg222165 - (view)	Author: Jan Kanis (JanKanis)	Date: 2014-07-03 09:38
The resolution of issue 5445 should be documented somewhere properly, so people can depend on it or not. IOBase.writelines handles generator arguments without problems, i.e. without first draining the entire generator and then writing the result in one go. That would require large amounts of memory if the generator is large, and fail entirely if the generator is infinite. codecs.StreamWriter.writelines uses self.write(''.join(argument)) as implementation, which fails on very large or infinite arguments. According to issue 5445 it is not part of the file protocol that .writelines must handle (large/infinite) generators, only list-like iterables. However as far as I know this is not documented anywhere, and sometimes people assume that writelines is meant for this case. E.g. jinja (https://github.com/mitsuhiko/jinja2/blob/master/jinja2/environment.py#L1153, the dump method is explicitly documented to stream). The guarantees that .writelines makes or does not make in this regard should be documented somewhere, so that either .writeline implementations that don't handle large generators can be pointed out as bugs, or code that makes assumptions on .writeline handling large generators can be. I personally think .writelines should handle large generators, since in the python 3 world a lot of apis were iterator-ified and it is wat a lot of people would probably expect. But having a clear and documented decision on this is more important. (note: I've copied most of the nosy list from #5445)
msg222252 - (view)	Author: Josh Rosenberg (josh.r) *	Date: 2014-07-04 00:48
+1. I've been assuming writelines handled arbitrary generators without an issue; guess I've gotten lucky and only used the ones that do. I've fed stuff populated by enormous (though not infinite) generators created from stuff like itertools.product and the like into it on the assumption that it would safely write it without generating len(seq) ** repeat values in memory. I'd definitely appreciate a documented guarantee of this. I don't need it to explicitly guarantee that each item is written before the next item is pulled off the iterator or anything; if it wants to buffer a reasonable amount of data in memory before triggering a real I/O that's fine (generators returning mutable objects and mutating them when the next object comes along are evil anyway, and forcing one-by-one output can prevent some useful optimizations). But anything that uses argument unpacking, collection as a list, ''.join (or at the C level, PySequence_Fast and the like), forcing the whole generator to exhaust before writing byte one, is a bad idea.
msg226120 - (view)	Author: Terry J. Reedy (terry.reedy) *	Date: 2014-08-30 04:17
Security fix only versions do not get doc fixes.
msg261399 - (view)	Author: Dan Haffey (dhaffey)	Date: 2016-03-09 03:00
+1, I just lost an hour-plus compute job to this. It sure violates POLA. I've been passing large generators to file.writelines since about as long as generators have existed, so I never would have guessed that a class named "StreamWriter" of all things wouldn't, you know, stream its writelines argument.
msg409248 - (view)	Author: Stanley (slateny) *	Date: 2021-12-28 04:47
I'd be interested in taking a look at this - would these changes clarify things? Current (https://docs.python.org/3/library/codecs.html#codecs.StreamWriter): Writes the concatenated list of strings to the stream (possibly by reusing the write() method). The standard bytes-to-bytes codecs do not support this method. Proposed: Writes the concatenated list of strings to the stream by reusing the write() method, and thus does not support infinite or very large generators. The standard bytes-to-bytes codecs do not support this method.
msg414395 - (view)	Author: Jelle Zijlstra (JelleZijlstra) *	Date: 2022-03-03 01:21
New changeset a8c87a239ee1414d6dd0b062fe9ec3e5b0c50cb8 by slateny in branch 'main': bpo-21910: Clarify docs for codecs writelines method (GH-31245) https://github.com/python/cpython/commit/a8c87a239ee1414d6dd0b062fe9ec3e5b0c50cb8
msg414396 - (view)	Author: miss-islington (miss-islington)	Date: 2022-03-03 01:43
New changeset 60b561c246da2073672a016340457e4534dfdf5b by Miss Islington (bot) in branch '3.10': bpo-21910: Clarify docs for codecs writelines method (GH-31245) https://github.com/python/cpython/commit/60b561c246da2073672a016340457e4534dfdf5b
msg414397 - (view)	Author: miss-islington (miss-islington)	Date: 2022-03-03 01:45
New changeset cf8aff6319794807aa578215710e6caa4479516f by Miss Islington (bot) in branch '3.9': bpo-21910: Clarify docs for codecs writelines method (GH-31245) https://github.com/python/cpython/commit/cf8aff6319794807aa578215710e6caa4479516f

History
Date	User	Action	Args
2022-04-11 14:58:05	admin	set	github: 66109
2022-03-03 01:45:50	miss-islington	set	messages: + msg414397
2022-03-03 01:43:04	miss-islington	set	messages: + msg414396
2022-03-03 01:22:45	JelleZijlstra	set	status: open -> closed resolution: fixed stage: patch review -> resolved
2022-03-03 01:21:56	miss-islington	set	pull_requests: + pull_request29768
2022-03-03 01:21:52	miss-islington	set	nosy: + miss-islington pull_requests: + pull_request29767
2022-03-03 01:21:44	JelleZijlstra	set	nosy: + JelleZijlstra messages: + msg414395
2022-02-10 07:00:34	slateny	set	keywords: + patch stage: needs patch -> patch review pull_requests: + pull_request29414
2021-12-28 04:47:00	slateny	set	nosy: + slateny messages: + msg409248
2021-12-13 18:23:05	iritkatriel	set	keywords: + easy title: File protocol should document if writelines must handle generators sensibly -> [doc] File protocol should document if writelines must handle generators sensibly versions: + Python 3.9, Python 3.10, Python 3.11, - Python 2.7, Python 3.4, Python 3.5
2016-03-12 00:51:09	martin.panter	set	stage: needs patch
2016-03-09 03:00:24	dhaffey	set	nosy: + dhaffey messages: + msg261399
2014-08-30 04:17:47	terry.reedy	set	messages: + msg226120 versions: - Python 3.1, Python 3.2, Python 3.3
2014-07-04 00:48:40	josh.r	set	nosy: + josh.r messages: + msg222252
2014-07-03 09:38:29	JanKanis	create