Issue 15049: [doc] say in open() doc that line buffering only applies to write

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/59254

classification

Title:	[doc] say in open() doc that line buffering only applies to write
Type:	behavior	Stage:	needs patch
Components:	Documentation	Versions:	Python 3.10, Python 3.9, Python 3.8

process

Status:	open	Resolution:
Dependencies:		Superseder:
Assigned To:	r.david.murray	Nosy List:	loewis, pitrou, r.david.murray
Priority:	normal	Keywords:

Created on 2012-06-12 01:15 by r.david.murray, last changed 2022-04-11 14:57 by admin.

Messages (4)
msg162656 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2012-06-12 01:15
rdmurray@hey:~/python/p32>cat bad.py This line is just ascii A second line for good measure. This comment contains undecodable stuff: "�" or "\\xe9" in "pass�"" cannot be decoded. The last line above is in latin-1, with an é inside those quotes. rdmurray@hey:~/python/p32>cat bug.py import sys with open('./bad.py', buffering=int(sys.argv[1])) as f: for line in f: print(line, end='') rdmurray@hey:~/python/p32>python3 bug.py -1 Traceback (most recent call last): File "bug.py", line 3, in <module> for line in f: File "/usr/lib/python3.2/codecs.py", line 300, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf8' codec can't decode byte 0xe9 in position 99: invalid continuation byte rdmurray@hey:~/python/p32>python3 bug.py 1 Traceback (most recent call last): File "bug.py", line 3, in <module> for line in f: File "/usr/lib/python3.2/codecs.py", line 300, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf8' codec can't decode byte 0xe9 in position 99: invalid continuation byte rdmurray@hey:~/python/p32>python3 bug.py 2 This line is just ascii A second line for good measure. Traceback (most recent call last): File "bug.py", line 3, in <module> for line in f: File "/usr/lib/python3.2/codecs.py", line 300, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf8' codec can't decode byte 0xe9 in position 0: invalid continuation byte So, line buffering does not appear to buffer line by line. I ran into this problem because I had a much larger file that I thought was in utf-8. When I got the encoding error, I was annoyed that the error message didn't really tell me which line the error was on, but I figured, OK, I'll just set line buffering and then I'll be able to tell. But that didn't work. Fortunately using '2' did work....but at a minimum the docs need to be updated to indicate when line buffering really is line buffering and when it isn't.
msg162678 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2012-06-12 14:46
Without looking at the code, it seems that http://docs.python.org/release/3.1.5/library/io.html?highlight=io#io.TextIOWrapper gives the answer "If line_buffering is True, flush() is implied when a call to write contains a newline character." So, "line buffering" may have a meaning only for writing. I don't think there is a reasonable way to implement it for reading.
msg162682 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2012-06-12 15:13
That makes sense. I'll add a mention of this to the 'open' docs that discuss the buffering parameter.
msg162706 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2012-06-13 12:56
Indeed, line buffering on the read size would very slow (since you would have to read and decode one byte at a time from the raw stream to make sure you don't overshoot the line boundaries).

History
Date	User	Action	Args
2022-04-11 14:57:31	admin	set	github: 59254
2020-11-06 17:20:11	iritkatriel	set	title: line buffering isn't always -> [doc] say in open() doc that line buffering only applies to write versions: + Python 3.8, Python 3.9, Python 3.10, - Python 3.2, Python 3.3
2012-06-13 12:56:28	pitrou	set	messages: + msg162706
2012-06-12 15:13:33	r.david.murray	set	assignee: r.david.murray messages: + msg162682 components: + Documentation
2012-06-12 14:46:24	loewis	set	nosy: + loewis messages: + msg162678
2012-06-12 01:15:24	r.david.murray	create