Issue 36172: csv module internal consistency

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/80353

classification

Title:	csv module internal consistency
Type:	behavior	Stage:	resolved
Components:		Versions:

process

Status:	closed	Resolution:	not a bug
Dependencies:		Superseder:
Assigned To:		Nosy List:	Shane Smith, josh.r, martin.panter
Priority:	normal	Keywords:

Created on 2019-03-03 19:37 by Shane Smith, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Messages (4)
msg337042 - (view)	Author: Shane (Shane Smith)	Date: 2019-03-03 19:37
It occurred to me there is a slight mismatch in the behavioral consistency of the csv module (at least on Windows, Python 3.X). Specifically, csv.writer() and csv.reader() treat the line terminator slightly differently. To boil it down to a concise example: #================================================== import csv data = [[1, 2, 3], [4, 5, 6]] with open('test.csv', 'w') as fout: csv.writer(fout).writerows(data) with open('test.csv', 'r') as fin: data2 = list(csv.reader(fin)) print(data, data2, sep='\n') >>> [[1, 2, 3], [4, 5, 6]] [['1', '2', '3'], [], ['4', '5', '6'], []] #================================================== So because csv.writer() uses lineterminator = '\r\n', data and data2 have a different structure (data2 has empty rows). To me this seems undesirable, so I always go out of my way to use lineterminator = '\n'. #================================================== import csv data = [[1, 2, 3], [4, 5, 6]] with open('test.csv', 'w') as fout: csv.writer(fout, lineterminator='\n').writerows(data) with open('test.csv', 'r') as fin: data2 = list(csv.reader(fin)) print(data, data2, sep='\n') >>> [[1, 2, 3], [4, 5, 6]] [['1', '2', '3'], ['4', '5', '6']] #================================================== Then the input and output have the same structure. I assume there was a reason lineterminator = '\r\n' was chosen as default, but for me there is no benefit wrt csv files. It seems like we would be better off with the more consistent, "reversible" behavior. Alternatively, the default behavior of csv.reader() could be changed. But in either case, I feel like their default behaviors should be in alignment. Thoughts? Thanks for reading.
msg337046 - (view)	Author: Martin Panter (martin.panter) *	Date: 2019-03-03 21:08
The documentation <https://docs.python.org/dev/library/csv.html#module-contents> says you should “open the files with newline=''.” IMO this is an unfortunate quirk of the CSV module. Everything else that I know of in the Python built-in library either works with binary files, which typically do no newline translation in Python 3, or is fine with newline translation enabled in text mode. See also Issue 10954 about making the behaviour stricter.
msg337143 - (view)	Author: Josh Rosenberg (josh.r) *	Date: 2019-03-04 18:38
Unless someone disagrees soon, I'm going to close this as documented behavior/not a bug. AFAICT, the only "fixes" available for this are: 1. Changing the default dialect from 'excel' to something else. Problem: Breaks correct code dependent on the excel dialect, but code could explicitly opt back in. 2. Change the 'excel' dialect. Problem: Breaks correct code dependent on the excel dialect, with no obvious way to opt back in. 3. Per #10954, check the file object to ensure it's not translating newlines and raise an exception otherwise. Problem: AFAICT, there is no documented API to check this (the result of calling open, with or without passing newline='', looks identical initially, never changes in write mode, and even in read mode, only exposes the newlines observed through the .newlines attribute, not whether or not they were translated), adding one wouldn't change all other file-like objects, so the change would need to propagate to all other built-in and third-party file APIs, and for some file-like objects, it wouldn't make sense to have this API at all (io.StringIO, being purely in memory, doesn't need to do translation of any kind) 4. (Extreme solution) Add io APIs (or add arguments to APIs) for reading/writing without newline translation (that is, whether or not newline is passed to open, you can read/write without translation), e.g. read(size) becomes read(size, translate_newlines=None) where None indicates default behavior, or we add read_untranslated(size) as an independent API. Problem: Like #3, this requires us to create new, mandatory APIs in the io module that would then need to propagate to all other built-in and third-party file APIs. Point is, the simple solutions (1/2) break correct code, and the complex solutions (3/4) involve major changes to the io module (and all other file-like object producers) and/or the csv module. Even then, nothing shy of #4 would make broken code just work, they just fail loudly. Both #3 and #4 would require cascading changes to every file-like object (both built-in and third-party) to make them work; for the file-like objects that aren't updated, we're stuck choosing between issuing a warning that most folks won't see, then ignoring the problem, or making those file-like objects without the necessary API cause true exceptions (making them unusable until the third party package is updated). If a fix is needed, I think my suggestion would be to do one or both of: 1. Emphasize the newline='' warning in the csv.reader/writer/DictReader/DictWriter docs (right now it's just one more unemphasized line in a fairly long wall of text for each) 2. Put a large, top-of-module warning about this at the top of the csv module docs, so people reading the basic module description are exposed to the warning before they even reach the API. Might help a few folks who are skimming without reading for detail.
msg337152 - (view)	Author: Shane (Shane Smith)	Date: 2019-03-04 21:39
Thank you both for having a look. I just find that these sort of gotchas rather annoying (nonsensical mental burden of having to memorize behavior that does not behave like most other features for "hysterical raisins"). I think making the documentation more visible would be far better than nothing. Currently, help(csv) does not even mention the newline parameter as an available option in any context, nor does help(csv.writer). I think ideally, the user should be able to rely on a given module's help documentation for most things without having to leave the interpreter to consult the manual. Thoughts?

History
Date	User	Action	Args
2022-04-11 14:59:11	admin	set	github: 80353
2020-08-29 03:58:10	josh.r	set	status: pending -> closed resolution: not a bug stage: resolved
2019-03-06 03:58:43	josh.r	set	status: open -> pending
2019-03-04 21:39:44	Shane Smith	set	messages: + msg337152
2019-03-04 18:38:46	josh.r	set	nosy: + josh.r messages: + msg337143
2019-03-03 21:08:33	martin.panter	set	nosy: + martin.panter messages: + msg337046
2019-03-03 19:37:05	Shane Smith	create