Issue 3436: csv.DictReader inconsistency

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/47686

classification

Title:	csv.DictReader inconsistency
Type:	behavior	Stage:
Components:	Library (Lib)	Versions:	Python 3.0, Python 2.6

process

Status:	closed	Resolution:	accepted
Dependencies:		Superseder:
Assigned To:	skip.montanaro	Nosy List:	barry, gvanrossum, mishok13, ncoghlan, rhettinger, skip.montanaro
Priority:	normal	Keywords:	patch

Created on 2008-07-24 15:30 by mishok13, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
csv.diff	skip.montanaro, 2008-08-01 00:59

Messages (21)
msg70207 - (view)	Author: Andrii V. Mishkovskyi (mishok13)	Date: 2008-07-24 15:30
I had to use csv module recently and ran into a "problem" with DictReader. I had to get headers of CSV file and only after that iterate throgh each row. But AFAIU there is no way to do it, other then subclassing. So, basically, right now we have this: Python 3.0b2+ (unknown, Jul 24 2008, 12:15:52) [GCC 4.2.3 (Ubuntu 4.2.3-2ubuntu7)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import csv >>> r = csv.DictReader(open('test.csv')) >>> r.fieldnames >>> next(r) {'baz': '13', 'foo': '42', 'bar': '27'} >>> r.fieldnames ['foo', 'bar', 'baz'] I think it would be much more useful, if DictReader got 'fieldnames' on calling __init__ method, so this would look like this: >>> r = csv.DictReader(open('test.csv')) >>> r.fieldnames ['foo', 'bar', 'baz'] The easy way to do this is to subclass csv.DictReader. The hard way to do this is to apply the patches I'm attaching. :) These patches also remove redundant check for self.fieldnames being None for each next()/__next__() call
msg70213 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2008-07-24 16:18
I think this is the wrong approach. It would be better to have a separate getheader() method. Having __init__ do the deed is at odds with other uses of __init__ that only do setup but don't start reading.
msg70214 - (view)	Author: Andrii V. Mishkovskyi (mishok13)	Date: 2008-07-24 16:29
And how this method should look? Something like this, I suppose: def getheader(self): if self.fieldnames is None: try: self.fieldnames = self.reader.next() except StopIteration: pass return self.fieldnames Well, adding new API after beta2 is a "no-no" as I understand, so this getheader() method can be added only in 2.7/3.1 releases. Should I post updated patches or just live with it?
msg70238 - (view)	Author: Skip Montanaro (skip.montanaro) *	Date: 2008-07-25 02:41
That would be a fairly easy change to the DictReader class (see the attached patch) but probably can't be applied at this point in the 2.6 release cycle even though all csv module tests pass with it.
msg70310 - (view)	Author: Skip Montanaro (skip.montanaro) *	Date: 2008-07-27 03:23
I should also point out that I've generally used this technique to populate the fieldnames attribute from the file: f = open("somefile.csv", "rb") rdr = csv.DictReader(f, fieldnames=csv.reader(f).next()) So it is fairly trivial to set the fieldnames attribute before actually reading any data rows.
msg70311 - (view)	Author: Nick Coghlan (ncoghlan) *	Date: 2008-07-27 04:15
Like Raymond, I have issues with the idea of implicitly reading the headers in __init__, but would be fine with the idea of a separate method in 2.7/3.1. As far as working around the absence of such a method goes, I personally use itertools.chain if I happen to need the headers before I start iterating: r = csv.DictReader(open('test.csv')) first = next(r) # Do something with r.fieldnames for row in chain(first, r): # Do something with each row
msg70341 - (view)	Author: Skip Montanaro (skip.montanaro) *	Date: 2008-07-28 10:59
The consensus seems to be that __init__ shouldn't "magically" read the header row, even though by not specifying a fieldnames arg that's exactly what you're telling the DictReader where to find the column headers. Given that case, my argument is that we not make any changes (no getheaders method, etc) since there are at least a couple different ways mentioned already to do what you want.
msg70342 - (view)	Author: Andrii V. Mishkovskyi (mishok13)	Date: 2008-07-28 11:03
I'm ok with that. :) Looks like you can close this one as "won't fix".
msg70343 - (view)	Author: Skip Montanaro (skip.montanaro) *	Date: 2008-07-28 11:17
Done...
msg70413 - (view)	Author: Guido van Rossum (gvanrossum) *	Date: 2008-07-30 00:17
I know this has been closed, but perhaps the fieldnames attribute could be made into a property that reads the first line of the file if it hasn't been read yet?
msg70434 - (view)	Author: Skip Montanaro (skip.montanaro) *	Date: 2008-07-30 17:41
Guido> I know this has been closed, but perhaps the fieldnames attribute Guido> could be made into a property that reads the first line of the Guido> file if it hasn't been read yet? It's a nice thought. I tried the straightforward implementation in my sandbox and one of the more obscure tests failed. I have yet to look into the cause. Skip
msg70442 - (view)	Author: Nick Coghlan (ncoghlan) *	Date: 2008-07-30 22:58
Re-opened for consideration of GvR's suggestion.
msg70501 - (view)	Author: Nick Coghlan (ncoghlan) *	Date: 2008-07-31 13:23
I personally like the idea of making fieldnames a property - while having merely reading an attribute cause disk I/O is slightly questionable, it seems like a better option than returning a misleading value for that property and also a better option than reading the first line of the file in __init__. Hopefully Skip can track down that obscure failure and this change can be made at some point in the future.
msg70521 - (view)	Author: Andrii V. Mishkovskyi (mishok13)	Date: 2008-07-31 15:24
I like the idea of fieldnames attribute being a property, so i've uploaded patches that implement them as such. Both patches ran through make test without problems.
msg70532 - (view)	Author: Skip Montanaro (skip.montanaro) *	Date: 2008-07-31 22:56
Nick, Working with Andrii's patch I'm trying to add a couple test cases to make sure the methods you and I both demonstrated still work. Mine is no problem, but I must be doing something wrong trying to use/adapt your example. I freely admit I am not an itertools user, but I can't get your example to work as written: >>> r = csv.DictReader(open("foo.csv", "rb")) >>> r.fieldnames ['f1', 'f2', 'f3'] >>> r.next() {'f1': '1', 'f2': '2', 'f3': 'abc'} >>> r = csv.DictReader(open("foo.csv", "rb")) >>> first = next(r) >>> first {'f1': '1', 'f2': '2', 'f3': 'abc'} >>> import itertools >>> for x in itertools.chain(first, r): ... print x ... f1 f2 f3 If I place first in a list it works: >>> r = csv.DictReader(open("foo.csv", "rb")) >>> first = next(r) >>> for x in itertools.chain([first], r): ... print x ... {'f1': '1', 'f2': '2', 'f3': 'abc'} That makes intuitive sense to me. Is that what you intended? S
msg70537 - (view)	Author: Skip Montanaro (skip.montanaro) *	Date: 2008-08-01 00:59
I added a comment to Andrii's patch and added simple test cases which check to make sure the way Nick and I currently use the DictReader class (or at least envision using it) still works.
msg70538 - (view)	Author: Skip Montanaro (skip.montanaro) *	Date: 2008-08-01 01:06
Andrii, If my view of the Python 3.0 development process is correct and this change makes it into the 2.6 code, one of the 3.0 developers will merge to the py3k branch.
msg70544 - (view)	Author: Andrii V. Mishkovskyi (mishok13)	Date: 2008-08-01 08:21
Oh, so this is how the process looks like... /me removes patches I've uploaded both py3k and trunk patches just because I'm fixing things the other way round -- first I write a patch for 3.0 and only after that I backport it to 2.6. Stupid me. :)
msg70547 - (view)	Author: Nick Coghlan (ncoghlan) *	Date: 2008-08-01 10:10
Skip's patch looks good to me (as Skip discovered, I left out the necessary step of putting the first row back into an iterable before invoking chain in my example code)
msg70917 - (view)	Author: Barry A. Warsaw (barry) *	Date: 2008-08-08 22:06
Making an existing attribute a property is a nice, API-neutral way to handle this. Let's call the inconsistency a bug and this a bug fix <wink> so that it's fine to add to 2.6 and 3.0 at this point.
msg70921 - (view)	Author: Skip Montanaro (skip.montanaro) *	Date: 2008-08-08 22:53
Committed as revision 65605.

History
Date	User	Action	Args
2022-04-11 14:56:36	admin	set	github: 47686
2008-08-08 22:53:44	skip.montanaro	set	status: open -> closed assignee: skip.montanaro messages: + msg70921
2008-08-08 22:06:29	barry	set	nosy: + barry resolution: accepted messages: + msg70917
2008-08-01 10:10:31	ncoghlan	set	messages: + msg70547
2008-08-01 08:21:16	mishok13	set	files: - issue3436.trunk.csv.py.diff
2008-08-01 08:21:12	mishok13	set	files: - issue3436.py3k.csv.py.diff
2008-08-01 08:21:07	mishok13	set	messages: + msg70544
2008-08-01 01:06:05	skip.montanaro	set	messages: + msg70538
2008-08-01 01:00:04	skip.montanaro	set	files: - csv.diff
2008-08-01 00:59:54	skip.montanaro	set	files: + csv.diff messages: + msg70537
2008-07-31 22:56:36	skip.montanaro	set	messages: + msg70532
2008-07-31 15:24:55	mishok13	set	messages: + msg70521
2008-07-31 15:14:16	mishok13	set	files: + issue3436.trunk.csv.py.diff
2008-07-31 15:13:53	mishok13	set	files: + issue3436.py3k.csv.py.diff
2008-07-31 13:23:21	ncoghlan	set	messages: + msg70501
2008-07-30 22:58:35	ncoghlan	set	status: closed -> open resolution: wont fix -> (no value) messages: + msg70442
2008-07-30 17:41:01	skip.montanaro	set	messages: + msg70434
2008-07-30 13:48:30	mishok13	set	files: - trunk.csv.py.diff
2008-07-30 13:48:28	mishok13	set	files: - py3k.csv.py.diff
2008-07-30 00:17:28	gvanrossum	set	nosy: + gvanrossum messages: + msg70413
2008-07-28 11:17:41	skip.montanaro	set	status: open -> closed resolution: wont fix messages: + msg70343
2008-07-28 11:03:06	mishok13	set	messages: + msg70342
2008-07-28 10:59:32	skip.montanaro	set	messages: + msg70341
2008-07-27 04:15:47	ncoghlan	set	nosy: + ncoghlan messages: + msg70311
2008-07-27 03:23:42	skip.montanaro	set	messages: + msg70310
2008-07-25 02:41:17	skip.montanaro	set	files: + csv.diff nosy: + skip.montanaro messages: + msg70238
2008-07-24 16:29:59	mishok13	set	messages: + msg70214
2008-07-24 16:18:49	rhettinger	set	nosy: + rhettinger messages: + msg70213
2008-07-24 15:33:38	mishok13	set	type: behavior components: + Library (Lib)
2008-07-24 15:33:22	mishok13	set	files: + trunk.csv.py.diff
2008-07-24 15:32:53	mishok13	set	files: - trunk.csv.py.diff
2008-07-24 15:31:51	mishok13	set	files: + trunk.csv.py.diff
2008-07-24 15:30:21	mishok13	create