Message 389528 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	skip.montanaro
Recipients	ejacq, rhettinger, skip.montanaro
Date	2021-03-25.21:03:40
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1616706221.48.0.554616358128.issue43625@roundup.psfhosted.org>
In-reply-to

Content
I assume the OP is referring to this sort of usage: >>> sniffer = csv.Sniffer() >>> raw = open("mixed.csv").read() >>> sniffer.has_header(raw) False sigh I really wish the Sniffer class had never been added to the CSV module. I can't recall who wrote it (the author is long gone). Though I am responsible for the initial commits, it wasn't me or the main authors of csvmodule.c. As far as I know, it never really worked well. I can't recall ever using it. A simpler heuristic would be if the first row contains a bunch of strings and the second row contains a bunch of numbers, then the file has a header. That assumes that CSV files consist mostly of numeric data. Looking at has_header, I see this: for thisType in [int, float, complex]: I think this particular problem would be solved if the order of those types were reversed. The attached diff suggests that as well. Note that the Sniffer class currently contains no test cases, so that the test I added failed before the change and passes after doesn't mean it doesn't break someone's mission critical Sniffer usage. (Sorry, Raymond. My Github-foo is insufficient to allow me to fork, apply the diff and create a PR.)

I assume the OP is referring to this sort of usage:

>>> sniffer = csv.Sniffer()
>>> raw = open("mixed.csv").read()
>>> sniffer.has_header(raw)
False

*sigh*

I really wish the Sniffer class had never been added to the CSV module. I can't recall who wrote it (the author is long gone). Though I am responsible for the initial commits, it wasn't me or the main authors of csvmodule.c. As far as I know, it never really worked well. I can't recall ever using it.

A simpler heuristic would be if the first row contains a bunch of strings and the second row contains a bunch of numbers, then the file has a header. That assumes that CSV files consist mostly of numeric data.

Looking at has_header, I see this:

    for thisType in [int, float, complex]:

I think this particular problem would be solved if the order of those types were reversed. The attached diff suggests that as well. Note that the Sniffer class currently contains no test cases, so that the test I added failed before the change and passes after doesn't mean it doesn't break someone's mission critical Sniffer usage.

(Sorry, Raymond. My Github-foo is insufficient to allow me to fork, apply the diff and create a PR.)

History
Date	User	Action	Args
2021-03-25 21:03:41	skip.montanaro	set	recipients: + skip.montanaro, rhettinger, ejacq
2021-03-25 21:03:41	skip.montanaro	set	messageid: <1616706221.48.0.554616358128.issue43625@roundup.psfhosted.org>
2021-03-25 21:03:41	skip.montanaro	link	issue43625 messages
2021-03-25 21:03:40	skip.montanaro	create