This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author skip.montanaro
Recipients ejacq, rhettinger, skip.montanaro
Date 2021-03-25.21:03:40
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1616706221.48.0.554616358128.issue43625@roundup.psfhosted.org>
In-reply-to
Content
I assume the OP is referring to this sort of usage:

>>> sniffer = csv.Sniffer()
>>> raw = open("mixed.csv").read()
>>> sniffer.has_header(raw)
False

*sigh*

I really wish the Sniffer class had never been added to the CSV module. I can't recall who wrote it (the author is long gone). Though I am responsible for the initial commits, it wasn't me or the main authors of csvmodule.c. As far as I know, it never really worked well. I can't recall ever using it.

A simpler heuristic would be if the first row contains a bunch of strings and the second row contains a bunch of numbers, then the file has a header. That assumes that CSV files consist mostly of numeric data.

Looking at has_header, I see this:

    for thisType in [int, float, complex]:

I think this particular problem would be solved if the order of those types were reversed. The attached diff suggests that as well. Note that the Sniffer class currently contains no test cases, so that the test I added failed before the change and passes after doesn't mean it doesn't break someone's mission critical Sniffer usage.

(Sorry, Raymond. My Github-foo is insufficient to allow me to fork, apply the diff and create a PR.)
History
Date User Action Args
2021-03-25 21:03:41skip.montanarosetrecipients: + skip.montanaro, rhettinger, ejacq
2021-03-25 21:03:41skip.montanarosetmessageid: <1616706221.48.0.554616358128.issue43625@roundup.psfhosted.org>
2021-03-25 21:03:41skip.montanarolinkissue43625 messages
2021-03-25 21:03:40skip.montanarocreate