Message 358645 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	evan.whitfield
Recipients	evan.whitfield
Date	2019-12-18.20:50:42
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1576702242.94.0.201901935137.issue39092@roundup.psfhosted.org>
In-reply-to

Content
I observed a false positive for the csv sniffer has_header method. (It thought there was a header when there was not.) This is due to the fact that in has_header, it determines the csv dialect by sniffing it, and failed to determine that the file I was using had an escape character of '\'. Since it doesn't set the escape character, it then incorrectly broke the first line of the file into columns, since it encountered an escaped quote within a quoted column, and treated that as the end of that column. (It correctly determined that the dialect wasn't doublequote, but apparently still needs to have the escape character set to handle an escaped quotechar.) I think one (or both) of these things should be done here to avoid this false positive: 1.) Allow a dialect to be passed to has_header, so that someone could specify the escape character of the dialect if it were known. 2.) Allow the sniff method of the Sniffer class to detect and set the escapechar.

I observed a false positive for the csv sniffer has_header method. (It thought there was a header when there was not.) This is due to the fact that in has_header, it determines the csv dialect by sniffing it, and failed to determine that the file I was using had an escape character of '\'. Since it doesn't set the escape character, it then incorrectly broke the first line of the file into columns, since it encountered an escaped quote within a quoted column, and treated that as the end of that column. (It correctly determined that the dialect wasn't doublequote, but apparently still needs to have the escape character set to handle an escaped quotechar.) 

I think one (or both) of these things should be done here to avoid this false positive:
1.) Allow a dialect to be passed to has_header, so that someone could specify the escape character of the dialect if it were known.
2.) Allow the sniff method of the Sniffer class to detect and set the escapechar.

History
Date	User	Action	Args
2019-12-18 20:50:42	evan.whitfield	set	recipients: + evan.whitfield
2019-12-18 20:50:42	evan.whitfield	set	messageid: <1576702242.94.0.201901935137.issue39092@roundup.psfhosted.org>
2019-12-18 20:50:42	evan.whitfield	link	issue39092 messages
2019-12-18 20:50:42	evan.whitfield	create