This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author Tiago Wright
Recipients Tiago Wright, peter.otten, r.david.murray, skip.montanaro
Date 2015-08-06.01:40:47
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <CAFxr9VppN__LfF5nKY-hkpK9m_N+rLh4n=A4X3wVuFCdic9dKg@mail.gmail.com>
In-reply-to <1438728718.4.0.834365575859.issue24787@psf.upfronthosting.co.za>
Content
I've run the Sniffer against 1614 csv files on my computer and compared the
delimiter it detects to what I have set manually. Here are the results:

 Sniffer            Human,;\t\(blank)Error:)ceMpGrand TotalError rate,498  2
110  1   5122.7%; 1          10.0%\t3 922 69121  227105412.5%|   33
330.0%space    91   4  1435.7%Grand Total5011922351610221142271614
-Tiago

On Tue, Aug 4, 2015 at 3:51 PM R. David Murray <report@bugs.python.org>
wrote:

>
> R. David Murray added the comment:
>
> If you look at the algorithm it is doing some fancy things with metrics,
> but does have a 'preferred delimiters' list that it checks.  It is possible
> things could be improved either by tweaking the threshold or by somehow
> giving added weight to the metrics when the candidate character is in the
> preferred delimiter list.
>
> We might have to do this with a feature flag to turn it on, though, since
> it could change the results for programs that happen to work with the
> current algorithm.
>
> ----------
> nosy: +r.david.murray
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue24787>
> _______________________________________
>
History
Date User Action Args
2015-08-06 01:40:48Tiago Wrightsetrecipients: + Tiago Wright, skip.montanaro, peter.otten, r.david.murray
2015-08-06 01:40:48Tiago Wrightlinkissue24787 messages
2015-08-06 01:40:47Tiago Wrightcreate