Message 248093 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	Tiago Wright
Recipients	Tiago Wright, peter.otten, r.david.murray, skip.montanaro
Date	2015-08-06.01:40:47
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<CAFxr9VppN__LfF5nKY-hkpK9m_N+rLh4n=A4X3wVuFCdic9dKg@mail.gmail.com>
In-reply-to	<1438728718.4.0.834365575859.issue24787@psf.upfronthosting.co.za>

Content
I've run the Sniffer against 1614 csv files on my computer and compared the delimiter it detects to what I have set manually. Here are the results: Sniffer Human,;\t\(blank)Error:)ceMpGrand TotalError rate,498 2 110 1 5122.7%; 1 10.0%\t3 922 69121 227105412.5%\| 33 330.0%space 91 4 1435.7%Grand Total5011922351610221142271614 -Tiago On Tue, Aug 4, 2015 at 3:51 PM R. David Murray <report@bugs.python.org> wrote: > > R. David Murray added the comment: > > If you look at the algorithm it is doing some fancy things with metrics, > but does have a 'preferred delimiters' list that it checks. It is possible > things could be improved either by tweaking the threshold or by somehow > giving added weight to the metrics when the candidate character is in the > preferred delimiter list. > > We might have to do this with a feature flag to turn it on, though, since > it could change the results for programs that happen to work with the > current algorithm. > > ---------- > nosy: +r.david.murray > > _______________________________________ > Python tracker <report@bugs.python.org> > <http://bugs.python.org/issue24787> > _______________________________________ >

I've run the Sniffer against 1614 csv files on my computer and compared the
delimiter it detects to what I have set manually. Here are the results:

 Sniffer            Human,;\t\(blank)Error:)ceMpGrand TotalError rate,498  2
110  1   5122.7%; 1          10.0%\t3 922 69121  227105412.5%|   33
330.0%space    91   4  1435.7%Grand Total5011922351610221142271614
-Tiago

On Tue, Aug 4, 2015 at 3:51 PM R. David Murray <report@bugs.python.org>
wrote:

>
> R. David Murray added the comment:
>
> If you look at the algorithm it is doing some fancy things with metrics,
> but does have a 'preferred delimiters' list that it checks.  It is possible
> things could be improved either by tweaking the threshold or by somehow
> giving added weight to the metrics when the candidate character is in the
> preferred delimiter list.
>
> We might have to do this with a feature flag to turn it on, though, since
> it could change the results for programs that happen to work with the
> current algorithm.
>
> ----------
> nosy: +r.david.murray
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue24787>
> _______________________________________
>

History
Date	User	Action	Args
2015-08-06 01:40:48	Tiago Wright	set	recipients: + Tiago Wright, skip.montanaro, peter.otten, r.david.murray
2015-08-06 01:40:48	Tiago Wright	link	issue24787 messages
2015-08-06 01:40:47	Tiago Wright	create