This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author skip.montanaro
Recipients jplaverdure, skip.montanaro, tds333
Date 2008-03-28.12:28:41
SpamBayes Score 0.08532603
Marked as misclassified No
Message-id <18412.4165.440393.304270@montanaro-dyndns-org.local>
In-reply-to <1206650343.02.0.115382198287.issue2078@psf.upfronthosting.co.za>
Content
Jean-Philippe> You're right, it does seem that using f.read(1024) to
    Jean-Philippe> feed the sniffer works OK in my case and allows me to
    Jean-Philippe> instantiate the DictReader correctly...  Why that is I'm
    Jean-Philippe> not sure though...

It works entirely based on chracter frequencies.  The more characters you
feed it the better it should be at guessing the correct delimiter.  In
particular, it pays attention to the frequency of the possible delimiters
per line and assumes the number of columns is the same for each line.
(Well, there's one place where it does use some knowledge of the structure
of a csv file, so my earlier assertion was incorrect.)  If you only feed it
one line it can't really use that frequency-per-line information.

    Jean-Philippe> I was submitting the first line as I thought is was the
    Jean-Philippe> right sample to provide the sniffer for it to sniff the
    Jean-Philippe> correct dialect regardless of the file format and file
    Jean-Philippe> content.

That's a good guess, but not quite spot on in this case.  In particular, the
character frequencies in the first line tend to be much different than the
other lines because it usually a row of column headers, while the remainder
of the file (though not always ;-) is a table of numbers.

Skip
History
Date User Action Args
2008-03-28 12:28:47skip.montanarosetspambayes_score: 0.085326 -> 0.08532603
recipients: + skip.montanaro, tds333, jplaverdure
2008-03-28 12:28:45skip.montanarolinkissue2078 messages
2008-03-28 12:28:43skip.montanarocreate