Message 291458 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	keef604
Recipients	Mariatta, keef604, r.david.murray
Date	2017-04-11.01:11:58
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1491873119.94.0.245999605587.issue30034@psf.upfronthosting.co.za>
In-reply-to

Content
As you say, David, however much we would like the world to stick to a given CSV standard, the reality is that people don't, which is all the more reason for making the csv reader flexible and forgiving. The csv module can and should be used for more than just "comma-separated-values" files. I use it for all sorts of different delimited files, and it works very well. Pandas uses it, as I'm sure do many other packages. It's such a good module, it would be a pity to restrict its scope to just Excel-related scenarios. Parsing delimited files is undoubtedly complex, and painfully slow if done with pure Python, so the more that can be done in C the better. I'm no C programmer, but my guesstimate is that the coding changes I'm proposing are relatively modest. In the IN_QUOTED_FIELD section (https://github.com/python/cpython/blob/master/Modules/_csv.c#L690), it would mean checking for newline characters if the new "multiline" attribute is False (and probably "strict" is False too). Of course there is more to this change than just that, but I'm guessing not that much more.

As you say, David, however much we would like the world to stick to a given CSV standard, the reality is that people don't, which is all the more reason for making the csv reader flexible and forgiving.

The csv module can and should be used for more than just "comma-separated-values" files.  I use it for all sorts of different delimited files, and it works very well.  Pandas uses it, as I'm sure do many other packages.  It's such a good module, it would be a pity to restrict its scope to just Excel-related scenarios.  Parsing delimited files is undoubtedly complex, and painfully slow if done with pure Python, so the more that can be done in C the better.

I'm no C programmer, but my guesstimate is that the coding changes I'm proposing are relatively modest.  In the IN_QUOTED_FIELD section (https://github.com/python/cpython/blob/master/Modules/_csv.c#L690), it would mean checking for newline characters if the new "multiline" attribute is False (and probably "strict" is False too).  Of course there is more to this change than just that, but I'm guessing not that much more.

History
Date	User	Action	Args
2017-04-11 01:12:00	keef604	set	recipients: + keef604, r.david.murray, Mariatta
2017-04-11 01:11:59	keef604	set	messageid: <1491873119.94.0.245999605587.issue30034@psf.upfronthosting.co.za>
2017-04-11 01:11:59	keef604	link	issue30034 messages
2017-04-11 01:11:58	keef604	create