Message 279058 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	hughdbrown
Recipients	hughdbrown, mtraskin, peter.otten, serhiy.storchaka, terry.reedy
Date	2016-10-20.17:25:36
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1476984336.58.0.0253381000046.issue18219@psf.upfronthosting.co.za>
In-reply-to

Content
I came across this problem today when I was using a 1000+ column CSV from a client. It was taking about 15 minutes to process each file. I found the problem and made this change: # wrong_fields = [k for k in rowdict if k not in self.fieldnames] wrong_fields = set(rowdict.keys()) - set(self.fieldnames) And my processing time went down to 12 seconds per file -- a 75x speedup. It's kind of sad that this change has been waiting for over three years when it is so simple. Any chance we could make one of the acceptable code changes and release it?

I came across this problem today when I was using a 1000+ column CSV from a client. It was taking about 15 minutes to process each file. I found the problem and made this change:

            # wrong_fields = [k for k in rowdict if k not in self.fieldnames]
            wrong_fields = set(rowdict.keys()) - set(self.fieldnames)

And my processing time went down to 12 seconds per file -- a 75x speedup.

It's kind of sad that this change has been waiting for over three years when it is so simple. Any chance we could make one of the acceptable code changes and release it?

History
Date	User	Action	Args
2016-10-20 17:25:36	hughdbrown	set	recipients: + hughdbrown, terry.reedy, peter.otten, serhiy.storchaka, mtraskin
2016-10-20 17:25:36	hughdbrown	set	messageid: <1476984336.58.0.0253381000046.issue18219@psf.upfronthosting.co.za>
2016-10-20 17:25:36	hughdbrown	link	issue18219 messages
2016-10-20 17:25:36	hughdbrown	create