This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author poostenr
Recipients eric.smith, poostenr, steven.daprano, ubehera, vstinner
Date 2016-01-15.16:56:01
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1452876962.1.0.761703082009.issue26118@psf.upfronthosting.co.za>
In-reply-to
Content
Thank you for your feedback Victor and Steven.

I just copied my scripts and 360MB of CSV files over to Linux.
The entire process finished in 4 minutes exactly, using the original python scripts.
So there is something different between my environments.
If it was a fragmentation issue, then I would expect to always have a slow performance on the Windows system. But I can influence the performance by alternating between the two original statements:
s = "{0},".format(columnvalue)   # fast
s = "'{0}',".format(columnvalue) # ~30x slower

I apologize for not being able to provide the entire code.
There is too much code to post at this time.

I am opening a file like this:
#logger = open(filename, rw, buffering, encoding)
logger = open('output.sql', 'a', 1, 'iso-8859-1')

I write to file:
logger.write(text+'\n')

I'm using a library to escape the string before saving to file.
import pymysql.converters as conv
<...>
for key in listkeys:
    keyvalue = self.recordstats[key]
    fieldtype   = keyvalue[0]
    columnvalue = record[key]
    columnvalue = conv.escape_string(columnvalue)
    if (count > 1):
        s = "{0},".format(columnvalue)  # No single quotes
    else
        s = "{0},".format(columnvalue)  # No single quotes
    count -= 1
    logger.write(s+'\n')

I appreciate the feedback and ideas so far.
Trying the profiler is on my list to see if it provides more insight.
I am not using Anaconda3 on Linux. Perhaps that has an impact somehow?

I never suspected inserting the two single quotes to cause such a problem in performance. I noticed it when I parsed ~40GB of data and it took almost a week to complete instead of my expected 6-7 hrs.
Just the other day I decided to remove the single quotes because it was the only thing left that I'd changed. I had discarded that change the past two weeks because that couldn't be causing the performance problem.

Today, I wasn't expecting such a big difference between running my script on Linux or Windows.

If I discover anything else, I will post an update.
When I get the chance I can remove redundant code and post the source.
History
Date User Action Args
2016-01-15 16:56:02poostenrsetrecipients: + poostenr, vstinner, eric.smith, steven.daprano, ubehera
2016-01-15 16:56:02poostenrsetmessageid: <1452876962.1.0.761703082009.issue26118@psf.upfronthosting.co.za>
2016-01-15 16:56:02poostenrlinkissue26118 messages
2016-01-15 16:56:01poostenrcreate