Message258308
Thank you for your feedback Victor and Steven.
I just copied my scripts and 360MB of CSV files over to Linux.
The entire process finished in 4 minutes exactly, using the original python scripts.
So there is something different between my environments.
If it was a fragmentation issue, then I would expect to always have a slow performance on the Windows system. But I can influence the performance by alternating between the two original statements:
s = "{0},".format(columnvalue) # fast
s = "'{0}',".format(columnvalue) # ~30x slower
I apologize for not being able to provide the entire code.
There is too much code to post at this time.
I am opening a file like this:
#logger = open(filename, rw, buffering, encoding)
logger = open('output.sql', 'a', 1, 'iso-8859-1')
I write to file:
logger.write(text+'\n')
I'm using a library to escape the string before saving to file.
import pymysql.converters as conv
<...>
for key in listkeys:
keyvalue = self.recordstats[key]
fieldtype = keyvalue[0]
columnvalue = record[key]
columnvalue = conv.escape_string(columnvalue)
if (count > 1):
s = "{0},".format(columnvalue) # No single quotes
else
s = "{0},".format(columnvalue) # No single quotes
count -= 1
logger.write(s+'\n')
I appreciate the feedback and ideas so far.
Trying the profiler is on my list to see if it provides more insight.
I am not using Anaconda3 on Linux. Perhaps that has an impact somehow?
I never suspected inserting the two single quotes to cause such a problem in performance. I noticed it when I parsed ~40GB of data and it took almost a week to complete instead of my expected 6-7 hrs.
Just the other day I decided to remove the single quotes because it was the only thing left that I'd changed. I had discarded that change the past two weeks because that couldn't be causing the performance problem.
Today, I wasn't expecting such a big difference between running my script on Linux or Windows.
If I discover anything else, I will post an update.
When I get the chance I can remove redundant code and post the source. |
|
Date |
User |
Action |
Args |
2016-01-15 16:56:02 | poostenr | set | recipients:
+ poostenr, vstinner, eric.smith, steven.daprano, ubehera |
2016-01-15 16:56:02 | poostenr | set | messageid: <1452876962.1.0.761703082009.issue26118@psf.upfronthosting.co.za> |
2016-01-15 16:56:02 | poostenr | link | issue26118 messages |
2016-01-15 16:56:01 | poostenr | create | |
|