classification
Title: sqlite3 400x-600x slower depending on formatting of an UPDATE statement in a string
Type: performance Stage:
Components: Interpreter Core, Library (Lib) Versions: Python 3.7, Python 3.6, Python 2.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: berker.peksag, bforst, pitrou, r.david.murray
Priority: normal Keywords:

Created on 2017-12-05 00:40 by bforst, last changed 2017-12-06 05:00 by berker.peksag.

Files
File name Uploaded Description Edit
sqlite3_27_36_performance_bug.py bforst, 2017-12-05 00:40 Demo of the bug
Messages (6)
msg307609 - (view) Author: Brian Forst (bforst) * Date: 2017-12-05 00:40
We're moving some code from Python 2.7 to 3.6 and found a weird performance issue using SQLite in-memory and on-disk DBs with the built-in sqlite3 library. In Python 2.7, the two update statements below (excerpted from the attached file) run in the same amount of time. In Python 3.6 the update statement with the table name on a separate line runs 400x-600x slower with the example data provided in the file.

"""
UPDATE tbl
SET col2 = NULL
WHERE col1 = ?
"""

"""
UPDATE
  tbl
SET col2 = NULL
WHERE col1 = ?
"""

We have verified this using Python installs from python.org on macOS Sierra and Windows 7 for Python 2.7 and 3.6.

We have tried formatting the SQL strings in different ways and it appears that the speed change only occurs when the table name is on a different line than the "UPDATE".

This also appears to be hitting some type of quadratic behaviour as with 10x less records, it only takes 10-15x as long. With the demo in the file we are seeing it take 1.6s on the fast string and ~1000s on the slow string.
msg307611 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2017-12-05 01:11
I can confirm that there is a difference on linux as well, using the sqlite version for both 2.7 and 3.7:

rdmurray@pydev:~/python/p27[2.7]>./python sqlite3_27_36_performance_bug.py
First step: 3.22849011421
Second step: 3.2167429924

rdmurray@pydev:~/python/p37[master]>./python ../p27/sqlite3_27_36_performance_bug.py
First step: 3.2722721099853516
Second step: 4.094221353530884

(I changed time.clock() to time.time()).
msg307612 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2017-12-05 01:12
...using the *same* sqlite version...
msg307683 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2017-12-05 20:57
Brian, does the speed difference disappear when you add a space character just after "UPDATE"?
We may be hitting this path: https://github.com/python/cpython/blob/master/Modules/_sqlite/statement.c#L76-L93
msg307689 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2017-12-05 21:29
It disappears for me running it on linux with the blank added.
msg307696 - (view) Author: Brian Forst (bforst) * Date: 2017-12-05 22:35
Hi Antoine, yup, adding a space after the UPDATE makes the speed difference disappear on macOS Sierra and Windows 7.
History
Date User Action Args
2017-12-06 05:00:46berker.peksagsetnosy: + berker.peksag
2017-12-05 22:35:54bforstsetmessages: + msg307696
2017-12-05 21:29:13r.david.murraysetmessages: + msg307689
2017-12-05 20:57:02pitrousetnosy: + pitrou
messages: + msg307683
2017-12-05 01:12:58r.david.murraysetversions: + Python 3.7
2017-12-05 01:12:42r.david.murraysetmessages: + msg307612
2017-12-05 01:11:56r.david.murraysetnosy: + r.david.murray
messages: + msg307611
2017-12-05 00:40:52bforstcreate