classification
Title: sqlite3 400x-600x slower depending on formatting of an UPDATE statement in a string
Type: performance Stage: patch review
Components: Library (Lib) Versions: Python 3.8, Python 3.7, Python 3.6
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: berker.peksag, bforst, pitrou, r.david.murray
Priority: normal Keywords: patch

Created on 2017-12-05 00:40 by bforst, last changed 2018-09-20 17:19 by berker.peksag.

Files
File name Uploaded Description Edit
sqlite3_27_36_performance_bug.py bforst, 2017-12-05 00:40 Demo of the bug
Pull Requests
URL Status Linked Edit
PR 8511 merged berker.peksag, 2018-07-28 09:10
PR 9441 closed miss-islington, 2018-09-20 11:11
PR 9442 closed miss-islington, 2018-09-20 11:11
PR 9449 merged miss-islington, 2018-09-20 15:25
PR 9452 merged miss-islington, 2018-09-20 15:57
Messages (10)
msg307609 - (view) Author: Brian Forst (bforst) * Date: 2017-12-05 00:40
We're moving some code from Python 2.7 to 3.6 and found a weird performance issue using SQLite in-memory and on-disk DBs with the built-in sqlite3 library. In Python 2.7, the two update statements below (excerpted from the attached file) run in the same amount of time. In Python 3.6 the update statement with the table name on a separate line runs 400x-600x slower with the example data provided in the file.

"""
UPDATE tbl
SET col2 = NULL
WHERE col1 = ?
"""

"""
UPDATE
  tbl
SET col2 = NULL
WHERE col1 = ?
"""

We have verified this using Python installs from python.org on macOS Sierra and Windows 7 for Python 2.7 and 3.6.

We have tried formatting the SQL strings in different ways and it appears that the speed change only occurs when the table name is on a different line than the "UPDATE".

This also appears to be hitting some type of quadratic behaviour as with 10x less records, it only takes 10-15x as long. With the demo in the file we are seeing it take 1.6s on the fast string and ~1000s on the slow string.
msg307611 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2017-12-05 01:11
I can confirm that there is a difference on linux as well, using the sqlite version for both 2.7 and 3.7:

rdmurray@pydev:~/python/p27[2.7]>./python sqlite3_27_36_performance_bug.py
First step: 3.22849011421
Second step: 3.2167429924

rdmurray@pydev:~/python/p37[master]>./python ../p27/sqlite3_27_36_performance_bug.py
First step: 3.2722721099853516
Second step: 4.094221353530884

(I changed time.clock() to time.time()).
msg307612 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2017-12-05 01:12
...using the *same* sqlite version...
msg307683 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2017-12-05 20:57
Brian, does the speed difference disappear when you add a space character just after "UPDATE"?
We may be hitting this path: https://github.com/python/cpython/blob/master/Modules/_sqlite/statement.c#L76-L93
msg307689 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2017-12-05 21:29
It disappears for me running it on linux with the blank added.
msg307696 - (view) Author: Brian Forst (bforst) * Date: 2017-12-05 22:35
Hi Antoine, yup, adding a space after the UPDATE makes the speed difference disappear on macOS Sierra and Windows 7.
msg322530 - (view) Author: Berker Peksag (berker.peksag) * (Python committer) Date: 2018-07-28 09:14
https://github.com/python/cpython/commit/ab994ed8b97e1b0dac151ec827c857f5e7277565 wasn't merged in the 2.7 branch, so this should only be reproduced in Python 3.6+.
msg325854 - (view) Author: Berker Peksag (berker.peksag) * (Python committer) Date: 2018-09-20 11:10
New changeset 8d1e190fc507a9e304f6817e761e9f628a23cbd8 by Berker Peksag in branch 'master':
bpo-32215: Fix performance regression in sqlite3 (GH-8511)
https://github.com/python/cpython/commit/8d1e190fc507a9e304f6817e761e9f628a23cbd8
msg325892 - (view) Author: Berker Peksag (berker.peksag) * (Python committer) Date: 2018-09-20 15:57
New changeset 015cd0f5cb17b1b208a92e549cd665dc38f2f699 by Berker Peksag (Miss Islington (bot)) in branch '3.7':
bpo-32215: Fix performance regression in sqlite3 (GH-8511)
https://github.com/python/cpython/commit/015cd0f5cb17b1b208a92e549cd665dc38f2f699
msg325912 - (view) Author: Berker Peksag (berker.peksag) * (Python committer) Date: 2018-09-20 17:19
New changeset 4fb672ff96ecbb87aaf2ecc4f04aed76aafe63b1 by Berker Peksag (Miss Islington (bot)) in branch '3.6':
bpo-32215: Fix performance regression in sqlite3 (GH-8511)
https://github.com/python/cpython/commit/4fb672ff96ecbb87aaf2ecc4f04aed76aafe63b1
History
Date User Action Args
2018-09-20 17:19:53berker.peksagsetmessages: + msg325912
2018-09-20 15:57:46miss-islingtonsetpull_requests: + pull_request8867
2018-09-20 15:57:01berker.peksagsetmessages: + msg325892
2018-09-20 15:25:53miss-islingtonsetpull_requests: + pull_request8864
2018-09-20 11:11:17miss-islingtonsetpull_requests: + pull_request8857
2018-09-20 11:11:10miss-islingtonsetpull_requests: + pull_request8856
2018-09-20 11:10:54berker.peksagsetmessages: + msg325854
2018-07-28 09:14:36berker.peksagsetmessages: + msg322530
components: - Interpreter Core
versions: + Python 3.8, - Python 2.7
2018-07-28 09:10:28berker.peksagsetkeywords: + patch
stage: patch review
pull_requests: + pull_request8028
2017-12-06 05:00:46berker.peksagsetnosy: + berker.peksag
2017-12-05 22:35:54bforstsetmessages: + msg307696
2017-12-05 21:29:13r.david.murraysetmessages: + msg307689
2017-12-05 20:57:02pitrousetnosy: + pitrou
messages: + msg307683
2017-12-05 01:12:58r.david.murraysetversions: + Python 3.7
2017-12-05 01:12:42r.david.murraysetmessages: + msg307612
2017-12-05 01:11:56r.david.murraysetnosy: + r.david.murray
messages: + msg307611
2017-12-05 00:40:52bforstcreate