Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sqlite3 400x-600x slower depending on formatting of an UPDATE statement in a string #76396

Closed
bforst mannequin opened this issue Dec 5, 2017 · 11 comments
Closed

sqlite3 400x-600x slower depending on formatting of an UPDATE statement in a string #76396

bforst mannequin opened this issue Dec 5, 2017 · 11 comments
Labels
3.7 (EOL) end of life 3.8 only security fixes performance Performance or resource usage stdlib Python modules in the Lib dir

Comments

@bforst
Copy link
Mannequin

bforst mannequin commented Dec 5, 2017

BPO 32215
Nosy @pitrou, @bitdancer, @berkerpeksag, @bforst, @iritkatriel
PRs
  • bpo-32215: Fix performance regression in sqlite3 #8511
  • [3.7] bpo-32215: Fix performance regression in sqlite3 (GH-8511) #9441
  • [3.6] bpo-32215: Fix performance regression in sqlite3 (GH-8511) #9442
  • [3.7] bpo-32215: Fix performance regression in sqlite3 (GH-8511) #9449
  • [3.6] bpo-32215: Fix performance regression in sqlite3 (GH-8511) #9452
  • Files
  • sqlite3_27_36_performance_bug.py: Demo of the bug
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2021-06-25.22:41:09.025>
    created_at = <Date 2017-12-05.00:40:52.911>
    labels = ['3.8', '3.7', 'library', 'performance']
    title = 'sqlite3 400x-600x slower depending on formatting of an UPDATE statement in a string'
    updated_at = <Date 2021-06-25.22:41:09.025>
    user = 'https://github.com/bforst'

    bugs.python.org fields:

    activity = <Date 2021-06-25.22:41:09.025>
    actor = 'iritkatriel'
    assignee = 'none'
    closed = True
    closed_date = <Date 2021-06-25.22:41:09.025>
    closer = 'iritkatriel'
    components = ['Library (Lib)']
    creation = <Date 2017-12-05.00:40:52.911>
    creator = 'bforst'
    dependencies = []
    files = ['47315']
    hgrepos = []
    issue_num = 32215
    keywords = ['patch']
    message_count = 11.0
    messages = ['307609', '307611', '307612', '307683', '307689', '307696', '322530', '325854', '325892', '325912', '378756']
    nosy_count = 5.0
    nosy_names = ['pitrou', 'r.david.murray', 'berker.peksag', 'bforst', 'iritkatriel']
    pr_nums = ['8511', '9441', '9442', '9449', '9452']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'performance'
    url = 'https://bugs.python.org/issue32215'
    versions = ['Python 3.6', 'Python 3.7', 'Python 3.8']

    @bforst
    Copy link
    Mannequin Author

    bforst mannequin commented Dec 5, 2017

    We're moving some code from Python 2.7 to 3.6 and found a weird performance issue using SQLite in-memory and on-disk DBs with the built-in sqlite3 library. In Python 2.7, the two update statements below (excerpted from the attached file) run in the same amount of time. In Python 3.6 the update statement with the table name on a separate line runs 400x-600x slower with the example data provided in the file.

    """
    UPDATE tbl
    SET col2 = NULL
    WHERE col1 = ?
    """

    """
    UPDATE
    tbl
    SET col2 = NULL
    WHERE col1 = ?
    """

    We have verified this using Python installs from python.org on macOS Sierra and Windows 7 for Python 2.7 and 3.6.

    We have tried formatting the SQL strings in different ways and it appears that the speed change only occurs when the table name is on a different line than the "UPDATE".

    This also appears to be hitting some type of quadratic behaviour as with 10x less records, it only takes 10-15x as long. With the demo in the file we are seeing it take 1.6s on the fast string and ~1000s on the slow string.

    @bforst bforst mannequin added interpreter-core (Objects, Python, Grammar, and Parser dirs) stdlib Python modules in the Lib dir performance Performance or resource usage labels Dec 5, 2017
    @bitdancer
    Copy link
    Member

    I can confirm that there is a difference on linux as well, using the sqlite version for both 2.7 and 3.7:

    rdmurray@pydev:~/python/p27[2.7]>./python sqlite3_27_36_performance_bug.py
    First step: 3.22849011421
    Second step: 3.2167429924

    rdmurray@pydev:~/python/p37[master]>./python ../p27/sqlite3_27_36_performance_bug.py
    First step: 3.2722721099853516
    Second step: 4.094221353530884

    (I changed time.clock() to time.time()).

    @bitdancer
    Copy link
    Member

    ...using the *same* sqlite version...

    @bitdancer bitdancer added the 3.7 (EOL) end of life label Dec 5, 2017
    @pitrou
    Copy link
    Member

    pitrou commented Dec 5, 2017

    Brian, does the speed difference disappear when you add a space character just after "UPDATE"?
    We may be hitting this path: https://github.com/python/cpython/blob/master/Modules/_sqlite/statement.c#L76-L93

    @bitdancer
    Copy link
    Member

    It disappears for me running it on linux with the blank added.

    @bforst
    Copy link
    Mannequin Author

    bforst mannequin commented Dec 5, 2017

    Hi Antoine, yup, adding a space after the UPDATE makes the speed difference disappear on macOS Sierra and Windows 7.

    @berkerpeksag
    Copy link
    Member

    ab994ed wasn't merged in the 2.7 branch, so this should only be reproduced in Python 3.6+.

    @berkerpeksag berkerpeksag added 3.8 only security fixes and removed interpreter-core (Objects, Python, Grammar, and Parser dirs) labels Jul 28, 2018
    @berkerpeksag
    Copy link
    Member

    New changeset 8d1e190 by Berker Peksag in branch 'master':
    bpo-32215: Fix performance regression in sqlite3 (GH-8511)
    8d1e190

    @berkerpeksag
    Copy link
    Member

    New changeset 015cd0f by Berker Peksag (Miss Islington (bot)) in branch '3.7':
    bpo-32215: Fix performance regression in sqlite3 (GH-8511)
    015cd0f

    @berkerpeksag
    Copy link
    Member

    New changeset 4fb672f by Berker Peksag (Miss Islington (bot)) in branch '3.6':
    bpo-32215: Fix performance regression in sqlite3 (GH-8511)
    4fb672f

    @iritkatriel
    Copy link
    Member

    This seems complete, can it be closed?

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life 3.8 only security fixes performance Performance or resource usage stdlib Python modules in the Lib dir
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants