classification
Title: sqlite3 bug handling column names that contain square braces
Type: behavior Stage: resolved
Components: Extension Modules Versions: Python 3.9, Python 3.8, Python 3.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: berker.peksag, elrull, ghaering, miss-islington, serhiy.storchaka, simonw
Priority: normal Keywords: patch

Created on 2020-02-16 17:10 by simonw, last changed 2020-03-21 14:35 by serhiy.storchaka. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 18533 open python-dev, 2020-02-17 04:21
PR 18942 merged serhiy.storchaka, 2020-03-11 21:11
PR 19103 merged miss-islington, 2020-03-21 14:07
PR 19104 merged serhiy.storchaka, 2020-03-21 14:10
Messages (10)
msg362081 - (view) Author: Simon Willison (simonw) * Date: 2020-02-16 17:10
Bit of an obscure bug this one. SQLite allows column names to contain [ and ] characters, even though those are often used as delimiters in SQLite. Here's how to create such a database with bash:

```
sqlite3 /tmp/demo.db <<EOF
    BEGIN TRANSACTION;
    CREATE TABLE "data" (
        "MTU (CET)" TEXT,
        "Day-ahead Price [EUR/MWh]" TEXT
    );
    INSERT INTO "data" VALUES('01.01.2016 00:00 - 01.01.2016 01:00','23.86');
    COMMIT;
EOF
```

If you query that database from sqlite3 in Python, the `[Eur/MWh]` piece is removed from that column name:
```
In [1]: import sqlite3                                                                                                         
In [2]: conn = sqlite3.connect("/tmp/demo.db")                                                                                 
In [3]: cursor = conn.cursor()                                                                                                 
In [4]: cursor.execute("select * from data")                                                                                   
Out[4]: <sqlite3.Cursor at 0x10c70a0a0>
In [5]: cursor.fetchall()                                                                                                      
Out[5]: [('01.01.2016 00:00 - 01.01.2016 01:00', '23.86')]
In [6]: cursor.description                                                                                                     
Out[6]: 
(('MTU (CET)', None, None, None, None, None, None),
 ('Day-ahead Price', None, None, None, None, None, None))
In [7]: conn.row_factory = sqlite3.Row                                                                                         
In [8]: cursor = conn.cursor()                                                                                                 
In [9]: cursor.execute("select * from data")                                                                                   
Out[9]: <sqlite3.Cursor at 0x10c7a8490>
In [10]: row = cursor.fetchall()                                                                                                     
In [12]: row                                                                                                                   
Out[12]: <sqlite3.Row at 0x10c3fe670>
In [15]: row.keys()                                                                                                            
Out[15]: ['MTU (CET)', 'Day-ahead Price']
```
As you can see, it is missing from both `cursor.description` and from `row.keys()` here.

But... if you query that database using SQLite directly (with `.headers on` so you can see the name of the columns) it works as expected:
```
$ sqlite3 /tmp/demo.db 
SQLite version 3.24.0 2018-06-04 14:10:15
Enter ".help" for usage hints.
sqlite> .schema
CREATE TABLE IF NOT EXISTS "data" (
        "MTU (CET)" TEXT,
        "Day-ahead Price [EUR/MWh]" TEXT
    );
sqlite> .headers on
sqlite> select * from data;
MTU (CET)|Day-ahead Price [EUR/MWh]
01.01.2016 00:00 - 01.01.2016 01:00|23.86
sqlite> 
```
It looks to me like this is a bug in Python's SQLite3 module.

This was first reported here: https://github.com/simonw/sqlite-utils/issues/86
msg362117 - (view) Author: Raul Gallegos (elrull) * Date: 2020-02-17 04:28
Thanks for reporting this issue, I have a PR here https://github.com/python/cpython/pull/18533 but that still needs some clarification, because there were some tests that were actually testing that we were stripping everything after the square brackets.

P.D. this is my first contribution :)
msg362119 - (view) Author: Simon Willison (simonw) * Date: 2020-02-17 06:40
Oh how interesting - yes it looks like this is deliberate behavior introduced in this commit: https://github.com/python/cpython/commit/0e3f591aeeef9ed715f8770320f4c4c7332a8794
msg363959 - (view) Author: Berker Peksag (berker.peksag) * (Python committer) Date: 2020-03-11 19:16
https://github.com/ghaering/pysqlite/commit/f3d452f2daeb432b8ad89fa4f087164bfd6ddc12 should probably give more context than that huge svnmerge commit :)

(Removed older Python versions since even if decide to change the behavior from 2006, it may break third-party programs that are relying on it.)
msg363964 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-03-11 20:24
It was originally introduced in https://github.com/ghaering/pysqlite/commit/780a76dcd8f4142b6ecbd423f52e0ccf067fc277. I think that original column names should be returned when PARSE_COLNAMES is not set.

Working on a PR.
msg363968 - (view) Author: Raul Gallegos (elrull) * Date: 2020-03-11 20:43
hi @serhiy.storchaka is this something that you think could be done by a new contributor? I'd really love to take care of this, I can improve on the PR I was preparing https://github.com/python/cpython/pull/18533
msg363976 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-03-11 22:02
Sorry Raul, but I already have a PR written. It includes minor refactoring,  better error handling in the nearby code, improved tests, fixed documentation and comments which incorrectly described the current behavior. It took time to determine what parts should be fixed, and unless you have found all this in your PR it would take much more time to explain to you what details should be checked than check and fix them myself.
msg364748 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-03-21 13:53
New changeset b146568dfcbcd7409c724f8917e4f77433dd56e4 by Serhiy Storchaka in branch 'master':
bpo-39652: Truncate the column name after '[' only if PARSE_COLNAMES is set. (GH-18942)
https://github.com/python/cpython/commit/b146568dfcbcd7409c724f8917e4f77433dd56e4
msg364752 - (view) Author: miss-islington (miss-islington) Date: 2020-03-21 14:32
New changeset 687f5921a46cf95c2a648d8031f9e99cdcc3e6b7 by Miss Islington (bot) in branch '3.8':
bpo-39652: Truncate the column name after '[' only if PARSE_COLNAMES is set. (GH-18942)
https://github.com/python/cpython/commit/687f5921a46cf95c2a648d8031f9e99cdcc3e6b7
msg364753 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-03-21 14:33
New changeset 39680fb7043e555469e08d3c4f49073acca77b20 by Serhiy Storchaka in branch '3.7':
[3.7] bpo-39652: Truncate the column name after '[' only if PARSE_COLNAMES is set. (GH-18942). (GH-19104)
https://github.com/python/cpython/commit/39680fb7043e555469e08d3c4f49073acca77b20
History
Date User Action Args
2020-03-21 14:35:30serhiy.storchakasetstatus: open -> closed
stage: patch review -> resolved
resolution: fixed
versions: + Python 3.7, Python 3.8
2020-03-21 14:33:47serhiy.storchakasetmessages: + msg364753
2020-03-21 14:32:29miss-islingtonsetmessages: + msg364752
2020-03-21 14:10:34serhiy.storchakasetpull_requests: + pull_request18464
2020-03-21 14:07:01miss-islingtonsetnosy: + miss-islington
pull_requests: + pull_request18463
2020-03-21 13:53:49serhiy.storchakasetmessages: + msg364748
2020-03-11 22:02:11serhiy.storchakasetmessages: + msg363976
2020-03-11 21:11:47serhiy.storchakasetpull_requests: + pull_request18295
2020-03-11 20:43:35elrullsetmessages: + msg363968
2020-03-11 20:24:42serhiy.storchakasetassignee: serhiy.storchaka

messages: + msg363964
nosy: + serhiy.storchaka
2020-03-11 19:16:23berker.peksagsetmessages: + msg363959
versions: - Python 2.7, Python 3.5, Python 3.6, Python 3.7, Python 3.8
2020-03-11 16:32:19xtreaksetnosy: + ghaering, berker.peksag
2020-02-17 06:40:35simonwsetmessages: + msg362119
2020-02-17 04:28:58elrullsetnosy: + elrull
messages: + msg362117
2020-02-17 04:21:55python-devsetkeywords: + patch
stage: patch review
pull_requests: + pull_request17908
2020-02-16 17:10:39simonwsettype: behavior
2020-02-16 17:10:07simonwcreate