classification
Title: sqlite3: optionally autoconvert table_info's DATETIME fields
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.8
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: berker.peksag, ghaering, jondo, matrixise, p-ganssle
Priority: normal Keywords:

Created on 2018-11-02 12:16 by jondo, last changed 2019-07-25 11:30 by jondo.

Files
File name Uploaded Description Edit
SQLiteStudio-3449.sqlite jondo, 2018-11-05 11:49
2017v2-reduced.db jondo, 2018-11-06 09:43
Messages (14)
msg329128 - (view) Author: Robert Pollak (jondo) Date: 2018-11-02 12:16
Currently, fields are converted to datetime as described in https://docs.python.org/3/library/sqlite3.html#sqlite3.PARSE_COLNAMES :

'select x as "x [datetime]" from table'

In my use case I don't know the names and locations of the datetime fields in advance. So I would need to do pandas.read_sql_query('select * from table', con) and get correct datetime columns.
(My current workaround is try calling pandas.to_datetime on each text column afterwards.)

The type info has to be available in the sqlite database, because I see that SQLiteStudio correctly detects the datetime columns.
msg329129 - (view) Author: Robert Pollak (jondo) Date: 2018-11-02 12:25
(In fact, I am currently taking the first non-missing entry of each text column and trying to dateutil.parser.parse it. If that works, I use pandas.to_datetime on the column.)
msg329157 - (view) Author: Paul Ganssle (p-ganssle) * (Python committer) Date: 2018-11-02 20:00
According to the sqlite documentation, there's no fundamental datetime type in sqlite: https://www.sqlite.org/datatype3.html#date_and_time_datatype


    SQLite does not have a storage class set aside for storing dates
    and/or times. Instead, the built-in Date And Time Functions of SQLite
    are capable of storing dates and times as TEXT, REAL, or INTEGER values


If you have an arbitrary database whose schema you don't know, I'm not sure it would be possible to automatically determine that it's a datetime, though it appears that Python already provides this functionality by exposing the converters "date" and "timestamp" ( https://docs.python.org/3/library/sqlite3.html#default-adapters-and-converters )

If you don't know the schema you can't be 100% accurate on which columns are datetime, but apparently datetime types that are text will be of the format "YYYY-MM-DD HH:MM:SS.SSS", which is a variant of iso8601, REAL columns will be Julian day numbers and integers will be epoch time.

If you assume that all your datetime columns will be TEXT and that any TEXT column that happens to be a valid date of is a datetime column, then you can either use:

    datetime.strftime(text_column, "%Y-%m-%d %H:%M:%S.%f")

Or if you want to be faster and less strict (this will allow several other variations on ISO 8601):

    datetime.fromisoformat(text_column)

I would not recommend using `dateutil.parser.parse`, as the dateutil parser is intended for taking something you know to be a string representing a datetime and getting you a datetime object from it. It is not designed to tell you whether something is or is not a datetime.
msg329214 - (view) Author: Robert Pollak (jondo) Date: 2018-11-04 00:13
Thank you, Paul, for your hints on sniffing. I have now asked at SqliteStudio for confirmation that they also do content sniffing to detect datetime text fields: https://github.com/pawelsalawa/sqlitestudio/issues/3449 .

So in this issue I am suggesting to embed this datetime text sniffing into sqlite3, for optional activation. Perhaps a new flag value of detect_types=DETECT_DATETIME to the sqlite3.connect() function would be suitable?
msg329225 - (view) Author: Stéphane Wirtel (matrixise) * (Python committer) Date: 2018-11-04 12:00
For my part, we could close this issue just because I am sure they sniff the format of the string.
If you use sqliteman you get a TEXT and not "datetime" 

You can read this doc about the read_sql https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_sql.html

Now, sqlite3 does not support the datetime type and I think it is not the job of Python to do this operation but the caller (your code) but you can see the example in Doc/includes/sqlite3/pysqlite_datetime.py
```python
import sqlite3
import datetime

con = sqlite3.connect(":memory:", detect_types=sqlite3.PARSE_DECLTYPES|sqlite3.PARSE_COLNAMES)
cur = con.cursor()
cur.execute("create table test(d date, ts timestamp)")

today = datetime.date.today()
now = datetime.datetime.now()

cur.execute("insert into test(d, ts) values (?, ?)", (today, now))
cur.execute("select d, ts from test")
row = cur.fetchone()
print(today, "=>", row[0], type(row[0]))
print(now, "=>", row[1], type(row[1]))

cur.execute('select current_date as "d [date]", current_timestamp as "ts [timestamp]"')
row = cur.fetchone()
print("current_date", row[0], type(row[0]))
print("current_timestamp", row[1], type(row[1]))
```

```shell
2018-11-04 => 2018-11-04 <class 'datetime.date'>
2018-11-04 12:58:01.399669 => 2018-11-04 12:58:01.399669 <class 'datetime.datetime'>
current_date 2018-11-04 <class 'datetime.date'>
current_timestamp 2018-11-04 11:58:01 <class 'datetime.datetime'>
```
msg329226 - (view) Author: Robert Pollak (jondo) Date: 2018-11-04 13:25
Stéphane, your example does not match my use case with unknown table structure.

Does everyone agree that such a sniffing is out of scope for sqlite3?
That's not what I'd call "batteries included" :-(
msg329227 - (view) Author: Stéphane Wirtel (matrixise) * (Python committer) Date: 2018-11-04 13:49
ok in this case, I re-open the issue.
msg329291 - (view) Author: Robert Pollak (jondo) Date: 2018-11-05 11:49
I have created the attached sqlite database file (with SQLiteStudio) for testing. I contains a table `t` with a TEXT column `x` and a DATETIME (according to SQLiteStudio) column `y`. The table contains a single row with the value `2018-11-05 12:20:30` for both columns.

For some reason SQLiteStudio manages to know these types (although the columns contain the same value, and even after renaming the file and reconnecting to it), so sqlite3 should also be able to detect them.
msg329309 - (view) Author: Paul Ganssle (p-ganssle) * (Python committer) Date: 2018-11-05 16:56
With regards to automatically deducing column types, I am -1 on that. It has way too many dangerous edge cases, and I know of countless bugs in software that are the direct result of stringly-typed data being coerced into a specific data type based on its form. For example, if you have an excel column of alphanumeric strings and happen to get one that looks like "13943E1", that will be coerced (lossily) into the number 139430.0, because it happens to take the form of exponential notation.

> I contains a table `t` with a TEXT column `x` and a DATETIME (according to SQLiteStudio) column `y`.

It sounds to me like SQLiteStudio is doing the same thing that Python is doing, by extending the "types" available to include a DATETIME. Presumably they do it the same way, with "datetime" in the column name. If that's the case then it's just a mismatch between what they call their datetime adapter and what python does, and you just need to rename the relevant columns to say "timestamp" instead of "datetime".

As an aside, this is what we get from people not offering real datetime types in their standards. JSON has the exact same problem - people need to transmit datetimes but there's no way to do it in the standard, so everyone extends the standard in a sightly different way and you end up with a bunch of incompatible ways of storing datetimes as strings. Bah humbug.
msg329319 - (view) Author: Robert Pollak (jondo) Date: 2018-11-05 21:47
Paul, the sniffing would be only active for people who explicitly add a connect() argument like detect_types=DETECT_DATETIME, which can have the appropriate warning in the docs. You can also extend the sniffing to all values, not just the first non-missing one. (I'd gladly pay the computational price.) How many cases are there where a column is full of `YYYY-MM-DD HH:MM:SS.SSS` data, DETECT_DATETIME is switched on, and the user _doesn't_ want this column to be interpreted as datetime?

(I'm of course not suggesting to try detecting REAL as Julian day, or INTEGER as Unix Time.)

Forget about my test file, by the way:
I have now found out that I can extract the DATETIME type information from my test file SQLiteStudio-3449.sqlite in the following two ways:
```
In [3]: pd.read_sql_query('pragma table_info("t")', con)
Out[3]: 
   cid name      type  notnull dflt_value  pk
0    0    x      TEXT        0       None   0
1    1    y  DATETIME        0       None   0

In [4]: pd.read_sql_query('SELECT SQL FROM sqlite_master WHERE name = "t"', con)
Out[4]: 
                                   sql
0  CREATE TABLE t (x TEXT, y DATETIME)
```
For my real use case however, those two statements yield empty results :-(
msg329350 - (view) Author: Robert Pollak (jondo) Date: 2018-11-06 09:43
Here is now a reduced version of my real use case: 2017v2-reduced.db. The file was originally created with "Oracle DBConvert". This file contains a table without rows.
I was wrong in my last comment: There's also no sniffing needed here, see either of the following useful outputs:
```
In [5]: pd.read_sql_query('PRAGMA table_info("t2")', con)['type']
Out[5]: 
0       DATETIME
1    VARCHAR (3)
Name: type, dtype: object

In [6]: pd.read_sql_query('SELECT SQL FROM sqlite_master WHERE name = "t2"', con).iloc[0,0]
Out[6]: 'CREATE TABLE t2 (localtime DATETIME DEFAULT NULL, freq VARCHAR (3))'
```
This also works with the original file from "Oracle DBConvert".

Wouldn't it make sense for sqlite3 to use this information, e.g. when connect() is called with something like `detect_types=CONVERT_DATETIME`? One could even call the option CONVERT_ORACLE_DATETIME to describe that one cannot expect it to work with files from other sources.

If yes, then I suggest changing this issue's title to '"SELECT *" should optionally autoconvert DATETIME fields if found in DDL'.

The question is of course how much sqlite generating software creates this metadata. But Oracle is certainly an important actor. And sqlite3 itself could also save this DDL instead of extending the field names (which seems more hacky).
msg330393 - (view) Author: Robert Pollak (jondo) Date: 2018-11-25 19:47
So my workaround is of course
```
table = pd.read_sql_query('select * from table', con)
column_type = pd.read_sql_query('PRAGMA table_info("table")', con)['type']
datetimes = table.columns[column_type == 'DATETIME']
table.loc[:, datetimes] = table.loc[:, datetimes].apply(pd.to_datetime)
```

I would be interested in creating a pull request for the `detect_types=CONVERT_DATETIME` argument that I described. Is there any chance to get that merged? I currently see 541 open cpython pull requests labeled "awaiting review" :-/
msg335544 - (view) Author: Stéphane Wirtel (matrixise) * (Python committer) Date: 2019-02-14 16:48
Hi @jondo,

Of course you can submit a PR and we can help you with a review if we have time because we are volunteers and it's in function of our free time.

Have a nice day,
msg348435 - (view) Author: Robert Pollak (jondo) Date: 2019-07-25 11:30
Sorry for not providing a pull request yet.

By the way, the "DB Browser for SQLite" (https://sqlitebrowser.org/) also interprets DATETIME fields correctly.

(Could someone please change the "Stage" of this issue to "needs patch"?)
History
Date User Action Args
2019-07-25 11:30:28jondosetmessages: + msg348435
versions: + Python 3.8, - Python 3.7
2019-02-14 16:48:24matrixisesetmessages: + msg335544
2018-11-25 19:47:29jondosetmessages: + msg330393
2018-11-18 16:31:56jondosettitle: sqlite3: "select *" should optionally sniff and autoconvert TEXT datetime fields -> sqlite3: optionally autoconvert table_info's DATETIME fields
2018-11-06 09:43:06jondosetfiles: + 2017v2-reduced.db

messages: + msg329350
2018-11-05 21:47:21jondosetmessages: + msg329319
2018-11-05 16:56:49p-gansslesetmessages: + msg329309
2018-11-05 11:49:15jondosetfiles: + SQLiteStudio-3449.sqlite

messages: + msg329291
2018-11-04 13:49:35matrixisesetstatus: closed -> open
resolution: fixed ->
messages: + msg329227
2018-11-04 13:25:06jondosetmessages: + msg329226
title: sqlite3: "select *" should optionally autoconvert TEXT datetime fields -> sqlite3: "select *" should optionally sniff and autoconvert TEXT datetime fields
2018-11-04 12:00:04matrixisesetstatus: open -> closed

nosy: + matrixise
messages: + msg329225

resolution: fixed
stage: resolved
2018-11-04 00:19:45jondosettitle: sqlite3: "select *" should autoconvert datetime fields -> sqlite3: "select *" should optionally autoconvert TEXT datetime fields
2018-11-04 00:13:40jondosetmessages: + msg329214
2018-11-02 20:00:12p-gansslesetnosy: + p-ganssle
messages: + msg329157
2018-11-02 12:25:40serhiy.storchakasetnosy: + ghaering, berker.peksag
2018-11-02 12:25:14jondosetmessages: + msg329129
2018-11-02 12:16:24jondocreate