classification
Title: sqlite3 timestamp converter cannot handle timezone
Type: enhancement Stage: patch review
Components: Library (Lib) Versions: Python 3.8
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: belopolsky, bozo.kopic, ghaering, hassanbot, lemburg, serhiy.storchaka, xiang.zhang
Priority: normal Keywords: patch

Created on 2016-12-28 23:04 by bozo.kopic, last changed 2020-09-18 07:14 by serhiy.storchaka.

Files
File name Uploaded Description Edit
sqlite3.patch bozo.kopic, 2016-12-28 23:04 review
sqlite3-2.patch bozo.kopic, 2016-12-29 17:49
sqlite3-2.patch serhiy.storchaka, 2016-12-29 18:21 Regenerated for review review
sqlite3-3.patch xiang.zhang, 2017-01-11 09:21 Based on -2, addressing comments. review
timestamptz.patch xiang.zhang, 2017-01-12 05:45 review
timestamptz-2.patch xiang.zhang, 2017-01-13 09:23 review
timestamptz-3.patch xiang.zhang, 2017-01-13 11:23 review
Messages (34)
msg284205 - (view) Author: Bozo Kopic (bozo.kopic) * Date: 2016-12-28 23:04
Current convert_timestamp function raises exception when parsing timestamps that contain timezone information and have microsecond set to zero.

Patch for this behavior is included.

Example script that demonstrates this problem:

import sys
import datetime
import sqlite3
import traceback
import contextlib


def main():
    db = sqlite3.connect('test.db', detect_types=sqlite3.PARSE_DECLTYPES)
    db.executescript("""
        CREATE TABLE IF NOT EXISTS test (
            timestamp TIMESTAMP)
        """)

    t = datetime.datetime.now(tz=datetime.timezone.utc)
    t = t.replace(microsecond=0)
    db.executemany("INSERT INTO test VALUES (:timestamp)", [{'timestamp': t}])
    db.commit()

    with contextlib.closing(db.cursor()) as c:
        try:
            c.execute('SELECT * FROM test')
            c.fetchall()
            print('original implementation success')
        except Exception as e:
            print('original implementation failed')
            traceback.print_exc()
        c.close()

    sqlite3.register_converter("timestamp", _sqlite_convert_timestamp)

    with contextlib.closing(db.cursor()) as c:
        try:
            c.execute('SELECT * FROM test')
            c.fetchall()
            print('patched implementation success')
        except Exception as e:
            print('patched implementation failed')
            traceback.print_exc()
        c.close()


def _sqlite_convert_timestamp(val):
    datepart, timepart = val.split(b" ")

    # this is the patch
    timepart = timepart.split(b'+', 1)[0].split(b'-', 1)[0]

    year, month, day = map(int, datepart.split(b"-"))
    timepart_full = timepart.split(b".")
    hours, minutes, seconds = map(int, timepart_full[0].split(b":"))
    if len(timepart_full) == 2:
        microseconds = int('{:0<6.6}'.format(timepart_full[1].decode()))
    else:
        microseconds = 0
    val = datetime.datetime(year, month, day, hours, minutes, seconds,
                            microseconds)
    return val


if __name__ == '__main__':
    sys.exit(main())
msg284231 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2016-12-29 07:07
Thanks for your report Bozo.

I think this is due to timestamp converter doesn't handle timezone now. Besides this ValueError, if I understand correctly, even if you pass a datetime object with tzinfo, with microsecond set, the returned datetime object is a wrong one since the tzinfo is discarded.
msg284240 - (view) Author: Bozo Kopic (bozo.kopic) * Date: 2016-12-29 09:56
Yes, that is correct. Timezone information is discarded. Submitted patch only resolves occurrence of ValueError.

Is additional patch that enables parsing of timezone required?
msg284241 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2016-12-29 10:06
I would prefer add the support for timezone since the current behaviour seems not correct to me. If you'd like to work on it, don't forget to add a test case and sign the CLA.

But I am not sure this should be treated as a bug or new feature.
msg284243 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-12-29 10:21
Could you provide a unittest?
msg284245 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2016-12-29 10:38
import sqlite3, datetime
c = sqlite3.connect(':memory:', detect_types=sqlite3.PARSE_DECLTYPES|sqlite3.PARSE_COLNAMES)
cur = c.cursor()
cur.execute('create table test(t timestamp)')
t = datetime.datetime.now(tz=datetime.timezone.utc)
cur.execute("insert into test(t) values (?)", (t,))
cur.execute('select t from test')
l = cur.fetchone()[0]
t == l   # the result not equal to the original one
msg284252 - (view) Author: Bozo Kopic (bozo.kopic) * Date: 2016-12-29 12:31
I'm providing new patch that adds support for parsing timezone information.

Tests are included.
msg284291 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-12-29 19:04
In general the patch LGTM. Added comments on Rietveld.
msg284318 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2016-12-30 06:37
LGTM generally. :-)
msg285211 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-01-11 10:38
LGTM except that arguments of assertEqual() should be swapped.
msg285227 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2017-01-11 14:52
> LGTM except that arguments of assertEqual() should be swapped.

Thanks Serhiy! I'll take care of that. One problem remained is what to do with 2.7. There is no timezone object in 2.7. My preference is just pointing out the limitation that only naive datetime object is supported in the doc.
msg285229 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-01-11 15:29
Agreed. And maybe raise more informative error for datetimes with a timezone.

I'm also not sure about 3.5 and 3.6. Datetimes with timezones were never supported. Currently all returned ditetime objects are naive. Getting an aware datetime can confuse applications.
msg285230 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2017-01-11 15:50
I don't think this can be considered a bug fix, since it changes behavior for applications that read data from SQLite databases which were not created by Python.

Those application may now see datetime values with tz infos and will likely not be prepared to handle all the problems associated with this.

Is it possible to make the support optional for 3.5 and 3.6 and only enable it for 3.7 (with the possibility of disabling it again) ?
msg285231 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2017-01-11 16:04
I asked whether this should be treated as an enhancement or bug fix. Now with your concerns I think it fits as enhancement more.

> Is it possible to make the support optional for 3.5 and 3.6 and only enable it for 3.7 (with the possibility of disabling it again) ?

How about just make it just into 3.7 and just document only naive datetime objects are supported in 3.5, 3.6 and 2.7.
I am not sure it's worth to make it even optional in 3.7. Discarding tzinfo sounds weird, and worse, the behaviour is not deterministic, it could also fail with an exception.
msg285232 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-01-11 16:08
The naive datetime converter is registered under the name "timestamp". The aware datetime converter or the universal datetime converter (if it is needed) can be registered under different names.
msg285243 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2017-01-11 17:01
On 11.01.2017 17:08, Serhiy Storchaka wrote:
> 
> The naive datetime converter is registered under the name "timestamp". The aware datetime converter or the universal datetime converter (if it is needed) can be registered under different names.

This sounds like a good idea.

Perhaps use "timestamptz" or something similar.
msg285245 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2017-01-11 17:16
On 11.01.2017 17:04, Xiang Zhang wrote:
> I am not sure it's worth to make it even optional in 3.7. Discarding tzinfo sounds weird, and worse, the behaviour is not deterministic, it could also fail with an exception.

Best practice is to store all date/time values using UTC (or some
other fixed timezone) in databases and to manage the timezone/locale
information elsewhere.

The TZ info then becomes redundant and only results in more
database space being used as well as slower queries (due to the extra
TZ calculations being done; many databases internally store the
values as UTC to avoid having to do TZ calculations at query time).

Of course, there are use cases, where you'd still want to work
with TZ values in the database, so an extra converter for this
sounds like a good plan.
msg285287 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2017-01-12 05:45
timestamptz.patch implements a new converter that's able to convert aware datetime objects.
msg285288 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-01-12 06:06
I think the timestamptz converter should either interpret strings without timezone as UTC (and perhaps understand the "Z" suffix as sqlite3 date() function) or raises an error. It should never return naive datetime.

The timestamp converter needs better error reporting when get an input with a timezone.
msg285289 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2017-01-12 06:24
> I think the timestamptz converter should either interpret strings without timezone as UTC (and perhaps understand the "Z" suffix as sqlite3 date() function) or raises an error. It should never return naive datetime.

Currently timestamptz just convert back what the user passed in, no matter naive or aware objects. What to do with them is left to the app. If we raise an error, could users use naive and aware objects together? And interpreting strings without timezone as UTC seems will mistranslate the object. For example, pass in datetime.datetime.now() and translate it back as UTC may not be right.

> The timestamp converter needs better error reporting when get an input with a timezone.

I thought about it but our intention to introduce a new converter is not to break old code. Doesn't add error reporting violate the intention? Users' code may not catch the error now.
msg285291 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-01-12 07:28
Naive and aware objects should not be mixed. Their actually are different types. The converter doesn't return str or bytes when can't parse an input as a datetime, and it shouldn't return a naive datetime if can't find a timezone.

SQLite date and time functions imply UTC if timezone is omitted. This is a reason for returning aware UTC datetime objects. On other side, Python is more powerful programming language, it distinguish naive and aware datetime objects, and it is unlikely that these two types are mixed in one database column. It is better to raise an error that silently return possible wrong result. It exceptional case user can write special converter or just call SQLite datetime() for unifying data format.

I think the old code unlikely will be broken if preserve an exception type and don't change conditions for raising an error. New error message can contain full input string and suggest to use the timestamptz converter if it looks as a datetime with timezone.
msg285371 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2017-01-13 09:19
> I think the old code unlikely will be broken if preserve an exception type and don't change conditions for raising an error.

Currently there could be no error when the object gets a timezone. The timezone is simply discarded and only when microseconds comes to 0 there is a ValueError. I don't think the user code would prepare to catch the ValueError. I don't oppose make timestamp more strict but just not sure if it's suitable.
msg285372 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2017-01-13 09:23
timestamptz-2.patch make timestamp and timestamptz specilized for their own objects.
msg285375 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-01-13 10:21
I think ProgrammingError is not correct type of an exception. It is used for programming errors such as using uninitialized or closed objects or supplying incorrect number of bindings. But this error can be caused by invalid data in a database created not by Python. Currently converters raise ValueError or TypeError for invalid data. ValueError is raised for datetime without microseconds but with timezone offset. I think ValueError is more appropriate type for such errors.

Perhaps even TypeError should be converted to ValueError.

    try:
        # parsing date or datetime
    except (ValueError, TypeError):
        raise ValueError(...)
msg285377 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-01-13 10:28
> Perhaps even TypeError should be converted to ValueError.

But this is different issue of course.
msg285381 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2017-01-13 10:50
I am okay with ValueError´╝łactually I use it in the patch originally) but I am not in favour of catching the errors of the parser. The parser raises errors due to invalid data, not timezone. I think propagate it to users could make the reason more obvious.
msg285385 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2017-01-13 11:23
timestamptz-3.patch uses ValueError instead of ProgrammingError and treat suffix Z as UTC timezone.
msg285985 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2017-01-22 04:04
Ping for review for timestamptz-3.patch.
msg285989 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-01-22 06:18
Added few minor comments on Rietveld. Technically the patch LGTM. But the behavior change of the timestamp converter for input without timezone offset perhaps needs wider discussion. We have several options:

0. Current behavior. Silently drop the timezone if there are microseconds, and raise an error when there are no microseconds. This is bad for many reasons, but some user code can depend on this.

1. Just drop the timezone, don't raise an error when there are no microseconds. The converter will become returning incorrect result instead of just failing on unexpected input. This will fix the user code that is aware of this, but currently fails when input doesn't have microseconds.

2. Return aware datetime for input with timezone offset (as in original patch). Returning aware or naive datetime depending on input can break a user code that is not prepared for this.

3. Raise an error for input with timezone offset, always return naive datetime objects from this converter. This can break the user code that depends on current behavior and works with the input containing the same timezone offsets.

Any option can break some code. I prefer option 3, but want to hear thoughts of other core developers. Maybe discuss this on Python-Dev?
msg285990 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2017-01-22 06:24
> Maybe discuss this on Python-Dev?

It's fine. Could you compose a mail instead of me? I am not good at that. :-(
msg285992 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-01-22 06:32
I'm even worse at that. :-(
msg285993 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2017-01-22 06:33
> I'm even worse at that. :-(

LoL. Okay I'll do that. :-)
msg377068 - (view) Author: Hassanbot (hassanbot) Date: 2020-09-17 21:51
This still isn't fixed as of 3.8 (or in master I think).

I can understand why you wouldn't want to allow serializing and deserializing time zones, since tzinfo objects cannot be accurately serialized with a simple UTC offset, but you should at least get an error when trying to insert an aware object. Anything is better than it is now, where you get no warning or error when inserting the object, and get a hard to interpret error ("invalid literal for int() with base 10") when trying to retrieve it.

For deserialization, the datetime class now (since 3.7) includes a fromisoformat() method that could be used as a counterpart to the isoformat() method used when serializing. At least it would be consistent then.
msg377090 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-09-18 07:14
Xiang Zhang, was there a discussion on Python-Dev?
History
Date User Action Args
2020-09-18 07:14:50serhiy.storchakasetmessages: + msg377090
2020-09-17 21:51:20hassanbotsetnosy: + hassanbot

messages: + msg377068
versions: + Python 3.8, - Python 3.7
2017-01-22 06:33:59xiang.zhangsetmessages: + msg285993
2017-01-22 06:32:20serhiy.storchakasetmessages: + msg285992
2017-01-22 06:24:38xiang.zhangsetmessages: + msg285990
2017-01-22 06:18:23serhiy.storchakasetmessages: + msg285989
2017-01-22 04:04:44xiang.zhangsetmessages: + msg285985
2017-01-13 11:23:12xiang.zhangsetfiles: + timestamptz-3.patch

messages: + msg285385
2017-01-13 10:50:52xiang.zhangsetmessages: + msg285381
2017-01-13 10:28:31serhiy.storchakasetmessages: + msg285377
2017-01-13 10:21:51serhiy.storchakasetmessages: + msg285375
2017-01-13 09:23:59xiang.zhangsetfiles: + timestamptz-2.patch

messages: + msg285372
2017-01-13 09:19:45xiang.zhangsetmessages: + msg285371
2017-01-12 07:28:40serhiy.storchakasetmessages: + msg285291
2017-01-12 06:24:39xiang.zhangsetmessages: + msg285289
2017-01-12 06:06:22serhiy.storchakasetmessages: + msg285288
2017-01-12 05:45:38xiang.zhangsetfiles: + timestamptz.patch

messages: + msg285287
stage: needs patch -> patch review
2017-01-11 17:16:25lemburgsetmessages: + msg285245
2017-01-11 17:06:28serhiy.storchakasetstage: commit review -> needs patch
type: behavior -> enhancement
versions: - Python 3.5, Python 3.6
2017-01-11 17:01:32lemburgsetmessages: + msg285243
2017-01-11 16:08:15serhiy.storchakasetmessages: + msg285232
2017-01-11 16:04:29xiang.zhangsetmessages: + msg285231
2017-01-11 15:50:16lemburgsetmessages: + msg285230
2017-01-11 15:29:35serhiy.storchakasetmessages: + msg285229
2017-01-11 14:52:40xiang.zhangsetmessages: + msg285227
2017-01-11 10:38:56serhiy.storchakasetmessages: + msg285211
stage: patch review -> commit review
2017-01-11 09:21:51xiang.zhangsetfiles: + sqlite3-3.patch
2016-12-30 06:37:21xiang.zhangsetmessages: + msg284318
2016-12-29 19:04:43serhiy.storchakasetmessages: + msg284291
2016-12-29 18:21:21serhiy.storchakasetfiles: + sqlite3-2.patch
stage: test needed -> patch review
2016-12-29 17:49:15bozo.kopicsetfiles: + sqlite3-2.patch
2016-12-29 17:49:05bozo.kopicsetfiles: - sqlite3-2.patch
2016-12-29 12:36:00bozo.kopicsetfiles: + sqlite3-2.patch
2016-12-29 12:34:52bozo.kopicsetfiles: - sqlite3-2.patch
2016-12-29 12:31:59bozo.kopicsetfiles: + sqlite3-2.patch

messages: + msg284252
2016-12-29 10:38:36xiang.zhangsetmessages: + msg284245
2016-12-29 10:21:03serhiy.storchakasetnosy: + lemburg, ghaering, belopolsky

messages: + msg284243
stage: test needed
2016-12-29 10:06:56xiang.zhangsetnosy: + serhiy.storchaka
messages: + msg284241
2016-12-29 09:56:03bozo.kopicsetmessages: + msg284240
2016-12-29 07:07:00xiang.zhangsettitle: sqlite3 timestamp converter ValueError -> sqlite3 timestamp converter cannot handle timezone
nosy: + xiang.zhang

messages: + msg284231

type: crash -> behavior
2016-12-28 23:04:29bozo.kopiccreate