This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: valid datetimes can become invalid if the timezone is changed
Type: Stage:
Components: Library (Lib) Versions: Python 3.10, Python 3.9, Python 3.8
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: belopolsky, eric.araujo, eryksun, lemburg, p-ganssle, terry.reedy, zzzeek
Priority: normal Keywords:

Created on 2021-03-13 03:30 by zzzeek, last changed 2022-04-11 14:59 by admin.

Messages (6)
msg388585 - (view) Author: mike bayer (zzzeek) * Date: 2021-03-13 03:30
So I'm pretty sure this is "not a bug" but it's a bit of a problem and I have a user suggesting the "security vulnerability" bell on this one, and to be honest I don't even know what any library would do to "prevent" this.

Basically, the datetime() object limits based on a numerical year of MINYEAR, rather than limiting based on an actual logical date.    

So I can create an "impossible" date as follows:


d = datetime.strptime("Mon Jan  1 00:00:00 0001 +01:00", "%c %z")

or like this:

d = datetime(year=1, month=1, day=1, tzinfo=timezone(timedelta(hours=1)))

and....you can see where this is going - it can't be converted to a timezone that pushes the year to zero:

>>> from datetime import datetime, timezone, timedelta
>>> d = datetime(year=1, month=1, day=1, tzinfo=timezone(timedelta(hours=1)))
>>> d.astimezone(timezone.utc)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OverflowError: date value out of range

this because, after all, astimezone() is just subraction or addition and if it overflows past the artificial boundary, well you're out of luck.

Why's this a security problem?   ish?    because PostgreSQL has a data type "TIMESTAMP WITH TIMEZONE" and if you take said date and INSERT it into your database, then SELECT it back using any Python DBAPI that returns datetime() objects like psycopg2, if your server is in a timezone with zero or negative offset compared to the given date, you get an error.  So the mischievous user can create that datetime for some reason and now they've broken your website which can't SELECT that table anymore without crashing.

So, suppose you maintain the database library that helps people send data in and out of psycopg2.    We have, the end user's application, we have the database abstraction library, we have the psycopg2 driver, we have Python's datetime() object with MIN_YEAR, and finally we have PostgreSQL with the TIMEZONE WITH TIMESTAMP datatype that I've never liked.

Of those five roles, whose bug is this?    I'd like to say it's the end user for letting untrusted input that can put unusual timezones and timestamps in their system.   But at the same time it's a little weird Python is letting me create this date that lacks the ability to convert into UTC.     thanks for reading!
msg389121 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2021-03-19 22:46
This tracker is for patching CPython, including the docs.  Questions for discussion should be posted to python-list.  Perhaps such a discussion would lead to a concrete change proposal.  In the meanwhile, I think that this should be closed as 'not a bug' (as you admit).
msg389123 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2021-03-19 23:02
That it allows creating the datetime instance looks like a bug to me, i.e. a time before 0001-01-01 00:00 UTC is invalid. What am I misunderstanding?
msg389127 - (view) Author: Paul Ganssle (p-ganssle) * (Python committer) Date: 2021-03-20 00:05
> That it allows creating the datetime instance looks like a bug to me, i.e. a time before 0001-01-01 00:00 UTC is invalid. What am I misunderstanding?

`datetime.datetime(1, 1, 1, tzinfo=timezone(timedelta(hours=1)))` is a valid datetime, it's just that it cannot be converted to all other timestamps, because in some time zones, the same absolute time is out of datetime's range.

`datetime.datetime` is a representation of an abstract datetime, and it can also be annotated with a time zone to basically tag the civil time with a function for converting it into other representations of the same *absolute* time. The range of valid `datetime.datetime` objects is based entirely on the naïve portion of the datetime, and has nothing to do with the absolute time. So this is indeed a natural consequence of the chosen design.

If we wanted to change things, it would cause a number of problems, and the cure would be much worse than the "disease". For one thing, accessing UTC offsets is done lazily, so `.utcoffset()` is not currently called during `datetime` creation. The datetime documentation lays out that this is consistent with the raison d'être of `datetime`: "While date and time arithmetic is supported, the focus of the implementation is on efficient attribute extraction for output formatting and manipulation." In order to determine whether a given `datetime` can always be converted to an equivalent datetime in any time zone, we'd need to actively determine its UTC offset, which would be a major performance regression in creating aware datetimes. We could avoid this performance regression by only doing the `.utcoffset()` check when the datetime is within 2 days of `MINYEAR` or `MAXYEAR`, but while this would be a more minor performance regression, it would also add new edge cases where `.utcoffset()` is sometimes called during the constructor and sometimes not, which is not ideal. Not to mention if we were to ever open up the allowed return values for `.utcoffset()` the logic might get hairier (depending on the nature of the allowed values).

Another issue with "fixing" this is that it would take currently-valid datetimes and turn them into invalid datetimes, which violates backwards compatibility. I imagine in most cases this is only done as part of test suites, since TZ-aware datetimes near 0 and 10,000 CE are anachronistic and not likely to be of much instrumental value, but the same can be said of these potentially "invalid" dates in the first place.

Additionally, even worse is that even naïve datetimes can be converted to UTC or other time zones, and if we want to add a new constraint that `some_datetime.astimezone(some_timezone)` must always work, then you wouldn't even be able to *construct* `datetime.MINYEAR` or `datetime.MAXYEAR`, since `datetime.MINYEAR.astimezone(timezone(timedelta(hours=-24)))` would fail everywhere, and worse, the minimum datetime value you could construct would depend on your system locale! Again, the alternative would be to make an exception for naïve datetimes, but given that this change is of dubious value to start with, I don't think it is worth it.

> So I'm pretty sure this is "not a bug" but it's a bit of a problem and I have a user suggesting the "security vulnerability" bell on this one, and to be honest I don't even know what any library would do to "prevent" this.

I don't really know why it would be a "security vulnerability", but presumably a library could either convert their datetimes to UTC as soon as they get them from the user if they want to use them as UTC in the future, or they could simply refuse to accept any datetimes outside the range `datetime.datetime.MINYEAR + timedelta(hours=48) < dt.replace(tzinfo=None) < datetime.datetime.MAXYEAR - timedelta(hours=48)`, or if the concern is only about UTC, then refuse datetimes outside the range `datetime.MINYEAR.replace(tzinfo=timezone.utc) < dt < datetime.MAXYEAR.replace(tzinfo=timezone.utc)`.

> Why's this a security problem?   ish?    because PostgreSQL has a data type "TIMESTAMP WITH TIMEZONE" and if you take said date and INSERT it into your database, then SELECT it back using any Python DBAPI that returns datetime() objects like psycopg2, if your server is in a timezone with zero or negative offset compared to the given date, you get an error.  So the mischievous user can create that datetime for some reason and now they've broken your website which can't SELECT that table anymore without crashing.

Can you clarify why this crashes? Is it because it always returns the datetime value in UTC?

> So, suppose you maintain the database library that helps people send data in and out of psycopg2.    We have, the end user's application, we have the database abstraction library, we have the psycopg2 driver, we have Python's datetime() object with MIN_YEAR, and finally we have PostgreSQL with the TIMEZONE WITH TIMESTAMP datatype that I've never liked.
>
>Of those five roles, whose bug is this?    I'd like to say it's the end user for letting untrusted input that can put unusual timezones and timestamps in their system.   But at the same time it's a little weird Python is letting me create this date that lacks the ability to convert into UTC.     thanks for reading!

It sounds like the fundamental issue here is that PostgreSQL supports a different range of datetimes than Python does, regardless of the question of whether any Python datetime can be converted into another timezone. It sounds to me like that's a mismatch between PostgreSQL's data type and Python's data type, and I'm not sure how they are squared up now. If the database abstraction layer is making the guarantee that anything you can INSERT into the database can be SELECTed back out (meaning that it will artificially restrict the range of valid values to the union of those supported by Postgres and Python), then if the abstraction layer allows you to choose what time zone applies when you read the values out, it should probably make sure that it doesn't write something in that can't be represented in another time zone.

If the abstraction layer's guarantee is that it can read out anything that can be stored in a postgres database, then I still think it might be the abstraction layer's bug, since apparently Postgres lets you store values that cannot be represented in all the supported time zones that you could read them out as — the abstraction layer may need to use a different type (possibly a subclass) for "out-of-range" values.

I'll note that I don't actually understand the difference between "abstraction layer" and "psycopg2 driver", so I may be conflating those two, but at the end of the day I think one of those two is the right place to fix this. If there are good reasons for not fixing it in either of those two layers, then obviously the buck stops in the user's application — though obviously it would be better to fix it lower in the stack, since it's difficult for users to keep track of minutia like this. I think it likely that neither CPython nor Postgres are the right levels of abstraction to fix it as, since neither one is in the business of making their datetime types compatible with one another (and each presumably has a thousand other integrations to also worry about).

Hopefully that's helpful information! Understandable if the answer is somewhat disappointing.
msg389128 - (view) Author: mike bayer (zzzeek) * Date: 2021-03-20 00:15
> 
I don't really know why it would be a "security vulnerability", but presumably a library could either convert their datetimes to UTC as soon as they get them from the user if they want to use them as UTC in the future, or they could simply refuse to accept any datetimes outside the range `datetime.datetime.MINYEAR + timedelta(hours=48) < dt.replace(tzinfo=None) < datetime.datetime.MAXYEAR - timedelta(hours=48)`, 

this is absolutely correct, but I'm not sure if you're aware, there's kind of a whole subsection of the tech community that considers anything that a user might do without understanding all the consequences which could in any way allow untrusted input to affect things to be a "security risk".  In SQLAlchemy i had CVEs posted because we have a method called order_by() that accepted a string, and the notion was, someone will write a web app that takes an arbitrary string as input and send it there!  CVE!   For you and me that would of course be crazy as this is obviously part of the SQL string being sent to the database,  but this is a particular programming subculture that has the ability to create a lot of havoc by filling up the CVE system with "Security Vulnerabilities" based on what many of us consider obviously wrong.

> Can you clarify why this crashes? Is it because it always returns the datetime value in UTC?

it returns the datetime value in the default timezone set up for the server which could be UTC or a local timezone, but the idea is it's potentially different from the timezone that's been put in.

> I'll note that I don't actually understand the difference between "abstraction layer" and "psycopg2 driver", so I may be conflating those two, 

from my POV I have always thought PostgreSQLs' TIMESTAMP WITH TIMEZONE datatype is nuts, and that you should only be sending UTC timestamps to a database.   But people want to use PG's type and IMO they need to understand what they're doing.  thanks for the response.
msg389130 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2021-03-20 00:28
Thank you for thoughtful and detailed answer, Paul.
History
Date User Action Args
2022-04-11 14:59:42adminsetgithub: 87650
2021-03-20 00:28:49eryksunsetmessages: + msg389130
2021-03-20 00:15:45zzzeeksetmessages: + msg389128
2021-03-20 00:05:32p-gansslesetmessages: + msg389127
2021-03-19 23:02:04eryksunsetnosy: + eryksun
messages: + msg389123
2021-03-19 22:46:00terry.reedysetnosy: + terry.reedy

messages: + msg389121
title: we can create valid datetime objects that become invalid if the timezone is changed -> valid datetimes can become invalid if the timezone is changed
2021-03-19 19:31:33eric.araujosetnosy: + eric.araujo

versions: - Python 3.6, Python 3.7
2021-03-13 03:51:25zach.waresetnosy: + lemburg, belopolsky, p-ganssle
2021-03-13 03:30:12zzzeekcreate