classification
Title: Create Lib/_pydatetime.py file to optimize "import datetime" when _datetime is available
Type: Stage: patch review
Components: Library (Lib) Versions: Python 3.10
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: ezio.melotti, p-ganssle, serhiy.storchaka, shihai1991, vstinner
Priority: normal Keywords: patch

Created on 2020-05-28 00:06 by vstinner, last changed 2020-06-26 14:58 by p-ganssle.

Pull Requests
URL Status Linked Edit
PR 20472 open vstinner, 2020-05-28 00:10
Messages (8)
msg370153 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-05-28 00:06
Currently, "import datetime" starts by importing time, math, sys and operator modules, and then execute 2500 lines of Python code, define 7 classes, etc. For what? Just to remove all classes, functions, etc. to replace them with symbols from _decimal module:
---
try:
    from _datetime import *
except ImportError:
    pass
else:
    # Clean up unused names
    del (_DAYNAMES, _DAYS_BEFORE_MONTH, _DAYS_IN_MONTH, _DI100Y, _DI400Y,
         _DI4Y, _EPOCH, _MAXORDINAL, _MONTHNAMES, _build_struct_time,
         _check_date_fields, _check_time_fields,
         _check_tzinfo_arg, _check_tzname, _check_utc_offset, _cmp, _cmperror,
         _date_class, _days_before_month, _days_before_year, _days_in_month,
         _format_time, _format_offset, _index, _is_leap, _isoweek1monday, _math,
         _ord2ymd, _time, _time_class, _tzinfo_class, _wrap_strftime, _ymd2ord,
         _divide_and_round, _parse_isoformat_date, _parse_isoformat_time,
         _parse_hh_mm_ss_ff, _IsoCalendarDate)
    # XXX Since import * above excludes names that start with _,
    # docstring does not get overwritten. In the future, it may be
    # appropriate to maintain a single module level docstring and
    # remove the following line.
    from _datetime import __doc__
---

I would prefer to use the same approach than the decimal module which also has large C and Python implementation. Lib/decimal.py is just:
---
try:
    from _decimal import *
    from _decimal import __doc__
    from _decimal import __version__
    from _decimal import __libmpdec_version__
except ImportError:
    from _pydecimal import *
    from _pydecimal import __doc__
    from _pydecimal import __version__
    from _pydecimal import __libmpdec_version__
---

Advantages:

* Faster import time
* Avoid importing indirectly time, math, sys and operator modules, whereas they are not used

IMO it also better separate the C and the Python implementations.


Attached PR implements this idea.


Currently, "import datetime" imports 4 modules:

  ['_operator', 'encodings.ascii', 'math', 'operator']

With the PR, "import datetime"  imports only 1 module:

  ['encodings.ascii']

Import performance:

  [ref] 814 us +- 32 us -> [change] 189 us +- 4 us: 4.31x faster (-77%)

Measured by:

  env/bin/python -m pyperf timeit -s 'import sys' 'import datetime; del sys.modules["datetime"]; del sys.modules["_datetime"]; del datetime'


Note: I noticed that "import datetime" imports the math module while working on minimizing "import test.support" imports, bpo-40275.
msg370165 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-05-28 07:03
What do decimals have to datetime?
msg370216 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-05-28 14:19
> What do decimals have to datetime?

Oops. Sorry, I was confused between "datetime" and "decimal" when I created this issue. I fixed the issue title.

My idea is to mimick Lib/decimal.py design for Lib/datetime.py.
msg370222 - (view) Author: Paul Ganssle (p-ganssle) * (Python committer) Date: 2020-05-28 14:39
I basically agree with this — this is one of the reasons I structured the zoneinfo module the way I did rather than mimicking the pattern in datetime.

I believe that there are other modules that have similar situations like heapq, but datetime is probably the worst offender.

I am inclined to say that we should restructure datetime into a folder, containing __init__.py, _datetime.py and possibly _strptime.py (which I think is also only used in datetime), but I think that sort of restructuring is way more sensitive to weird import bugs than this one.

As it is now, I would be shocked if this didn't break *someone*, because people are always relying on weird implementation details (knowingly or unknowingly), but I think it's worth doing; it's good to tackle it this early in the cycle.

@vstinner What do you think about restructuring into a folder-based submodule rather than _pydatetime.py? It's way more likely to break someone, but I think it might be the better way to organize the code, and I don't want to have to go through *two* refactors of this sort.
msg370223 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-05-28 14:53
> I believe that there are other modules that have similar situations like heapq, but datetime is probably the worst offender.

heapq seems to be a little bit different. _heapq is not a drop-in replacement of heapq.py. For example, nlargest() seems to only be implemented in pure Python.


> I am inclined to say that we should restructure datetime into a folder, containing __init__.py, _datetime.py and possibly _strptime.py (which I think is also only used in datetime), but I think that sort of restructuring is way more sensitive to weird import bugs than this one.

I have no idea what are the side effects of converting datetime.py file into a package.

A single file _pydatetime.py seems more convenient to me. I'm aware of _strptime.py but I don't see it as a datetime submodule and I don't see the value of moving it as a datetime submodule.

I'm fine with _datetime accessing _strptime module. It sounds more complex to me if _datetime would be imported by datetime which contains datetime._strptime. I see a higher risk of subtle import issues, since datetime has two implementations (C and Python). But it may be wrong :-)

Also, all other stdlib modules which have a C implementation are designed with files, not folders: io.py (_io and _pyio) and decimal.py (_decimal and _pydecimal) are good examples.

I mostly case about reducing the number of indirect imports and import performance. I don't have a strong opinion about file vs folder.


> As it is now, I would be shocked if this didn't break *someone*, because people are always relying on weird implementation details (knowingly or unknowingly), but I think it's worth doing; it's good to tackle it this early in the cycle.

I'm fine with breaking applications relying on implementation details. Also, we can adjust the code to fix such corner cases later if it's needed, possible and justified :-)
msg370584 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-06-02 00:50
About _strptime, I see that the time.strptime() imports internally the _strptime module. If we move _strptime inside datetime: does it mean that calling time.strptime() would have to import the datetime module? It doesn't sound right to me. I see the time as the low-level interface: it should not rely on the high-level interface. I prefer to separate _strptime module from the datetime module.
msg372425 - (view) Author: Paul Ganssle (p-ganssle) * (Python committer) Date: 2020-06-26 14:40
> bout _strptime, I see that the time.strptime() imports internally the _strptime module.

Ah, sorry, my remark about including `_strptime` was off the cuff — I thought it was only used in `datetime`, which is why I said "possibly _strptime". If it's used for `time` as well, we should leave it where it is.
msg372427 - (view) Author: Paul Ganssle (p-ganssle) * (Python committer) Date: 2020-06-26 14:58
As for deciding between moving to `datetime/` and moving to `_pydatetime`, I think we should send an e-mail to Python-Dev about it to get a wider perspective, because the import machinery is a lot of black magic, and I know that there are large proprietary code bases out there that pile weird stuff on top of it. I'm not sure I can fully appreciate the trade-offs.

The biggest advantage I see to moving `datetime` into its own folder is that it gives us a lot more freedom to expand into smaller sub-modules in the future. For example, in `zoneinfo`, we have zoneinfo/_common.py (https://github.com/python/cpython/blob/2e0a920e9eb540654c0bb2298143b00637dc5961/Lib/zoneinfo/_common.py), which is some logic shared between the C and Python implementations; `_zoneinfo.c` is able to rely directly on `_common.py` without importing `zoneinfo/_zoneinfo.py` (which saves us a bunch of other module imports as well).

Right now the C implementation of `datetime` only directly imports `time` and `_strptime`, but I could imagine future enhancements that would be stupidly inconvenient to implement in C, but where we wouldn't want to implement all of _pydatetime just to get a pure-Python implementation. Having a namespace available for such packages would be useful.
History
Date User Action Args
2020-06-26 14:58:45p-gansslesetmessages: + msg372427
2020-06-26 14:40:09p-gansslesetmessages: + msg372425
2020-06-25 17:26:02shihai1991setnosy: + shihai1991
2020-06-02 00:50:52vstinnersetmessages: + msg370584
2020-05-28 14:53:35vstinnersetmessages: + msg370223
2020-05-28 14:39:10p-gansslesetmessages: + msg370222
2020-05-28 14:19:30vstinnersetmessages: + msg370216
2020-05-28 14:18:38vstinnersettitle: Create Lib/_pydecimal.py file to optimize "import datetime" when _decimal is available -> Create Lib/_pydatetime.py file to optimize "import datetime" when _datetime is available
2020-05-28 07:03:31serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg370165
2020-05-28 01:46:36vstinnersetnosy: + p-ganssle
2020-05-28 00:10:17vstinnersetkeywords: + patch
stage: patch review
pull_requests: + pull_request19724
2020-05-28 00:09:57ezio.melottisetnosy: + ezio.melotti
2020-05-28 00:06:58vstinnercreate