classification
Title: Add pure Python implementation of datetime module to CPython
Type: enhancement Stage: resolved
Components: Library (Lib) Versions: Python 3.2
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: belopolsky Nosy List: amaury.forgeotdarc, belopolsky, brett.cannon, brian.curtin, daniel.urban, davidfraser, giampaolo.rodola, haypo, lemburg, mark.dickinson, merwok, pitrou, r.david.murray, rhettinger, techtonik, tim.peters
Priority: normal Keywords: patch

Created on 2010-02-22 16:22 by brian.curtin, last changed 2010-07-23 19:32 by belopolsky. This issue is now closed.

Files
File name Uploaded Description Edit
PyPy-2.7.diff belopolsky, 2010-06-17 16:46
datetime-sandbox-pypy.diff belopolsky, 2010-06-17 20:14
issue7989-2.7-3.2.diff belopolsky, 2010-06-19 01:03
issue7989-cmp.diff belopolsky, 2010-07-02 20:24
issue5288-proto.diff belopolsky, 2010-07-02 23:38
issue7989.diff belopolsky, 2010-07-03 05:50 Patch against py3k
issue7989b.diff belopolsky, 2010-07-13 03:19
issue7989c.diff belopolsky, 2010-07-14 01:28
issue7989d.diff belopolsky, 2010-07-22 19:10
datetimetester.py belopolsky, 2010-07-23 14:23
issue7989e.diff belopolsky, 2010-07-23 16:19
Messages (91)
msg99774 - (view) Author: Brian Curtin (brian.curtin) * (Python committer) Date: 2010-02-22 16:22
After discussion on numerous issues, python-dev, and here at the PyCon sprints, it seems to be a good idea to move timemodule.c to _timemodule.c and convert as much as possible into pure Python. The same change seems good for datetime.c as well.
msg99801 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-02-22 18:27
By 'convert', I believe you mean 'create a python implementation of'.  That is, there's no reason to drop any c code, just to create parallel python versions when possible.

Also note that one of the alternate implementations (IronPython I think) has an implementation of DateTime in Python that they plan to contribute.
msg99804 - (view) Author: Brian Curtin (brian.curtin) * (Python committer) Date: 2010-02-22 18:30
Correct, your wording is better than mine.

I'll ask around and see where that datetime module may be and what it's state is.
msg107171 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-06-06 01:35
As far as I remember, the datetime module started as a pure python module and was reimplemented in C around year 2003 or so.  One of the important additions at that time was the C API to datetime functionality.  I am afraid that with the _timemodule.c/timemodule.py split there will be more an more functionality that is awkward to access from C API.
msg107258 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2010-06-07 08:06
Alexander Belopolsky wrote:
> 
> Alexander Belopolsky <belopolsky@users.sourceforge.net> added the comment:
> 
> As far as I remember, the datetime module started as a pure python module and was reimplemented in C around year 2003 or so.  One of the important additions at that time was the C API to datetime functionality.  I am afraid that with the _timemodule.c/timemodule.py split there will be more an more functionality that is awkward to access from C API.

That's correct, though the main reason for rewriting the module in
C was to gain performance - this is essential for basic types like
date/time types.

-1 on undoing the C rewrite.

It would be much better to spell out the problems you mention and
provide patches to implement solutions for them.
msg107272 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-06-07 17:17
For the datetime module there are also a few more subtle issues that would make it difficult to make a hybrid C/Python implementation.  The problem is that unlike the time module where most of the functionality is implemented in module level functions, datetime functionality is entirely in class methods.

One may suggest to use inheritance and derive datetime classes from those implemented in _datetime.c, but this will not work because for example datetime + timedelta will return a _datetime class unless __add__ is overridden in Python.  There is another even less obvious issue with inheriting from datetime classes: see issue #5516.

Therefore, it looks like there are only two choices:

1. Replicate all functionality in both _datetime.c and datetime.py and thus double the cost of implementing new features in CPython. (OK, maybe not double because Python development is easier than C, but increase instead of decrease.)

2. Incur a performance penalty in every method because every C call wil have to be wrapped in Python.

Another -1 from me.
msg107275 - (view) Author: Brian Curtin (brian.curtin) * (Python committer) Date: 2010-06-07 19:14
It seems like this might not be worth it or a good idea, and I have no strong feeling for this being done. Feel free to close/reject this one.
msg107292 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-06-08 00:28
Brett Cannon wrote in a comment (msg106498) on another issue:

"""
The stated long-term goal of the stdlib is to minimize the C extension modules to only those that have to be written in C (modules can still have performance enhancing extension back-ends). Since datetime does not meet that requirement it's not a matter of "if" but "when" datetime will get a pure Python version and use the extension code only for performance.
"""

I'll keep this open for a while to give Brett and others a chance to respond to opposition.
msg107293 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-06-08 00:35
Also, my opposition is only to splitting datetime.  While I am not against splitting the time module, I believe it should be phased out eventually and posix compatibility portion folded into posix module.
msg107295 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2010-06-08 00:58
So I see a couple of objections here to the idea that I will try to address.

First is MAL's thinking that this will undo any C code, which it won't. The idea is that stdlib modules that do not inherently rely on other C code (e.g. sqlite3 does not fall underneath this) would have a pure Python implementation with possible C enhancements. In the case of datetime that  code is done, so it won't go anywhere. In this case it would be bringing in a pure Python implementation like the one PyPy maintains. You can look at heapq if you want an existing example of what it looks like to maintain a pure Python and C version of a module.

Alexander is worried about performance because of Python shims and duplication of work. For the performance issue, there is none. If you look at something like heapq you will see that there is an `import *` in there to pull in all extension code; there is no Python shim to pass through if you don't engineer the extension that way. So in datetime's case all of the extension code would just get pulled into datetime.py directly and not have any indirection cost.

As for duplication of work, we already have that with datetime in the other Python VMs. IronPython, Jython, and PyPy have to do their own version of datetime because the stdlib doesn't provide a pure Python version for them to use. So while CPython might not be duplicating work, other people are. Besides, people typically prototype in Python anyway (including you, Alexander, for the UTC patch) and then write the C code, so there really isn't any wasted development cycles by having the Python and C version already.

The key thing to remember is that this is so the VMs other than CPython are not treated as second-class. They are legit implementations just like CPython, but because we have this legacy stuff written in C in the stdlib they are put at a disadvantage. It would be better to pool resources and make sure everyone gets to use an equivalent standard library when possible.

And I should also mention I expect the PyPy folks to be the ones to move their code over; I would never expect anyone to re-implement datetime from scratch.
msg107298 - (view) Author: anatoly techtonik (techtonik) Date: 2010-06-08 03:51
It would be nice to see the transition accompanied by some tutorial that could be used as an example for other similar tasks.
msg107302 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-06-08 04:30
Brett,

Thanks for your explanation.  It looks like I misunderstood the proposal.  I though the idea was to have some methods of e.g. date type implemented in python and some in C.  What you propose is much simpler.  Effectively, your proposal is "let's distribute a pure python implementation of datetime with CPython."  I don't have any problem with this.  Patches welcome!
msg107303 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-06-08 04:42
I am changing the title to match Brett's explanation better.  Note that since the time module is a thin wrapper around C library calls, it falls under inherently relying on C code exception.
msg107308 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2010-06-08 08:02
Brett Cannon wrote:
> 
> Brett Cannon <brett@python.org> added the comment:
> 
> So I see a couple of objections here to the idea that I will try to address.
> 
> First is MAL's thinking that this will undo any C code, which it won't. The idea is that stdlib modules that do not inherently rely on other C code (e.g. sqlite3 does not fall underneath this) would have a pure Python implementation with possible C enhancements. In the case of datetime that  code is done, so it won't go anywhere. In this case it would be bringing in a pure Python implementation like the one PyPy maintains. You can look at heapq if you want an existing example of what it looks like to maintain a pure Python and C version of a module.

So the proposal is to have something like we have for pickle, with
cPickle being the fast version and pickle.py the slow Python one ?

Since no CPython would use the Python version, who would be supporting
the Python-only version ?
msg107309 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2010-06-08 08:20
Oops, sorry. Looks like the Roundup email interface changed the ticket
title back to the old one again (I was replying to Brett's comment under the old title).
msg107312 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2010-06-08 11:06
Even from pypy perspective, a pure python implementation is not ideal because it makes it difficult to implement the C API.
msg107317 - (view) Author: anatoly techtonik (techtonik) Date: 2010-06-08 12:27
On Tue, Jun 8, 2010 at 2:06 PM, Amaury Forgeot d'Arc
<report@bugs.python.org> wrote:
>
> Even from pypy perspective, a pure python implementation is not ideal because it makes it difficult to implement the C API.

C API
must die
a shadow of Go
msg107333 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2010-06-08 18:09
So yes, cPickle/pickle, cStringIO/StringIO, heapq, etc. are all examples of the approach. One could choose to write the pure Python version first, profile the code, and only write extension code for the hot spots, but obviously that typically doesn't happen.

As for who maintains it, that's python-dev, just like every other module that is structured like this. When the stdlib gets more of a clear separation from CPython I suspect the other VM maintainers will contribute more.

As for PyPy not specifically needing this, that still doesn't solve the problem that Jython and IronPython have with extension code or any other future VM that doesn't try to come up with a solution for extensions.
msg107451 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2010-06-10 09:54
Brett Cannon wrote:
> 
> Brett Cannon <brett@python.org> added the comment:
> 
> So yes, cPickle/pickle, cStringIO/StringIO, heapq, etc. are all examples of the approach. One could choose to write the pure Python version first, profile the code, and only write extension code for the hot spots, but obviously that typically doesn't happen.

That's what was done for the datetime module. The pure-Python version
just never made it into the stdlib, AFAIK.

Note that we've just dropped the pure-Python version of the io package
as well, so an approach where we keep the pure-Python prototype
would be a novelty in Python land and should probably be codified
in a PEP.

> As for who maintains it, that's python-dev, just like every other module that is structured like this. When the stdlib gets more of a clear separation from CPython I suspect the other VM maintainers will contribute more.

I'm not sure whether there would be much interest in this. Unless
the core devs are also active in other VM implementations, there's
little motivation to maintain two separate implementations of the
same thing.

Users of CPython will likely only use the C version anyway, so the
pure-Python code would also get little real-life testing.

Perhaps we should open up python-dev to external VM developers
that would have to rely on those pure-Python implementations ?!

> As for PyPy not specifically needing this, that still doesn't solve the problem that Jython and IronPython have with extension code or any other future VM that doesn't try to come up with a solution for extensions.

Both Jython and IronPython could add bridges to CPython extensions
(Jython via the JNI and IronPython via unmanaged code.

Still, you're right in that it's unlikely they will move away from
being pure-Java or pure-C# implementations, so they do have a need
for such pure-Python implementations.
msg107453 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2010-06-10 10:48
I like the idea of a pure Python implementation of the datetime module, for different reasons:
 - it will become the reference implementation
 - other Python interpreters can use it
 - it can be used to test another implementation, eg. the current C version
 - implement/test a new feature is much faster in Python than in C

About the last point: I already used _pyio many times to fix a bug or to develop a new feature. _pyio helps to choice the right solution because you can easily write a short patch and so compare different solutions.

If other Python interpreters have already their Python implementation, we can just choose the best one, and patch it to add last new features of the C version.
msg107454 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2010-06-10 11:03
STINNER Victor wrote:
> 
> STINNER Victor <victor.stinner@haypocalc.com> added the comment:
> 
> I like the idea of a pure Python implementation of the datetime module, for different reasons:
>  - it will become the reference implementation
>  - other Python interpreters can use it
>  - it can be used to test another implementation, eg. the current C version
>  - implement/test a new feature is much faster in Python than in C
> 
> About the last point: I already used _pyio many times to fix a bug or to develop a new feature. _pyio helps to choice the right solution because you can easily write a short patch and so compare different solutions.

Ah, so that where the Python io module hides. Thanks for the pointer.
msg108019 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-06-17 14:24
I would like to move this forward.  The PyPy implementation at

http://codespeak.net/pypy/dist/pypy/lib/datetime.py

claims to be based on the original CPython datetime implementation from the time when datetime was a python module.  I looked through the code and it seems to be very similar to datetime.c.  Some docstings and comments are literal copies.  I think it will not be hard to port that to 3.x.

I have a few questions, though.

1. I remember seeing python-dev discussion that concluded that the best way to distribute parallel C and Python implementations was to have module.py with the following:

# pure python implementation

def foo():
    pass

def bar():
    pass

# ..

try:
    from _module import *
except ImportError:
    pass

Is this still the state of the art?  What about parsing overhead?

2. Is there a standard mechanism to ensure that unitests run both python and C code?  I believe sys.module['_module'] = None will prevent importing _module.  Is there direct regrtest support for this?
msg108020 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2010-06-17 14:31
Alexander Belopolsky wrote:
> 
> Alexander Belopolsky <belopolsky@users.sourceforge.net> added the comment:
> 
> I would like to move this forward.  The PyPy implementation at
> 
> http://codespeak.net/pypy/dist/pypy/lib/datetime.py
> 
> claims to be based on the original CPython datetime implementation from the time when datetime was a python module.  I looked through the code and it seems to be very similar to datetime.c.  Some docstings and comments are literal copies.  I think it will not be hard to port that to 3.x.
> 
> I have a few questions, though.
> 
> 1. I remember seeing python-dev discussion that concluded that the best way to distribute parallel C and Python implementations was to have module.py with the following:
> 
> # pure python implementation
> 
> def foo():
>     pass
> 
> def bar():
>     pass
> 
> # ..
> 
> try:
>     from _module import *
> except ImportError:
>     pass
> 
> Is this still the state of the art?  What about parsing overhead?

That approached was used for modules where the C bits replaced the Python
ones. The Python bites were then typically removed altogether.

To avoid the wasted memory and import time, it's better to use:

try:
    from _cmodule import *
except ImportError:
    from _pymodule import *

> 2. Is there a standard mechanism to ensure that unitests run both python and C code?  I believe sys.module['_module'] = None will prevent importing _module.  Is there direct regrtest support for this?

Why not import the two modules directly ?

import _cmodule as module
and
import _pymodule as module
msg108021 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-06-17 14:32
> Is this still the state of the art?  What about parsing overhead?

The io module has three modules:
- io.py just imports everything from _io
- _io is the default C implementation
- _pyio.py must be imported explicitly to get the pure Python implementation

=> no parsing overhead for the default case of importing the C implementation

> Is there direct regrtest support for this?

You can take a look at test_io, test_memoryio or test_heapq for inspiration.
msg108023 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-06-17 14:45
On Thu, Jun 17, 2010 at 10:31 AM, Marc-Andre Lemburg <report@bugs.python.org> wrote:
..
> To avoid the wasted memory and import time, it's better to use:
>
> try:
>    from _cmodule import *
> except ImportError:
>    from _pymodule import *
>

Hmm, I cannot find the relevant thread, but I thought this was rejected at some point.  Personally, I don't like this at all for the following reasons:

1. This introduces two _.. names instead of one.

2. This departs from established convention that C (or native) implementation for modulename is in _modulename, not _cmodulename.  Non-C implementations may still provide native _modulename, but would not want to call it _cmodulename.

3. Hiding python code in _pymodule makes it harder to find it.

..
> Why not import the two modules directly ?
>
> import _cmodule as module
> and
> import _pymodule as module
>

Because this requires having two modules in the first place.
msg108024 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-06-17 14:49
> To avoid the wasted memory and import time, it's better to use:
>
> try:
>    from _cmodule import *
> except ImportError:
>    from _pymodule import *
>

.. also this makes it harder to prototype things in Python or have mixed Python/C modules.  The goal is to use Python implementation unless native implementation exists on per function/class basis.  The syntax above makes it all or nothing.
msg108025 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-06-17 14:55
I think we have no standard for this yet, though it has been discussed.  If you can't find a python-dev thread about it, you should probably start a new one.

As one example, heapq does:

  try:
      from _heapq import *
  except ImportError:
       pass

after having defined the python.  Which does not incur parsing overhead in most real-world situations since most distributions generate the .pyc files during install, but does incur the execution overhead on first import.

On the other hand, io doesn't fall back to _pyio at all (perhaps this is a bug).

As for the tests, the way this is typically done is that you define a base test class that is *not* a TestCase, and then you define two subclasses that are TestCases and mix in the base class.  You then assign the appropriate module (or function or whatever) under test as attributes of the subclasses, and the base class uses those attributes to run the tests.  That way you know all the tests are run for both the Python and the C implementation.
msg108026 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2010-06-17 14:59
Alexander Belopolsky wrote:
> 
> Alexander Belopolsky <belopolsky@users.sourceforge.net> added the comment:
> 
> On Thu, Jun 17, 2010 at 10:31 AM, Marc-Andre Lemburg <report@bugs.python.org> wrote:
> ..
>> To avoid the wasted memory and import time, it's better to use:
>>
>> try:
>>    from _cmodule import *
>> except ImportError:
>>    from _pymodule import *
>>
> 
> Hmm, I cannot find the relevant thread, but I thought this was rejected at some point.  Personally, I don't like this at all for the following reasons:
> 
> 1. This introduces two _.. names instead of one.
> 
> 2. This departs from established convention that C (or native) implementation for modulename is in _modulename, not _cmodulename.  Non-C implementations may still provide native _modulename, but would not want to call it _cmodulename.
> 
> 3. Hiding python code in _pymodule makes it harder to find it.

Well, you wanted to have two implementation of the same thing in the
stdlib :-) I personally don't think that's a good idea. We've had
trouble in the past of keeping pickle.py and cPickle.c in sync, it's
not going to be much different with those two datetime implementations.

In any case, we shouldn't make regular CPython use of datetime slower
and use more memory, just to make life easier for PyPy.

>> Why not import the two modules directly ?
>>
>> import _cmodule as module
>> and
>> import _pymodule as module
>>
> 
> Because this requires having two modules in the first place.

Where's the problem ? Disk space ?
msg108027 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-06-17 15:00
On Thu, Jun 17, 2010 at 10:32 AM, Antoine Pitrou <report@bugs.python.org> wrote:
..
>> Is there direct regrtest support for this?
>
> You can take a look at test_io, test_memoryio or test_heapq for inspiration.
>

I looked at test_io and don't like that approach.  It seems to require subclassing each TestCase twice for C and Python.  There is no mechanism to assure that all tests are replicated that way.
msg108028 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-06-17 15:02
> I looked at test_io and don't like that approach.  It seems to require
> subclassing each TestCase twice for C and Python.  There is no
> mechanism to assure that all tests are replicated that way.

Subclassing /is/ the mechanism :)
Furthermore, some rare tests are Py-specific and some rare others are
C-specific: you want specific test classes for them anyway.

The only alternative is to manually duplicate tests, these leads to very
poor test coverage because of the average developer's laziness (json is
an example).
msg108029 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2010-06-17 15:03
Alexander Belopolsky wrote:
> 
> Alexander Belopolsky <belopolsky@users.sourceforge.net> added the comment:
> 
>> To avoid the wasted memory and import time, it's better to use:
>>
>> try:
>>    from _cmodule import *
>> except ImportError:
>>    from _pymodule import *
>>
> 
> .. also this makes it harder to prototype things in Python or have mixed Python/C modules.  The goal is to use Python implementation unless native implementation exists on per function/class basis.  The syntax above makes it all or nothing.

Why ?

You can have the Python parts that are used by both implementation
defined in the datetime.py module.

Alternatively, you could write:

try:
    # Use the faster C version
    from _module import *
except ImportError:
    # Use Python
    class datetime:
        ...

I find that rather ugly, though.
msg108030 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-06-17 15:35
> The only alternative is to manually duplicate tests, these leads to very
> poor test coverage because of the average developer's laziness (json is
> an example).

No, here is another alternative:


==> _example.py <==
def foo():
    print(__name__)

==> example.py <==
def foo():
    print(__name__)
try:
    from _example import *
except ImportError:
    pass

==> test_example.py <==
import sys
sys.modules['_example'] = None
import example
example.foo()
del sys.modules['_example']
import _example as example
example.foo()

With the code above,

$ ./python.exe test_example.py
example
_example


If we move import to setUp(), we can run each test case twice: with and without native code.  Tests that are specific to one implementation can be run once or skipped conditionally on per test method basis.
msg108035 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-06-17 16:46
Porting PyPy implementation to 2.7 was fairly easy.  I am posting the patch which makes PyPy datetime.py pass regression tests when dropped in the trunk.

I expect 3.x port to be uneventful as well.  Raising the priority because I would like to check this in before other datetime feature requests.
msg108037 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-06-17 17:48
I don't see how "moving the import to setUp" is going to avoid having to explicitly run each set of tests twice, though.
msg108047 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2010-06-17 18:54
A couple of things about all of this.

One, we should not blindly pull in the PyPy code without some core PyPy developer being in on this; just common courtesy and I don't think anyone participating in this discussion is a PyPy developer (but I could be wrong).

Two, as David pointed out, parsing overhead is pretty minor thing to be worrying about thanks to bytecode. The import * solution at the end of the main file is the agreed-upon approach (it has been discussed at some point).

Three, for testing you can also look at test_warnings (the creation of _warnings led to the discussion of best practices for all of this). Basically you use test.support.import_fresh_module to get the pure Python version and the C-enhanced one, write your tests with the module being tested set on the class, and then subclass with the proper modules as a class attribute. I understand your worry, Alexander, about accidentally missing a test class for a module, but in practice that will be rare as people will be watching for that, and you just do the subclass first. There is in practice no need to get too fancy, and you have to make sure your tests are discoverable anyway by test runners that simply look for classes that inherit from unittest.TestCase.

Best alternative you could do is a metaclass that searches for tests that start with 'test' *and* take an argument for the module to test, and then auto-create methods that take *no* arguments and then call the test methods directly (after renaming them so that test runners don't try to use them). Then you can pass in the modules to test as arguments to the metaclass.
msg108048 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2010-06-17 19:12
> One, we should not blindly pull in the PyPy code 
> without some core PyPy developer being in on this

I concur.  Much of PyPy code is written for a restricted subset of Python instead of clean, idiomatic modern Python.

Also, this should not be marked as high priority.  It may be a personal priority for you, but it is by no means essential for Py3.2 or something that other developers should prioritize higher than other tasks like fixing bugs.
msg108051 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-06-17 19:31
> Also, this should not be marked as high priority.  It may be a
> personal priority for you, ...

Reverting priority.  I thought once an issue is assigned, the priority becomes the priority that assignee places on the issue. Sorry for the confusion.
msg108053 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-06-17 19:37
>> One, we should not blindly pull in the PyPy code
>> without some core PyPy developer being in on this
>
> I concur.  Much of PyPy code is written for a restricted subset of
> Python instead of clean, idiomatic modern Python.

Raymond, I think you misread Brett's comment the same way as I did when I first saw it.  Brett wrote "core PyPy developer", not "core CPython developer".  Of course this will go through normal patch review process and will be looked at by at least two cpython developers before it goes it.  I also agree with Brett that it would be great to get input from PyPy developers and they may see benefit from 2.7 and 3.x ports of their code.  I am just not familiar with PyPy community and will have to research how to approach them.
msg108055 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2010-06-17 19:47
I would simply email their developer mailing list (find it at http://pypy.org/contact.html) and say that you are willing to work on this. Maciej and I have discussed this before, so this won't be a total shock to them.

As for Raymond's comment, I think he understood what I meant. What he is worried about is that datetime as PyPy has implemented it is done in RPython which is a custom subset of Python, and not a "normal" Python implementation. But if they have simply been maintaining the pure Python version that Tim wrote way back in the day then I suspect it's not in RPython.
msg108056 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-06-17 20:14
I am attaching datetime-sandbox-pypy.diff, a plain diff between six-year-old sandbox and pypy versions.  (Plain diff is cleaner than unified diff.)

You can see that the differences are trivial.  I notice, however that original datetime implementation was returning subclass instances from operations on datetime subclass instances.  Brett, this is off-topic hear, but I would appreciate your take on msg107410.

BTW, in order to preserve history, it may be a good idea to develop this in a branch off datetime sandbox and merge it back when ready.
msg108059 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2010-06-17 20:23
I would not worry about the history too much; the code has been forked and pulling it back in means there is already some history missing. Just do what is easiest.
msg108089 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2010-06-18 09:16
>> One, we should not blindly pull in the PyPy code
>> without some core PyPy developer being in on this

You can count me among the PyPy developers.

> I concur.  Much of PyPy code is written for a restricted subset of
> Python instead of clean, idiomatic modern Python.

Not this part. The module datetime.py is meant to be imported by the interpreter, and has no limitation (we call it "application-level" code, opposed to interpreter-level code which is translated to C and which indeed has serious constraints)
msg108090 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2010-06-18 09:18
If both implementations can exist in the same interpreter, how will they cooperate?
For example, Time instances created with datetime.py won't pass PyTime_Check().
msg108122 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2010-06-18 18:24
I've been thinking about this feature request and am starting to question whether it is necessary.  Usually we need a pure python version when 1) the python module is developed first, with the subsequent C code needing to match, or 2) we expect a porting issue, or 3) for pedagogical purposes (i.e. showing how heaps work).

For example, sets.py preceded setobject.c and ultimately we dropped sets.py.  In the case of heapq.py, it was kept because of its teaching value and because some other implementations like Jython used it.  In other cases like collections.deque, the pure python version is maintained off-line in an ASPN recipe and we may provide a link to it in the docs.

For the datetime module, I don't think we get much value from having pure python and C version in the distribution.  The semantics were worked out a long time ago, the algorithms aren't interesting, and other implementations already have their own conforming versions.  ISTM, a pure python version in our standard distribution would never be run and rarely looked at.  

While it may seem like a cool thing to do, no one has actually requested
this "feature" (I use quotes here because no new functionality is added).  The addition would be mostly harmless but it would increase the maintenance burden (I know because I've actively maintained pure python equivalents for itertools and it has been a PITA).

If this is a step in your development process, I recommend keeping it in a sandbox or publishing it on PyPI.

If we were to invest some efforts in writing pure python equivalents, I would like to see the docs include an equivalent of str.split() whose behavior is difficult to fully and correctly describe in plain English.  In contrast, the availability of a pure python version of datetime wouldn't add much that isn't already covered in the docs.
msg108124 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-06-18 18:39
Let me just add a story to show how an alternate python implementation may be useful for users.  As I was porting datetime.py to 3.x, I saw many failures from pickle tests.  It was not easy to figure out what was going on because C pickle code was calling buggy Python and pdb was unable to trace the full chain of calls.  To work around that, I added sys.modules['_pickle'] = None to my test run and there you go - the problem was found in minutes.   I am sure that someone debugging his tzinfo implementation, for example, may find datetime.py easier to work with.

The story may be a bit self-serving, but I was against this "feature" myself, but now I see enough use that I am actually working on it.

Yes, the work is in the sandbox, but I want to have py3k working version before I announce it.
msg108126 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-06-18 18:49
My understanding is that the desire to have pure python versions of standard library modules is to decouple the standard library from dependency on CPython as far as practical.  Perhaps all existing Python implementations have dealt with datetime somehow, but what about new implementations?  Having a pure Python version would help developers of new implementations get started, even if they later decided to reimplement it more natively (as CPython has done).
msg108129 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2010-06-18 19:34
> My understanding is that the desire to have pure 
> python versions of standard library modules is 
> to decouple the standard library from dependency 
> on CPython as far as practical.

That is a bit of a broad brush.  I do not know
of an approved project to give all C modules
a pure python equivalent (in fact modules like
pickle presented a long-term maintenance problem
in that the two versions differed and modules
like sets.py were dropped entirelly).

Looking at svn.python.org/view/python/trunk/Modules/
I see some which are good candidates and many which
aren't.  We need to show judgment on which one to do
and recognize that maintaining dual code is a 
heavy maintenance burden and only do so where
there is a clear value add.

In my judgment, something like str.split() would
benefit quite a bit from a pure python equivalent
because its spec is somewhat opaque and hidden
in C code and because both docs and the test 
coverage are incomplete.

In contrast, I believe that dual code for datetime 
is a net loss.  There is a reason that Uncle Timmy 
didn't put it in in the first place.  

Also, for those who haven't tried it before, it is
not always easy to get good pure python equivalents
(i.e. C iterators check their arguments when first
called and Python versions do their argument checking
when next() is first called; C functions sometimes
do interesting keyword argument handling that cannot be 
done in pure python; and pure python versions differ
in their tracebacks).  Also, it's easy to make a mistake
and misspecify the pure python version.  You're relying
on the test suite to catch all semantic differences.
msg108158 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-06-19 01:03
I was probably misled by Brett's assertion that 'it's not a matter of "if" but "when" datetime will get a pure Python version.' (msg106498)  It looks like this is not a universally shared opinion.

I am not ready to form a final opinion on datetime.py.  I have ported it to 3.2 to the point where it passes the regression tests, but did not attempt to clean up the code in any way or match C implementation on the level of doc strings and error messages.  I am attaching the diff between PyPy-2.7 and 3.2 port as issue7989-2.7-3.2.diff here. You can find the full source in the sandbox/py3k-datetime at r82083.  I think having a working implementation will help making a decision here.

Here are some random thoughts based on the experience with datetime.py.  

The datetime module have seen very little development in the last six years.  Tracker RFEs and bug reports were languishing for years while people have been ranting about how much better other languages handle date/time than Python.  Python-dev discussions would run into dozens of posts with an inevitable conclusion that the situation is a mess and cannot be fixed.

It is posible that one of the reasons for the current state of affairs was that people with the problem domain expertise did not have C skills and people with the requisite C skills were conditioned by the C approach to time which is an even bigger mess than what we have.  I cannot rule out that if datetime.py was easily available, we would see more patches proposed and more informed discussions about desired features.

Raymond argues that datetime documentation is good enough and python implementation will not add to it.  I disagree.  Consider this passage from tzinfo documentation: "When a datetime object is passed in response to a datetime method, dt.tzinfo is the same object as self. tzinfo methods can rely on this, unless user code calls tzinfo methods directly."  Is this as clear as the following code that makes use of this?

    def fromutc(self, dt):
        ..
        if dt.tzinfo is not self:
            raise ValueError("dt.tzinfo is not self")

Documentation for datetime module is indeed extensive.  The reST file is over 1700 lines long.  This is comparable to about 1900 lines in datetime.py (not counting a long treatise on timezone calculations at the end of the file.)  It may be easier to find an answer in the code than in the documentation.  After all you cannot step through documentation in a debugger.

I am still between -0 and +0 about including datetime.py in the main tree.  For my own development purposes having sandbox version is adequate and maintaining it there is easier than in-tree.  It would be great, however if this discussion would lead to clear guidelines about cases when parallel C and Python implementations are desired and how to maintain such arrangements
msg108179 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-06-19 13:18
Brett's assertion comes from the decision made at the language summit at the last pycon. Which does not negate Raymond's assertion that there may be more important stuff to pythonize. However, Alexander is maintaining datetime, and if he wishes to do the Python version there is no reason I see for him not to do it.
msg108553 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2010-06-24 22:04
Why do you need a pure Python version of datetime:
 - it's easier to work on a new feature first in Python: there are more people are able to write Python code, than people able to write C code (especially C code using the Python C API)
 - it helps datetime debuging
 - it helps other Python implementations (not only PyPy)
 - it improves the quality of the tests and so of the C version

I think that the first point is the most important, but datetime still lack many feature and is far from being perfect.

I don't think that the pure Python implementation should be used by default: the current C implementation should stay because it's faster and many people use it. I don't know the best name for the Python version, maybe pydatetime.py (or _pydatetime.py).

--

Questions:

 - @Alexander: did you contacted people from IronPython and Jython?
 - Is the Python version compatible with the C version about the serialization (pickle)?

--

r.david.murray> there may be more important stuff to python

You cannot force other developer to work on a specific topic since Python is only developed by hackers in their free time. If Alexander would like to work on this, he have to do it :-)
msg108624 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-06-25 19:44
Victor: that was exactly the point of my post that you partially quoted :)
msg108643 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-06-25 23:02
Adding tim_one to the nosy list.

Tim,

It would be great if you could shed some light on the history behind pure python implementation.  Why was it developed in the first place?  What was the reason not to ship it with python?

Thanks.
msg108645 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2010-06-25 23:11
> It would be great if you could shed
> some light on the history behind pure
> python implementation.  Why was it
> developed in the first place?

It was rapid prototyping - design decisions were changing daily, and it goes a lot faster to change Python code than C code.

> What was the reason not to ship it
> with python?

Didn't want to create new ongoing maintenance burdens.  Multiple implementations eventually drift out of synch, and at the time we had had enough of that already wrt, e.g., pickle vs cPickle.

Sorry, nothing deep here ;-)
msg108648 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-06-25 23:29
Tim, thanks for your prompt reply.

What would be your opinion on adding datetime.py to the main python tree today?

There is momentum behind several features to be added to datetime module and having easily accessible prototype would give similar benefits to those you had during original design.

It is hard for me to judge the significance of maintenance burden, but others reported that having parallel versions of the io module was helpful.  I believe that with proper support in the regression test suit, it should be quite manageable.  If contributors are encouraged to do python version of new features first, get full test coverage and then do C implementation, it may lead to higher quality contributions.
msg108649 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-06-25 23:32
> It is hard for me to judge the significance of maintenance burden, but
> others reported that having parallel versions of the io module was
> helpful.  I believe that with proper support in the regression test
> suit, it should be quite manageable.

For io, we find this quite manageable indeed, although it is quite more
complex and quirkier than datetime.
msg108666 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-06-26 00:50
> For io, we find this quite manageable indeed, although it is quite more
> complex and quirkier than datetime.

I don't understand how something being "more complex and quirkier," can make it more "manageable."

While admittedly simple, datetime has it's share of quirks.  (See issue 5516, for example.)  Most algorithms are simple, but ord2ymd, for one is quite instructional.  Another example is datetime.fromutc.  If there is a person other than Tim who understands how it works, I have not met him or her yet!

Note that we are not talking about writing something from scratch or bringing a 3rd party module to python library.  Tim's prototype needed almost no changes to pass 2.7 test suit and very few to get ported to 3.x.  The sandbox version is currently at r82218, a dozen small changesets from Tim's version of 6 years ago.

The only remaining task is to make the test suit run tests on both modules.  This is not entirely trivial, but it appears that test_datetime was originally designed to make tested classes replaceable.  Some housecleaning is needed to make it work, but I think it is doable.

I have not checked the C code coverage yet, but python code has 100% coverage as reported by trace module.  (Well, almost - there is exactly one line not covered and it actually contains a bug introduced by pypy!)

In any case, I do intend to maintain datetime.py while I am maintaining datetimemodule.c.  I will use it to prototype new features before implementing them in C.  I think having it in the main tree will increase its visibility and lower the barriers for future contributors.

Let's get back to this discussion after 2.7 is out and more developers can focus on 3.2.
msg108682 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2010-06-26 02:35
> What would be your opinion on adding
> datetime.py to the main python tree
> today?

The funny thing is I can't remember why we bothered creating the C version - I would have been happiest leaving it all in Python.

Provided the test suite ensures the versions remain bug-compatible ;-), +0 from me for adding the Python version to the distro.
msg108687 - (view) Author: anatoly techtonik (techtonik) Date: 2010-06-26 07:23
About importance to have a maintainable pure libraries and speedups for them. Believe it or not, but the only reason why Python 2.x did not get RFC 3339 implementation in standard library is that datetime module is in C. I hope everybody understands the importance of RFC 3339 nowadays.

About maintenance of C vs Python modules. Mercurial and Dulwich have notion of optional "speedups" for pure Python modules and it is a way to go for standard reference Python implementation. Such separation serves these projects very well, especially on Windows with no installed compiler to be able to insert debug statements into C code.

To easy maintenance I can see that only critical sections should be delegated to speedups, there should be 100% test coverage for both execution routes and performance benchmarks out of the box (i.e. developers should not think about how to compare code coverage or measure performance).
msg108697 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-06-26 09:43
Le samedi 26 juin 2010 à 00:50 +0000, Alexander Belopolsky a écrit :
> > For io, we find this quite manageable indeed, although it is quite more
> > complex and quirkier than datetime.
> 
> I don't understand how something being "more complex and quirkier,"
> can make it more "manageable."

I don't understand what you are arguing about. What is supposed to be
"more manageable"?
msg108743 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-06-26 19:35
Alexander and Antoine, you are talking past each other.

Alexander: Antoine was trying to point out that the fact that io is quirky has not impacted their ability to maintain parallel versions significantly.  So if datetime is less quirky (and it probably is despite the quirks you pointed out), then it should be even easier to maintain parallel versions of it.
msg108747 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-06-26 20:10
I see.  I misunderstood AP's "although" as "however", but he probably meant "even though" or "in spite the fact that."

Antoine, can I count you as "+1"?

In any case, my threshold for moving this forward is for someone to review the code in sandbox.

Here is a convenient link to the code:

http://svn.python.org/view/*checkout*/sandbox/branches/py3k-datetime/datetime.py
msg109058 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-07-01 16:09
> In any case, my threshold for moving this forward is for someone to
> review the code in sandbox.

Ok some comments:

- I find the _cmp() and __cmp() indirection poor style in 3.x, especially when you simply end up comparing self._getstate() and other._getstate() (it is also suboptimal because it can do more comparisons than needed)

- Shouldn't __eq__ and friends return NotImplemented if the other type mismatches, to give the other class a chance to implement its own comparison method? that's what built-in types do, as least
(this would also make _cmperror useless)

- Using assert to check arguments is bad. Either there's a risk of bad input, and you should raise a proper error (for example ValueError), or there is none and the assert can be left out.

- Starting _DAYS_IN_MONTH with a None element and then iterating over _DAYS_IN_MONTH[1:] looks quirky

- Using double-underscored names such as __day is generally discouraged, simple-underscored names (e.g. _day) should be preferred

- Some comments about "CPython compatibility" should be removed

- Some other comments should be reconsidered or removed, such as "# XXX The following should only be used as keyword args" or "XXX Buggy in 2.2.2"

- Some things are inconsistent: date uses bytes for pickle support, time uses str for the same purpose
msg109061 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-07-01 16:42
Thanks a lot for the review.  Please see my replies below.

On Thu, Jul 1, 2010 at 12:09 PM, Antoine Pitrou <report@bugs.python.org> wrote:
..
> - I find the _cmp() and __cmp() indirection poor style in 3.x,
> especially when you simply end up comparing self._getstate() and
> other._getstate() (it is also suboptimal because it can do more
> comparisons than needed)
>

I agree.  Do you think I should just define __lt__ and use functools.total_ordering decorator?  Note that current implementation mimics what is done in C, but I think python should drive what is done in C and not the other way around. 

> - Shouldn't __eq__ and friends return NotImplemented if the other type
> mismatches, to give the other class a chance to implement its own
> comparison method? that's what built-in types do, as least
> (this would also make _cmperror useless)

This is a tricky part.  See issue #5516.  I would rather not touch it unless we want to revisit the whole comparison design.

>
> - Using assert to check arguments is bad. Either there's a risk of bad > input, and you should raise a proper error (for example ValueError),
> or there is none and the assert can be left out.
>

I disagree.  Asserts as executable documentation are good.  I know, -O is disfavored in python, but still you can use it to disable asserts.  Also I believe most of the asserts are the same in C version.

> - Starting _DAYS_IN_MONTH with a None element and then iterating over
> _DAYS_IN_MONTH[1:] looks quirky
>
Would you rather start with 0 and iterate over the whole list?  It may be better to just define it as a literal list display.  That's what C code does.

> - Using double-underscored names such as __day is generally
> discouraged, simple-underscored names (e.g. _day) should be preferred
>

I think in this case double-underscored names are justified.  Pickle/cPickle experience shows that people tend to abuse the freedom that python implementations give to subclasses and then complain that C version does not work for them.  I think __ name mangling will be a better deterrent than _ is private convention.

> - Some comments about "CPython compatibility" should be removed
>

Why?  The goal is to keep datetime.py in sync with datetimemodule.c, not to replace the C implementation.  C implementation will still be definitive.


> - Some other comments should be reconsidered or removed, such as
> "# XXX The following should only be used as keyword args"

This one I was actually thinking about making mandatory by changing the signature to use keyword only arguments.  I am not sure if that is well supported by C API, though.  

> or "XXX Buggy in 2.2.2"

Yes, a review of XXXs is in order.

>
> - Some things are inconsistent: date uses bytes for pickle support,
> time uses str for the same purpose

Already fixed.
msg109063 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-07-01 18:08
If they abuse the _ methods and complain that the C version doesn't work, we just say "we *told* you not to do that".  It is not the Python philosophy to try to protect users from mistakes that they wilfully make.
msg109067 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-07-01 19:06
> R. David Murray <rdmurray@bitdance.com> added the comment:
>
> If they abuse the _ methods and complain that the C version doesn't
> work, we just say "we *told* you not to do that".  It is not the Python
> philosophy to try to protect users from mistakes that they willfully
> make.

Let me think some more about this.  Given double underscores in special methods, changing this is not a simple s/__/_/ throughout the file.  I am not sure _ clearly signals "don't use in subclasses": that's what __ is for.
msg109069 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-07-01 19:13
> I agree.  Do you think I should just define __lt__ and use
> functools.total_ordering decorator?

I had forgotten about functools.total_ordering. Yes, very good idea.

>   Note that current implementation mimics what is done in C, but I
> think python should drive what is done in C and not the other way
> around.

I think the Python version doesn't have to mimic every exact quirk of
the C version. I think it's good if the code is idiomatic Python.

> I disagree.  Asserts as executable documentation are good.

I am talking specifically about this kind of assert:

    assert 1 <= month <= 12, 'month must be in 1..12'

I think it should be replaced with:

    if month < 1 or month > 12:
        raise ValueError('month must be in 1..12')

I don't think it's reasonable to disable argument checking when -O is
given. Furthermore, AssertionError is the wrong exception type for this.

On the other hand, I do agree that most asserts in e.g.
timedelta.__new__ are good.

> > - Starting _DAYS_IN_MONTH with a None element and then iterating
> over
> > _DAYS_IN_MONTH[1:] looks quirky
> >
> Would you rather start with 0 and iterate over the whole list?  It may
> be better to just define it as a literal list display.  That's what C
> code does.

Hmm, I wrote that comment before discovering that it is useful for
actual data to start at index 1. You can forget this, sorry.

> I think in this case double-underscored names are justified.
> Pickle/cPickle experience shows that people tend to abuse the freedom
> that python implementations give to subclasses and then complain that
> C version does not work for them.

Ah, but the Python datetime implementation will be automatically
shadowed by the C one; you won't end up using it by mistake, so people
should not ever rely on any of its implementation details.

To give a point of reference, the threading module used the __attribute
naming style for private attributes in 2.x, but it was converted to use
the _attribute style in 3.x.

(one genuine use for it, by the way, is to make it easy to test
implementation-specific internal invariants in the test suite)

> > - Some comments about "CPython compatibility" should be removed
> 
> Why?  The goal is to keep datetime.py in sync with datetimemodule.c,
> not to replace the C implementation.

Yes, but talking about CPython compatibility in the CPython source tree
looks puzzling. You could reword these statements, e.g. "compatibility
with the C implementation".

> > - Some other comments should be reconsidered or removed, such as
> > "# XXX The following should only be used as keyword args"
> 
> This one I was actually thinking about making mandatory by changing
> the signature to use keyword only arguments.

That would be an API change and would break compatibility. Are you sure
you want to do it?

> I am not sure if that is well supported by C API, though.

Not at all. You would have to analyze contents of the keywords dict
manually.
msg109128 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-07-02 20:24
> - I find the _cmp() and __cmp() indirection poor style in 3.x,
> especially when you simply end up comparing self._getstate() and
> other._getstate() (it is also suboptimal because it can do more
> comparisons than needed)

The best I could come up with is issue7989-cmp.diff - basically replacing _cmp(self, other) with _normalize(self, other) that returns a pair of objects that compare the same as self and other.

I am not committing this in sandbox because I don't see this a big improvement.

Datetime comparisons are tricky due to date/datetime inheritance.  I think it is best not to touch it.
msg109129 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-07-02 20:41
> I am talking specifically about this kind of assert:
> 
>    assert 1 <= month <= 12, 'month must be in 1..12'
>
> I think it should be replaced with:
>
>    if month < 1 or month > 12:
>        raise ValueError('month must be in 1..12')

I reviewed the asserts.  Value range checking asserts appear in non-public functions which are not called with out-of-range values by the module code.  Therefore they can only be triggered if there is a bug in the future version of datetime.py.  This is expressly what asserts are for.

There is another type of asserts that should be either removed or modified:

assert daysecondswhole == int(daysecondswhole)  # can't overflow   

Since int is long in 3.x, this assert does not check anything. We can replace this with

assert daysecondswhole.bit_length() <= 32
msg109131 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-07-02 20:43
> If they abuse the _ methods and complain that the C version doesn't
> work, we just say "we *told* you not to do that".  It is not the Python
> philosophy to try to protect users from mistakes that they willfully
> make.

OK. +0 from me.  Patches welcome. :-)
msg109134 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2010-07-02 21:23
> assert daysecondswhole == int(daysecondswhole)  # can't overflow   
> Since int is long in 3.x, this assert does not check anything

Even with 2.5 int(x) cannot overflow, and returns a long when needed!
This assert probably checks that the number has no fractional part.
msg109135 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-07-02 21:24
This comment in datetime.__new__ makes me +0.5 on s/__/_/:


	self = date.__new__(cls, year, month, day)
        # XXX This duplicates __year, __month, __day for convenience :-(                                                                                                                 
        self.__year = year
        self.__month = month
        self.__day = day

Tim,

Do you remember why it was a good idea to derive datetime from date?
msg109136 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-07-02 21:28
On Fri, Jul 2, 2010 at 5:23 PM, Amaury Forgeot d'Arc
<report@bugs.python.org> wrote:
..
> Even with 2.5 int(x) cannot overflow, and returns a long when needed!
> This assert probably checks that the number has no fractional part.

Yes I've realized that.  I thought x was coming from integer
arithmetics, but apparently datetime.py loves floats!
msg109140 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2010-07-02 21:57
> I thought x was coming from integer
> arithmetics, but apparently datetime.py loves floats!

The arguments to __new__ can be floats, so it's necessary to deal with floats there.
msg109142 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2010-07-02 22:00
> Do you remember why it was a good idea to
> derive datetime from date?

Why not?  A datetime is a date, but with additional behavior.  Makes inheritance conceptually natural.
msg109144 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-07-02 22:14
On Fri, Jul 2, 2010 at 6:00 PM, Tim Peters <report@bugs.python.org> wrote:
>
> Tim Peters <tim.peters@gmail.com> added the comment:
>
>> Do you remember why it was a good idea to
>> derive datetime from date?
>
> Why not?  A datetime is a date, but with additional behavior.  Makes inheritance conceptually natural.

It is also time with additional behavior.  In the face of ambiguity ...

Why not?  See issue #5516.  Most of datetime comparison code is
devoted to fighting inheritance from date.   There is hardly any
non-trivial method that benefits from this inheritance.

To me,  conceptually, datetime is a container of date, time and
optionally time zone, it is not a date.
msg109146 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2010-07-02 22:26
I'm not going to argue about whether datetime "should have been" subclassed from date - fact is that it was, and since it was Guido's idea from the start, he wouldn't change it now even if his time machine weren't out for repairs ;-)
msg109147 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-07-02 22:48
On Fri, Jul 2, 2010 at 6:26 PM, Tim Peters <report@bugs.python.org> wrote:
..
> I'm not going to argue about whether datetime "should have been" subclassed from date - fact is that it was, and since it was
> Guido's idea from the start, he wouldn't change it now even if his time machine weren't out for repairs ;-)

I know, he will probably accept the fact that 23:59:60 is valid time
first. :-)  I still very much appreciate your insights.

I think I mentioned that in my other posts, but I find datetime design
very elegant and when I find things that I would have done differently
my first reaction is that I am probably missing something.

datetime(date) inheritance is one of those things.  Another is tzinfo
attribute of time.  With time t, t.utcoffset() is kid of useless given
that you cannot subtract it from t and unless tzinfo is a fixed offset
timezone, there is not enough information in t to compute the offset
to begin with.

Do you have any historical insight on this one?
msg109148 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2010-07-02 22:52
Alexander Belopolsky wrote:
> 
> Alexander Belopolsky <belopolsky@users.sourceforge.net> added the comment:
> 
> On Fri, Jul 2, 2010 at 6:00 PM, Tim Peters <report@bugs.python.org> wrote:
>>
>> Tim Peters <tim.peters@gmail.com> added the comment:
>>
>>> Do you remember why it was a good idea to
>>> derive datetime from date?
>>
>> Why not?  A datetime is a date, but with additional behavior.  Makes inheritance conceptually natural.
> 
> It is also time with additional behavior.  In the face of ambiguity ...
> 
> Why not?  See issue #5516.  Most of datetime comparison code is
> devoted to fighting inheritance from date.   There is hardly any
> non-trivial method that benefits from this inheritance.
> 
> To me,  conceptually, datetime is a container of date, time and
> optionally time zone, it is not a date.

Just an aside:

Conceptually, you don't need date and time, only an object to
reference a point in time and another one to describe the
difference between two points in time. In mxDateTime I
called them DateTime and DateTimeDelta.

What we commonly refer to as date is really the combination of
a DateTime value pointing to the start of the day together with
a DateTimeDelta value representing one full turn of the Earth.

That said, I don't think redesigning the datetime module is part
of this ticket, just adding a second implementation of what we
already have in CPython :-)
msg109149 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-07-02 23:09
On Fri, Jul 2, 2010 at 6:52 PM, Marc-Andre Lemburg
<report@bugs.python.org> wrote:
..
> That said, I don't think redesigning the datetime module is part
> of this ticket, just adding a second implementation of what we
> already have in CPython :-)

I agree.  I am just looking for an excuse not to change attributes
like __year to _year.  :-)
msg109152 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-07-02 23:38
I am attaching a patch from issue 5288 as an example of a change that I would favor more than issue7989-cmp.diff.  This patch eliminates _utcoffset and _dst methods that duplicate utcoffset and dst, but return integer minutes rather than a timedelta.

I am not checking these changes in sandbox because these are examples of how I plan to improve C implementation when datetime.py makes it into the main tree.  I envision such changes to be discussed within context of datetime.py and if approved, implemented in C and committed simultaneously.

Improving datetime.py implementation and making it diverge from C implementation defeats the purpose I see in having datetime.py in the first place.

Antoine?
msg109286 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2010-07-05 02:14
> ...
> Another is tzinfo attribute of time. With time t,
> t.utcoffset() is kid of useless given that you
> cannot subtract it from t

Sure you can - but you have to write your code to do time arithmetic.  The time implementation does so under the covers when two "aware" time objects are compared:

    If both [time] comparands are aware and have
    different tzinfo members, the comparands are
    first adjusted by subtracting their UTC
    offsets (obtained from self.utcoffset()).


> and unless tzinfo is a fixed offset timezone, there
> is not enough information in t to compute the offset
> to begin with.

The docs _suggest_ that tzinfo.utcoffset(x, None) return the "standard" UTC offset, but in the end it's up to the tzinfo subclass designer to decide what they want an aware time object to return as an offset.

> Do you have any historical insight on this one?

History has nothing to do with it ;-)  There are several uses spelled out in the docs.  In addition to "aware" time comparison (mentioned above):

- Maintaining the datetime invariant

    d == datetime.combine(d.date(), d.timetz())

requires that an "aware" datetime's tzinfo member be attached to at least one of the date and time projections, and since raw dates are always "naive", that leaves the time projection as the only choice.

- Various string format operations use an "aware" time object's tzinfo member to incorporate time zone information - at least time.isoformat() and time.strftime().

That said, would have been fine by me if the "naive" versus "aware" distinction had been limited to datetime objects (i.e., if plain date and plain time objects had no notion of time zone).  If you need to know why time objects "really" support being aware, you'll have to ask Guido ;-)
msg109662 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-07-09 01:38
issue7989a.diff is a partial success implementing Nick Coghlan's testing idea.  Unfortunately, datetime implementation with it's circular dependency on _strftime is not very robust with respect to import trickery.

I am calling this a partial success because although running Lib/test/test_datetime.py does not report any errors, it only works with pure python version of pickle. (I had to add sys.modules['_pickle'] = None at the top of the module to make it work.)

Also, the resulting test_datetime is quite an abomination!
msg110155 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-07-13 02:41
issue9206b.diff fixes test_datetime in issue9206a.diff by restoring sys.modules in-place in tearDown method.

Based on python-dev discussion, I am marking this as accepted and uploading the patch to Rietveld for commit review.

Please comment on the code at http://codereview.appspot.com/1824041 .  Unfortunately upload tool does not recognize copied/moved files.
msg110248 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-07-14 01:28
In issue7989c.diff, I reverted to lazy import of _strptime, added cleanup of _xyz helper functions, and made test_datetime more robust by restoring sys.modules more thoroughly.

Unfortunately I've encountered an issue with Rietveld that prevents me from uploading the latest patch for review.  See http://code.google.com/p/rietveld/issues/detail?id=208
msg111097 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-07-21 18:26
Is anyone interested in giving this a final review?  I would like to check this in unless someone objects or needs time to review.
msg111131 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2010-07-21 22:54
Technically, since the C module is now named _datetime, it needs to be renamed in Modules/Setup.dist, and most importantly in PC/config.c (because on Windows datetime is built in the main interpreter)
msg111210 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-07-22 19:10
> [datetime.c] needs to be renamed in Modules/Setup.dist, and most
> importantly in PC/config.c

Fixed in issue7989d.diff, thanks.

In order to commit this patch I need an SVN advise.  I would like to copy datetime.py from sandbox to py3k in a way that will preserve the history.  (I know, this strictly necessary, but I don't want my name on every line in svn blame datetime.py given how little I had to change there.)

I tried both svn copy oldpath newpath and svn copy oldurl newpath and it did not work (most likely because sandbox and py3k are independent checkouts.) I don't want to use svn copy oldurl newurl because that would require a separate commit for datetime.py.
msg111211 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-07-22 19:14
s/strictly necessary/not strictly necessary/
msg111338 - (view) Author: Brian Curtin (brian.curtin) * (Python committer) Date: 2010-07-23 14:38
One additional change was needed to compile on Windows:
Index: PC/config.c
===================================================================
--- PC/config.c (revision 83087)
+++ PC/config.c (working copy)
@@ -116,7 +116,7 @@
     {"parser", PyInit_parser},
     {"winreg", PyInit_winreg},
     {"_struct", PyInit__struct},
-    {"datetime", PyInit_datetime},
+    {"_datetime", PyInit__datetime},
     {"_functools", PyInit__functools},
     {"_json", PyInit__json},
msg111356 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-07-23 16:19
Brian, thanks for the fix and for testing.  I am attaching a commit-ready patch issue7989e.diff, that includes Brian's fix and a few white-space changes.

I hope I've resolved the SVN issue:  I was working in a read-only checkout while sandbox checkout was read/write.

Here is the svn status now:

M       PCbuild/pythoncore.vcproj
M       setup.py
M       Misc/NEWS
M       PC/config.c
A  +    Lib/datetime.py
A  +    Lib/test/datetimetester.py
M       Lib/test/test_datetime.py
A  +    Modules/_datetimemodule.c
D       Modules/datetimemodule.c
M       Modules/Setup.dist


Note that unlike previous patches, issue7989e.diff, contains only datetime.py differences compared to sandbox.  You should do

svn cp svn+ssh://pythondev@svn.python.org/sandbox/branches/py3k-datetime/datetime.py Lib

before it can be applied.  Depending on your patch utility, you may also need to do

svn cp Lib/test/test_datetime.py Lib/test/datetimetester.py

I am running final tests and will commit this patch shortly.
msg111375 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-07-23 19:32
Committed in r83112.
History
Date User Action Args
2010-07-23 19:32:38belopolskysetstatus: open -> closed
resolution: accepted -> fixed
messages: + msg111375

stage: commit review -> resolved
2010-07-23 16:20:10belopolskysetfiles: + issue7989e.diff

messages: + msg111356
2010-07-23 14:38:46brian.curtinsetmessages: + msg111338
2010-07-23 14:23:15belopolskysetfiles: + datetimetester.py
2010-07-22 19:28:19belopolskysetfiles: - issue9206a.diff
2010-07-22 19:14:30belopolskysetmessages: + msg111211
2010-07-22 19:11:06belopolskysetfiles: + issue7989d.diff

messages: + msg111210
2010-07-21 22:54:06amaury.forgeotdarcsetmessages: + msg111131
2010-07-21 18:26:37belopolskysetmessages: + msg111097
2010-07-21 18:09:35davidfrasersetnosy: + davidfraser
2010-07-19 18:19:13belopolskyunlinkissue1777412 superseder
2010-07-19 18:19:13belopolskylinkissue1777412 dependencies
2010-07-19 18:19:00belopolskylinkissue1777412 superseder
2010-07-14 01:29:18belopolskysetfiles: + issue7989c.diff

messages: + msg110248
2010-07-13 22:52:21belopolskylinkissue7584 dependencies
2010-07-13 20:05:05merwoksetnosy: + merwok
2010-07-13 03:19:16belopolskysetfiles: + issue7989b.diff
2010-07-13 03:18:48belopolskysetfiles: - issue9206b.diff
2010-07-13 03:18:42belopolskysetfiles: - issue9206b.diff
2010-07-13 03:18:33belopolskysetfiles: + issue9206b.diff
2010-07-13 02:41:49belopolskysetfiles: + issue9206b.diff
resolution: accepted
messages: + msg110155

stage: patch review -> commit review
2010-07-09 01:38:45belopolskysetfiles: + issue9206a.diff

messages: + msg109662
2010-07-05 02:14:05tim.peterssetmessages: + msg109286
2010-07-03 15:14:26giampaolo.rodolasetnosy: + giampaolo.rodola
2010-07-03 05:50:52belopolskysetfiles: + issue7989.diff
stage: needs patch -> patch review
2010-07-02 23:38:30belopolskysetfiles: + issue5288-proto.diff

messages: + msg109152
2010-07-02 23:09:22belopolskysetmessages: + msg109149
title: Add pure Python implementation of datetime module to CPython -> Add pure Python implementation of datetime module to CPython
2010-07-02 22:52:31lemburgsetmessages: + msg109148
title: Add pure Python implementation of datetime module to CPython -> Add pure Python implementation of datetime module to CPython
2010-07-02 22:48:21belopolskysetmessages: + msg109147
2010-07-02 22:26:01tim.peterssetmessages: + msg109146
2010-07-02 22:14:48belopolskysetmessages: + msg109144
2010-07-02 22:00:08tim.peterssetmessages: + msg109142
2010-07-02 21:58:21tim.peterssetmessages: - msg109141
2010-07-02 21:58:02tim.peterssetmessages: + msg109141
2010-07-02 21:57:58tim.peterssetmessages: + msg109140
2010-07-02 21:28:12belopolskysetmessages: + msg109136
title: Add pure Python implementation of datetime module to CPython -> Add pure Python implementation of datetime module to CPython
2010-07-02 21:24:20belopolskysetmessages: + msg109135
2010-07-02 21:23:16amaury.forgeotdarcsetmessages: + msg109134
2010-07-02 20:43:40belopolskysetmessages: + msg109131
2010-07-02 20:41:40belopolskysetmessages: + msg109129
2010-07-02 20:24:09belopolskysetfiles: + issue7989-cmp.diff

messages: + msg109128
2010-07-01 19:13:48pitrousetmessages: + msg109069
2010-07-01 19:06:06belopolskysetmessages: + msg109067
2010-07-01 18:08:31r.david.murraysetmessages: + msg109063
2010-07-01 16:42:39belopolskysetmessages: + msg109061
2010-07-01 16:09:05pitrousetmessages: + msg109058
2010-06-26 20:10:15belopolskysetmessages: + msg108747
2010-06-26 19:35:01r.david.murraysetmessages: + msg108743
2010-06-26 09:43:02pitrousetmessages: + msg108697
2010-06-26 07:23:12techtoniksetmessages: + msg108687
2010-06-26 02:35:58tim.peterssetmessages: + msg108682
2010-06-26 00:50:45belopolskysetmessages: + msg108666
2010-06-25 23:32:54pitrousetmessages: + msg108649
2010-06-25 23:29:34belopolskysetmessages: + msg108648
2010-06-25 23:11:20tim.peterssetmessages: + msg108645
2010-06-25 23:02:23belopolskysetnosy: + tim.peters
messages: + msg108643
2010-06-25 19:44:35r.david.murraysetmessages: + msg108624
2010-06-24 22:04:55hayposetmessages: + msg108553
2010-06-19 13:18:21r.david.murraysetmessages: + msg108179
2010-06-19 01:03:54belopolskysetfiles: + issue7989-2.7-3.2.diff

messages: + msg108158
2010-06-18 19:34:06rhettingersetmessages: + msg108129
2010-06-18 18:49:20r.david.murraysetmessages: + msg108126
2010-06-18 18:39:35belopolskysetmessages: + msg108124
2010-06-18 18:24:45rhettingersetmessages: + msg108122
2010-06-18 09:18:30amaury.forgeotdarcsetmessages: + msg108090
2010-06-18 09:16:22amaury.forgeotdarcsetmessages: + msg108089
2010-06-17 20:23:52brett.cannonsetmessages: + msg108059
2010-06-17 20:14:04belopolskysetfiles: + datetime-sandbox-pypy.diff

messages: + msg108056
2010-06-17 19:47:19brett.cannonsetmessages: + msg108055
2010-06-17 19:37:56belopolskysetmessages: + msg108053
2010-06-17 19:31:02belopolskysetpriority: high -> normal

messages: + msg108051
2010-06-17 19:12:34rhettingersetnosy: + rhettinger
messages: + msg108048
2010-06-17 18:54:06brett.cannonsetmessages: + msg108047
2010-06-17 17:52:50mark.dickinsonsetnosy: + mark.dickinson
2010-06-17 17:48:06r.david.murraysetmessages: + msg108037
2010-06-17 16:46:29belopolskysetpriority: low -> high
files: + PyPy-2.7.diff
messages: + msg108035

keywords: + patch
2010-06-17 15:35:10belopolskysetmessages: + msg108030
2010-06-17 15:03:31lemburgsetmessages: + msg108029
2010-06-17 15:02:33pitrousetmessages: + msg108028
2010-06-17 15:00:09belopolskysetmessages: + msg108027
2010-06-17 14:59:01lemburgsetmessages: + msg108026
2010-06-17 14:55:30r.david.murraysetmessages: + msg108025
2010-06-17 14:49:14belopolskysetmessages: + msg108024
2010-06-17 14:45:21belopolskysetmessages: + msg108023
2010-06-17 14:32:17pitrousetnosy: + pitrou
messages: + msg108021
2010-06-17 14:31:33lemburgsetmessages: + msg108020
2010-06-17 14:24:24belopolskysetmessages: + msg108019
2010-06-10 11:03:02lemburgsetmessages: + msg107454
2010-06-10 10:48:50hayposetnosy: + haypo
messages: + msg107453
2010-06-10 09:54:59lemburgsetmessages: + msg107451
title: Add pure Python implementation of datetime module to CPython -> Add pure Python implementation of datetime module to CPython
2010-06-08 18:09:36brett.cannonsetmessages: + msg107333
2010-06-08 12:27:12techtoniksetmessages: + msg107317
2010-06-08 11:06:05amaury.forgeotdarcsetnosy: + amaury.forgeotdarc
messages: + msg107312
2010-06-08 08:20:28lemburgsetmessages: + msg107309
title: Transition time/datetime C modules to Python -> Add pure Python implementation of datetime module to CPython
2010-06-08 08:02:37lemburgsetmessages: + msg107308
title: Add pure Python implementation of datetime module to CPython -> Transition time/datetime C modules to Python
2010-06-08 04:42:07belopolskysetmessages: + msg107303
title: Transition time/datetime C modules to Python -> Add pure Python implementation of datetime module to CPython
2010-06-08 04:30:22belopolskysetmessages: + msg107302
2010-06-08 03:51:06techtoniksetnosy: + techtonik
messages: + msg107298
2010-06-08 00:58:48brett.cannonsetmessages: + msg107295
2010-06-08 00:35:06belopolskysetmessages: + msg107293
2010-06-08 00:28:26belopolskysetpriority: normal -> low

nosy: + brett.cannon
messages: + msg107292

type: behavior -> enhancement
2010-06-07 19:14:30brian.curtinsetmessages: + msg107275
2010-06-07 17:17:55belopolskysetversions: - Python 2.7
2010-06-07 17:17:06belopolskysetmessages: + msg107272
2010-06-07 13:10:01daniel.urbansetnosy: + daniel.urban
2010-06-07 08:06:15lemburgsetnosy: + lemburg
messages: + msg107258
2010-06-06 01:35:26belopolskysetassignee: belopolsky

messages: + msg107171
nosy: + belopolsky
2010-02-22 18:30:57brian.curtinsetmessages: + msg99804
2010-02-22 18:27:07r.david.murraysetnosy: + r.david.murray
messages: + msg99801
2010-02-22 16:22:07brian.curtincreate