Author skip.montanaro
Recipients skip.montanaro
Date 2013-11-01.17:05:20
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1383325521.57.0.024650274045.issue19475@psf.upfronthosting.co.za>
In-reply-to
Content
I have a CSV file. Here are a few rows:

"2013-10-30 14:26:46.000528","1.36097023829"
"2013-10-30 14:26:46.999755","1.36097023829"
"2013-10-30 14:26:47.999308","1.36097023829"
"2013-10-30 14:26:49.002472","1.36097023829"
"2013-10-30 14:26:50","1.36097023829"
"2013-10-30 14:26:51.000549","1.36097023829"
"2013-10-30 14:26:51.999315","1.36097023829"
"2013-10-30 14:26:52.999703","1.36097023829"
"2013-10-30 14:26:53.999640","1.36097023829"
"2013-10-30 14:26:54.999139","1.36097023829"

I want to parse the strings in the first column as timestamps. I can, and often do, use dateutil.parser.parse(), but in situations like this where all the timestamps are of the same format, it can be incredibly slow. OTOH, there is no single format I can pass to datetime.datetime.strptime() that will parse all the above timestamps. Using "%Y-%m-%d %H:%M:%S" I get errors about the leftover microseconds. Using "%Y-%m-%d %H:%M:%S".%f" I get errors when I try to parse a timestamp which doesn't have microseconds.

Alas, it is datetime itself which is to blame for this problem. The above timestamps were all printed from an earlier Python program which just dumps the str() of a datetime object to its output CSV file. Consider:

>>> dt = dateutil.parser.parse("2013-10-30 14:26:50")
>>> print dt
2013-10-30 14:26:50
>>> dt2 = dateutil.parser.parse("2013-10-30 14:26:51.000549")
>>> print dt2
2013-10-30 14:26:51.000549

The same holds for isoformat():

>>> print dt.isoformat()
2013-10-30T14:26:50
>>> print dt2.isoformat()
2013-10-30T14:26:51.000549

Whatever happened to "be strict in what you send, but generous in what you receive"? If strptime() is going to complain the way it does, then str() should always generate a full timestamp, including microseconds. The above is from a Python 2.7 session, but I also confirmed that Python 3.3 behaves the same.

I've checked 2.7 and 3.3 in the Versions list, but I don't think it can be fixed there. Can the __str__ and isoformat methods of datetime (and time) objects be modified for 3.4 to always include the microseconds? Alternatively, can the %S format character be modified to consume optional decimal point and microseconds? I rate this as "easy" considering the easiest fix is to modify __str__ and isoformat, which seems unchallenging.
History
Date User Action Args
2013-11-01 17:05:21skip.montanarosetrecipients: + skip.montanaro
2013-11-01 17:05:21skip.montanarosetmessageid: <1383325521.57.0.024650274045.issue19475@psf.upfronthosting.co.za>
2013-11-01 17:05:21skip.montanarolinkissue19475 messages
2013-11-01 17:05:20skip.montanarocreate