This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: datetime.strptime slow
Type: performance Stage:
Components: Extension Modules Versions: Python 3.4
process
Status: closed Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Lars.Nordin, belopolsky, r.david.murray, tshepang
Priority: normal Keywords:

Created on 2012-07-11 14:00 by Lars.Nordin, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Messages (4)
msg165256 - (view) Author: Lars Nordin (Lars.Nordin) Date: 2012-07-11 14:00
The datetime.strptime works well enough for me it is just slow.

I recently added a comparison to a log parsing script to skip log lines earlier than a set date. After doing so my script ran much slower.
I am processing 4,784,212 log lines in 1,746 files.

Using Linux "time", the measured run time is:
real    5m12.884s
user    4m54.330s
sys     0m2.344s

Altering the script to cache the datetime object if the date string is the same, reduces the run time to: 
real    1m3.816s
user    0m49.635s
sys     0m1.696s

# --- code snippet ---
# start_dt calculated at script start
...
day_dt = datetime.datetime.strptime(day_str, "%Y-%m-%d")
if day_dt < start_dt:
...


$ python
import platform
print 'Version      :', platform.python_version()
print 'Version tuple:', platform.python_version_tuple()
print 'Compiler     :', platform.python_compiler()
print 'Build        :', platform.python_build()

Version      : 2.7.2+
Version tuple: ('2', '7', '2+')
Compiler     : GCC 4.6.1
Build        : ('default', 'Oct  4 2011 20:03:08')

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 11.10
Release:        11.10
Codename:       oneiric
msg165257 - (view) Author: Lars Nordin (Lars.Nordin) Date: 2012-07-11 14:09
Running the script without any timestamp comparison (and parsing more log lines), gives these performance numbers:

log lines: 7,173,101

time output:
real    1m9.892s
user    0m53.563s
sys     0m1.592s
msg165258 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2012-07-11 14:12
Thanks for the report.  However, do you have a patch to propose?  Otherwise I'm not sure there is a reason to keep this issue open...one can always say various things are slow; that by itself is not a bug.  Performance enhancement patches are welcome, though.

If you are proposing adding an LRU cache, I think it may be that that should be left up to the application, as you did in your case.  I'm not convinced there would be enough general benefit to make it worth adding to the stdlib, since the characteristics of date parsing workloads probably vary widely.
msg165418 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2012-07-13 20:40
If someone wants to propose a patch we can reopen the issue.
History
Date User Action Args
2022-04-11 14:57:32adminsetgithub: 59533
2012-07-14 12:50:28eric.araujosetcomponents: + Extension Modules, - None
versions: + Python 3.4, - Python 2.7
2012-07-13 20:40:40r.david.murraysetstatus: open -> closed

messages: + msg165418
2012-07-13 18:54:00tshepangsetstatus: pending -> open
nosy: + tshepang
2012-07-12 12:56:20brett.cannonsetstatus: open -> pending
2012-07-11 14:12:26r.david.murraysetnosy: + r.david.murray, belopolsky
messages: + msg165258
2012-07-11 14:09:10Lars.Nordinsetmessages: + msg165257
2012-07-11 14:00:20Lars.Nordincreate