Message 384782 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	2d4d
Recipients	2d4d
Date	2021-01-10.21:58:21
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1610315901.41.0.467379667553.issue42885@roundup.psfhosted.org>
In-reply-to

Content
The re lib needs 7 seconds to check if a billion As start with an x. So e.g. this statement takes this long: re.search(r'^x', 'A' * 1000000000) It takes longer, the longer the string is. The string handling is not the problem, checking if it starts which an A takes just 0.00014 seconds. See output and code below: 3.10.0a4+ (heads/master:d16f617, Jan 9 2021, 13:24:45) [GCC 7.5.0] testing string len: 100000 re_test_false: 0.0008246829966083169 testing string len: 1000000000 re_test_false: 7.317708015005337 testing string len: 1000000000 re_test_true: 0.00014710200048284605 import re, timeit, functools, sys def re_test_true(string): print("testing string len: ", len(string)) re.search(r'^A', string) def re_test_false(string): print("testing string len: ", len(string)) re.search(r'^x', string) print(sys.version) huge_string = 'A' * 100000 print('re_test_false: ', timeit.timeit(functools.partial(re_test_false, huge_string), number=1)) huge_string = 'A' * 1000000000 print('re_test_false: ', timeit.timeit(functools.partial(re_test_false, huge_string), number=1)) print('re_test_true: ', timeit.timeit(functools.partial(re_test_true, huge_string), number=1))

The re lib needs 7 seconds to check if a billion As start with an x. So e.g. this statement takes this long:

re.search(r'^x', 'A' * 1000000000)

It takes longer, the longer the string is. The string handling is not the problem, checking if it starts which an A takes just 0.00014 seconds. See output and code below:

3.10.0a4+ (heads/master:d16f617, Jan  9 2021, 13:24:45) 
[GCC 7.5.0]
testing string len:  100000
re_test_false:  0.0008246829966083169
testing string len:  1000000000
re_test_false:  7.317708015005337
testing string len:  1000000000
re_test_true:   0.00014710200048284605


import re, timeit, functools, sys

def re_test_true(string):
    print("testing string len: ", len(string))
    re.search(r'^A', string)

def re_test_false(string):
    print("testing string len: ", len(string))
    re.search(r'^x', string)

print(sys.version)

huge_string = 'A' * 100000
print('re_test_false: ', timeit.timeit(functools.partial(re_test_false, huge_string), number=1))

huge_string = 'A' * 1000000000
print('re_test_false: ', timeit.timeit(functools.partial(re_test_false, huge_string), number=1))

print('re_test_true:  ', timeit.timeit(functools.partial(re_test_true, huge_string), number=1))

History
Date	User	Action	Args
2021-01-10 21:58:21	2d4d	set	recipients: + 2d4d
2021-01-10 21:58:21	2d4d	set	messageid: <1610315901.41.0.467379667553.issue42885@roundup.psfhosted.org>
2021-01-10 21:58:21	2d4d	link	issue42885 messages
2021-01-10 21:58:21	2d4d	create