classification
Title: Query performance is very low and can even lead to denial of service
Type: security Stage: resolved
Components: Regular Expressions Versions: Python 2.7
process
Status: closed Resolution: wont fix
Dependencies: Superseder:
Assigned To: Nosy List: ezio.melotti, ghi5107, mrabarnett, serhiy.storchaka, taleinat, tim.peters
Priority: normal Keywords:

Created on 2018-03-21 06:28 by ghi5107, last changed 2018-07-29 06:50 by taleinat. This issue is now closed.

Files
File name Uploaded Description Edit
test_performance.py ghi5107, 2018-03-21 06:28 repro file
Messages (7)
msg314187 - (view) Author: guohui (ghi5107) Date: 2018-03-21 06:28
I found a issue in regex (findall search)function, when seaching some content by some pattern, the function return for a long long time, match performance is very low.
I think this issue could lead to too low query performance, or a attacker may exploit the issue to cause a denail of service condition.


system:  python 2.7.14  regex(2018.2.21)
poc:

import re
pat = r'^(\(?[\w\d\-\.\\]{3,}\|?){1,}[\w\d\-\.\\]{3,}\)?$'
#plaintext content
content = r'(ftp\x3a\x2f\x2f|http\x3a\x2f\x2f|https\x3a\x2f\x2f|c\x3a\x2f\x2f|d\x3a\x2f\x2f|e\x3a\x2f\x2f)a'
result = re.findall(pat, content)
print result
msg322584 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2018-07-28 18:36
I suggest closing this as "wontfix".

This is a just an non-optimized regexp pattern leading to long run times.  That these are possible is a well-known trait of backtracking regular expression engines in general, and ours in particular.

IMO this isn't a security issue since the root of the issue is the pattern.  I don't see this as a bug or a significant performance issue either, and there is no concrete enhancement suggestion here.

For clarification, the given pattern is equivalent to:
pat = r'''^
(
\(?
[\w\d\-\.\\]{3,}
\|?
)+
[\w\d\-\.\\]{3,}
\)?
$'''
msg322585 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2018-07-28 18:37
Clarification: The given pattern is equivalent to that in my previous post, assuming the latter is used with the re.VERBOSE flag.
msg322587 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-07-28 19:53
I concur with Tal. This problem is called "catastrophic backtracking".
msg322593 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2018-07-28 22:25
Note:  if you found a regexp like this _in_ the Python distribution, then a bug report would be appropriate.  It's certainly possible to write regexps that can suffer catastrophic backtracking, and we've repaired a few of those, over the years, that shipped with Python.

But we can't stop you from writing such things yourself.  If you post your regexp to, e.g., comp.lang.python or StackOverflow, I'm sure someone will show you how to rewrite it in a safe way.  But be prepared to explain in English what you're trying to accomplish with it.

For example, while it appears you're trying to ensure there are at least 3 characters (of the right kind) between "|" separators, for some reason you made matching "|" optional.  That leaves open an exponential number of ways to try to match long strings of non-"|" characters between "|" separators.
msg322608 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-07-29 03:51
Perhaps it would be nice to add a section about catastrophic backtracking and ways of resolving it in the RE Howto.
msg322611 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2018-07-29 06:50
Serhiy, that would be a good idea.  A short mention of the issue with a link to an external reference would also suffice IMO.
History
Date User Action Args
2018-07-29 06:50:44taleinatsetmessages: + msg322611
2018-07-29 03:51:47serhiy.storchakasetmessages: + msg322608
2018-07-28 22:25:48tim.peterssetnosy: + tim.peters
messages: + msg322593
2018-07-28 19:53:24serhiy.storchakasetstatus: open -> closed
resolution: wont fix
messages: + msg322587

stage: resolved
2018-07-28 18:37:20taleinatsetmessages: + msg322585
2018-07-28 18:36:11taleinatsetnosy: + taleinat
messages: + msg322584
2018-03-21 08:52:06xiang.zhangsetnosy: + serhiy.storchaka
2018-03-21 06:28:15ghi5107create