classification
Title: re.findall() dead locked whent the expected ending char not occur until end of string
Type: behavior Stage: resolved
Components: Regular Expressions Versions: Python 3.6
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: ezio.melotti, mamamiaibm, mrabarnett, tim.peters
Priority: normal Keywords:

Created on 2018-05-18 08:06 by mamamiaibm, last changed 2018-07-29 00:18 by tim.peters. This issue is now closed.

Messages (6)
msg317013 - (view) Author: Min (mamamiaibm) Date: 2018-05-18 08:06
Firstly, I wrote something like this:

patn = r"\bROW\s*\((\d+|\*)\)(.|\s)*?\)"
    newlines = re.sub(patn, "\nYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY\n", newlines)
but if the file(or string) ended without the expected ")" the code deadlock there, no progress, no exception, and no exit.

Then I changed it to :
 patn = r"\bROW\s*\((\d+|\*)\)(.|\s)*?(\)|$)"
    newlines = re.sub(patn, "\nYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY\n", newlines) to enforce the rule of  end of file. then everything ok.

I felt this is a but, coz RE should not die, it should exit if can't match.

it is Py3.5 on ubuntu. Thanks!
msg317015 - (view) Author: Min (mamamiaibm) Date: 2018-05-18 08:09
Sorry, forgot I have upgraded to 3.6.2, not 3.5
msg317017 - (view) Author: Min (mamamiaibm) Date: 2018-05-18 08:19
Sorry again, the sample code offered is issue of re.sub(), not findall() :o)))
msg317042 - (view) Author: Matthew Barnett (mrabarnett) * (Python triager) Date: 2018-05-18 17:47
You don't give the value of 'newlines', but the problem is probably catastrophic backtracking, not deadlock.
msg317043 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2018-05-18 17:56
Min, you need to give a complete example other people can actually run for themselves.

Offhand, this part of the regexp

(.|\s)*

all by itself _can_ cause exponential-time behavior. You can run this for yourself:

>>> import re
>>> p = r"(.|\s)*K"
>>> re.search(p, " " * 10) # fast
>>> re.search(p, " " * 15) # fast
>>> re.search(p, " " * 20) # obviously takes a bit of time
>>> re.search(p, " " * 21) # very obviously takes time
>>> re.search(p, " " * 22) # over a second
>>> re.search(p, " " * 25) # about 10 seconds

Etc.
msg322599 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2018-07-29 00:18
Closing as not-a-bug - not enough info to reproduce, but the regexp looked prone to exponential-time backtracking to both MRAB and me, and there's been no response to requests for more info.
History
Date User Action Args
2018-07-29 00:18:40tim.peterssetstatus: open -> closed

components: + Regular Expressions

nosy: + ezio.melotti
messages: + msg322599
resolution: not a bug
stage: resolved
2018-05-18 17:56:58tim.peterssetnosy: + tim.peters
messages: + msg317043
2018-05-18 17:47:19mrabarnettsetnosy: + mrabarnett
messages: + msg317042
2018-05-18 08:19:22mamamiaibmsetmessages: + msg317017
2018-05-18 08:09:57mamamiaibmsetmessages: + msg317015
versions: + Python 3.6, - Python 3.5
2018-05-18 08:06:05mamamiaibmcreate