This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: python re bug
Type: crash Stage: resolved
Components: Regular Expressions Versions: Python 3.7, Python 3.6
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: aixian le, aldwinaldwin, ezio.melotti, mrabarnett
Priority: normal Keywords:

Created on 2019-06-18 06:17 by aixian le, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Messages (5)
msg345953 - (view) Author: aixian le (aixian le) Date: 2019-06-18 06:17
the code is:
banner = "HTTP/1.0 404 Not Found\r\nDate: Mon, 17 Jun 2019 13:15:44 GMT\r\nServer:                \r\nConnection: close\r\nContent-Type: text/html\r\n\r\n<HTML><HEAD><TITLE>404 Not Found</TITLE></HEAD>\r\n<BODY><H1>404 Not Found</H1>\r\nThe requested URL /PSIA/index was not found on this server.\r\n</BODY></HTML>\r\n"
        regex = "^HTTP/1\\.0 404 Not Found\\r\\n(?:[^<]+|<(?!/head>))*?<style>"
        print("start")
        regex_re = re.compile(regex)
        print("start1")
        regex_re.search(banner)
        print("end")
when I execute this code ,python cannot finished.
msg345955 - (view) Author: aixian le (aixian le) Date: 2019-06-18 06:31
the code is:
banner = "HTTP/1.0 404 Not Found\r\nDate: Mon, 17 Jun 2019 13:15:44 GMT\r\nServer:                \r\nConnection: close\r\nContent-Type: text/html\r\n\r\n<HTML><HEAD><TITLE>404 Not Found</TITLE></HEAD>\r\n<BODY><H1>404 Not Found</H1>\r\nThe requested URL /PSIA/index was not found on this server.\r\n</BODY></HTML>\r\n"
        regex = "^HTTP/1\\.0 404 Not Found\\r\\n(?:[^<]+|<(?!/head>))*?<style>"
        print("start")
        regex_re = re.compile(regex)
        print("start1")
        regex_re.search(banner)
        print("end")
when I execute this code ,python cannot finished.
msg345957 - (view) Author: Aldwin Pollefeyt (aldwinaldwin) * Date: 2019-06-18 07:36
When I run the regex on https://regex101.com/, after some small adjustments ("HTTP\/1\.0" and "\/head"), it mentions 'Catastrophic backtracking has been detected and the execution of your expression has been halted.' I don't know much about regex, but it seems there is some eternal loop or something.

I'd suggest to try to make the regex work first on other regex compiler, before calling it a python bug.
msg345958 - (view) Author: Aldwin Pollefeyt (aldwinaldwin) * Date: 2019-06-18 07:43
neither the banner contains "<style>"
msg345978 - (view) Author: Matthew Barnett (mrabarnett) * (Python triager) Date: 2019-06-18 10:19
The problem is the "(?:[^<]+|<(?!/head>))*?".

If I simplify it a little I get "(?:[^<]+)*?", which is a repeat within a repeat.

There are many ways in which it could match, and if what follows fails to match (it doesn't because there's no "<style>" in the target string, as  Aldwin pointed out), it'll try them all, which can take a long time.
History
Date User Action Args
2022-04-11 14:59:16adminsetgithub: 81508
2019-06-18 10:19:49mrabarnettsetstatus: open -> closed
resolution: not a bug
messages: + msg345978

stage: resolved
2019-06-18 07:43:16aldwinaldwinsetmessages: + msg345958
2019-06-18 07:36:51aldwinaldwinsetnosy: + aldwinaldwin
messages: + msg345957
2019-06-18 06:31:14aixian lesetmessages: + msg345955
versions: + Python 3.7
2019-06-18 06:17:11aixian lecreate