This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author jblangston
Recipients ezio.melotti, jblangston, mrabarnett
Date 2022-02-03.18:40:17
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1643913617.99.0.318274176206.issue46627@roundup.psfhosted.org>
In-reply-to
Content
The following code will cause Python's regex engine to hang apparently indefinitely: 

import re
message = "Flushed to [BigTableReader(path='/data/cassandra/data/log/logEntry_202202-e68971800b2711ecaf770d5fa3f5ae87/md-112-big-Data.db')] (1 sstables, 8,650MiB), biggest 8,650MiB, smallest 8,650MiB"
regex = re.compile(r"Flushed to \[(?P<sstables>[^]]+)+\] \((?P<sstable_count>[^ ]+) sstables, (?P<total_size>[^)]+)\), biggest (?P<biggest_size>[^,]+), smallest (?P<smallest_size>[^ ]+)( \((?P<duration>\d+)ms\))?")
regex.match(message)

This may be a case of exponential backtracking similar to #35915 or #30973. Both of these issues have been closed as Wont Fix, and I suspect my issue is similar. The use of commas for decimal points in the input string was not anticipated but happened due to localization of the logs that the message came from.  The regex works properly when the decimal point is a period.

I will try to rewrite my regex to address this specific issue, but it's hard to anticipate every possible input and craft a bulletproof regex, so something like this kind of thing can be used for a denial of service attack (intentional or not). In this case the regex was used in an automated import process and caused the process to back up for many hours before someone noticed.  Maybe a solution could be to add a timeout option to the regex engine so it will give up and throw an exception if the regex executes for longer than the configured timeout.
History
Date User Action Args
2022-02-03 18:40:18jblangstonsetrecipients: + jblangston, ezio.melotti, mrabarnett
2022-02-03 18:40:17jblangstonsetmessageid: <1643913617.99.0.318274176206.issue46627@roundup.psfhosted.org>
2022-02-03 18:40:17jblangstonlinkissue46627 messages
2022-02-03 18:40:17jblangstoncreate