Message412450
The following code will cause Python's regex engine to hang apparently indefinitely:
import re
message = "Flushed to [BigTableReader(path='/data/cassandra/data/log/logEntry_202202-e68971800b2711ecaf770d5fa3f5ae87/md-112-big-Data.db')] (1 sstables, 8,650MiB), biggest 8,650MiB, smallest 8,650MiB"
regex = re.compile(r"Flushed to \[(?P<sstables>[^]]+)+\] \((?P<sstable_count>[^ ]+) sstables, (?P<total_size>[^)]+)\), biggest (?P<biggest_size>[^,]+), smallest (?P<smallest_size>[^ ]+)( \((?P<duration>\d+)ms\))?")
regex.match(message)
This may be a case of exponential backtracking similar to #35915 or #30973. Both of these issues have been closed as Wont Fix, and I suspect my issue is similar. The use of commas for decimal points in the input string was not anticipated but happened due to localization of the logs that the message came from. The regex works properly when the decimal point is a period.
I will try to rewrite my regex to address this specific issue, but it's hard to anticipate every possible input and craft a bulletproof regex, so something like this kind of thing can be used for a denial of service attack (intentional or not). In this case the regex was used in an automated import process and caused the process to back up for many hours before someone noticed. Maybe a solution could be to add a timeout option to the regex engine so it will give up and throw an exception if the regex executes for longer than the configured timeout. |
|
Date |
User |
Action |
Args |
2022-02-03 18:40:18 | jblangston | set | recipients:
+ jblangston, ezio.melotti, mrabarnett |
2022-02-03 18:40:17 | jblangston | set | messageid: <1643913617.99.0.318274176206.issue46627@roundup.psfhosted.org> |
2022-02-03 18:40:17 | jblangston | link | issue46627 messages |
2022-02-03 18:40:17 | jblangston | create | |
|