Title: REDoS in c_analyzer
Author: yeting li (yetingli) * Date: 2020-09-04 11:11

I find this regex "^([a-zA-Z]|_\w*[a-zA-Z]\w*|[a-zA-Z]\w*)$" may be stucked by input.
The vulnerable regex is located in

The ReDOS vulnerability of the regex is mainly due to the sub-pattern \w*[a-zA-Z]\w*
and can be exploited with the following string
"_" + "a" * 5000 + "!"

I think you can limit the input length or fix this regex.

Looking forward for your response‚Äč!

Yeting Li
Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-09-04 11:47
I would use

   NAME_RE = re.compile(r'(?![_\d]+\Z)(?!\d)\w+', re.ASCII)


   NAME_RE = re.compile(r'(?=.*[A-Za-z])(?!\d)\w+', re.ASCII)

and NAME_RE.fullmatch() instead of NAME_RE.match().

But why identifiers not containing letters are disabled at first place? Is _123 an invalid identifier in C?
Author: yeting li (yetingli) * Date: 2020-09-04 13:24
I think we can replace \w*[a-zA-Z]\w* with (_\d*)+([a-zA-Z]([_\d])*)+

This is an equivalent fix and the fixed regex is safe.

Does that sound right to you?
Author: yeting li (yetingli) * Date: 2020-09-04 13:33
You can use the dk.brics.automaton library to verify whether two regexes are equivalent.
Author: yeting li (yetingli) * Date: 2020-09-04 14:41
I'm sorry there was a typo just now.

replace _\w*[a-zA-Z]\w* with (_\d*)+([a-zA-Z]([_\d])*)+
Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-09-17 07:34
New changeset dcfaa520c4638a67052a4ff4a2a820be68750ad7 by Serhiy Storchaka in branch 'master':
bpo-41715: Fix potential catastrofic backtracking in c_analyzer. (GH-22091)
