This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: REDoS in c_analyzer
Type: security Stage: resolved
Components: Library (Lib) Versions: Python 3.10
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: eric.snow, serhiy.storchaka, yetingli
Priority: normal Keywords: patch

Created on 2020-09-04 11:11 by yetingli, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
info.py yetingli, 2020-09-04 11:11
Pull Requests
URL Status Linked Edit
PR 22091 merged serhiy.storchaka, 2020-09-04 17:27
Messages (6)
msg376355 - (view) Author: yeting li (yetingli) * Date: 2020-09-04 11:11
Hi,

I find this regex "^([a-zA-Z]|_\w*[a-zA-Z]\w*|[a-zA-Z]\w*)$" may be stucked by input.
The vulnerable regex is located in
https://github.com/python/cpython/blob/54a66ade2067c373d31003ad260e1b7d14c81564/Tools/c-analyzer/c_analyzer/common/info.py#L12

The ReDOS vulnerability of the regex is mainly due to the sub-pattern \w*[a-zA-Z]\w*
and can be exploited with the following string
"_" + "a" * 5000 + "!"


I think you can limit the input length or fix this regex.


Looking forward for your response​!

Best,
Yeting Li
msg376358 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-09-04 11:47
I would use

   NAME_RE = re.compile(r'(?![_\d]+\Z)(?!\d)\w+', re.ASCII)

or

   NAME_RE = re.compile(r'(?=.*[A-Za-z])(?!\d)\w+', re.ASCII)

and NAME_RE.fullmatch() instead of NAME_RE.match().

But why identifiers not containing letters are disabled at first place? Is _123 an invalid identifier in C?
msg376366 - (view) Author: yeting li (yetingli) * Date: 2020-09-04 13:24
I think we can replace \w*[a-zA-Z]\w* with (_\d*)+([a-zA-Z]([_\d])*)+

This is an equivalent fix and the fixed regex is safe.

Does that sound right to you?
msg376367 - (view) Author: yeting li (yetingli) * Date: 2020-09-04 13:33
You can use the dk.brics.automaton library to verify whether two regexes are equivalent.
msg376369 - (view) Author: yeting li (yetingli) * Date: 2020-09-04 14:41
I'm sorry there was a typo just now.


replace _\w*[a-zA-Z]\w* with (_\d*)+([a-zA-Z]([_\d])*)+
msg377037 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-09-17 07:34
New changeset dcfaa520c4638a67052a4ff4a2a820be68750ad7 by Serhiy Storchaka in branch 'master':
bpo-41715: Fix potential catastrofic backtracking in c_analyzer. (GH-22091)
https://github.com/python/cpython/commit/dcfaa520c4638a67052a4ff4a2a820be68750ad7
History
Date User Action Args
2022-04-11 14:59:35adminsetgithub: 85881
2020-09-17 07:36:35serhiy.storchakasetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2020-09-17 07:34:42serhiy.storchakasetmessages: + msg377037
2020-09-04 17:27:34serhiy.storchakasetkeywords: + patch
stage: patch review
pull_requests: + pull_request21178
2020-09-04 14:41:11yetinglisetmessages: + msg376369
2020-09-04 13:33:00yetinglisetmessages: + msg376367
2020-09-04 13:24:06yetinglisetmessages: + msg376366
2020-09-04 11:47:11serhiy.storchakasetmessages: + msg376358
2020-09-04 11:18:37serhiy.storchakasetnosy: + eric.snow, serhiy.storchaka
2020-09-04 11:13:19yetinglisettype: security
components: + Library (Lib)
versions: + Python 3.10
2020-09-04 11:12:33yetinglisettitle: REDoS inc_analyzer -> REDoS in c_analyzer
2020-09-04 11:11:24yetinglicreate