Title: re.findall() takes a long time (100% cup usage) on Python 3.6.10
msg368026 - (view) Author: Sergio Rael (srael) Date: 2020-05-04 09:54
I have found a deadlock using Python 3.6.10 that seems to have been solved on 3.7.x. probably related to capture groups. To reproduce the deadlock just do something like this:

    '\[et_pb_image(?:\w|=|"|\d|\.| |_|\/)*src="(https?:\/\/(?:www\.)?\w*\.\w*(?:\/|\w|\d|\.|-)*\.(?:png|jpg|jpeg|gif))"(?:\w|=|"|\d|\.| |_|\/|%|\|)*(?:\/?\])(?:\[\/et_pb_image\])?',
    '[et_pb_image _builder_version="3.27.2" src="" box_shadow_horizontal_tablet="0px" box_shadow_vertical_tablet="0px" box_shadow_blur_tablet="40px" box_shadow_spread_tablet="0px" z_index_tablet="500" url="" url_new_window="on" /]',

I noticed that the problem is related to having two image urls on the content. The regex says to look only for the one starting with "src=" so the one starting with "url=" should be ignored. If "url=\"XXX\"" is removed from the tag it works fine.
msg368028 - (view) Author: Sergio Rael (srael) Date: 2020-05-04 10:02
Sorry, this is not a deadlock. Python puts the CPU to 100% of usage, but it takes so long that a I didn't know if it can finish the task.
msg368030 - (view) Author: Rémi Lapeyre (remi.lapeyre) * Date: 2020-05-04 10:10
I don't think this is a deadlock rather it is certainly related to the number of '*' there is in your pattern, the regexp has to search an exponentially growing number of patterns. 

You could try a simple pattern to match your attribute and it should be faster.
msg368109 - (view) Author: Sergio Rael (srael) Date: 2020-05-05 07:39
Thank you for your reply Rémi.

I agree with you that the reason can be that the pattern is too complex. I just noticed that in Python 3.7 using the same pattern finish the searchall almost instantaneously, but in 3.6 the CPU goes to 100% and it takes ages to finish. In fact I don't know if this can finish at all because it takes so long that I had to stop it.
I tough it would be a good idea to let you know this behaviour. Of course, after this, I don't use 3.6 anymore.

Thanks again!
msg368119 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-05-05 09:18
It is hard to say what is the problem, but seems it was solved in 3.7. Either it was an optimization, or a bug fix which had such side effect. If it was a bug fix, it was one of backward incompatible bugfixes which are not backported to older versions.
