Message280814
Thanks, Gareth. That does work.
Interesting that regex does still seem to work linearly with the original
version, but your version seems cleaner.
On Mon, Nov 14, 2016 at 3:55 PM, Gareth Rees <report@bugs.python.org> wrote:
>
> Gareth Rees added the comment:
>
> This is a well-known gotcha with backtracking regexp implementations. The
> problem is that in the alternation "( +|'[^']*'|\"[^\"]*\"|[^>]+)" there
> are some characters (space, apostrophe, double quotes) that match multiple
> alternatives (for example a space matches both " +" and "[^>]+"). This
> causes the regexp engine to have to backtrack for each ambiguous character
> to try out the other alternatives, leading to runtime that's exponential in
> the number of ambiguous characters.
>
> Linear behaviour can be restored if you make the alternation unambiguous,
> like this: ( +|'[^']*'|\"[^\"]*\"|[^>'\"]+)
>
> ----------
> nosy: +Gareth.Rees
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue28690>
> _______________________________________
> |
|
Date |
User |
Action |
Args |
2016-11-14 21:09:42 | Walter Farrell | set | recipients:
+ Walter Farrell, ezio.melotti, mrabarnett, gdr@garethrees.org |
2016-11-14 21:09:42 | Walter Farrell | link | issue28690 messages |
2016-11-14 21:09:42 | Walter Farrell | create | |
|