This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author Walter Farrell
Recipients Walter Farrell, ezio.melotti, gdr@garethrees.org, mrabarnett
Date 2016-11-14.21:09:42
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <CAOETLVFdEmD2+nJsjhNgOVQ6DOACn0vUnS2AfMEV+6WfwCe98Q@mail.gmail.com>
In-reply-to <1479156925.14.0.601715521823.issue28690@psf.upfronthosting.co.za>
Content
Thanks, Gareth. That does work.

Interesting that regex does still seem to work linearly with the original
version, but your version seems cleaner.

On Mon, Nov 14, 2016 at 3:55 PM, Gareth Rees <report@bugs.python.org> wrote:

>
> Gareth Rees added the comment:
>
> This is a well-known gotcha with backtracking regexp implementations. The
> problem is that in the alternation "( +|'[^']*'|\"[^\"]*\"|[^>]+)" there
> are some characters (space, apostrophe, double quotes) that match multiple
> alternatives (for example a space matches both " +" and "[^>]+"). This
> causes the regexp engine to have to backtrack for each ambiguous character
> to try out the other alternatives, leading to runtime that's exponential in
> the number of ambiguous characters.
>
> Linear behaviour can be restored if you make the alternation unambiguous,
> like this: ( +|'[^']*'|\"[^\"]*\"|[^>'\"]+)
>
> ----------
> nosy: +Gareth.Rees
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue28690>
> _______________________________________
>
History
Date User Action Args
2016-11-14 21:09:42Walter Farrellsetrecipients: + Walter Farrell, ezio.melotti, mrabarnett, gdr@garethrees.org
2016-11-14 21:09:42Walter Farrelllinkissue28690 messages
2016-11-14 21:09:42Walter Farrellcreate