This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author ezio.melotti
Recipients Arfrever, barry, ezio.melotti, loewis, nadeem.vawda, orsenthil, r.david.murray, rosslagerwall, serhiy.storchaka
Date 2012-09-23.18:53:18
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1348426399.19.0.771620737701.issue11454@psf.upfronthosting.co.za>
In-reply-to
Content
> They are precompiled because for a program processing lots of email,
> they are hot spots.

OK, I didn't know they were hot spots.  Note that the regex are not recompiled everytime: they are compiled the first time and then taken from the cache (assuming they don't fall out from the bottom of the cache).  This still has a small overhead though.

> Can you explain your changes to the ecre regex (keeping in mind
> that I don't know much about regex syntax).

-  (?P<charset>[^?]*?)   # non-greedy up to the next ? is the charset
+  (?P<charset>[^?]*)    # up to the next ? is the charset
   \?                    # literal ?
   (?P<encoding>[qb])    # either a "q" or a "b", case insensitive
   \?                    # literal ?
-  (?P<encoded>.*?)      # non-greedy up to the next ?= is the encoded string
+  (?P<encoded>[^?]*)    # up to the next ?= is the encoded string
   \?=                   # literal ?=

At the beginning, the non-greedy *? is unnecessary because [^?]* already stops at the first ? found.
The second change might actually be wrong if <encoded> is allowed to contain lone '?'s.  The original regex used '.*?\?=', which means "match everything (including lone '?'s) until the first '?=')", mine means "match everything until the first '?'" which works fine as long as lone '?'s are not allowed.

Serhiy's suggestion is semantically different, but it might be still suitable if having _has_surrogate return True even for surrogates not in range \udc80-\udcff is OK.
History
Date User Action Args
2012-09-23 18:53:19ezio.melottisetrecipients: + ezio.melotti, loewis, barry, orsenthil, nadeem.vawda, Arfrever, r.david.murray, rosslagerwall, serhiy.storchaka
2012-09-23 18:53:19ezio.melottisetmessageid: <1348426399.19.0.771620737701.issue11454@psf.upfronthosting.co.za>
2012-09-23 18:53:18ezio.melottilinkissue11454 messages
2012-09-23 18:53:18ezio.melotticreate