This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author effbot
Recipients effbot, gvanrossum, ostkamp
Date 2007-09-23.19:09:59
SpamBayes Score 0.0006339121
Marked as misclassified No
Message-id <1190574600.55.0.276315240503.issue1160@psf.upfronthosting.co.za>
In-reply-to
Content
Well, I'm not sure 81k qualifies as "medium sized", really.  If you look
at the size distribution for typical RE:s (which are usually
handwritten, not machine generated), that's one or two orders of
magnitude larger than "medium".

(And even if this was guaranteed to work on all Python builds, my guess
is that performance would be pretty bad compared to a using a minimal RE
and checking potential matches against a set.  The "|" operator is
mostly O(N), not O(1).)

As for fixing this, the "byte code" used by the RE engine uses a word
size equal to the Unicode character size (sizeof(Py_UNICODE)) for the
given platform.  I don't think it would be that hard to set it to 32
bits also on platforms using 16-bit Unicode characters (if anyone would
like to experiment, just set SRE_CODE to "unsigned long" in sre.h and
see what happens when you run the test suite).
History
Date User Action Args
2007-09-23 19:10:00effbotsetspambayes_score: 0.000633912 -> 0.0006339121
recipients: + effbot, gvanrossum, ostkamp
2007-09-23 19:10:00effbotsetspambayes_score: 0.000633912 -> 0.000633912
messageid: <1190574600.55.0.276315240503.issue1160@psf.upfronthosting.co.za>
2007-09-23 19:10:00effbotlinkissue1160 messages
2007-09-23 19:09:59effbotcreate