This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author timehorse
Recipients adi, akuchling, effbot, gpolo, greg@gregdetre.co.uk, gvanrossum, mathieu.clabaut, ostkamp, rsc, timehorse
Date 2008-09-25.12:19:00
SpamBayes Score 1.2767565e-14
Marked as misclassified No
Message-id <1222345142.41.0.615630232371.issue1160@psf.upfronthosting.co.za>
In-reply-to
Content
It seems that changing the size type of the Regular Expression Byte-code
is a nice quick-fix, even though it doubles the size of a pattern.  It
may have the added benefit that most machine architectures available
today are at least partially, if not fully, 32-bit oriented so that
retrieving op codes may in fact be faster if we make this change.  OTOH,
it implies something interesting IMHO with the repeat count limits we
currently have.  Repeat counts can be explicitly set up to 65534 times
because 65535, being the largest number you can express in a 16-bit
unsigned integer, is currently reserved to mean Infinite.  It seems to
me this is a great opportunity to set that limit to (unsigned long)-1,
since that repeat count is incredibly large.

OTOH, if size is an issue, we could change the way sizes are expressed
in the Regexp Op Codes (typically in skip counts) to be 15-bit, with the
Most Significant Bit being reserved for 'extended' expressions.  In this
way, a value of 0xFFFFFFFF could be expressed as:

0xFFFF 0xFFFF 0x0003

Of course, parsing number in this form is a pain, to say the least, and
unlike in Python, the C-library would not play nicely if someone tried
to express a number that could not fit into what the architecture
defined an int to be.  Plus, there is the problem of how you express
Infinite with this scheme.  The advantage though would be we don't have
to change the op-code size and these 'extended' counts would be very
rare indeed.

Over all, I'm more of an Occam's Razor fan in that the simplest solution
is probably the best: just change the op-code size to unsigned long
(which, on SOME architectures would actually make it 64-bits!) and
define the 'Infinite' constant as (unsigned long)-1.  Mind you, I prefer
defining the constant in Python, not C, and it would be hard for Python
to determine that particular value being that Python is meant to be 'the
same' regardless of the underlying architecture, but that's another issue.

Anyway, as 2.6 is in Beta, this will have to wait for Python 2.7 / 3.1,
and so I will add an item to Issue 2636 with respect to it.
History
Date User Action Args
2008-09-25 12:19:02timehorsesetrecipients: + timehorse, gvanrossum, effbot, akuchling, ostkamp, rsc, mathieu.clabaut, gpolo, greg@gregdetre.co.uk, adi
2008-09-25 12:19:02timehorsesetmessageid: <1222345142.41.0.615630232371.issue1160@psf.upfronthosting.co.za>
2008-09-25 12:19:01timehorselinkissue1160 messages
2008-09-25 12:19:00timehorsecreate