Message83427
Okay, as I said, Atomic Grouping, etc., off a recent 2.6 is already
available and I can do any cleanups requested to those already
mentioned, I just don't want to start any new items at the moment. As
it is, we are still over a year from any of this seeing the light of day
as it's not going to be merged until we start 2.7 / 3.1 alpha.
Fortunately, I think Matthew here DOES have a lot of potential to have
everything wrapped up by then, but I think to summarize everyone's
concern, we really would like to be able to examine each change
incrementally, rather than as a whole. So, for the purposes of this, I
would recommend that you, Matthew, make a version of your new engine
WITHOUT any Atomic Group, variable length look behind / ahead
assertions, reverse string scanning, positional, negated or scoped
inline flags, group key indexing or any other feature described in the
various issues, and that we then evaluate purely on the merits of the
engine itself whether it is worth moving to that engine, and having made
that decision officially move all work to that design if warranted.
Personally, I'd like to see that 'pure' engine for myself and maybe we
can all develop an appropriate benchmark suite to test it fairly against
the existing engine. I also think we should consider things like
presentation (are all lines terminated by column 80), number of
comments, and general readability. IMHO, the current code is conformant
in the line length, but VERY deficient WRT comments and readability, the
later of which it sacrifices for speed (as well as being retrofitted for
iteration rather than recursion). I'm no fan of switch-case, but I
found that by turning the various case statements into bite-sized
functions and adding many, MANY comments, the code became MUCH more
readable at the minor cost of speed. As I think speed trumps
readability (though not blindly), I abandoned my work on the engines,
but do feel that if we are going to keep the old engine, I should try
and adapt my comments to the old framework to make the current code a
bit easier to understand since the framework is more or less the same
code as in the existing engine, just re-arranged.
I think all of the things you've added to your engine, Matthew, can,
with varying levels of difficulty be implemented in the existing Regexp
Engine, though I'm not suggesting that we start that effort. Simply,
let's evaluate fairly whether your engine is worth the switch over.
Personally, I think the engine has some potential -- though not much
better than current WRT readability -- but we've only heard anecdotal
evidence of it's superior speed. Even if the engine isn't faster,
developing speed benchmarks that fairly gage any potential new engine
would be handy for the next person to have a great idea for a rewrite,
so perhaps while you peruse the stripped down version of your engine,
the rest of us can work on modifying regex_tests.py, test_re.py and
re_tests.py in Lib/test specifically for the purpose of benchmarking.
If we can focus on just these two issues ('pure' engine and fair
benchmarks) I think I can devote some time to the later as I've dealt a
lot with benchmarking (WRT the compiler-cache) and test cases and hope
to be a bit more active here. |
|
Date |
User |
Action |
Args |
2009-03-10 12:00:49 | timehorse | set | recipients:
+ timehorse, loewis, akuchling, georg.brandl, collinwinter, jimjjewett, amaury.forgeotdarc, pitrou, nneonneo, rsc, mark, ezio.melotti, mrabarnett, jaylogan, moreati |
2009-03-10 12:00:48 | timehorse | set | messageid: <1236686448.88.0.877135098691.issue2636@psf.upfronthosting.co.za> |
2009-03-10 12:00:47 | timehorse | link | issue2636 messages |
2009-03-10 12:00:43 | timehorse | create | |
|