Message 83427 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	timehorse
Recipients	akuchling, amaury.forgeotdarc, collinwinter, ezio.melotti, georg.brandl, jaylogan, jimjjewett, loewis, mark, moreati, mrabarnett, nneonneo, pitrou, rsc, timehorse
Date	2009-03-10.12:00:43
SpamBayes Score	0.0
Marked as misclassified	No
Message-id	<1236686448.88.0.877135098691.issue2636@psf.upfronthosting.co.za>
In-reply-to

Content
Okay, as I said, Atomic Grouping, etc., off a recent 2.6 is already available and I can do any cleanups requested to those already mentioned, I just don't want to start any new items at the moment. As it is, we are still over a year from any of this seeing the light of day as it's not going to be merged until we start 2.7 / 3.1 alpha. Fortunately, I think Matthew here DOES have a lot of potential to have everything wrapped up by then, but I think to summarize everyone's concern, we really would like to be able to examine each change incrementally, rather than as a whole. So, for the purposes of this, I would recommend that you, Matthew, make a version of your new engine WITHOUT any Atomic Group, variable length look behind / ahead assertions, reverse string scanning, positional, negated or scoped inline flags, group key indexing or any other feature described in the various issues, and that we then evaluate purely on the merits of the engine itself whether it is worth moving to that engine, and having made that decision officially move all work to that design if warranted. Personally, I'd like to see that 'pure' engine for myself and maybe we can all develop an appropriate benchmark suite to test it fairly against the existing engine. I also think we should consider things like presentation (are all lines terminated by column 80), number of comments, and general readability. IMHO, the current code is conformant in the line length, but VERY deficient WRT comments and readability, the later of which it sacrifices for speed (as well as being retrofitted for iteration rather than recursion). I'm no fan of switch-case, but I found that by turning the various case statements into bite-sized functions and adding many, MANY comments, the code became MUCH more readable at the minor cost of speed. As I think speed trumps readability (though not blindly), I abandoned my work on the engines, but do feel that if we are going to keep the old engine, I should try and adapt my comments to the old framework to make the current code a bit easier to understand since the framework is more or less the same code as in the existing engine, just re-arranged. I think all of the things you've added to your engine, Matthew, can, with varying levels of difficulty be implemented in the existing Regexp Engine, though I'm not suggesting that we start that effort. Simply, let's evaluate fairly whether your engine is worth the switch over. Personally, I think the engine has some potential -- though not much better than current WRT readability -- but we've only heard anecdotal evidence of it's superior speed. Even if the engine isn't faster, developing speed benchmarks that fairly gage any potential new engine would be handy for the next person to have a great idea for a rewrite, so perhaps while you peruse the stripped down version of your engine, the rest of us can work on modifying regex_tests.py, test_re.py and re_tests.py in Lib/test specifically for the purpose of benchmarking. If we can focus on just these two issues ('pure' engine and fair benchmarks) I think I can devote some time to the later as I've dealt a lot with benchmarking (WRT the compiler-cache) and test cases and hope to be a bit more active here.

Okay, as I said, Atomic Grouping, etc., off a recent 2.6 is already 
available and I can do any cleanups requested to those already 
mentioned, I just don't want to start any new items at the moment.  As 
it is, we are still over a year from any of this seeing the light of day 
as it's not going to be merged until we start 2.7 / 3.1 alpha.

Fortunately, I think Matthew here DOES have a lot of potential to have 
everything wrapped up by then, but I think to summarize everyone's 
concern, we really would like to be able to examine each change 
incrementally, rather than as a whole.  So, for the purposes of this, I 
would recommend that you, Matthew, make a version of your new engine 
WITHOUT any Atomic Group, variable length look behind / ahead 
assertions, reverse string scanning, positional, negated or scoped 
inline flags, group key indexing or any other feature described in the 
various issues, and that we then evaluate purely on the merits of the 
engine itself whether it is worth moving to that engine, and having made 
that decision officially move all work to that design if warranted.  
Personally, I'd like to see that 'pure' engine for myself and maybe we 
can all develop an appropriate benchmark suite to test it fairly against 
the existing engine.  I also think we should consider things like 
presentation (are all lines terminated by column 80), number of 
comments, and general readability.  IMHO, the current code is conformant 
in the line length, but VERY deficient WRT comments and readability, the 
later of which it sacrifices for speed (as well as being retrofitted for 
iteration rather than recursion).  I'm no fan of switch-case, but I 
found that by turning the various case statements into bite-sized 
functions and adding many, MANY comments, the code became MUCH more 
readable at the minor cost of speed.  As I think speed trumps 
readability (though not blindly), I abandoned my work on the engines, 
but do feel that if we are going to keep the old engine, I should try 
and adapt my comments to the old framework to make the current code a 
bit easier to understand since the framework is more or less the same 
code as in the existing engine, just re-arranged.

I think all of the things you've added to your engine, Matthew, can, 
with varying levels of difficulty be implemented in the existing Regexp 
Engine, though I'm not suggesting that we start that effort.  Simply, 
let's evaluate fairly whether your engine is worth the switch over.  
Personally, I think the engine has some potential -- though not much 
better than current WRT readability -- but we've only heard anecdotal 
evidence of it's superior speed.  Even if the engine isn't faster, 
developing speed benchmarks that fairly gage any potential new engine 
would be handy for the next person to have a great idea for a rewrite, 
so perhaps while you peruse the stripped down version of your engine, 
the rest of us can work on modifying regex_tests.py, test_re.py and 
re_tests.py in Lib/test specifically for the purpose of benchmarking.

If we can focus on just these two issues ('pure' engine and fair 
benchmarks) I think I can devote some time to the later as I've dealt a 
lot with benchmarking (WRT the compiler-cache) and test cases and hope 
to be a bit more active here.

History
Date	User	Action	Args
2009-03-10 12:00:49	timehorse	set	recipients: + timehorse, loewis, akuchling, georg.brandl, collinwinter, jimjjewett, amaury.forgeotdarc, pitrou, nneonneo, rsc, mark, ezio.melotti, mrabarnett, jaylogan, moreati
2009-03-10 12:00:48	timehorse	set	messageid: <1236686448.88.0.877135098691.issue2636@psf.upfronthosting.co.za>
2009-03-10 12:00:47	timehorse	link	issue2636 messages
2009-03-10 12:00:43	timehorse	create