Author timehorse
Recipients akuchling, amaury.forgeotdarc, jimjjewett, mark, pitrou, rsc, timehorse
Date 2008-09-16.11:59:45
SpamBayes Score 0.0
Marked as misclassified No
Message-id <1221566431.72.0.323986074747.issue2636@psf.upfronthosting.co.za>
In-reply-to
Content
Update 16 Sep 2008:

Based on the work for issue #3825, I would like to simply update the
item list as follows:

1) Atomic Grouping / Possessive Qualifiers (See also Issue #433030)
[Complete]

2) Match group names as attributes (e.g. match.foo) [Complete save
issues outlined above]

3) Match group indexing (e.g. match['foo'], match[3])

4) Perl-style back-references (e.g. compile(r'(a)\g{-1}'), and possibly
adding the r'\k' escape sequence for keywords.

5) Parenthesis-Aware Python Comment (e.g. r'(?P#...)') [Complete]

6) Expose support for Template expressions (expressions without repeat
operators), adding test cases and documentation for existing code.

7) Larger compiled Regexp cache (256 vs. 100) and reduced thrashing
risk. [Complete]

8) Character Classes (e.g. r'[:alphanum:]')

9) Proposed Engine redesigns and cleanups (core item only contains
cleanups and comments to the current design but does not modify the design).

9-1) Single-loop Engine redesign that runs 8% slower than current.
[Complete]

9-1-1) 3-loop Engine redesign that runs 10% slower than current. [Complete]

9-2) Matthew Bernett's Engine redesign as per issue #3825

10) Have all C-Python shared constants stored in 1 place
(sre_constants.py) and generated by that into C constants
(sre_constants.h). [Complete AFAICT]

11) Scan Perl 5.10.0 for other potential additions that could be
implemented for Python.

12) Documentation suggestions by Jim J. Jewett [Complete]

13) Add grouptuples method to the Match object (i.e. match.grouptuples()
returns (<index>, <name or None>, <value>) ) suitable for iteration.

14) UNICODE match group names, as per PEP-3131.

15) Add __doc__ strings and other Python niceties to the Pattern_Type,
Match_Type and Scanner_Type (experimental).

16) Implement any remaining TODOs and FIXMEs in the Regexp modules.

16-1) Allow for the disassociation of a source string from a Match_Type,
assuming this will still leave the object in a "reasonable" state.

17) Variable-length [Positive and Negative] Look-behind assertions, as
described and implemented in Issue #3825.

---

Now, we have a combination of Items 1, 9-2 and 17 available in issue
#3825, so for now, refer to that issue for the 01+09-02+17 combined
solution.  Eventually, I hope to merge the work between this and that issue.

I sadly admit I have made not progress on this since June because
managing 30 some lines of development, some of which having complex
diamond branching, e.g.:

01 is the child of Issue2636
09 is the child of Issue2636
10 is the child of Issue2636
09-01 is the child of 09
09-01-01 is the child of 09-01
01+09 is the child of 01 and 09
01+10 is the child of 01 and 10
09+10 is the child of 09 and 10
01+09-01 is the child of 01 and 09-01
01+09-01-01 is the child of 01 and 09-01-01
09-01+10 is the child of 09-01 and 10
09-01-01+10 is the child of 09-01-01 and 10

Which all seems rather simple until you wrap your head around:

01+09+10 is the child of 01, 09, 10, 01+09, 01+10 AND 09+10!

Keep in mind the reason for all this complex numbering is because many
issues cannot be implemented in a vacuum: If you want Atomic Grouping,
that's 1 implementation, if you want Shared Constants, that's a
different implementation. but if you want BOTH Atomic Grouping and
Shared Constants, that is a wholly other implementation because each
implementation affects the other.  Thus, I end up with a plethora of
branches and a nightmare when it comes to merging which is why I've been
so slow in making progress.  Bazaar seems to be very confused when it
comes to a merge in 6 parts between, for example 01, 09, 10, 01+09,
01+10 and 09+10, as above.  It gets confused when it sees the same
changes applied in a previous merge applied again, instead of simply
realizing that the change in one since last merge is EXACTLY the same
change in the other since last merge so effectively there is nothing to
do; instead, Bazaar gets confused and starts treating code that did NOT
change since last merge as if it was changed and thus tries to role back
the 01+09+10-specific changes rather than doing nothing and generates a
conflict.  Oh, that I could only have a version control system that
understood the kind of complex branching that I require!

Anyway, that's the state of things; this is me, signing out!
History
Date User Action Args
2008-09-16 12:00:32timehorsesetrecipients: + timehorse, akuchling, jimjjewett, amaury.forgeotdarc, pitrou, rsc, mark
2008-09-16 12:00:31timehorsesetmessageid: <1221566431.72.0.323986074747.issue2636@psf.upfronthosting.co.za>
2008-09-16 11:59:48timehorselinkissue2636 messages
2008-09-16 11:59:45timehorsecreate