Message68336
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Author | timehorse |
---|---|
Recipients | akuchling, amaury.forgeotdarc, jimjjewett, mark, pitrou, rsc, timehorse |
Date | 2008-06-17.17:43:20 |
SpamBayes Score | 7.873811e-06 |
Marked as misclassified | No |
Message-id | <1213724620.33.0.615366054985.issue2636@psf.upfronthosting.co.za> |
In-reply-to |
Content | |
---|---|
Well, it's time for another update on my progress... Some good news first: Atomic Grouping is now completed, tested and documented, and as stated above, is classified as issue2636-01 and related patches. Secondly, with caveats listed below, Named Match Group Attributes on a match object (item 2) is also more or less complete at issue2636-02 -- it only lacks documentation. Now, I want to also update my list of items. We left off at 11: Other Perl-specific modifications. Since that time, I have spawned a number of other branches, the first of which (issue2636-12) I am happy to announce is also complete! 12) Implement the changes to the documentation of re as per Jim J. Jewett suggestion from 2008-04-24 14:09. Again, this has been done. 13) Implement a grouptuples(...) method as per Mark Summerfield's suggest on 2008-05-28 09:38. grouptuples would take the same filtering parameters as the other group* functions, and would return a list of 3- tuples (unless only 1 group was requested). It should default to all match groups (1..n, not group 0, the matching string). 14) As per PEP-3131 and the move to Python 3.0, python will begin to allow full UNICODE-compliant identifier names. Correspondingly, it would be the responsibility of this item to allow UNICODE names for match groups. This would allow retrieval of UNICODE names via the group* functions or when combined with Item 3, the getitem handler (m[u'...']) (03+14) and the attribute name itself (e.g. getattr(m, u'...')) when combined with item 2 (02+14). 15) Change the Pattern_Type, Match_Type and Scanner_Type (experimental) to become richer Python Types. Specifically, add __doc__ strings to each of these types' methods and members. 16) Implement various FIXMEs. 16-1) Implement the FIXME such that if m is a MatchObject, del m.string will disassociate the original matched string from the match object; string would be the only member that would allow modification or deletion and you will not be able to modify the m.string value, only delete it. ----- Finally, I want to say a couple notes about Item 2: Firstly, as noted in Item 14, I wish to add support for UNICODE match group names, and the current version of the C-code would not allow that; it would only make sense to add UNICODE support if 14 is implemented, so adding support for UNICODE match object attributes would depend on both items 2 and 14. Thus, that would be implemented in issue2636-02+14. Secondly, there is a FIXME which I discussed in Item 16; I gave that problem it's own item and branch. Also, as stated in Item 15, I would like to add more robust help code to the Match object and bind __doc__ strings to the fixed attributes. Although this would not directly effect the Item 2 implementation, it would probably involve moving some code around in its vicinity. Finally, I would like suggestions on how to handle name collisions when match group names are provided as attributes. For instance, an expression like '(?P<pos>.*)' would match more or less any string and assign it to the name "pos". But "pos" is already an attribute of the Match object, and therefore pos cannot be exposed as a named match group attribute, since match.pos will return the usual meaning of pos for a match object, not the value of the capture group names "pos". I have 3 proposals as to how to handle this: a) Simply disallow the exposure of match group name attributes if the names collide with an existing member of the basic Match Object interface. b) Expose the reserved names through a special prefix notation, and for forward compatibility, expose all names via this prefix notation. In other words, if the prefix was 'k', match.kpos could be used to access pos; if it was '_', match._pos would be used. If Item 3 is implemented, it may be sufficient to allow access via match['pos'] as the canonical way of handling match group names using reserved words. c) Don't expose the names directly; only expose them through a prefixed name, e.g. match._pos or match.kpos. Personally, I like a because if Item 3 is implemented, it makes a fairly useful shorthand for retrieving keyword names when a keyword is used for a name. Also, we could put a deprecation warning in to inform users that eventually match groups names that are keywords in the Match Object will eventually be disallowed. However, I don't support restricting the match group names any more than they already are (they must be a valid python identifier only) so again I would go with a) and nothing more and that's what's implemented in issue2636-02.patch. ----- Now, rather than posting umteen patch files I am posting one bz2- compressed tar of ALL patch files for all threads, where each file is of the form: issue2636(-\d\d|+\d\d)*(-only)?.patch For instance, issue2636-01.patch is the p1 patch that is a difference between the current Python trunk and all that would need to be implemented to support Atomic Grouping / Possessive Qualifiers. Combined branches are combined with a PLUS ('+') and sub-branches concatenated with a DASH ('- '). Thus, "issue2636-01+09-01-01+10.patch" is a patch which combines the work from Item 1: Atomic Grouping / Possessive Qualifiers, the sub- sub branch of Item 9: Engine Cleanups and Item 10: Shared Constants. Item 9 has both a child and a grandchild. The Child (09-01) is my proposed engine redesign with the single loop; the grandchild (09-01-01) is the redesign with the triple loop. Finally the optional "-only" flag means that the diff is against the core SRE modifications branch and thus does not include the core branch changes. As noted above, Items 01, 02, 05, 07 and 12 should be considered more or less complete and ready for merging assuming I don't identify in my implementation of the other items that I neglected something in these. The rest, including the combined items, are all provided in the given tarball. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2008-06-17 17:43:41 | timehorse | set | spambayes_score: 7.87381e-06 -> 7.873811e-06 recipients: + timehorse, akuchling, jimjjewett, amaury.forgeotdarc, pitrou, rsc, mark |
2008-06-17 17:43:40 | timehorse | set | spambayes_score: 7.87381e-06 -> 7.87381e-06 messageid: <1213724620.33.0.615366054985.issue2636@psf.upfronthosting.co.za> |
2008-06-17 17:43:39 | timehorse | link | issue2636 messages |
2008-06-17 17:43:33 | timehorse | create |