msg245361 - (view) |
Author: Raymond Hettinger (rhettinger) * |
Date: 2015-06-15 03:47 |
The usability, learnability, and readability of match object code would be improved by giving it a more Pythonic API (inspired by ElementTree).
Given a search like:
data = 'Answer found on row 8 column 12.'
mo = re.search(r'row (?P<row>\d+) column (?P<col>\d+)', data)
We currently access results with:
print(mo.group('col'))
print(len(mo.groups())
A better way would look like this:
print(mo['col'])
print(len(mo))
This would work nicely with string formatting:
print('Located coordinate at (%(row)s, %(col)s)' % mo)
print('Located coordinate at ({row}, {col})'.format_map(mo))
|
msg245362 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2015-06-15 04:29 |
You can use mo.groupdict().
print('Located coordinate at (%(row)s, %(col)s)' % mo.groupdict())
print('Located coordinate at ({row}, {col})'.format_map(mo.groupdict()))
As for len(mo), this is ambiguous, as well as indexing with integer indices. You suggest len(mo) be equal len(mo.groups()) and this looks reasonable to me, but in regex len(mo) equals to len(mo.groups())+1 (because mo[1] equals to mo.group(1) or mo.groups()[0]). If indexing will work only with named groups, it would be expected that len(mo) will be equal to len(mo.groupdict()).
|
msg245363 - (view) |
Author: Ezio Melotti (ezio.melotti) * |
Date: 2015-06-15 04:39 |
> print(mo['col'])
> print(len(mo))
This has already been discussed in another issue. As Serhiy mentioned, len(mo) and mo[num] would be ambiguous because of the group 0, but mo[name] might be ok.
|
msg245375 - (view) |
Author: Eric V. Smith (eric.smith) * |
Date: 2015-06-15 13:50 |
I'd definitely be for mo['col']. I can't say I've ever used len(mo.groups()).
I do have lots of code like:
return mo.group('col'), mo.group('row'), mo.group('foo')
Using groupdict there is doable but not great. But:
return mo['col'], mo['row'], mo['foo']
would be a definite improvement.
|
msg245376 - (view) |
Author: Matthew Barnett (mrabarnett) * |
Date: 2015-06-15 14:14 |
I agree that it would be nice if len(mo) == len(mo.groups()), but Serhiy has explained why that's not the case in the regex module.
The regex module does support mo[name], so:
print('Located coordinate at (%(row)s, %(col)s)' % mo)
print('Located coordinate at ({row}, {col})'.format_map(mo))
already work.
|
msg245379 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2015-06-15 14:47 |
The disadvantage of supporting len() is its ambiguousness. Supporting indexing with group name also has disadvantages (benefits already was mentioned above).
1. Indexing with string keys is a part of mapping protocol, and it would be expected that other parts of the protocol if not all are supported (at least len() and iteration), but they are not.
2. If indexing with group names would be supported, it would be expected the support of integer indexes. But this is ambiguous too.
This feature would improve the access to named groups (6 characters less to type for every case and better readability), but may be implementing access via attributes would be even better? mo.groupnamespace().col or mo.ns.col?
|
msg245381 - (view) |
Author: Raymond Hettinger (rhettinger) * |
Date: 2015-06-15 15:36 |
> but may be implementing access via attributes would
> be even better? mo.groupnamespace().col or mo.ns.col?
The whole point is to eliminate the unnecessary extra level.
Contrast using DOM with using ElementTree. The difference
in usability and readability is huge. If you do much in the
way of regex work, this will be a win:
mo['name'], mo['rank'], mo['serialnumber']
There are several problems with trying to turn this into attribute access. One of the usual ones are the conflict between the user fieldnames and the actual methods and attributes of the objects (that is why named tuples have the irritating leading underscore for its own attributes and methods). The other problem is that it interferes with usability when the fieldname is stored in a variable. Contrast, "fieldname='rank'; print(mo[fieldname])" with "fieldname='rank'; print(getattr(mo, fieldname))".
I'm happy to abandon the "len(mo)" suggestion, but mo[groupname] would be really nice.
|
msg256030 - (view) |
Author: Eric V. Smith (eric.smith) * |
Date: 2015-12-06 20:54 |
I've been playing around with this. My implementation is basically the naive:
def __getitem__(self, value):
return self.group(value)
I have the following tests passing:
def test_match_getitem(self):
pat = re.compile('(?:(?P<a1>a)|(?P<b2>b))(?P<c3>c)?')
m = pat.match('a')
self.assertEqual(m['a1'], 'a')
self.assertEqual(m['b2'], None)
self.assertEqual(m['c3'], None)
self.assertEqual(m[0], 'a')
self.assertEqual(m[1], 'a')
self.assertEqual(m[2], None)
self.assertEqual(m[3], None)
with self.assertRaises(IndexError):
m['X']
m = pat.match('ac')
self.assertEqual(m['a1'], 'a')
self.assertEqual(m['b2'], None)
self.assertEqual(m['c3'], 'c')
self.assertEqual(m[0], 'ac')
self.assertEqual(m[1], 'a')
self.assertEqual(m[2], None)
self.assertEqual(m[3], 'c')
with self.assertRaises(IndexError):
m['X']
# Cannot assign.
with self.assertRaises(TypeError):
m[0] = 1
# No len().
self.assertRaises(TypeError, len, m)
But because I'm just calling group(), you'll notice a few oddities. Namely:
- indexing with integers works, as would group(i). See the discussion of omitting len().
- for defined but missing group names, you get None instead of KeyError
- for invalid group names, you get IndexError instead of KeyError
I can't decide if these are good (because they're consistent with group()), or bad (because they're surprising. I'm interested in your opinions.
I'll attach the patch when I'm at a better computer.
|
msg256061 - (view) |
Author: Eric V. Smith (eric.smith) * |
Date: 2015-12-07 13:31 |
Here's the patch.
I added some more tests, including tests for ''.format_map(). I think the format_map() tests convince me that keeping None for non-matched groups makes sense.
|
msg275557 - (view) |
Author: Eric V. Smith (eric.smith) * |
Date: 2016-09-10 03:30 |
Updated patch, with docs. I'd like to get this in to 3.6. Can anyone take a look?
|
msg275785 - (view) |
Author: Roundup Robot (python-dev) |
Date: 2016-09-11 12:55 |
New changeset ac0643314d12 by Eric V. Smith in branch 'default':
Issue 24454: Improve the usability of the re match object named group API
https://hg.python.org/cpython/rev/ac0643314d12
|
msg275788 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2016-09-11 13:19 |
Please document this feature in What's News Eric.
There is no need to add the __getitem__ method explicitly the the list of methods match_methods. It would be added automatically.
|
msg275789 - (view) |
Author: Eric V. Smith (eric.smith) * |
Date: 2016-09-11 13:27 |
I added a note in Misc/NEWS. Is that not what you mean?
I'll look at match_methods.
|
msg275790 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2016-09-11 13:40 |
I meant Doc/whatsnew/3.6.rst.
|
msg275792 - (view) |
Author: Roundup Robot (python-dev) |
Date: 2016-09-11 13:50 |
New changeset 3265247e08f0 by Eric V. Smith in branch 'default':
Issue 24454: Added whatsnew entry, removed __getitem__ from match_methods. Thanks Serhiy Storchaka.
https://hg.python.org/cpython/rev/3265247e08f0
|
msg275793 - (view) |
Author: Eric V. Smith (eric.smith) * |
Date: 2016-09-11 13:51 |
Fixed. Thanks, Serhiy.
|
msg275794 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2016-09-11 14:05 |
./Modules/_sre.c:2425:14: warning: ‘match_getitem_doc’ defined but not used [-
Wunused-variable]
|
msg275797 - (view) |
Author: Eric V. Smith (eric.smith) * |
Date: 2016-09-11 14:18 |
On 9/11/2016 10:05 AM, Serhiy Storchaka wrote:
>
> Serhiy Storchaka added the comment:
>
> ./Modules/_sre.c:2425:14: warning: ‘match_getitem_doc’ defined but not used [-
> Wunused-variable]
Not in Visual Studio!
Standby.
|
msg275798 - (view) |
Author: Roundup Robot (python-dev) |
Date: 2016-09-11 14:20 |
New changeset 9eb38e0f1cad by Eric V. Smith in branch 'default':
Issue 24454: Removed unused match_getitem_doc.
https://hg.python.org/cpython/rev/9eb38e0f1cad
|
|
Date |
User |
Action |
Args |
2022-04-11 14:58:18 | admin | set | github: 68642 |
2016-10-06 08:43:48 | THRlWiTi | set | nosy:
+ THRlWiTi
|
2016-09-11 14:20:39 | python-dev | set | messages:
+ msg275798 |
2016-09-11 14:18:16 | eric.smith | set | messages:
+ msg275797 |
2016-09-11 14:05:13 | serhiy.storchaka | set | messages:
+ msg275794 |
2016-09-11 13:51:17 | eric.smith | set | messages:
+ msg275793 |
2016-09-11 13:50:56 | python-dev | set | messages:
+ msg275792 |
2016-09-11 13:40:33 | serhiy.storchaka | set | messages:
+ msg275790 |
2016-09-11 13:27:10 | eric.smith | set | messages:
+ msg275789 |
2016-09-11 13:19:28 | serhiy.storchaka | set | messages:
+ msg275788 |
2016-09-11 12:59:03 | eric.smith | set | status: open -> closed resolution: fixed stage: patch review -> resolved |
2016-09-11 12:55:49 | python-dev | set | nosy:
+ python-dev messages:
+ msg275785
|
2016-09-10 03:30:12 | eric.smith | set | files:
+ issue-24454-1.diff
messages:
+ msg275557 |
2016-06-27 06:41:43 | berker.peksag | link | issue19536 superseder |
2015-12-07 13:31:21 | eric.smith | set | files:
+ issue-24454-0.diff messages:
+ msg256061
assignee: eric.smith keywords:
+ patch, easy, needs review stage: patch review |
2015-12-06 20:54:29 | eric.smith | set | messages:
+ msg256030 |
2015-06-15 15:36:54 | rhettinger | set | messages:
+ msg245381 |
2015-06-15 14:47:43 | serhiy.storchaka | set | messages:
+ msg245379 |
2015-06-15 14:14:00 | mrabarnett | set | messages:
+ msg245376 |
2015-06-15 13:50:13 | eric.smith | set | nosy:
+ eric.smith messages:
+ msg245375
|
2015-06-15 11:20:35 | barry | set | nosy:
+ barry
|
2015-06-15 04:39:52 | ezio.melotti | set | messages:
+ msg245363 |
2015-06-15 04:29:28 | serhiy.storchaka | set | nosy:
+ ezio.melotti, serhiy.storchaka, mrabarnett messages:
+ msg245362 components:
+ Regular Expressions
|
2015-06-15 03:47:52 | rhettinger | create | |