Issue 24454: Improve the usability of the match object named group API

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/68642

classification

Title:	Improve the usability of the match object named group API
Type:	enhancement	Stage:	resolved
Components:	Library (Lib), Regular Expressions	Versions:	Python 3.6

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:	eric.smith	Nosy List:	THRlWiTi, barry, eric.smith, ezio.melotti, mrabarnett, python-dev, rhettinger, serhiy.storchaka
Priority:	low	Keywords:	easy, needs review, patch

Created on 2015-06-15 03:47 by rhettinger, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
issue-24454-0.diff	eric.smith, 2015-12-07 13:31		review
issue-24454-1.diff	eric.smith, 2016-09-10 03:30		review

Messages (19)
msg245361 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2015-06-15 03:47
The usability, learnability, and readability of match object code would be improved by giving it a more Pythonic API (inspired by ElementTree). Given a search like: data = 'Answer found on row 8 column 12.' mo = re.search(r'row (?P<row>\d+) column (?P<col>\d+)', data) We currently access results with: print(mo.group('col')) print(len(mo.groups()) A better way would look like this: print(mo['col']) print(len(mo)) This would work nicely with string formatting: print('Located coordinate at (%(row)s, %(col)s)' % mo) print('Located coordinate at ({row}, {col})'.format_map(mo))
msg245362 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2015-06-15 04:29
You can use mo.groupdict(). print('Located coordinate at (%(row)s, %(col)s)' % mo.groupdict()) print('Located coordinate at ({row}, {col})'.format_map(mo.groupdict())) As for len(mo), this is ambiguous, as well as indexing with integer indices. You suggest len(mo) be equal len(mo.groups()) and this looks reasonable to me, but in regex len(mo) equals to len(mo.groups())+1 (because mo[1] equals to mo.group(1) or mo.groups()[0]). If indexing will work only with named groups, it would be expected that len(mo) will be equal to len(mo.groupdict()).
msg245363 - (view)	Author: Ezio Melotti (ezio.melotti) *	Date: 2015-06-15 04:39
> print(mo['col']) > print(len(mo)) This has already been discussed in another issue. As Serhiy mentioned, len(mo) and mo[num] would be ambiguous because of the group 0, but mo[name] might be ok.
msg245375 - (view)	Author: Eric V. Smith (eric.smith) *	Date: 2015-06-15 13:50
I'd definitely be for mo['col']. I can't say I've ever used len(mo.groups()). I do have lots of code like: return mo.group('col'), mo.group('row'), mo.group('foo') Using groupdict there is doable but not great. But: return mo['col'], mo['row'], mo['foo'] would be a definite improvement.
msg245376 - (view)	Author: Matthew Barnett (mrabarnett) *	Date: 2015-06-15 14:14
I agree that it would be nice if len(mo) == len(mo.groups()), but Serhiy has explained why that's not the case in the regex module. The regex module does support mo[name], so: print('Located coordinate at (%(row)s, %(col)s)' % mo) print('Located coordinate at ({row}, {col})'.format_map(mo)) already work.
msg245379 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2015-06-15 14:47
The disadvantage of supporting len() is its ambiguousness. Supporting indexing with group name also has disadvantages (benefits already was mentioned above). 1. Indexing with string keys is a part of mapping protocol, and it would be expected that other parts of the protocol if not all are supported (at least len() and iteration), but they are not. 2. If indexing with group names would be supported, it would be expected the support of integer indexes. But this is ambiguous too. This feature would improve the access to named groups (6 characters less to type for every case and better readability), but may be implementing access via attributes would be even better? mo.groupnamespace().col or mo.ns.col?
msg245381 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2015-06-15 15:36
> but may be implementing access via attributes would > be even better? mo.groupnamespace().col or mo.ns.col? The whole point is to eliminate the unnecessary extra level. Contrast using DOM with using ElementTree. The difference in usability and readability is huge. If you do much in the way of regex work, this will be a win: mo['name'], mo['rank'], mo['serialnumber'] There are several problems with trying to turn this into attribute access. One of the usual ones are the conflict between the user fieldnames and the actual methods and attributes of the objects (that is why named tuples have the irritating leading underscore for its own attributes and methods). The other problem is that it interferes with usability when the fieldname is stored in a variable. Contrast, "fieldname='rank'; print(mo[fieldname])" with "fieldname='rank'; print(getattr(mo, fieldname))". I'm happy to abandon the "len(mo)" suggestion, but mo[groupname] would be really nice.
msg256030 - (view)	Author: Eric V. Smith (eric.smith) *	Date: 2015-12-06 20:54
I've been playing around with this. My implementation is basically the naive: def __getitem__(self, value): return self.group(value) I have the following tests passing: def test_match_getitem(self): pat = re.compile('(?:(?P<a1>a)\|(?P<b2>b))(?P<c3>c)?') m = pat.match('a') self.assertEqual(m['a1'], 'a') self.assertEqual(m['b2'], None) self.assertEqual(m['c3'], None) self.assertEqual(m[0], 'a') self.assertEqual(m[1], 'a') self.assertEqual(m[2], None) self.assertEqual(m[3], None) with self.assertRaises(IndexError): m['X'] m = pat.match('ac') self.assertEqual(m['a1'], 'a') self.assertEqual(m['b2'], None) self.assertEqual(m['c3'], 'c') self.assertEqual(m[0], 'ac') self.assertEqual(m[1], 'a') self.assertEqual(m[2], None) self.assertEqual(m[3], 'c') with self.assertRaises(IndexError): m['X'] # Cannot assign. with self.assertRaises(TypeError): m[0] = 1 # No len(). self.assertRaises(TypeError, len, m) But because I'm just calling group(), you'll notice a few oddities. Namely: - indexing with integers works, as would group(i). See the discussion of omitting len(). - for defined but missing group names, you get None instead of KeyError - for invalid group names, you get IndexError instead of KeyError I can't decide if these are good (because they're consistent with group()), or bad (because they're surprising. I'm interested in your opinions. I'll attach the patch when I'm at a better computer.
msg256061 - (view)	Author: Eric V. Smith (eric.smith) *	Date: 2015-12-07 13:31
Here's the patch. I added some more tests, including tests for ''.format_map(). I think the format_map() tests convince me that keeping None for non-matched groups makes sense.
msg275557 - (view)	Author: Eric V. Smith (eric.smith) *	Date: 2016-09-10 03:30
Updated patch, with docs. I'd like to get this in to 3.6. Can anyone take a look?
msg275785 - (view)	Author: Roundup Robot (python-dev)	Date: 2016-09-11 12:55
New changeset ac0643314d12 by Eric V. Smith in branch 'default': Issue 24454: Improve the usability of the re match object named group API https://hg.python.org/cpython/rev/ac0643314d12
msg275788 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2016-09-11 13:19
Please document this feature in What's News Eric. There is no need to add the __getitem__ method explicitly the the list of methods match_methods. It would be added automatically.
msg275789 - (view)	Author: Eric V. Smith (eric.smith) *	Date: 2016-09-11 13:27
I added a note in Misc/NEWS. Is that not what you mean? I'll look at match_methods.
msg275790 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2016-09-11 13:40
I meant Doc/whatsnew/3.6.rst.
msg275792 - (view)	Author: Roundup Robot (python-dev)	Date: 2016-09-11 13:50
New changeset 3265247e08f0 by Eric V. Smith in branch 'default': Issue 24454: Added whatsnew entry, removed __getitem__ from match_methods. Thanks Serhiy Storchaka. https://hg.python.org/cpython/rev/3265247e08f0
msg275793 - (view)	Author: Eric V. Smith (eric.smith) *	Date: 2016-09-11 13:51
Fixed. Thanks, Serhiy.
msg275794 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2016-09-11 14:05
./Modules/_sre.c:2425:14: warning: ‘match_getitem_doc’ defined but not used [- Wunused-variable]
msg275797 - (view)	Author: Eric V. Smith (eric.smith) *	Date: 2016-09-11 14:18
On 9/11/2016 10:05 AM, Serhiy Storchaka wrote: > > Serhiy Storchaka added the comment: > > ./Modules/_sre.c:2425:14: warning: ‘match_getitem_doc’ defined but not used [- > Wunused-variable] Not in Visual Studio! Standby.
msg275798 - (view)	Author: Roundup Robot (python-dev)	Date: 2016-09-11 14:20
New changeset 9eb38e0f1cad by Eric V. Smith in branch 'default': Issue 24454: Removed unused match_getitem_doc. https://hg.python.org/cpython/rev/9eb38e0f1cad

History
Date	User	Action	Args
2022-04-11 14:58:18	admin	set	github: 68642
2016-10-06 08:43:48	THRlWiTi	set	nosy: + THRlWiTi
2016-09-11 14:20:39	python-dev	set	messages: + msg275798
2016-09-11 14:18:16	eric.smith	set	messages: + msg275797
2016-09-11 14:05:13	serhiy.storchaka	set	messages: + msg275794
2016-09-11 13:51:17	eric.smith	set	messages: + msg275793
2016-09-11 13:50:56	python-dev	set	messages: + msg275792
2016-09-11 13:40:33	serhiy.storchaka	set	messages: + msg275790
2016-09-11 13:27:10	eric.smith	set	messages: + msg275789
2016-09-11 13:19:28	serhiy.storchaka	set	messages: + msg275788
2016-09-11 12:59:03	eric.smith	set	status: open -> closed resolution: fixed stage: patch review -> resolved
2016-09-11 12:55:49	python-dev	set	nosy: + python-dev messages: + msg275785
2016-09-10 03:30:12	eric.smith	set	files: + issue-24454-1.diff messages: + msg275557
2016-06-27 06:41:43	berker.peksag	link	issue19536 superseder
2015-12-07 13:31:21	eric.smith	set	files: + issue-24454-0.diff messages: + msg256061 assignee: eric.smith keywords: + patch, easy, needs review stage: patch review
2015-12-06 20:54:29	eric.smith	set	messages: + msg256030
2015-06-15 15:36:54	rhettinger	set	messages: + msg245381
2015-06-15 14:47:43	serhiy.storchaka	set	messages: + msg245379
2015-06-15 14:14:00	mrabarnett	set	messages: + msg245376
2015-06-15 13:50:13	eric.smith	set	nosy: + eric.smith messages: + msg245375
2015-06-15 11:20:35	barry	set	nosy: + barry
2015-06-15 04:39:52	ezio.melotti	set	messages: + msg245363
2015-06-15 04:29:28	serhiy.storchaka	set	nosy: + ezio.melotti, serhiy.storchaka, mrabarnett messages: + msg245362 components: + Regular Expressions
2015-06-15 03:47:52	rhettinger	create