This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: re.groups() is not checking the arguments
Type: behavior Stage: resolved
Components: Regular Expressions Versions: Python 2.7
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: ezio.melotti, mrabarnett, narendrac, serhiy.storchaka
Priority: normal Keywords:

Created on 2017-11-07 13:14 by narendrac, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Messages (7)
msg305751 - (view) Author: Narendra (narendrac) Date: 2017-11-07 13:14
Hi Team,

I have observed a bug in re.groups() function behavior in Python as below:

Issue:
re.groups() is not validating the arguments

Example:
>>> m = re.match(r'(\w+)@(\w+)\.(\w+)','username@hackerrank.com')
>>> m.groups()
('username', 'hackerrank', 'com')
>>> m.groups(1)
('username', 'hackerrank', 'com')
>>> m.groups(100000000000)
('username', 'hackerrank', 'com')
>>>

From the above, its clear that re.groups() and re.groups(<somevalue>) both are same. I think re.groups() is not validating the arguments.

Please review the same and provide your comments whether my views are correct or wrong
msg305753 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-11-07 13:19
All work as designed. In your example you don't see a difference because all groups are defined. Look at other example:

>>> import re
>>> m = re.match(r'(?:(\w+)@)?(\w+)\.(\w+)', 'hackerrank.com')
>>> m.groups()
(None, 'hackerrank', 'com')
>>> m.groups(1)
(1, 'hackerrank', 'com')
>>> m.groups(100000000000)
(100000000000, 'hackerrank', 'com')
msg305754 - (view) Author: Narendra (narendrac) Date: 2017-11-07 13:21
Please look in to the following example:

>>> m.groups()
('narendra', 'happiestmidns', 'com')
>>> m.groups(1)
('narendra', 'happiestmidns', 'com')
>>> m.groups(34)
('narendra', 'happiestmidns', 'com')
>>> m.groups(10000000000000000000000000000000000000000000000000000000)
('narendra', 'happiestmidns', 'com')
>>>
msg305755 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-11-07 13:23
There is nothing wrong with this output if you use the first example.
msg305809 - (view) Author: Narendra (narendrac) Date: 2017-11-08 06:32
Hi Storchaka,

As per re.groups(), its should work as below:

groups([default])
Return a tuple containing all the subgroups of the match, from 1 up to however many groups are in the pattern. The default argument is used for groups that did not participate in the match; it defaults to None. (Incompatibility note: in the original Python 1.5 release, if the tuple was one element long, a string would be returned instead. In later versions (from 1.5.1 on), a singleton tuple is returned in such cases.)

For example:

>>> m = re.match(r"(\d+)\.(\d+)", "24.1632")
>>> m.groups()
('24', '1632')
If we make the decimal place and everything after it optional, not all groups might participate in the match. These groups will default to None unless the default argument is given:

>>> m = re.match(r"(\d+)\.?(\d+)?", "24")
>>> m.groups()      # Second group defaults to None.
('24', None)
>>> m.groups('0')   # Now, the second group defaults to '0'.
('24', '0')

I tested some scenario as below:
Scenario: Suppose i have a match like 
m = re.match(r"(\d+)\.(\d+)", "24.1632")
Here if i pass m.groups(10000), then it should check if there is optional match (optional match, pattern which is specified using ?), it should print 10000 in that match and if not, it should throw error that there is no any optional match (didn't have any pattern with ?).

Expected Output:
>>> m.groups(10000)
There is no any optional argument to use 10000

Received Output:
>>> m.groups(10000)
'24', '1632')

Please review the above and provide your comments?
msg305810 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-11-08 08:46
No, it should not throw error that there is no any optional match.

It is not easy to check if there is optional match. For example, the pattern '(a)|(b)' contains optional matches, while the group in the pattern '()?' always matches.
msg305884 - (view) Author: Matthew Barnett (mrabarnett) * (Python triager) Date: 2017-11-08 17:39
@Narendra: The argument, if provided, is merely a default. Checking whether it _could_ be used would not be straightforward, and raising an exception if it would never be used would have little, if any, benefit.

It's not a bug, and it's not worth changing.
History
Date User Action Args
2022-04-11 14:58:54adminsetgithub: 76150
2017-11-08 17:54:02serhiy.storchakasetresolution: not a bug
2017-11-08 17:39:30mrabarnettsetstatus: open -> closed

messages: + msg305884
2017-11-08 08:46:10serhiy.storchakasetmessages: + msg305810
2017-11-08 06:32:41narendracsetstatus: closed -> open
resolution: not a bug -> (no value)
messages: + msg305809
2017-11-07 13:24:28serhiy.storchakasetstatus: open -> closed
2017-11-07 13:23:25serhiy.storchakasetmessages: + msg305755
2017-11-07 13:21:42narendracsetstatus: closed -> open

messages: + msg305754
2017-11-07 13:19:20serhiy.storchakasetstatus: open -> closed

nosy: + serhiy.storchaka
messages: + msg305753

resolution: not a bug
stage: resolved
2017-11-07 13:14:30narendraccreate