classification
Title: Wrong end index and subgroup for group match
Type: behavior Stage: resolved
Components: Regular Expressions Versions: Python 3.5, Python 2.7
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: Pavel Cisar, ezio.melotti, mrabarnett, serhiy.storchaka
Priority: normal Keywords:

Created on 2016-10-20 09:41 by Pavel Cisar, last changed 2016-10-20 10:03 by serhiy.storchaka. This issue is now closed.

Messages (2)
msg279024 - (view) Author: Pavel Cisar (Pavel Cisar) Date: 2016-10-20 09:41
Hi,
python re returns wrong end index of searched group and also subgroup itself.

Example:

In: price_string = u"1 307 000,00 Kč"
In: match = re.search(r"([,\.]00)\s?.*$", price_string)
In: print price_string, "|", match.groups(), "|", match.group(0), "|", match.start(0), "|", match.end(0), "|", match.span()
Out: 1 307 000,00 Kč | (u',00',) | ,00 Kč | 9 | 15 | (9, 15)

As I understand documentation start and end functions should return start and endindex of subgroup matched in search. I .groups() output I see subgroup is correct u',00' but end function returns end index for subgroup ',00 Kč'. Also calling specific subgroup .group(0) returns incorrect one ',00 Kč'. It seems match return end index of whole pattern, not only subgroup.
msg279025 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-10-20 10:03
group(0) returns the whole match. group(1) returns the first captured subgroup.
History
Date User Action Args
2016-10-20 10:03:39serhiy.storchakasetstatus: open -> closed

nosy: + serhiy.storchaka
messages: + msg279025

resolution: not a bug
stage: resolved
2016-10-20 09:41:56Pavel Cisarsetversions: + Python 3.5
2016-10-20 09:41:15Pavel Cisarcreate