classification
Title: re.findall does not always return a list of strings
Type: behavior Stage:
Components: Regular Expressions Versions: Python 2.5
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: Phillip.M.Feldman@gmail.com, ash, mrabarnett
Priority: normal Keywords:

Created on 2009-08-07 06:00 by Phillip.M.Feldman@gmail.com, last changed 2009-08-09 11:40 by pitrou. This issue is now closed.

Messages (4)
msg91393 - (view) Author: Phillip M. Feldman (Phillip.M.Feldman@gmail.com) Date: 2009-08-07 06:00
As per the Python documentation, the following regular expression should
produce a list containing the strings '6.7', 7.33', and '9':

re.findall('(-?\d+[.]\d+)|(-?\d+[.]?)|(-?[.]\d+)', 'asdf6.7jjjj7.33ff9')

Instead, it generates a list of tuples.  Either the documentation should
be changed to make it consistent with what re.findall is actually doing,
or, better yet, re.findall should be fixed.
msg91401 - (view) Author: Alexey Shamrin (ash) Date: 2009-08-07 12:23
You've made three groups with parentheses. Just drop them:

>>> re.findall('-?\d+[.]\d+|-?\d+[.]?|-?[.]\d+', 'asdf6.7jjjj7.33ff9')
['6.7', '7.33', '9']

Everything is according to documentation: "If one or more groups are
present in the pattern, return a list of groups; this will be a list of
tuples if the pattern has more than one group."

http://docs.python.org/library/re.html#re.findall

I would suggest to close this bug.
msg91411 - (view) Author: Phillip M. Feldman (Phillip.M.Feldman@gmail.com) Date: 2009-08-07 19:46
You are right-- the documentation does say this, although it took me a 
while to understand what it means.  Thanks!

It seems as though there's a flaw in the design here, because there 
should be some mechanism for grouping elements of a regular expression 
without having findall treat these as groups for purposes of packaging 
the output.  If someone really wants to get lists of tuples out of 
findall, then it might make sense to input a tuple of strings instead of 
a single string.

Phillip

Alexey Shamrin wrote:
> Alexey Shamrin <shamrin@gmail.com> added the comment:
>
> You've made three groups with parentheses. Just drop them:
>
>   
>>>> re.findall('-?\d+[.]\d+|-?\d+[.]?|-?[.]\d+', 'asdf6.7jjjj7.33ff9')
>>>>         
> ['6.7', '7.33', '9']
>
> Everything is according to documentation: "If one or more groups are
> present in the pattern, return a list of groups; this will be a list of
> tuples if the pattern has more than one group."
>
> http://docs.python.org/library/re.html#re.findall
>
> I would suggest to close this bug.
>
> ----------
> nosy: +ash
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue6663>
> _______________________________________
>
>
msg91415 - (view) Author: Matthew Barnett (mrabarnett) * (Python triager) Date: 2009-08-07 23:54
In a regular expression (...) will group and capture, whereas (?:...)
will only group and not capture.
History
Date User Action Args
2009-08-09 11:40:01pitrousetstatus: open -> closed
resolution: not a bug
2009-08-07 23:54:46mrabarnettsetnosy: + mrabarnett
messages: + msg91415
2009-08-07 19:46:18Phillip.M.Feldman@gmail.comsetmessages: + msg91411
2009-08-07 12:23:34ashsetnosy: + ash
messages: + msg91401
2009-08-07 06:00:12Phillip.M.Feldman@gmail.comcreate