This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Bad Regular Expression Broke re.findall()
Type: Stage: resolved
Components: Regular Expressions Versions: Python 3.7
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: Callipygean, Windson Yang, ezio.melotti, mrabarnett, steven.daprano
Priority: normal Keywords:

Created on 2018-11-02 13:43 by Callipygean, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Messages (5)
msg329130 - (view) Author: Dan Boxall (Callipygean) Date: 2018-11-02 13:43
Hi, I'm new to regular expressions and while playing around with them I tried this:

>>> rex = '*New Revision:.* ([0-9]+)'
>>> re.findall(rex, text)

and got this:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python\Python37\lib\re.py", line 223, in findall
    return _compile(pattern, flags).findall(string)
  File "C:\Python\Python37\lib\re.py", line 286, in _compile
    p = sre_compile.compile(pattern, flags)
  File "C:\Python\Python37\lib\sre_compile.py", line 764, in compile
    p = sre_parse.parse(p, flags)
  File "C:\Python\Python37\lib\sre_parse.py", line 930, in parse
    p = _parse_sub(source, pattern, flags & SRE_FLAG_VERBOSE, 0)
  File "C:\Python\Python37\lib\sre_parse.py", line 426, in _parse_sub
    not nested and not items))
  File "C:\Python\Python37\lib\sre_parse.py", line 651, in _parse
    source.tell() - here + len(this))
re.error: nothing to repeat at position 0
msg329131 - (view) Author: Windson Yang (Windson Yang) * Date: 2018-11-02 13:56
The last line "re.error: nothing to repeat at position 0" shows that you should not put raw * as the first element, use \* instead.
msg329132 - (view) Author: Dan Boxall (Callipygean) Date: 2018-11-02 13:59
Thank you.  I realised that and if I put a dot in front it worked fine.
But it should not break the function, so they will surely want to fix the
bug?

Kind regards,
Dan Boxall

On Fri, 2 Nov 2018 at 13:56, Windson Yang <report@bugs.python.org> wrote:

>
> Windson Yang <wiwindson@gmail.com> added the comment:
>
> The last line "re.error: nothing to repeat at position 0" shows that you
> should not put raw * as the first element, use \* instead.
>
> ----------
> nosy: +Windson Yang
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <https://bugs.python.org/issue35146>
> _______________________________________
>
msg329133 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2018-11-02 14:21
This is not a bug in Python, it is an invalid (broken) regular expression. There is nothing that the interpreter or the regular expression engine can do, because you are telling it to do something that makes no sense. What do you expect findall to find, if you ask it to find something nonsensical?

You say:

"repeat this pattern any number of times"

but there is no "this pattern" to be repeated. You are asking for something impossible. The only legitimate response is to report back that the regular expression is invalid and cannot be compiled, and fail immediately.
msg329135 - (view) Author: Dan Boxall (Callipygean) Date: 2018-11-02 15:24
Yes I realised that, as I said earlier.  But it could say, "Invalid regular
expression" and not produce ten lines of error messages.

On Fri, 2 Nov 2018 at 14:21, Steven D'Aprano <report@bugs.python.org> wrote:

>
> Steven D'Aprano <steve+python@pearwood.info> added the comment:
>
> This is not a bug in Python, it is an invalid (broken) regular expression.
> There is nothing that the interpreter or the regular expression engine can
> do, because you are telling it to do something that makes no sense. What do
> you expect findall to find, if you ask it to find something nonsensical?
>
> You say:
>
> "repeat this pattern any number of times"
>
> but there is no "this pattern" to be repeated. You are asking for
> something impossible. The only legitimate response is to report back that
> the regular expression is invalid and cannot be compiled, and fail
> immediately.
>
> ----------
> nosy: +steven.daprano
> resolution:  -> not a bug
> stage:  -> resolved
> status: open -> closed
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <https://bugs.python.org/issue35146>
> _______________________________________
>
History
Date User Action Args
2022-04-11 14:59:07adminsetgithub: 79327
2018-11-02 15:24:59Callipygeansetmessages: + msg329135
2018-11-02 14:21:39steven.dapranosetstatus: open -> closed

nosy: + steven.daprano
messages: + msg329133

resolution: not a bug
stage: resolved
2018-11-02 13:59:25Callipygeansetmessages: + msg329132
2018-11-02 13:56:20Windson Yangsetnosy: + Windson Yang
messages: + msg329131
2018-11-02 13:43:06Callipygeancreate