Title: Is this a regular expression library bug?
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.5
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: Bruce Eckel, ebarry, tim.peters
Priority: normal Keywords:

Created on 2016-07-21 21:03 by Bruce Eckel, last changed 2016-07-22 20:22 by Bruce Eckel. This issue is now closed.

Messages (7)
msg270956 - (view) Author: Bruce Eckel (Bruce Eckel) Date: 2016-07-21 21:03
This looks suspicious to me, like it could be a library bug, but before chasing it down I was hoping someone might be able to tell me whether I might be on to something:

Traceback (most recent call last):
  File "", line 22, in <module>
    new_javatext = find_output.sub(new_output, javatext)
  File "C:\Python35\lib\", line 325, in _subx
    template = _compile_repl(template, pattern)
  File "C:\Python35\lib\", line 312, in _compile_repl
    p = sre_parse.parse_template(repl, pattern)
  File "C:\Python35\lib\", line 872, in parse_template
    raise s.error("missing <")
sre_constants.error: missing < at position 100 (line 4, column 41)
msg270957 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2016-07-21 21:07
If you don't show us the regular expression, it's going to be darned hard to guess what it is ;-)
msg270958 - (view) Author: Bruce Eckel (Bruce Eckel) Date: 2016-07-21 21:09
Sorry, I thought maybe the error message would be indicative of something. Here's the re:

find_output = re.compile(r"/\* (Output:.*)\*/", re.DOTALL)

Here's the program:

#! py -3
# Requires Python 3.5
# Updates generated output into extracted Java programs in "On Java 8"
from pathlib import Path
import re
import pprint
import sys

if __name__ == '__main__':
    find_output = re.compile(r"/\* (Output:.*)\*/", re.DOTALL)
    for outfile in Path(".").rglob("*.p1"):
        javafile = outfile.with_suffix(".java")
        if not javafile.exists():
            print(str(outfile) + " has no javafile")
        javatext = javafile.read_text()
        if "/* Output:" not in javatext:
            print(str(javafile) + " has no /* Output:")
        new_output = outfile.read_text()
        new_javatext = find_output.sub(new_output, javatext)
msg270960 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2016-07-21 21:30
Well, some backslash escapes are processed in the "replacement" argument to `.sub()`.  If your replacement text contains a substring of the form


not immediately followed by


that will raise the exception you're seeing.  The parser is expecting to see a "matched group" reference after "\g", like


So it depends on the value of your `new_output`.
msg270961 - (view) Author: Bruce Eckel (Bruce Eckel) Date: 2016-07-21 21:51
Urk. There was exactly a \g in the input. Sorry for the bother.
msg270967 - (view) Author: Emanuel Barry (ebarry) * (Python triager) Date: 2016-07-22 02:36
For future reference, if your input can have arbitrary escapes, it might be a good idea to pass it through re.escape; it does proper escaping so that stuff like e.g. \g in your input will get treated as a literal backslash, followed by a literal 'g', and not an escape sequence.
msg271028 - (view) Author: Bruce Eckel (Bruce Eckel) Date: 2016-07-22 20:22
Thank you ebarry, very helpful. Tim, sorry I missed you at Pycon.
Date User Action Args
2016-07-22 20:22:33Bruce Eckelsetmessages: + msg271028
2016-07-22 02:36:52ebarrysettype: compile error -> behavior

messages: + msg270967
nosy: + ebarry
2016-07-21 21:56:32tim.peterssetstatus: open -> closed
stage: resolved
2016-07-21 21:51:43Bruce Eckelsetresolution: not a bug
messages: + msg270961
2016-07-21 21:30:17tim.peterssetmessages: + msg270960
2016-07-21 21:09:43Bruce Eckelsetmessages: + msg270958
2016-07-21 21:07:32tim.peterssetnosy: + tim.peters
messages: + msg270957
2016-07-21 21:03:42Bruce Eckelcreate