Issue34304
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2018-08-01 05:13 by sabakauser, last changed 2022-04-11 14:59 by admin. This issue is now closed.
Messages (3) | |||
---|---|---|---|
msg322842 - (view) | Author: Saba Kauser (sabakauser) | Date: 2018-08-01 05:13 | |
Hello, I have a program that works well upto python 3.6 but fails with python 3.7. import re pattern="DBMS_NAME: string(%d) %s" sym = ['\[','\]','\(','\)'] for chr in sym: pattern = re.sub(chr, '\\' + chr, pattern) print(pattern) pattern=re.sub('%s','.*?',pattern) print(pattern) pattern = re.sub('%d', '\\d+', pattern) print(pattern) result=re.match(pattern, "DBMS_NAME: string(8) \"DB2/NT64\" ") print(result) result=re.match("DBMS_NAME python4: string\(\d+\) .*?", "DBMS_NAME python4: string(8) \"DB2/NT64\" ") print(result) expected output: DBMS_NAME: string(%d) %s DBMS_NAME: string(%d) %s DBMS_NAME: string\(%d) %s DBMS_NAME: string\(%d\) %s DBMS_NAME: string\(%d\) .*? DBMS_NAME: string\(\d+\) .*? <re.Match object; span=(0, 21), match='DBMS_NAME: string(8) '> <re.Match object; span=(0, 29), match='DBMS_NAME python4: string(8) '> However, the below statement execution fails with python 3.7: pattern = re.sub('%d', '\\d+', pattern) DBMS_NAME: string(%d) %s DBMS_NAME: string(%d) %s DBMS_NAME: string\(%d) %s DBMS_NAME: string\(%d\) %s DBMS_NAME: string\(%d\) .*? Traceback (most recent call last): File "c:\users\skauser\appdata\local\programs\python\python37\lib\sre_parse.py", line 1021, in parse_template this = chr(ESCAPES[this][1]) KeyError: '\\d' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "pattern.txt", line 11, in <module> pattern = re.sub('%d', '\\d+', pattern) File "c:\users\skauser\appdata\local\programs\python\python37\lib\re.py", line 192, in sub return _compile(pattern, flags).sub(repl, string, count) File "c:\users\skauser\appdata\local\programs\python\python37\lib\re.py", line 309, in _subx template = _compile_repl(template, pattern) File "c:\users\skauser\appdata\local\programs\python\python37\lib\re.py", line 300, in _compile_repl return sre_parse.parse_template(repl, pattern) File "c:\users\skauser\appdata\local\programs\python\python37\lib\sre_parse.py", line 1024, in parse_template raise s.error('bad escape %s' % this, len(this)) re.error: bad escape \d at position 0 if I change the statement to have 3 backslash like pattern = re.sub('%d', '\\\d+', pattern) I can correctly generate correct regular expression. Can you please comment if this has changed in python 3.7 and we need to escape 'd' in '\d' as well ? Thank you! |
|||
msg322853 - (view) | Author: Karthikeyan Singaravelan (xtreak) * | Date: 2018-08-01 10:16 | |
The reported behavior is reproducible in master as well as of ea68d83933 but not on 3.6.0. I couldn't bisect to the exact commit between 3.7.0 and 3.6.0 where this change was introduced though. I can also see some deprecation warnings as below while running the script : ➜ cpython git:(master) ./python.exe ../backups/bpo34034.py ../backups/bpo34034.py:4: DeprecationWarning: invalid escape sequence \[ sym = ['\[','\]','\(','\)'] ../backups/bpo34034.py:4: DeprecationWarning: invalid escape sequence \] sym = ['\[','\]','\(','\)'] ../backups/bpo34034.py:4: DeprecationWarning: invalid escape sequence \( sym = ['\[','\]','\(','\)'] ../backups/bpo34034.py:4: DeprecationWarning: invalid escape sequence \) sym = ['\[','\]','\(','\)'] ../backups/bpo34034.py:15: DeprecationWarning: invalid escape sequence \( result=re.match("DBMS_NAME python4: string\(\d+\) .*?", "DBMS_NAME python4: string(8) \"DB2/NT64\" ") DBMS_NAME: string(%d) %s DBMS_NAME: string(%d) %s DBMS_NAME: string\(%d) %s DBMS_NAME: string\(%d\) %s DBMS_NAME: string\(%d\) .*? Traceback (most recent call last): File "/Users/karthikeyansingaravelan/stuff/python/cpython/Lib/sre_parse.py", line 1045, in parse_template this = chr(ESCAPES[this][1]) KeyError: '\\d' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "../backups/bpo34034.py", line 11, in <module> pattern = re.sub('%d', '\\d+', pattern) File "/Users/karthikeyansingaravelan/stuff/python/cpython/Lib/re.py", line 192, in sub return _compile(pattern, flags).sub(repl, string, count) File "/Users/karthikeyansingaravelan/stuff/python/cpython/Lib/re.py", line 309, in _subx template = _compile_repl(template, pattern) File "/Users/karthikeyansingaravelan/stuff/python/cpython/Lib/re.py", line 300, in _compile_repl return sre_parse.parse_template(repl, pattern) File "/Users/karthikeyansingaravelan/stuff/python/cpython/Lib/sre_parse.py", line 1048, in parse_template raise s.error('bad escape %s' % this, len(this)) re.error: bad escape \d at position 0 Thanks |
|||
msg322854 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * | Date: 2018-08-01 10:19 | |
If you want to replace %d with literal \d, you need to repeat the backslash 4 times: pattern = re.sub('%d', '\\\\d+', pattern) or use a raw string literal and repeat the backslash 2 times: pattern = re.sub('%d', r'\\d+', pattern) Since the backslash has a special meaning in the replacement pattern, it needs to be escaped with a backslash, i.e. duplicated. But since it has a special meaning in Python string literals, every of these backslashes needs to be escaped with a backslash in a non-raw string literal, i.e. repeated 4 times. Python 3.6 is more lenient. It keeps a backslash if it is followed by a character which doesn't compound a known escape sequences in a replacement string. But it emits a deprecation warning, which you can see when run Python with corresponding -W option. $ python3.6 -Wa -c 'import re; print(re.sub("%d", "\d+", "DBMS_NAME: string(%d) %s"))' <string>:1: DeprecationWarning: invalid escape sequence \d /usr/lib/python3.6/re.py:191: DeprecationWarning: bad escape \d return _compile(pattern, flags).sub(repl, string, count) DBMS_NAME: string(\d+) %s $ python3.6 -Wa -c 'import re; print(re.sub("%d", "\\d+", "DBMS_NAME: string(%d) %s"))' /usr/lib/python3.6/re.py:191: DeprecationWarning: bad escape \d return _compile(pattern, flags).sub(repl, string, count) DBMS_NAME: string(\d+) %s $ python3.6 -Wa -c 'import re; print(re.sub("%d", "\\\d+", "DBMS_NAME: string(%d) %s"))' <string>:1: DeprecationWarning: invalid escape sequence \d DBMS_NAME: string(\d+) %s $ python3.6 -Wa -c 'import re; print(re.sub("%d", "\\\\d+", "DBMS_NAME: string(%d) %s"))' DBMS_NAME: string(\d+) %s Here "invalid escape sequence \d" is generated by the Python parser, "bad escape \d" is generated by the RE engine. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:59:04 | admin | set | github: 78485 |
2018-08-01 10:19:42 | serhiy.storchaka | set | status: open -> closed nosy: + serhiy.storchaka messages: + msg322854 resolution: not a bug stage: resolved |
2018-08-01 10:16:15 | xtreak | set | messages: + msg322853 |
2018-08-01 08:32:09 | xtreak | set | nosy:
+ xtreak |
2018-08-01 05:13:39 | sabakauser | create |