This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Regex not evalauated correctly
Type: behavior Stage: resolved
Components: Regular Expressions Versions: Python 3.7, Python 3.6
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: ezio.melotti, hongweipeng, mrabarnett, ngwood111, ram, serhiy.storchaka, xtreak
Priority: normal Keywords:

Created on 2018-08-02 05:52 by ram, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
win12r2py3.7.png hongweipeng, 2018-09-11 06:04
Messages (5)
msg322916 - (view) Author: Raman (ram) Date: 2018-08-02 05:52
Sample code below

import re

regex = r'DELETE\s*(?P<table_alias>[a-zA-z_0-9]*)\s*FROM\s*(?P<table_name>[a-zA-z_0-9]+)\s*([a-zA-Z0-9_]*)\s*(?P<where_statement>WHERE){0,1}(\s.)*?'

test_str = 'DELETE FROM my_table1 t_ WHERE id in (1,2,3)'

matches = re.finditer(regex, test_str, re.MULTILINE)

print([m.groupdict() for m in matches])

Below is the expected output.

[{'table_alias': '', 'table_name': 'my_table1', 'where_statement': 'WHERE'}]

But in Win Server 2012 R2, the output is
[{'table_alias': '', 'table_name': 'mytable1', 'where_statement': None}]

Using 3.7 in Win Server 2012 R2 also the output is not as expected. But in Win 10 and other linux variants, expected output is obtained.
msg324993 - (view) Author: hongweipeng (hongweipeng) * Date: 2018-09-11 06:04
In my test in win2012r2, it works well. Add a screenshot.
msg335840 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019-02-18 14:59
The regular expression engine is not platform depended.

Please check that you run the same code.
msg336366 - (view) Author: noah (ngwood111) * Date: 2019-02-23 05:40
I was able to recreate the 'bad' output on Linux using 'bad' input.

The issue is caused when you misspell WHERE, regex is looking for the exact word "WHERE", any lowercase (where), multicase (WHeRe), or misspelling (WERE) is going to cause it to return None because regex didn't find a matching substring.

I also on a whim tested out a bunch of encodings before realizing it didn't run on bytes objects anyways, so really the only way to get this output is to misspell the input. I think this problem should probably be closed as it's not a bug with the python core.
msg336367 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019-02-23 05:50
For case-insensitive matching you can use the re.IGNORECASE flag.
History
Date User Action Args
2022-04-11 14:59:04adminsetgithub: 78496
2019-02-23 05:50:05serhiy.storchakasetstatus: open -> closed
resolution: not a bug
messages: + msg336367

stage: resolved
2019-02-23 05:40:36ngwood111setstatus: pending -> open
nosy: + ngwood111
messages: + msg336366

2019-02-18 14:59:42serhiy.storchakasetstatus: open -> pending

messages: + msg335840
2018-09-11 06:04:23hongweipengsetfiles: + win12r2py3.7.png
nosy: + hongweipeng
messages: + msg324993

2018-08-02 06:42:34xtreaksetnosy: + xtreak
2018-08-02 05:52:59ramcreate