This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Regular expression split fails on 3.6 and not 2.7 or 3.7+
Type: crash Stage: resolved
Components: Regular Expressions Versions: Python 3.6
process
Status: closed Resolution: wont fix
Dependencies: Superseder:
Assigned To: Nosy List: ezio.melotti, mrabarnett, probinso, serhiy.storchaka
Priority: normal Keywords:

Created on 2021-02-14 08:17 by probinso, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Messages (2)
msg386942 - (view) Author: Philip (probinso) Date: 2021-02-14 08:17
I am receiving an unexpected behavior in using regular expressions for splitting a string. It seems like this error exists in `python 3.6` but not `python 2.7` and not `python 3.7+`. Below I have described a minimal example with `tox`.

`setup.py`
```
from setuptools import setup
setup(
    name='my-tox-tested-package',
    version='0.0.1',
    install_requires=["pytest"]
)
```

`tests/test_re.py`
```
import re
import pytest

_DIGIT_BOUNDARY_RE = re.compile(
    r'(?<=\D)(?=\d)|(?<=\d)(?=\D)'
)

def test():
    _DIGIT_BOUNDARY_RE.split("10.0.0")
```

`tox.ini`
```
[tox]
envlist = py27, py36, py37
requires=
  pytest

[testenv]
commands =
    pytest {posargs: tests}
```
```
============================================= FAILURES ================================
_____________________________________________ test ____________________________________

    def test():
>       _DIGIT_BOUNDARY_RE.split("10.0.0")
E       ValueError: split() requires a non-empty pattern match.

tests/test_god.py:9: ValueError
============================================ short test summary info ==================
...

============================================ test session starts ======================
platform linux -- Python 3.7.5, pytest-6.2.2, py-1.10.0, pluggy-0.13.1
rootdir: /home/probinson/code
collected 1 item

tests/test_re.py .                                                                [100%]

============================================ 1 passed in 0.00s =========================
____________________________________________ summary ___________________________________
  py27: commands succeeded
ERROR:   py36: commands failed
  py37: commands succeeded

```
msg386943 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2021-02-14 09:19
There was a bug in the regular expression engine which caused re.split() working incorrectly with zero-width patterns. Note that in your example _DIGIT_BOUNDARY_RE.split("10.0.0") returns ['10.0.0'] on Python 2.7 -- the result which you unlikely expected.

It was impossible to fix that bug without changing behavior of other functions in corner cases and breaking existing code. So we first made re.split() raising an exception instead of returning nonsensical result and added warnings for some other cases to help users to catch potential bugs in their code and avoid ambiguous patterns. You see this in 3.6. In 3.7 we fixed the underlying bug. It caused breakage of some user code, but it made regular expressions more consistent in long perspective and made zero-width patterns more usable.

In your particular case, if you still need to support Python 2.7 and 3.6, try to use re.split() with pattern r'(\D+)' or r'(\d+)' (parentheses are meaningful here). It gives almost the same result, except possible prepended and appended empty strings.
History
Date User Action Args
2022-04-11 14:59:41adminsetgithub: 87388
2021-02-15 18:10:58probinsosetstatus: open -> closed
resolution: wont fix
stage: resolved
2021-02-14 09:19:37serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg386943
2021-02-14 08:17:10probinsocreate