msg69134 - (view) |
Author: Matthew Barnett (mrabarnett) * |
Date: 2008-07-02 22:07 |
re.split doesn't split a string when the regex matches a zero characters.
For example:
re.split(r'\b', 'a b') returns ['a b'] instead of ['', 'a', ' ', 'b', ''].
re.split(r'(?<!\w)(?=\w)', 'a b') returns ['a b'] instead of ['', 'a ',
'b'].
|
msg69139 - (view) |
Author: Matthew Barnett (mrabarnett) * |
Date: 2008-07-02 22:51 |
The attached patch appears to work.
|
msg69146 - (view) |
Author: Guido van Rossum (gvanrossum) * |
Date: 2008-07-02 23:28 |
Probably by design. There's probably even a unittest for this behavior.
|
msg69150 - (view) |
Author: Matthew Barnett (mrabarnett) * |
Date: 2008-07-02 23:57 |
I've found that this issue has been discussed before: #988761.
|
msg69157 - (view) |
Author: Matthew Barnett (mrabarnett) * |
Date: 2008-07-03 00:59 |
New patch version after studying #988761 and doing more testing.
|
msg69408 - (view) |
Author: Mike Coleman (mkc) |
Date: 2008-07-08 02:36 |
I don't want to discourage you, but #852532, which is essentially the
same bug report, was closed--without explanation--as 'wont fix' in
April, after four-plus years. I wish you good luck--this is an
important and irritating bug, in my opinion...
|
msg69438 - (view) |
Author: Matthew Barnett (mrabarnett) * |
Date: 2008-07-08 16:39 |
There appear to be 2 opinions on this issue:
1. It's a bug, a corner case that got missed.
2. It's always been like this, so it's probably a design decision,
although no-one can't point to where or when the decision was made...
Looking at the code, I think it's a bug.
Expected behaviour: if 'pattern' is a non-capturing regex, then
re.split(pattern, text) == re.sub(pattern, MARKER, text).split(MARKER).
|
msg69852 - (view) |
Author: Mike Coleman (mkc) |
Date: 2008-07-16 22:40 |
I think it's probably both. The original design was incorrect, though
this probably wasn't apparent to the designer. But as a significant
user of 're', it really stands out as a problem.
|
msg70749 - (view) |
Author: Guido van Rossum (gvanrossum) * |
Date: 2008-08-05 16:08 |
I think it's better to leave this alone. Such a subtle change is likely
to trip over more people in worse ways than the alleged "bug".
|
msg70752 - (view) |
Author: Mike Coleman (mkc) |
Date: 2008-08-05 16:18 |
Okay. For what it's worth, note that my original 2004 patch for this
(#988761) is completely backward compatible (a flag must be set in the
call to get the new behavior).
|
msg73523 - (view) |
Author: Matthew Barnett (mrabarnett) * |
Date: 2008-09-21 19:41 |
I wonder whether it could be put into Python 3 where certain breaks in
backwards compatibility are to be expected.
|
msg73567 - (view) |
Author: Jeffrey C. Jacobs (timehorse) |
Date: 2008-09-22 11:54 |
I think Mike Coleman proposal of enabling this behaviour via flag is
probably best and IMHO we should consider it under these circumstances.
Intuitively, I think you're interpretation of what re.split should do
under zero-width conditions is logical, and I almost think this should
be a 2-minor number transition à la from __future__ import
zeroWidthRegexpSplit if we are to consider it as the long-term 'right
thing to do'. 3000 (3.0) seems a good place to also consider it for
true overhaul / reexamination, especially as we are writing 'upgrade'
scripts for many of the other Python features. However, I would say
this, Guido has spoken and it may be too late for the pebbles to vote.
I would like to add this patch as a new item to the general Regexp
Enhancements thread of issue 2636 though, as I think it is an idea worth
considering when overhauling Regexp.
|
msg73592 - (view) |
Author: Guido van Rossum (gvanrossum) * |
Date: 2008-09-22 20:39 |
The problem with doing this per 3.0 is that it's impossible to write a
conversion script.
I'm okay with adding a flag to enable this behavior though. Please open
a new bug with a new patch, preferably one that applies cleanly to the
trunk, and a separate patch for the py3k branch unless the trunk patch
merges cleanly. There should also be unittests and documentation. The
patches should be marked for Python 2.7 and 3.1 -- it's way too late to
get this into 2.6 and 3.0.
|
msg104226 - (view) |
Author: Tim Pietzcker (pietzcker) |
Date: 2010-04-26 12:29 |
Sorry to revive this dormant (?) topic - has anybody brought this any further? This "feature" has tripped me up a few times, and I would be all for adding a flag to enable the "split on zero-size matches" behavior, but I myself am not competent enough to code a patch.
|
msg104257 - (view) |
Author: Matthew Barnett (mrabarnett) * |
Date: 2010-04-26 17:31 |
You could try the regex module mentioned in issue 2636.
|
|
Date |
User |
Action |
Args |
2022-04-11 14:56:36 | admin | set | github: 47512 |
2021-11-04 14:19:04 | eryksun | set | nosy:
- ahmedsayeed1982
|
2021-11-04 14:18:56 | eryksun | set | messages:
- msg405692 |
2021-11-04 12:09:24 | ahmedsayeed1982 | set | versions:
- Python 2.6, Python 2.5, Python 3.1 nosy:
+ ahmedsayeed1982, - gvanrossum, mkc, timehorse, filip, pietzcker, mrabarnett
messages:
+ msg405692
components:
+ Tests, - Regular Expressions |
2017-12-02 17:32:37 | serhiy.storchaka | set | pull_requests:
+ pull_request4589 |
2017-11-19 23:36:58 | serhiy.storchaka | set | pull_requests:
+ pull_request4406 |
2010-08-04 05:05:56 | terry.reedy | set | status: open -> closed |
2010-04-26 17:31:46 | mrabarnett | set | messages:
+ msg104257 |
2010-04-26 12:29:45 | pietzcker | set | nosy:
+ pietzcker
messages:
+ msg104226 versions:
+ Python 2.6, Python 3.1, Python 2.7 |
2008-09-22 20:40:00 | gvanrossum | set | messages:
+ msg73592 |
2008-09-22 11:54:30 | timehorse | set | messages:
+ msg73567 |
2008-09-21 19:41:19 | mrabarnett | set | messages:
+ msg73523 |
2008-09-21 11:58:49 | timehorse | set | nosy:
+ timehorse |
2008-08-05 16:18:46 | mkc | set | messages:
+ msg70752 |
2008-08-05 16:08:32 | gvanrossum | set | resolution: rejected messages:
+ msg70749 |
2008-07-16 22:40:59 | mkc | set | messages:
+ msg69852 |
2008-07-08 16:39:18 | mrabarnett | set | messages:
+ msg69438 |
2008-07-08 02:36:23 | mkc | set | messages:
+ msg69408 |
2008-07-08 02:20:49 | mkc | set | nosy:
+ mkc |
2008-07-07 11:40:01 | filip | set | nosy:
+ filip |
2008-07-03 00:59:38 | mrabarnett | set | files:
- split_zero_width.diff |
2008-07-03 00:59:01 | mrabarnett | set | files:
+ split_zero_width.diff messages:
+ msg69157 |
2008-07-02 23:57:16 | mrabarnett | set | messages:
+ msg69150 |
2008-07-02 23:28:53 | gvanrossum | set | nosy:
+ gvanrossum messages:
+ msg69146 |
2008-07-02 22:51:51 | mrabarnett | set | files:
+ split_zero_width.diff keywords:
+ patch messages:
+ msg69139 |
2008-07-02 22:07:48 | mrabarnett | create | |