Message46359
Logged In: YES
user_id=308203
I agree completely that splitting on non-zero matches should
be supported - and that the default behavior should change
at some point - but I don't think this patch quite covers
it. Taking an example from the dev-python thread back in
August of 2004
(http://mail.python.org/pipermail/python-dev/2004-August/047272.html):
>>> re.split('x*', 'abxxxcdefxxx', emptyok=True)
['', 'a', 'b', '', 'c', 'd', 'e', 'f', '', '']
To me, this means there's an empty string, beginning and
ending in pos 0, followed by a zero-width divider also
beginning and ending in the same position, followed by an
'a', etc. That seems awkward to me. I think a more intuitive
result would be (I'm omitting the emptyok argument in the
following examples):
>>> re.split('x*', 'abxxxcdefxxx')
['a', 'b', 'c', 'd', 'e', 'f', '']
That is, empty matches cause a split when they are not
adjacent to a non-empty match and not at the beginning or
the end of the string. Grouping parentheses would, of
course, reveal the empty-string boundaries:
>>> re.split('(x*)', 'abxxxcdefxxx')
['', 'a', '', 'b', 'xxx', '', 'c', '', 'd', '', 'e', '',
'f', 'xxx', '']
Using the same approach, these results would also seem
perfectly reasonable to me:
>>> re.split('(?m)$', 'foo\nbar\nbaz')
['foo', '\nbar', '\nbaz']
>>> re.split('(?m)^', 'foo\nbar\nbaz')
['foo\n', 'bar\n', 'baz']
Splitting a one-character string should be possible only if
the pattern matches that character:
>>> re.split('\w*', 'a')
['', '']
>>> re.split('\d*', 'a')
['a']
|
|
Date |
User |
Action |
Args |
2007-08-23 15:38:36 | admin | link | issue988761 messages |
2007-08-23 15:38:36 | admin | create | |
|