Message 46359 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	filip
Recipients
Date	2006-01-16.21:56:08
SpamBayes Score
Marked as misclassified
Message-id
In-reply-to

Content
Logged In: YES user_id=308203 I agree completely that splitting on non-zero matches should be supported - and that the default behavior should change at some point - but I don't think this patch quite covers it. Taking an example from the dev-python thread back in August of 2004 (http://mail.python.org/pipermail/python-dev/2004-August/047272.html): >>> re.split('x', 'abxxxcdefxxx', emptyok=True) ['', 'a', 'b', '', 'c', 'd', 'e', 'f', '', ''] To me, this means there's an empty string, beginning and ending in pos 0, followed by a zero-width divider also beginning and ending in the same position, followed by an 'a', etc. That seems awkward to me. I think a more intuitive result would be (I'm omitting the emptyok argument in the following examples): >>> re.split('x', 'abxxxcdefxxx') ['a', 'b', 'c', 'd', 'e', 'f', ''] That is, empty matches cause a split when they are not adjacent to a non-empty match and not at the beginning or the end of the string. Grouping parentheses would, of course, reveal the empty-string boundaries: >>> re.split('(x)', 'abxxxcdefxxx') ['', 'a', '', 'b', 'xxx', '', 'c', '', 'd', '', 'e', '', 'f', 'xxx', ''] Using the same approach, these results would also seem perfectly reasonable to me: >>> re.split('(?m)$', 'foo\nbar\nbaz') ['foo', '\nbar', '\nbaz'] >>> re.split('(?m)^', 'foo\nbar\nbaz') ['foo\n', 'bar\n', 'baz'] Splitting a one-character string should be possible only if the pattern matches that character: >>> re.split('\w', 'a') ['', ''] >>> re.split('\d*', 'a') ['a']

Logged In: YES 
user_id=308203

I agree completely that splitting on non-zero matches should
be supported - and that the default behavior should change
at some point - but I don't think this patch quite covers
it. Taking an example from the dev-python thread back in
August of 2004
(http://mail.python.org/pipermail/python-dev/2004-August/047272.html):

>>> re.split('x*', 'abxxxcdefxxx', emptyok=True)
['', 'a', 'b', '', 'c', 'd', 'e', 'f', '', '']

To me, this means there's an empty string, beginning and
ending in pos 0, followed by a zero-width divider also
beginning and ending in the same position, followed by an
'a', etc. That seems awkward to me. I think a more intuitive
result would be (I'm omitting the emptyok argument in the
following examples):

>>> re.split('x*', 'abxxxcdefxxx')
['a', 'b', 'c', 'd', 'e', 'f', '']

That is, empty matches cause a split when they are not
adjacent to a non-empty match and not at the beginning or
the end of the string. Grouping parentheses would, of
course, reveal the empty-string boundaries:

>>> re.split('(x*)', 'abxxxcdefxxx')
['', 'a', '', 'b', 'xxx', '', 'c', '', 'd', '', 'e', '',
'f', 'xxx', '']

Using the same approach, these results would also seem
perfectly reasonable to me:

>>> re.split('(?m)$', 'foo\nbar\nbaz')
['foo', '\nbar', '\nbaz']
>>> re.split('(?m)^', 'foo\nbar\nbaz')
['foo\n', 'bar\n', 'baz']

Splitting a one-character string should be possible only if
the pattern matches that character:

>>> re.split('\w*', 'a')
['', '']
>>> re.split('\d*', 'a')
['a']

History
Date	User	Action	Args
2007-08-23 15:38:36	admin	link	issue988761 messages
2007-08-23 15:38:36	admin	create