Message 287584 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	terry.reedy
Recipients	george-shuklin, steven.daprano, terry.reedy
Date	2017-02-11.01:35:34
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1486776938.37.0.817033618231.issue29511@psf.upfronthosting.co.za>
In-reply-to

Content
Without any test code (other than my examples) to illustrate the desired new functionality, I may have misunderstood. But I read the George's prose (but not the SO link) and everything I wrote is relevant to what I thought it said. The request appears to be for either what now exists (other than the name and failure signal) or what Guido has specifically rejected for non-strings. Reasons for rejecting subsequence matching: 1. Except for strings, practical use cases seem to be rare. 2. Enhancement could mask bugs. 3. General sequences with nesting (tuples and lists, but not range) have an ambiguity problem that strings do not. [1, 2, [1,2]].index([1,2]) currently returns 2, not 0, and this cannot change. Similarly, [1,2] in [1,2,3] should not change from False to True. Steven, without specific code examples, I do not understand what the 'this' is that you think is different from what you say was properly rejected, The request appears to be for extending the meaning of'in' and 'find/index' for non-strings. (See last sentence of opening post.) As you note, there are several related but different problems. http://code.activestate.com/recipes/117214/ gives Python code for Knuth-Morris-Pratt string matching. Python uses a C-coded version of either this or an alternative in (str/bytes/bytearray).(index/find) Both methods stop with the first match, but have a 'start' parameter if one wants repeated matches, and one can choose either start as position + 1 or position + len(pattern) to allow overlaps or not. Every presentation of KMP I have seen is as a string algorithm. In spite of the recipe title and argument name ('text'), the author claims that the Python code is generic. Since the recipe discussion only tested strings, I tried for i in KnuthMorrisPratt([1,2,3,4,5,1,2], [1,2]): print(i) and it prints 0 and 5, as claimed. Nice! Generic subsequence matching is easily possible. I believe the Python code could be rewritten in C with the Python C-API and remain generic. If this idea is not to be dropped, I think the next step should be a python-ideas post with a clear function definition and a possible API (which will elicit alternative proposals) that avoids the back compatibility problem, specific positive and negative test examples, and real-life use cases (which I hope might be included in the SO questions).

Without any test code (other than my examples) to illustrate the desired new functionality, I may have misunderstood.  But I read the George's prose (but not the SO link) and everything I wrote is relevant to what I thought it said.  The request appears to be for either what now exists (other than the name and failure signal) or what Guido has specifically rejected for non-strings.

Reasons for rejecting subsequence matching:
1. Except for strings, practical use cases seem to be rare.
2. Enhancement could mask bugs.
3. General sequences with nesting (tuples and lists, but not range) have an ambiguity problem that strings do not.

[1, 2, [1,2]].index([1,2]) currently returns 2, not 0, and this cannot change.  Similarly, [1,2] in [1,2,3] should not change from False to True.

Steven, without specific code examples, I do not understand what the 'this' is that you think is different from what you say was properly rejected,  The request appears to be for extending the meaning of'in' and 'find/index' for non-strings. (See last sentence of opening post.) As you note, there are several related but different problems.

http://code.activestate.com/recipes/117214/ gives Python code for Knuth-Morris-Pratt string matching.  Python uses a C-coded version of either this or an alternative in (str/bytes/bytearray).(index/find) Both methods stop with the first match, but have a 'start' parameter if one wants repeated matches, and one can choose either start as position + 1 or position + len(pattern) to allow overlaps or not.

Every presentation of KMP I have seen is as a string algorithm.  In spite of the recipe title and argument name ('text'), the author claims that the Python code is generic.  Since the recipe discussion only tested strings, I tried

for i in KnuthMorrisPratt([1,2,3,4,5,1,2], [1,2]):
    print(i)

and it prints 0 and 5, as claimed.  Nice! Generic subsequence matching is easily possible.  I believe the Python code could be rewritten in C with the Python C-API and remain generic.

If this idea is not to be dropped, I think the next step should be a python-ideas post with a clear function definition and a possible API (which will elicit alternative proposals) that avoids the back compatibility problem, specific positive and negative test examples, and real-life use cases (which I hope might be included in the SO questions).

History
Date	User	Action	Args
2017-02-11 01:35:38	terry.reedy	set	recipients: + terry.reedy, steven.daprano, george-shuklin
2017-02-11 01:35:38	terry.reedy	set	messageid: <1486776938.37.0.817033618231.issue29511@psf.upfronthosting.co.za>
2017-02-11 01:35:38	terry.reedy	link	issue29511 messages
2017-02-11 01:35:34	terry.reedy	create