Message117612
SequenceMatcher caches the result of get_matching_blocks and get_opcodes. There are some problems with this:
What get_matching_blocks caches is a list of tuples. The first call does not return that list: it returns map(Match._make, self.matching_blocks) (converting the tuples to namedtuples). Subsequent calls just return self.matching_blocks directly. Especially in python 3 and up this is weird, since the first call returns a map object while later calls return a list.
This caching behavior is not documented, so calling code may mutate the returned list. One example of calling code is difflib itself: get_grouped_opcodes mutates the result of get_opcodes (a cached list). I am not sure if the right fix is to have get_grouped_opcodes copy before it mutates or to have get_opcodes return a copy.
Snippet demonstrating both bugs:
matcher = difflib.SequenceMatcher(a='aaaaaaaabc', b='aaaaaaaadc')
print(list(matcher.get_matching_blocks()))
# This should print the same thing, but it does not:
print(list(matcher.get_matching_blocks()))
print(matcher.get_opcodes())
print(list(matcher.get_grouped_opcodes()))
# This should print the same thing as the previous get_opcodes()
# list, but it does not:
print(matcher.get_opcodes()) |
|
Date |
User |
Action |
Args |
2010-09-29 14:32:19 | marienz | set | recipients:
+ marienz |
2010-09-29 14:32:19 | marienz | set | messageid: <1285770739.15.0.805040100323.issue9985@psf.upfronthosting.co.za> |
2010-09-29 14:32:17 | marienz | link | issue9985 messages |
2010-09-29 14:32:17 | marienz | create | |
|