Message 29269 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	sjmachin
Recipients
Date	2006-07-24.23:59:52
SpamBayes Score
Marked as misclassified
Message-id
In-reply-to

Content
A short example script is attached. Two strings, each 500 bytes long. Longest match is the first 32 bytes of each string. Result produced is a 10-byte match later in the string. If one byte is chopped off the end of each string (len now 499 each), the correct result is produced. Other observations, none of which may be relevant: 1. Problem may be in the heuristic for "popular" elements in the __chain_b method. In this particular example, the character '0' (which has frequency of 6 in the "b" string) is not "popular" with a len of 500 but is "popular" with a len of 499. 2. '0' is the last byte of the correct longest match. 3. The correct longest match is at the start of the each of the strings. 4. Disabling the "popular" heuristic (like below) appears to make the problem go away: if 0: # if n >= 200 and len(indices) * 100 > n: populardict[elt] = 1 del indices[:] else: indices.append(i) 5. The callable self.isbpopular is created but appears to be unused. 6. The determination of "popular" doesn't accord with the comments, which say 1%. However with len==500, takes 6 to be popular.

A short example script is attached.
Two strings, each 500 bytes long. Longest match is the
first 32 bytes of each string. Result produced is a
10-byte match later in the string.

If one byte is chopped off the end of each string (len
now 499 each), the correct result is produced.

Other observations, none of which may be relevant:
1. Problem may be in the heuristic for "popular"
elements in the __chain_b method. In this particular
example, the character '0' (which has frequency of 6 in
the "b" string) is not "popular" with a len of 500 but
is "popular" with a len of 499.
2. '0' is the last byte of the correct longest match.
3. The correct longest match is at the start of the
each of the strings.
4. Disabling the "popular" heuristic (like below)
appears to make the problem go away:

                if 0:
                # if n >= 200 and len(indices) * 100 > n:
                    populardict[elt] = 1
                    del indices[:]
                else:
                    indices.append(i)
5. The callable self.isbpopular is created but appears
to be unused.
6. The determination of "popular" doesn't accord with
the comments, which say 1%. However with len==500,
takes 6 to be popular.

History
Date	User	Action	Args
2007-08-23 14:41:33	admin	link	issue1528074 messages
2007-08-23 14:41:33	admin	create