Message 111372 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	terry.reedy
Recipients	LambertDW, barry, eli.bendersky, georg.brandl, ggenellina, gjb1002, hagna, janpf, jimjjewett, mrotondo, pitrou, r.david.murray, rtvd, sjmachin, terry.reedy, tim.peters, vbr
Date	2010-07-23.18:31:32
SpamBayes Score	0.0009937617
Marked as misclassified	No
Message-id	<1279909894.2.0.755636795057.issue2986@psf.upfronthosting.co.za>
In-reply-to

Content
For 2.6 and 3.1, this is a documentation only issue. For 2.7, this is a doc + behavior issue. For 3.2, this is a doc + behavior + new feature issue. For 2.6.6 (release candidate due Aug 2, 10 days), I propose to add the following paragraph after the current 'Timing:' paragraph in the SequenceMatcher entry ('Heuristic:' should be bold-faced, like 'Timing:') Heuristic: To speed matching, items that appear more than 1% of the time in sequences of at least 200 items are treated as junk. This has the unfortunate side-effect of giving bad results for sequences constructed from a small set of items. An option to turn off the heuristic will be added to a future version. I would have said 'to 2.7.1' but that has not happened yet. I thought about putting the heuristic paragraph first, but I think it fits better after the discussion of quadratic run time. I think it should be a separate paragraph and not tacked on the end of the previous paragraph so people will be more likely to take notice. I have marked this a release blocker because at least 6 issues have been filed for this bug and so I think it important that the explanation be added to the next released doc. I plan to temporarily reassign this to docs@python in a few days.

For 2.6 and 3.1, this is a documentation only issue.
For 2.7, this is a doc + behavior issue.
For 3.2, this is a doc + behavior + new feature issue.

For 2.6.6 (release candidate due Aug 2, 10 days), I propose to add the following paragraph after the current 'Timing:' paragraph in the SequenceMatcher entry ('Heuristic:' should be bold-faced, like 'Timing:')

Heuristic: To speed matching, items that appear more than 1% of the time in sequences of at least 200 items are treated as junk. This has the unfortunate side-effect of giving bad results for sequences constructed from a small set of items. An option to turn off the heuristic will be added to a future version.

I would have said 'to 2.7.1' but that has not happened yet. I thought about putting the heuristic paragraph first, but I think it fits better after the discussion of quadratic run time. I think it should be a separate paragraph and not tacked on the end of the previous paragraph so people will be more likely to take notice.

I have marked this a release blocker because at least 6 issues have been filed for this bug and so I think it important that the explanation be added to the next released doc. I plan to temporarily reassign this to docs@python in a few days.

History
Date	User	Action	Args
2010-07-23 18:31:34	terry.reedy	set	recipients: + terry.reedy, tim.peters, barry, georg.brandl, jimjjewett, sjmachin, gjb1002, ggenellina, pitrou, rtvd, vbr, LambertDW, hagna, r.david.murray, eli.bendersky, janpf, mrotondo
2010-07-23 18:31:34	terry.reedy	set	messageid: <1279909894.2.0.755636795057.issue2986@psf.upfronthosting.co.za>
2010-07-23 18:31:32	terry.reedy	link	issue2986 messages
2010-07-23 18:31:32	terry.reedy	create