Message 103143 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	bls
Recipients	bls, mark.dickinson, milko.krachounov, rhettinger, tebeka, terry.reedy
Date	2010-04-14.20:09:24
SpamBayes Score	3.262974e-06
Marked as misclassified	No
Message-id	<1271275767.58.0.627161340132.issue4356@psf.upfronthosting.co.za>
In-reply-to

Content
This was closed over a year ago, but since mark.dickinson was asking for convincing use-cases: I'm breaking up a file into line-delimited chunks. These chunks are non-overlapping, contiguous, and tend to be fairly large, so I'm just recording the start line of each chunk in a 2-ple: mapping = [ (10, 'first chunk'), (50, 'second chunk'), (60, 'third chunk') ] Lines 10-49 are in the first chunk, lines 50-59 are in the second, lines 60+ are in the third. So: def CategorizeLine(line, mapping): loc = bisect.bisect([m[0] for m in mapping], line) if loc == 0: return None # before first chunk return mapping[loc-1][1] It Would Be Nice if I could write the second line as: loc = bisect.bisect(mapping, line, key=lambda m:m[0]) The bisect documentation suggests pre-computing the key list, but it seems messy and error-prone to keep a redundant data structure in sync with its source. I could also rewrite my "mapping" data structure to be two parallel lists instead of one list of 2-ples, but this data structure is more readable and extensible and handles empty lists more naturally.

This was closed over a year ago, but since mark.dickinson was asking for convincing use-cases: I'm breaking up a file into line-delimited chunks.  These chunks are non-overlapping, contiguous, and tend to be fairly large, so I'm just recording the start line of each chunk in a 2-ple:

mapping = [
  (10, 'first chunk'),
  (50, 'second chunk'),
  (60, 'third chunk')
]

Lines 10-49 are in the first chunk, lines 50-59 are in the second, lines 60+ are in the third.  So:

def CategorizeLine(line, mapping):
   loc = bisect.bisect([m[0] for m in mapping], line)
   if loc == 0:
      return None # before first chunk
   return mapping[loc-1][1]

It Would Be Nice if I could write the second line as:

   loc = bisect.bisect(mapping, line, key=lambda m:m[0])

The bisect documentation suggests pre-computing the key list, but it seems messy and error-prone to keep a redundant data structure in sync with its source.  I could also rewrite my "mapping" data structure to be two parallel lists instead of one list of 2-ples, but this data structure is more readable and extensible and handles empty lists more naturally.

History
Date	User	Action	Args
2010-04-14 20:09:27	bls	set	recipients: + bls, rhettinger, terry.reedy, tebeka, mark.dickinson, milko.krachounov
2010-04-14 20:09:27	bls	set	messageid: <1271275767.58.0.627161340132.issue4356@psf.upfronthosting.co.za>
2010-04-14 20:09:25	bls	link	issue4356 messages
2010-04-14 20:09:25	bls	create