Message 142039 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	pitrou
Recipients	Arfrever, mrabarnett, pitrou, r.david.murray, tchrist, terry.reedy
Date	2011-08-13.21:09:51
SpamBayes Score	3.919014e-06
Marked as misclassified	No
Message-id	<1313269670.3553.18.camel@localhost.localdomain>
In-reply-to	<1313269060.0.0.568845256971.issue12729@psf.upfronthosting.co.za>

Content
> There are occasions when you want to do string slicing, often of the form: > > pos = my_str.index(x) > endpos = my_str.index(y) > substring = my_str[pos : endpos] > > To me that suggests that if UTF-8 is used then it may be worth > profiling to see whether caching the last 2 positions would be > beneficial. And/or a lookup table giving the byte offset of, say, every 16th character. It gives you a O(1) lookup with a relatively reasonable constant cost (you have to scan for less than 16 characters after the lookup). On small strings (< 256 UTF-8 bytes) the space overhead for the lookup table would be 1/16. It could also be constructed lazily whenever more than 2 positions are cached.

> There are occasions when you want to do string slicing, often of the form:
> 
> pos = my_str.index(x)
> endpos = my_str.index(y)
> substring = my_str[pos : endpos]
> 
> To me that suggests that if UTF-8 is used then it may be worth
> profiling to see whether caching the last 2 positions would be
> beneficial.

And/or a lookup table giving the byte offset of, say, every 16th
character. It gives you a O(1) lookup with a relatively reasonable
constant cost (you have to scan for less than 16 characters after the
lookup).

On small strings (< 256 UTF-8 bytes) the space overhead for the lookup
table would be 1/16. It could also be constructed lazily whenever more
than 2 positions are cached.

History
Date	User	Action	Args
2011-08-13 21:09:52	pitrou	set	recipients: + pitrou, terry.reedy, mrabarnett, Arfrever, r.david.murray, tchrist
2011-08-13 21:09:51	pitrou	link	issue12729 messages
2011-08-13 21:09:51	pitrou	create