Message142039
> There are occasions when you want to do string slicing, often of the form:
>
> pos = my_str.index(x)
> endpos = my_str.index(y)
> substring = my_str[pos : endpos]
>
> To me that suggests that if UTF-8 is used then it may be worth
> profiling to see whether caching the last 2 positions would be
> beneficial.
And/or a lookup table giving the byte offset of, say, every 16th
character. It gives you a O(1) lookup with a relatively reasonable
constant cost (you have to scan for less than 16 characters after the
lookup).
On small strings (< 256 UTF-8 bytes) the space overhead for the lookup
table would be 1/16. It could also be constructed lazily whenever more
than 2 positions are cached. |
|
Date |
User |
Action |
Args |
2011-08-13 21:09:52 | pitrou | set | recipients:
+ pitrou, terry.reedy, mrabarnett, Arfrever, r.david.murray, tchrist |
2011-08-13 21:09:51 | pitrou | link | issue12729 messages |
2011-08-13 21:09:51 | pitrou | create | |
|