Message51680
Sorry, the test_array failure was due to not rebuilding after patching. Because extension modules are built using distutils, they don't get automatically rebuilt when a relevant header has changed.
"grind to a halt": swapping, probably, due to memory filling up with 1M-character string objects, as you experienced yourself.
Your proposal takes the edge off, although I can still come up with a worst-case scenario (just use 64K strings instead of 1M strings, and leave the rest the same).
I am far from convinced that replacing one pathological case (O(N**2) concatenation, which is easily explained and avoided) with another (which is harder to explain due to the more complicated algorithms and heuristics involved) is a good trade-off.
This is all the worse since your optimization doesn't have a clear time/space trade-off: it mostly attempts to preserve time *and* space, but in the worst case it can *waste* space. (And I'm not convinced there can't be a pathological case where it is slower, too.) And the gains are dependent on the ability to *avoid* ultimately rendering the string; if every string ends up being rendered, there is no net gain in space, and there might be no net gain in time either (at least not for slices).
I believe I would rather not pursue this patch further at this time; a far more important programming task is the str/unicode unification (now that the int/long unification is mostly there).
If you want to clean up the patch, I suggest that you add a large comment section somewhere (unicode.h?) describing the algorithms in a lot of detail, including edge cases and performance analysis, to make review of the code possible. But you're most welcome to withdraw it, too; it would save me a lot of headaches. |
|
Date |
User |
Action |
Args |
2007-08-23 15:56:05 | admin | link | issue1629305 messages |
2007-08-23 15:56:05 | admin | create | |
|