Author gvanrossum
Recipients
Date 2007-01-14.16:32:32
SpamBayes Score
Marked as misclassified
Message-id
In-reply-to
Content
Sorry, the test_array failure was due to not rebuilding after patching.  Because extension modules are built using distutils, they don't get automatically rebuilt when a relevant header has changed.

"grind to a halt": swapping, probably, due to memory filling up with 1M-character string objects, as you experienced yourself.

Your proposal takes the edge off, although I can still come up with a worst-case scenario (just use 64K strings instead of 1M strings, and leave the rest the same).

I am far from convinced that replacing one pathological case (O(N**2) concatenation, which is easily explained and avoided) with another (which is harder to explain due to the more complicated algorithms and heuristics involved) is a good trade-off.

This is all the worse since your optimization doesn't have a clear time/space trade-off: it mostly attempts to preserve time *and* space, but in the worst case it can *waste* space.  (And I'm not convinced there can't be a pathological case where it is slower, too.)  And the gains are dependent on the ability to *avoid* ultimately rendering the string; if every string ends up being rendered, there is no net gain in space, and there might be no net gain in time either (at least not for slices).

I believe I would rather not pursue this patch further at this time; a far more important programming task is the str/unicode unification (now that the int/long unification is mostly there).

If you want to clean up the patch, I suggest that you add a large comment section somewhere (unicode.h?) describing the algorithms in a lot of detail, including edge cases and performance analysis, to make review of the code possible.  But you're most welcome to withdraw it, too; it would save me a lot of headaches.
History
Date User Action Args
2007-08-23 15:56:05adminlinkissue1629305 messages
2007-08-23 15:56:05admincreate