Issue 1943: improved allocation of PyUnicode objects

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/46235

classification

Title:	improved allocation of PyUnicode objects
Type:	enhancement	Stage:	patch review
Components:	Interpreter Core, Unicode	Versions:	Python 3.2

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:		Nosy List:	BreamoreBoy, Rhamphoryncus, ajaksu2, amaury.forgeotdarc, benjamin.peterson, collinwinter, eric.smith, ezio.melotti, ferringb, gvanrossum, jafo, jimjjewett, lemburg, mark.dickinson, orivej, pitrou, rhettinger, terry.reedy, vstinner
Priority:	normal	Keywords:	patch

Created on 2008-01-26 22:44 by pitrou, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
stringbench3k.py	pitrou, 2008-01-26 22:45	a quick port to py3k of stringbench.py
unialloc4.patch	pitrou, 2008-03-20 19:37
freelists2.patch	pitrou, 2008-03-20 19:37
unialloc5.patch	pitrou, 2009-05-23 23:09
unialloc6.patch	pitrou, 2010-10-06 23:21

Messages (96)
msg61728 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2008-01-26 22:44
This is an attempt at improving allocation of str (PyUnicode) objects. For that it does two things: 1. make str objects PyVarObjects, so that the object is allocated in one pass rather than two 2. maintain an array of freelists, each for a given str length, so that many str allocations can bypass actual memory allocation calls There is a ~10% speedup in stringbench, and a slight improvement in pybench (as invoked with "./python Tools/pybench/pybench.py -t String"). Also, memory consumption is a bit lower when running those benchmarks.
msg61730 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2008-01-26 23:27
The Unicode object was designed not to be a PyVarObject (in contrast to the string object), so I'm not sure what you would want to break that design in return for a mere 10% speedup. Note that turning the objects into PyVarObjects removes the ability to extend the type at C level. It also introduces lots of problems if you want to resize a Unicode object, due to the fact that the memory address of the whole PyUnicodeObject can change. Resizing is done quite often in codecs. Regarding your second point: Unicode objects already use a free list and even keep the allocated Py_UNICODE buffer allocated (if it's not too large). Furthermore, the allocation is handled by pymalloc, so you only rarely need to go to the system malloc() to allocate more space. Tuning the KEEPALIVE_SIZE_LIMIT will likely have similar effects w/r to speedup.
msg61732 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2008-01-27 00:24
I just tried bumping KEEPALIVE_SIZE_LIMIT to 200. It makes up for a bit of the speedup, but the PyVarObject version is still faster. Moreover, and perhaps more importantly, it eats less memory (800KB less in pybench, 3MB less when running the whole test suite). (then there are of course microbenchmarks. For example: # With KEEPALIVE_SIZE_LIMIT = 200 ./python -m timeit -s "s=open('LICENSE', 'r').read()" "s.split()" 1000 loops, best of 3: 556 usec per loop # With this patch ./python -m timeit -s "s=open('LICENSE', 'r').read()" "s.split()" 1000 loops, best of 3: 391 usec per loop ) I don't understand the argument for codecs having to resize the unicode object. Since codecs also have to resize the bytes object when encoding, the argument should apply to bytes objects as well, yet bytes (and str in 2.6) is a PyVarObject. I admit I don't know the exact reasons for PyUnicode's design. I just know that the patch doesn't break the API, and runs the test suite fine. Are there any specific things to look for?
msg61735 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2008-01-27 13:52
Your microbenchmark is biased towards your patched version. The KEEPALIVE_SIZE_LIMIT will only cut in when you deallocate and then reallocate Unicode objects. The free list used for Unicode objects is also limited to 1024 objects - which isn't all that much. You could tune MAX_UNICODE_FREELIST_SIZE as well. Regarding memory usage: this is difficult to measure in Python, since pymalloc will keep memory chunks allocated even if they are not in use by Python. However, this is a feature of pymalloc and not specific to the Unicode implementation. It can be tuned in pymalloc. To get more realistic memory measurements, you'd have to switch off pymalloc altogether and then create a separate process that consumes lots of memory to force the OS to have it allocate only memory that's really needed to the process you're running for memory measurements. Of course, keeping objects alive in a free list will always use more memory than freeing them altogether and returning the memory to the OS. It's a speed/space tradeoff. The RAM/CPU costs ratio has shifted a lot towards RAM nowadays, so using more RAM is usually more efficient than using more CPU time. Regarding resize: you're right - the string object is a PyVarObject as well and couldn't be changed at the time due to backwards compatibility reasons. You should also note that when I added Unicode to Python 1.6, it was a new and not commonly used type. Codecs were not used much either, so there was no incentive to make resizing strings work better. Later on, other optimizations were added to the Unicode implementation that caused the PyUnicode_Resize() API to also require being able to change the object address. Still, in the common case, it doesn't change the object address. The reason for using an external buffer for the Unicode object was to be able to do further optimizations, such as share buffers between Unicode objects. We never ended up using this, though, but there's still a lot of room for speedups and more memory efficiency because of this design. Like I already mentioned, PyObjects are also easier to extend at C level - adding new variables to the object at the end is easy with PyObjects. It's difficult for PyVarObjects, since you always have to take the current size of the object into account and you always have to use indirection to get at the extra variables due to the undefined offset of the variables. How much speedup do you get when you compare the pybench test with KEEPALIVE_SIZE_LIMIT = 200 compared to your patched version ?
msg61738 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2008-01-27 15:35
With KEEPALIVE_SIZE_LIMIT = 200, the pybench runtime is basically the same as with my patched version. stringbench remains a bit faster though (~8%). You say that RAM size is cheaper than CPU power today, which is true but misses one part of the picture: the importance of CPU caches, and thus of working set size. There are also probably people wanting to use Python in memory-constrained environments (embedded). I understand the argument about possible optimizations with an external buffer, but are such optimizations likely to be implemented? (see #1590352 and #1629305). If they really are, then I'm happy with the unicode type remaining a plain PyObject!
msg61740 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2008-01-27 16:03
I don't really see the connection with #1629305. An optimization that would be worth checking is hooking up the Py_UNICODE pointer to interned Unicode objects if the contents match (e.g. do a quick check based on the hash value and then a memcmp). That would save memory and the call to the pymalloc allocator. Another strategy could involve a priority queue style cache with the aim of identifying often used Unicode strings and then reusing them. This could also be enhanced using an offline approach: you first run an application with an instrumented Python interpreter to find the most often used strings and then pre-fill the cache or interned dictionary on the production Python interpreter at startup time. Coming from a completely different angle, you could also use the Py_UNICODE pointer to share slices of a larger data buffer. A Unicode sub-type could handle this case, keeping a PyObject* reference to the larger buffer, so that it doesn't get garbage collected before the Unicode slice. Regarding memory constrained environments: these should simply switch off all free lists and pymalloc. OTOH, even mobile phones come with gigabytes of RAM nowadays, so it's not really worth the trouble, IMHO.
msg61741 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2008-01-27 16:46
All of those proposals are much heavier to implement; they also make the unicode type more complicated and difficult to maintain probably (while turning it into a PyVarObject actually shortens the implementation a bit). In any case, it's not something I want to tackle. The reason that I mentioned #1629305 was that it was such an optimization which complicated the unicode type while bringing very significant speedups.
msg61743 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2008-01-27 18:05
Agreed, those optimizations do make the implementation more complicated. It's also not clear whether they would really be worth it. #1629305 only provided speedups for the case where you write s += 'abc'. The usual idiom for this is to use a list and then concatenate in one go. If you want a really fast approach, you'd use cStringIO or perhaps the bufferarray. I don't think that optimizing for just one particular use case warrants making the code more complicated or changing the C interface radically. In your case, I think that closing the door for being able to easily extend the type implement at the C is the main argument against it. The speedups are only marginal and can also be achieved (to some extent) by tuning the existing implementation's parameters.
msg61747 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2008-01-27 19:03
I know it's not the place to discuss #1629305, but the join() solution is not always faster. Why? Because 1) there's the list contruction and method call overhead 2) ceval.c has some bytecode hackery to try and make plain concatenations somewhat less slow. As for cStringIO, I actually observed in some experiments that it was slower than the other alternatives, at least for short strings. All in all, choosing between the three idioms is far from obvious and needs case-by-case analysis.
msg61832 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2008-01-29 21:54
FWIW, I tried using the freelist scheme introduced in my patch without making PyUnicode a PyVarObject and, although it's better than the standard version, it's still not as good as the PyVarObject version. Would you be interested in that patch?
msg61862 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2008-01-30 11:45
Yes, definitely. Some comments on style in your first patch: * please use unicode->length instead of the macro LENGTH you added * indents in unicodeobject.c are 4 spaces * line length should stay below 80
msg61867 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2008-01-30 15:56
After some more tests I must qualify what I said. The freelist patch is an improvement in some situations. In others it does not really have any impact. On the other hand, the PyVarObject version handles memory-bound cases dramatically better, see below. With a small string: ./python -m timeit -s "s=open('INTBENCH', 'r').read()" "s.split()" -> Unpatched py3k: 26.4 usec per loop -> Freelist patch: 21.5 usec per loop -> PyVarObject patch: 20.2 usec per loop With a medium-sized string: ./python -m timeit -s "s=open('LICENSE', 'r').read()" "s.split()" -> Unpatched py3k: 458 usec per loop -> Freelist patch: 408 usec per loop -> PyVarObject patch: 316 usec per loop With a long string: ./python -m timeit -s "s=open('Misc/HISTORY', 'r').read()" "s.split()" -> Unpatched py3k: 31.3 msec per loop -> Freelist patch: 32.7 msec per loop -> PyVarObject patch: 17.8 msec per loop (the numbers are better than in my previous posts because the split-accelerating patch has been integrated) Also, given those results, it is also clear that neither pybench nor stringbench really stress memory efficiency of strings, only processing algorithms. That said, the freelist patch is attached.
msg62332 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2008-02-12 19:58
Here is an updated patch against the current py3k branch, and with spaces instead of tabs for indentation.
msg62467 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2008-02-16 20:32
Here is an updated patch, to comply with the introduction of the PyUnicode_ClearFreeList() function.
msg64108 - (view)	Author: Sean Reifschneider (jafo) *	Date: 2008-03-19 21:40
Marc-Andre: Wit the udpated patches, is this a set of patches we can accept?
msg64112 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2008-03-19 22:00
Thanks for your interest Sean :) By the way, on python-3000 both GvR and Martin von Löwis were ok on the proposed design change, although they did not review the patch itself. http://mail.python.org/pipermail/python-3000/2008-February/012076.html
msg64161 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2008-03-20 10:03
Antoine, as I've already mentioned in my other comments, I'm -1 on changing the Unicode object to a variable size object. I also don't think that the micro-benchmarks you are applying really do test the implementation in a real-life situations. The only case where your patch appears significantly faster is the "long string" case. If you look at the distribution of the Unicode strings generated by this case, you'll find that most strings have less than 10-20 characters. Optimizing pymalloc for these cases and tuning the parameters in the Unicode implementation will likely give you the same performance increase without having to sacrifice the advantages of using a pointer in the object rather than a inlining the data. I'm +1 on the free list changes, though, in the long run, I think that putting more efforts into improving pymalloc and removing the free lists altogether would be better. BTW: Unicode slices would be a possible and fairly attractive target for a C level subclass of Unicode objects. The pointer in the Unicode object implementation could then point to the original Unicode object's buffer and the subclass would add an extra pointer to the original object itself (in order to keep it alive). The Unicode type (written by Fredrik Lundh) I used as basis for the Unicode implementation worked with this idea.
msg64163 - (view)	Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) *	Date: 2008-03-20 10:30
Marc-Andre: don't all your objections also apply to the 8bit string type, which is already a variable-size structure? Is extending unicode more common than extending str? With python 3.0, all strings are unicode. Shouldn't this type be optimized for the same use cases of the previous str type?
msg64165 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2008-03-20 11:00
Yes, all those objections apply to the string type as well. The fact that strings are variable length objects makes it impossible to do apply any of the possible optimizations I mentioned. If strings were a fixed length object, it would have been possible to write a true basestring object from which both 8-bit strings and Unicode subclass, making a lot of things easier. BTW: Please also see ticket #2321 to see how the change affects your benchmarks.
msg64166 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2008-03-20 11:32
Hi, Marc-André, I'm all for "real-life" benchmarks if someone proposes some. Until that we have to live with micro-benchmarks, which seems to be the method used for other CPython optimizations by the way. You are talking about slicing optimizations but you forget that the "lazy slices" idea has been shot down by Guido and others when proposed by Larry Hastings (with a patch) some time ago. Also the "lazy slices" patch was for 8-bit strings, which are variable-size objects, which seems to counter your argument that variable-size Unicode objects would prevent making such optimizations. As I said the freelist changes actually have mixed consequences, and in general don't improve much (and that improvement is just backed by micro-benchmarks after all... why would it be more convincing than the far greater improvement brought by making Unicode objects variable-size objects?). Why wouldn't you express your arguments in the python-3000 thread I launched one month ago, so that at least there is a clear picture of the different arguments for and against this approach?
msg64167 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2008-03-20 12:53
Regarding benchmarks: It's difficult to come up with decent benchmarks for things like this. A possible strategy is to use an instrumented interpreter that records which Unicode objects are created and when they are deleted. If you then run this instrumented interpreter against a few larger applications such as Zope for a while, you'll get a data set that can be used as input for a more real-life like benchmark. I've done this in the past for Python byte codes to see which are used how often - based on such data you can create optimizations that have a real-life effect. Regarding the lazy slice patches: those were not using subclassing, they were patches to the existing type implementations. With subclassing you don't affect all uses of an object, but instead let the user control which uses should take advantage of the slice operations. Since the user also controls the objects which are kept alive this way, the main argument against Unicode slice objects goes away. This is different than patching the type itself and doesn't change the main object implementation in any way. Furthermore, it's possible to implement and provide such subclasses outside the Python core, which gives developers a lot more freedom. Regarding discussions on the py3k list: I'm not on that list, since I already get more email than I can manage. I am on the python-dev list, if you want to take up the discussions there.
msg64168 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2008-03-20 13:57
Well I'm not subscribed to the python-3k list either - too much traffic indeed. You can read and post into it with gmane for example: http://thread.gmane.org/gmane.comp.python.python-3000.devel/11768 (there is probably an NNTP gateway too) As for instrumenting the interpreter, this would tell us when and which strings are allocated, but not the precise effect it has on a modern CPU's memory subsystem (which is quite a complicated thing). Also, this is Py3k - we can't test any real-life applications until they are ported to it... (someone really motivated would have to learn Intel VTune or AMD CodeAnalyst and run such a "py3k real-life application" with it :-)) As for the explicit slicing approach, "explicit string views" have been discussed in 2006 on the python-3k list, including a proof-of-concept patch by Josiah Carlson - without any definitive pronouncement it seems. See the subthread here: http://mail.python.org/pipermail/python-3000/2006-August/003280.html The reason I'm bringing in those previous discussions is that, in theory, I'm all for much faster concatenation and slicing thanks to buffer sharing etc., but realistically the previous attempts have failed for various reasons (either technical or political). And since those optimizations are far from simple to implement and maintain, chances are they will not be attempted again, let alone accepted. I'd be happy to be proven wrong :) regards Antoine.
msg64169 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2008-03-20 14:34
I've read the comments from Guido and Martin, but they don't convince me in changing my -1. As you say: it's difficult to get support for optimizations such a slicing and concatenation into the core. And that's exactly why I want to keep the door open for developers to add such support via extension modules. With the 8-bit string implementation this was never possible. With the Unicode implementation and the subclassing support for builtin types, it is possible to add support without having to go through all the hoops and discussions on python-dev or py3k lists. I'm also for making Python faster, but not if it limits future optimizations and produces dead-ends. The fact that such optimizations haven't been implemented yet doesn't really mean anything either - people are not yet using Unicode intensively enough to let the need arise.
msg64170 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2008-03-20 14:50
Well, I'm not gonna try to defend my patch eternally :) I understand your opinion even if I find a bit disturbing that we refuse a concrete, actual optimization on the basis of future hypothetical ones. Since all the arguments have been laid down, I'll let other developers decide whether further discussion of this proposal is useful or not.
msg64194 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2008-03-20 19:10
Regarding the benchmark: You can instrument a 2.x version of the interpreter to build the data set and then have the test load this data set in Py3k and have it replay the allocation/deallocation in the same way it was done on the 2.x system. I also expect that patch #2321 will have an effect on the performance since that patch changed the way memory allocation of the buffer was done (from using the system malloc() to using pymalloc's allocator for the buffer as well).
msg64197 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2008-03-20 19:36
You are right, #2321 made the numbers a bit tighter: With a small string: ./python -m timeit -s "s=open('INTBENCH', 'r').read()" "s.split()" -> Unpatched py3k: 23.1 usec per loop -> Freelist patch: 21.3 usec per loop -> PyVarObject patch: 20.5 usec per loop With a medium-sized string: ./python -m timeit -s "s=open('LICENSE', 'r').read()" "s.split()" -> Unpatched py3k: 406 usec per loop -> Freelist patch: 353 usec per loop -> PyVarObject patch: 314 usec per loop With a long string: ./python -m timeit -s "s=open('Misc/HISTORY', 'r').read()" "s.split()" -> Unpatched py3k: 22.7 msec per loop -> Freelist patch: 24 msec per loop -> PyVarObject patch: 20.6 msec per loop stringbench3k: -> Unpatched py3k: 266 seconds -> Freelist patch: 264 seconds -> PyVarObject patch: 249 seconds Regarding your benchmarking suggestion, this would certainly be an interesting thing to do, but I fear it is also much more than I'm willing to do... I'm going to post the updated patches.
msg64215 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2008-03-20 22:13
Thanks for running the tests again. The use of pymalloc for the buffer made a significant difference indeed. I expect that more can be had by additionally tweaking KEEPALIVE_SIZE_LIMIT. It is interesting to see that the free list patch only appears to provide better timings for the "smaller" strings tests. I couldn't find the INTBENCH file anywhere in the Python source tree, but here's the distribution of the words of the latter two files: Dev-Python/LICENSE: ------------------------------------------------------------------------ Length: [Count] 0----1----2----3----4----5----6----7----8----9----10 * 10% 1: [ 28] === 2: [ 350] ================================================= 3: [ 354] ================================================== 4: [ 190] ========================== 5: [ 191] ========================== 6: [ 158] ====================== 7: [ 164] ======================= 8: [ 132] ================== 9: [ 127] ================= 10: [ 102] ============== 11: [ 39] ===== 12: [ 34] ==== 13: [ 21] == 14: [ 10] = 15: [ 10] = 16: [ 0] 17: [ 0] 18: [ 1] 19: [ 0] 20: [ 0] 21: [ 1] 22: [ 0] 23: [ 0] 24: [ 0] 25: [ 1] 26: [ 1] 27: [ 1] 28: [ 0] 29: [ 1] 30: [ 0] 31: [ 0] 32: [ 0] 33: [ 0] 34: [ 0] 35: [ 0] 36: [ 2] 37: [ 0] 38: [ 0] 39: [ 1] 40: [ 0] 41: [ 0] 42: [ 0] 43: [ 1] 44: [ 1] 45: [ 0] 46: [ 0] 47: [ 0] 48: [ 0] 49: [ 0] 50: [ 1] 51: [ 0] 52: [ 0] 53: [ 0] 54: [ 0] 55: [ 0] 56: [ 0] 57: [ 0] 58: [ 0] 59: [ 0] 60: [ 0] 61: [ 0] 62: [ 0] 63: [ 1] Dev-Python/Misc/HISTORY: ------------------------------------------------------------------------ Length: [Count] 0----1----2----3----4----5----6----7----8----9----10 * 10% 1: [ 6853] ================== 2: [13920] ===================================== 3: [18401] ================================================== 4: [12626] ================================== 5: [ 9545] ========================= 6: [ 9348] ========================= 7: [ 9625] ========================== 8: [ 7351] =================== 9: [ 5353] ============== 10: [ 3266] ======== 11: [ 1947] ===== 12: [ 1336] === 13: [ 983] == 14: [ 638] = 15: [ 408] = 16: [ 288] 17: [ 286] 18: [ 216] 19: [ 176] 20: [ 147] 21: [ 120] 22: [ 116] 23: [ 85] 24: [ 89] 25: [ 70] 26: [ 44] 27: [ 59] 28: [ 39] 29: [ 32] 30: [ 65] 31: [ 23] 32: [ 26] 33: [ 28] 34: [ 19] 35: [ 18] 36: [ 9] 37: [ 18] 38: [ 5] 39: [ 10] 40: [ 9] 41: [ 9] 42: [ 1] 43: [ 1] 44: [ 8] 45: [ 4] 46: [ 5] 47: [ 5] 48: [ 3] 49: [ 3] 50: [ 4] 51: [ 0] 52: [ 0] 53: [ 2] 54: [ 0] 55: [ 0] 56: [ 4] 57: [ 2] 58: [ 4] 59: [ 1] 60: [ 0] 61: [ 1] 62: [ 0] 63: [ 1] 64: [ 1] 65: [ 9] 66: [ 1] 67: [ 1] 68: [ 0] 69: [ 0] 70: [ 16] 71: [ 1] 72: [ 0] 73: [ 1] Compare that to a typical Python module source file... Dev-Python/Lib/urllib.py: ------------------------------------------------------------------------ Length: [Count] 0----1----2----3----4----5----6----7----8----9----10 * 10% 1: [ 806] ================================================== 2: [ 672] ========================================= 3: [ 675] ========================================= 4: [ 570] =================================== 5: [ 531] ================================ 6: [ 501] =============================== 7: [ 296] ================== 8: [ 393] ======================== 9: [ 246] =============== 10: [ 147] ========= 11: [ 150] ========= 12: [ 102] ====== 13: [ 90] ===== 14: [ 116] ======= 15: [ 61] === 16: [ 51] === 17: [ 45] == 18: [ 38] == 19: [ 31] = 20: [ 39] == 21: [ 24] = 22: [ 18] = 23: [ 18] = 24: [ 23] = 25: [ 17] = 26: [ 15] 27: [ 13] 28: [ 14] 29: [ 11] 30: [ 9] 31: [ 7] 32: [ 1] 33: [ 6] 34: [ 10] 35: [ 2] 36: [ 4] 37: [ 3] 38: [ 6] 39: [ 1] 40: [ 0] 41: [ 1] 42: [ 5] 43: [ 0] 44: [ 0] 45: [ 0] 46: [ 0] 47: [ 0] 48: [ 2] 49: [ 0] 50: [ 1] 51: [ 1] 52: [ 2] The distributions differ a lot, but they both show that typical strings (both in English text and Python program code) have a length of < 25 characters. Setting KEEPALIVE_SIZE_LIMIT to 32 should cover most of those cases while still keeping the memory load within bounds. Raising the free list limit to e.g. 2048 would cause at most 64kB to be used by the free list - which is well within bounds of any modern CPU L2 cache.
msg64216 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2008-03-20 22:19
Well, of course most words in most languages are below 20 characters. Hence most strings containing words are also below 20 chars. But strings can also contain whole lines (e.g. decoding of various Internet protocols), which are statistically below 80 chars. I took that into account in the freelists patch (it keeps lots of <15 char strings in memory, and a few <90 char strings), which as you can see still yields rather little benefits :)
msg64320 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2008-03-22 12:56
I wasn't clear enough: my point was that your free list patch would probably benefit from some tuning of the cut-off parameters. 15 characters appears to be too small (see the HISTORY file histogram). You'll likely get better results for 32 and 256 as cut-off values and by increasing the max list sizes to higher values, e.g. 1024 for the 32 character slots and 146 for the 256 character slots (see the counts in the histograms).
msg87912 - (view)	Author: Daniel Diniz (ajaksu2) *	Date: 2009-05-16 19:37
Collin, Can you test this patch with Unladen Swallow's benchmarks?
msg87917 - (view)	Author: Collin Winter (collinwinter) *	Date: 2009-05-16 19:47
Daniel, which patch? freelists2.patch or unialloc4.patch? If these are targeted py3k (judging by the "Versions" selector above), none of Unladen Swallow's benchmarks work under 3k (we're focusing on 2.x).
msg87918 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2009-05-16 19:53
> Daniel, which patch? freelists2.patch or unialloc4.patch? If these are > targeted py3k (judging by the "Versions" selector above), none of > Unladen Swallow's benchmarks work under 3k (we're focusing on 2.x). They target py3k indeed. Also, they need updating (at least the uniallocX patch which is the interesting part here).
msg88254 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2009-05-23 23:09
Updated patch against py3k. On a 64-bit system, each unicode object takes 14 bytes less than without the patch (using sys.getsizeof()). Two to four more bytes could be gained by folding the `state` member in the two lower bits of `defenc`, but I'm not sure it's worth the trouble.
msg88287 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2009-05-24 21:30
Antoine, I think we have to make a decision here: I'm still -1 on changing PyUnicodeObject to be a PyVarObject, but do like your experiments with the free lists. I also still believe that tuning the existing parameters in the Unicode implementation and pymalloc would give a better performance gain than what your patch achieves. Setting KEEPALIVE_SIZE_LIMIT to 32 would be a first start in that direction. However, if you insist on changing the PyUnicodeObject structure, I'll have to reject the patch.
msg88288 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2009-05-24 21:36
As I already showed, the freelist experiments bring very little improvement.
msg88290 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2009-05-24 21:48
Ok, then closing the patch as rejected.
msg88291 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2009-05-24 21:51
Marc-André, please don't close the issue while you're the only one opposing it, thanks.
msg88307 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2009-05-25 08:17
Antoine, I have explained the reasons for rejecting the patch. In short, it violates a design principle behind the Unicode implementation. If you want to change such a basic aspect of the Unicode implementation, then write a PEP which demonstrates the usefulness on a larger set of more general tests and comes up with significant results (10% speedup in some micro benchmarks is not significant; memory tests need to be run without pymalloc and require extra care to work around OS malloc optimization strategies). Like I said: The current design of the Unicode object implementation would benefit more from advances in pymalloc tuning, not from making it next to impossible to extend the Unicode objects to e.g. * reuse existing memory blocks for allocation, * pointing straight into memory mapped files, * providing highly efficient ways to tokenize Unicode data, * sharing of data between Unicode objects, etc. The reason I chose this design was to make the above easily implementable and it was a conscious decision to use a PyObject rather than a PyVarObject, like the string object, since I knew that the Unicode object was eventually going to replace the string object.
msg88309 - (view)	Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) *	Date: 2009-05-25 08:45
Looking at the comments, it seems that the performance gain comes from the removal of the double allocation which is needed by the current design. Was the following implementation considered: - keep the current PyUnicodeObject structure - for small strings, allocate one chunk of memory: sizeof(PyUnicodeObject)+2length. Then set self->str=(Py_UNICODE)(self+1); - for large strings, self->str may be allocated separately. - unicode_dealloc() must be careful and not free self->str if it is contiguous to the object (it's probably a good idea to reuse the self->state field for this purpose).
msg88310 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2009-05-25 09:21
Amaury Forgeot d'Arc wrote: > Amaury Forgeot d'Arc <amauryfa@gmail.com> added the comment: > > Looking at the comments, it seems that the performance gain comes from > the removal of the double allocation which is needed by the current design. > > Was the following implementation considered: > - keep the current PyUnicodeObject structure > - for small strings, allocate one chunk of memory: > sizeof(PyUnicodeObject)+2length. Then set self->str=(Py_UNICODE)(self+1); > - for large strings, self->str may be allocated separately. > - unicode_dealloc() must be careful and not free self->str if it is > contiguous to the object (it's probably a good idea to reuse the > self->state field for this purpose). AFAIK, this was not yet been investigated. Note that in real life applications, you hardly ever have to call malloc on small strings - these are managed by pymalloc as pieces of larger chunks and allocation/deallocation is generally fast. You have the same situation for PyUnicodeObject itself (which, as noted earlier, could be optimized in pymalloc even further, since the size of PyUnicodeObject is fixed). The OS malloc() is only called for longer strings and then only for the string buffer itself - the PyUnicodeObject is again completly managed by pymalloc, even in this case.
msg88312 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2009-05-25 09:24
Marc-André, the problem is that all your arguments are fallacious at best. Let me see: > Like I said: The current design of the Unicode object implementation > would benefit more from advances in pymalloc tuning, not from making it > next to impossible to extend the Unicode objects to e.g. [...] Saying that is like saying "we shouldn't try to improve ceval.c because it makes it harder to write a JIT". You are dismissing concrete actual improvements in favour of pie-in-the-sky improvements that nobody has seemed to try (you're welcome to prove me wrong) in 10 years of existence of the unicode type. Besides, if someone wants to experiment with such improvements, it is not difficult to switch back to the old representation (my patch is very short if you discard the mechanic replacement of "self->length" with "PyUnicode_GET_SIZE(self)", which doesn't have to be undone to switch representations). So, I fail to see the relevance of that argument. > Antoine, I have explained the reasons for rejecting the patch. In short, > it violates a design principle behind the Unicode implementation. You seem to be the only one thinking this while, AFAIK, you haven't been the only one to work on that datatype. > (10% speedup in > some micro benchmarks is not significant; memory tests need to be run > without pymalloc and require extra care to work around OS malloc > optimization strategies). Actually, running performance or resource consumption tests without pymalloc is pointless since it makes the test completely artificial and unrelated to real-world conditions (who runs Python without pymalloc in real-world conditions?). > * reuse existing memory blocks for allocation, > * pointing straight into memory mapped files, > * providing highly efficient ways to tokenize Unicode data, > * sharing of data between Unicode objects, > etc. By the way, I haven't seen your patches or experiments for those. Giving guidance is nice, but proofs of concept, at the minimum, are more convincing. None of the suggestions above strike me as very /easy/ (actually, they are at least an order of magnitude harder than the present patch), or even guaranteed to give any tangible benefits. To be clear, I don't think this proposal is more important than any other one giving similar results (provided these exist). But your arguments are never factual and, what's more, while I already did the same replies as I did here in other messages, you never bothered to be more factual. I would accept your refusal if your arguments had some semblance of concrete support for them.
msg88313 - (view)	Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) *	Date: 2009-05-25 09:32
> The OS malloc() is only called... I know this. But pymalloc has its own overhead, and cache locality will certainly be better if string data is close to the string length. The goal is to improve the current usage of strings, and not rely on hypothetical enhancements to the generic pymalloc.
msg88584 - (view)	Author: Jim Jewett (jimjjewett)	Date: 2009-05-30 22:41
There were a number of patches to support sharing of data between unicode objects. (By Larry Hastings?) They were rejected because (a) they were complicated, and (b) it was possible to provoke pathological memory retention.
msg88585 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2009-05-30 22:47
> There were a number of patches to support sharing of data between > unicode objects. (By Larry Hastings?) They were rejected because (a) > they were complicated, and (b) it was possible to provoke pathological > memory retention. Yes, it's the "lazy strings" patches by Larry Hastings (it was for str, not unicode, though). Issues are #1590352 and #1569040 (and perhaps others). In any case, as I said, it is easy to switch back to the old representation, so I don't think it is an argument to block this patch.
msg88744 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2009-06-02 13:46
Jim Jewett wrote: > Jim Jewett <jimjjewett@users.sourceforge.net> added the comment: > > There were a number of patches to support sharing of data between > unicode objects. (By Larry Hastings?) They were rejected because (a) > they were complicated, and (b) it was possible to provoke pathological > memory retention. Right, but the patches were targeting the main Unicode type implementation. It would certainly be possible to implement these features on a Unicode sub-type. Note that the Unicode type implementation on which the Python type is based did in fact use references to other objects in order to implement sharing. This part was removed from the base type due to the issues with unwillingly keeping alive large reference objects. However, the implementation can be used as basis for writing a Unicode sub-type which does implement data sharing. If you're looking for application space where such data sharing types are useful, have a look at parsing engines or routines that split larger chunks of data in multiple smaller pieces. Shared memory is another use case where such types would enable sharing of Unicode data between processes... but I'm repeating myself.
msg88745 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2009-06-02 13:53
Antoine Pitrou wrote: > Antoine Pitrou <pitrou@free.fr> added the comment: > >> There were a number of patches to support sharing of data between >> unicode objects. (By Larry Hastings?) They were rejected because (a) >> they were complicated, and (b) it was possible to provoke pathological >> memory retention. > > Yes, it's the "lazy strings" patches by Larry Hastings (it was for str, > not unicode, though). Issues are #1590352 and #1569040 (and perhaps > others). > > In any case, as I said, it is easy to switch back to the old > representation, so I don't think it is an argument to block this patch. That's not the case. The patch breaks C API + binary compatibility for an essential Python type - that's not something you can easily undo.
msg88746 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2009-06-02 14:12
> The patch breaks C API + binary compatibility for an essential Python > type - that's not something you can easily undo. I don't see how it breaks C API compatibility. No officially documented function has changed, and the accessor macros still work. Am I missing something? As for binary compatibility, yes, it does break it, which isn't an exceptional situation in the development process. We have changed other "essential types" too -- for example, recently, the PyLong object got 30-bit digits on some systems. Why you think it is hard to undo, I don't understand. As for the future ABI PEP, which has not yet been accepted, it does not mention PyUnicodeObject as part of the structures which are guaranteed to remain binary-compatible : http://www.python.org/dev/peps/pep-0384/#structures
msg88751 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2009-06-02 16:38
Antoine Pitrou wrote: > Antoine Pitrou <pitrou@free.fr> added the comment: > >> The patch breaks C API + binary compatibility for an essential Python >> type - that's not something you can easily undo. > > I don't see how it breaks C API compatibility. No officially documented > function has changed, and the accessor macros still work. Am I missing > something? Yes: The layout and object type of the PyUnicodeObject object. You cannot simply recompile your code and have it working. Instead, you have to provide different sub-typing implementations depending on whether PyUnicodeObject is a PyVarObject or PyObject, since these are inherently different in their structure. Please note that all type objects documented in the header files not explicitly declared for private use only, are in fact public APIs. You need access to those type objects in order to be able to subclass them. > As for binary compatibility, yes, it does break it, which isn't an > exceptional situation in the development process. We have changed other > "essential types" too -- for example, recently, the PyLong object got > 30-bit digits on some systems. Why you think it is hard to undo, I don't > understand. That's a different kind of change. Even though it's very hard to sub-type longs due to their PyVarObject nature and the fact that longs even dig into the PyObject_VAR_HEAD, you can still recompile your code and it will continue to work. The change was to a typedef - the name of the typedef itself has not changed. This is similar to compiling Python as UCS2 or UCS4 version - Py_UNICODE will stay the same typedef, but on a UCS2 system it maps to 16 bits, whereas on a UCS4 system it is set to 32 bits. Note that the Unicode implementation takes great care not to hide this binary incompatibility - by remapping all APIs to include the UCS2/UCS4 hint in the exported name. As an side: The long implementation does not. > As for the future ABI PEP, which has not yet been accepted, it does not > mention PyUnicodeObject as part of the structures which are guaranteed > to remain binary-compatible : > http://www.python.org/dev/peps/pep-0384/#structures That's fine, but please note that the ABI PEP only addresses applications that wish to benefit from the binary compatibility across Python versions. It has no implications on applications that don't want to use the ABI or cannot, since they are too low-level, such as extensions wishing to sub-class built-in types.
msg88753 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2009-06-02 17:17
> You cannot simply recompile your code and have it working. Who is "you"? People doing mundane things with PyUnicodeObjects certainly can, assuming they use the macros for any member access. > Please note that all type objects documented in the header files > not explicitly declared for private use only, are in fact > public APIs. If the datatype layout is not publicly documented in the API reference, it certainly seems to be a non-public part of the API. That's why we have macros for member access, instead of letting people access members directly. The fact that my patch doesn't touch any part of the C sources except for the unicode implementation itself seems to support this view as well: people have been using the macros because they understand the actual layout shouldn't be relied upon. > You need access to those type objects in order to > be able to subclass them. As is needed for every other core object whose layout is nevertheless changed now and then... I think it should be expected that any code relying on low-level implementation specifics can break now and then. Changing low-level implementation specifics is often a prerequisite for improving things and it would be foolish to make a promise that we guarantee 100% compatibility at that level. (we could of course strengthen the rules for unicode if it was demonstrated that there are several popular instances of subclassing unicode in a C extension. However, I haven't seen any such examples) > Note that the Unicode implementation takes great care not to hide > this binary incompatibility - by remapping all APIs to include the > UCS2/UCS4 hint in the exported name. That's because there are UCS2 and UCS4 builds of the same interpreter version, and people are not necessarily aware of there being a difference. Such variability is not what we are talking about here.
msg88767 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2009-06-02 22:06
Here's an example implementation of a Unicode sub-type that allows referencing other Unicode objects: http://downloads.egenix.com/python/unicoderef-0.0.1.tar.gz As you can see, it's pretty straight-forward to write and I want to keep it that way.
msg88768 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2009-06-02 22:25
Antoine Pitrou wrote: > Antoine Pitrou <pitrou@free.fr> added the comment: > >> You cannot simply recompile your code and have it working. > > Who is "you"? > People doing mundane things with PyUnicodeObjects certainly can, > assuming they use the macros for any member access. As soon as they want to do C-level sub-typing, a change from PyObject to PyVarObject will break their code in non-trivial ways. >> Please note that all type objects documented in the header files >> not explicitly declared for private use only, are in fact >> public APIs. > > If the datatype layout is not publicly documented in the API reference, > it certainly seems to be a non-public part of the API. That's why we > have macros for member access, instead of letting people access members > directly. Header files are the API reference. There are many instances where they include things that are only meant to be used internally by the interpreter, but these are carefully documented in the header files. >> You need access to those type objects in order to >> be able to subclass them. > > As is needed for every other core object whose layout is nevertheless > changed now and then... I think it should be expected that any code > relying on low-level implementation specifics can break now and then. > Changing low-level implementation specifics is often a prerequisite for > improving things and it would be foolish to make a promise that we > guarantee 100% compatibility at that level. It would be foolish to break such compatibility for the sake of some really minor performance win. Python's main focus is flexibility, not speed. Your proposed change makes it a lot harder to tap into the available flexibility, since sub-typing of PyVarObjects is non-trivial. > (we could of course strengthen the rules for unicode if it was > demonstrated that there are several popular instances of subclassing > unicode in a C extension. However, I haven't seen any such examples) Well, since you don't appear to count the many attempts to get slicing-by-reference into the base type as proof that such ideas do have use-cases, I've posted an example implementation which provides such a sub-type. It's easy to extend to all the use cases I've mentioned so far.
msg88774 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2009-06-02 23:09
Mark, I'm inclined to agree that this would be a destabilizing change. Guido, do you care to pronounce on whether it is okay to change the struct?
msg88775 - (view)	Author: Guido van Rossum (gvanrossum) *	Date: 2009-06-02 23:25
If this is not yet in 3.1, it's clearly too late to add it (now that RC1 was already released). If was in already (hard to tell from the long bug), I think it should be kept in (removing it would destabilize more than keeping it).
msg88777 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2009-06-02 23:32
Correction: It is a proposal for 3.2 that changes the struct used in 3.0 and 3.1.
msg88779 - (view)	Author: Guido van Rossum (gvanrossum) *	Date: 2009-06-02 23:51
That's unfortunate; it would clearly have been easier to change this in 3.1. That said, I'm not sure anyone should be subclassing PyUnicode. Maybe Marc-Andre can explain why he is doing this (or point to the message in this thread where he explained this before)? If it's a viable use case, it should be possible to have some symbol or a few symbols whose presence can be tested in the preprocessor by code that needs to subclass; we should design the patch with that in mind and Marc-Andre could help testing it. All this is assuming the speed-up is important enough to bother. Has anyone run a comparison benchmark using the Unladen Swallow benchmarks? I trust those much more than micro-benchmarks (including, I assume, stringbench.py). I do expect that reducing the number of allocations for short-to-medium-size strings from 2 to 1 would be a significant speed-up, but I can't guess how much.
msg88801 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2009-06-03 09:36
> That's unfortunate; it would clearly have been easier to change this in 3.1. > > That said, I'm not sure anyone should be subclassing PyUnicode. Maybe > Marc-Andre can explain why he is doing this (or point to the message in > this thread where he explained this before)? The Unicode type was designed to be a basic useful type with as much functionality as needed, but no more. Since it was clear that we would get sub-typing at the time, the aim was to move special use cases into such sub-types. One such example was the referencing logic used in Fredrik's implementation and often requested by the Python community. I removed that logic from the implementation due to the issues this would have caused by accidentally keeping alive large referenced Unicode objects due to references pointing to them. However, it's easy to add it back by sub-typing the Unicode type: http://downloads.egenix.com/python/unicoderef-0.0.1.tar.gz Other special use cases: * sub-types which hold a reference to the original bytes string, e.g. to implement a round-trip safe storage even of broken text data or text that uses multiple encodings * sub-types that get their data from a memory mapped file or a shared memory area without copying * sub-types that implement indexing based on glyphs (ie. the human recognizable notion of a character) rather than code points * sub-types that implement special extra methods or provide case insensitive operations * sub-types that implement special text data forms, such as URLs, OS paths, UID strings, etc. and custom operations on those Sub-typing is also encouraged by the documentation: http://docs.python.org/library/userdict.html#module-UserString ... and after all: one of the main points in making all built-in types sub-typeable was to actually do it :-) > If it's a viable use case, > it should be possible to have some symbol or a few symbols whose > presence can be tested in the preprocessor by code that needs to > subclass; we should design the patch with that in mind and Marc-Andre > could help testing it. > > All this is assuming the speed-up is important enough to bother. Has > anyone run a comparison benchmark using the Unladen Swallow benchmarks? > > I trust those much more than micro-benchmarks (including, I assume, > stringbench.py). I do expect that reducing the number of allocations > for short-to-medium-size strings from 2 to 1 would be a significant > speed-up, but I can't guess how much. While the Unladen Swallow aims at providing high-level benchmarks, it's current state doesn't really implement that promise (yet). If you look at the list of benchmarks they use, most appear to be dealing with pickling. That doesn't strike me as particularly useful for testing real life Python usage. If a high level benchmark is indeed what's wanted, then they should setup pre-configured Django and Zope instances and run those through a series of real-life usage scenarios to cover the web application use space. For scientific use cases, it would be good to have similar setups using BioPython, NumPy and matplotlib. And so on. Much like the high level benchmarks you have in the Windows world. Depending on the use case, the results of benchmarks for this particular change are difficult to predict or interpret. Here's a summary message with my reasoning for rejecting the patch: http://bugs.python.org/issue1943#msg88307 Instead of changing PyUnicodeObject from a PyObject to a PyVarObject, making sub-typing a lot harder, I'd much rather apply a single change for 3.1: raising the KEEPALIVE_SIZE_LIMIT to 32 as explained and motivated here: http://bugs.python.org/issue1943#msg64215 That's a simple non-disruptive change which makes a lot of sense due to the advances in CPU designs in the last 9 years. I determined the original value of 9 using benchmarks and similar statistics in 1999/2000. It's probably also a good time to remove the warning, now that the implementation has proven itself for so many years... /* Limit for the Unicode object free list stay alive optimization. The implementation will keep allocated Unicode memory intact for all objects on the free list having a size less than this limit. This reduces malloc() overhead for small Unicode objects. At worst this will result in PyUnicode_MAXFREELIST * (sizeof(PyUnicodeObject) + KEEPALIVE_SIZE_LIMIT + malloc()-overhead) bytes of unused garbage. Setting the limit to 0 effectively turns the feature off. Note: This is an experimental feature ! If you get core dumps when using Unicode objects, turn this feature off. */ #define KEEPALIVE_SIZE_LIMIT 9
msg88802 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2009-06-03 10:15
> Instead of changing PyUnicodeObject from a PyObject to a PyVarObject, > making sub-typing a lot harder, I'd much rather apply a single change > for 3.1: raising the KEEPALIVE_SIZE_LIMIT to 32 as explained and > motivated here: You make it sound like an alternative to this patch, while it is quite independent. You could open a separate entry about it, together with benchmarks results etc.
msg88804 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2009-06-03 11:16
Antoine Pitrou wrote: >> Instead of changing PyUnicodeObject from a PyObject to a PyVarObject, >> making sub-typing a lot harder, I'd much rather apply a single change >> for 3.1: raising the KEEPALIVE_SIZE_LIMIT to 32 as explained and >> motivated here: > > You make it sound like an alternative to this patch, while it is quite > independent. You could open a separate entry about it, together with > benchmarks results etc. You're right, I could open a separate ticket for it, but it does fit the title of the ticket and was also discussed at length on this ticket, since part of your original patch included changes to the free list management in the Unicode implementation.
msg88806 - (view)	Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) *	Date: 2009-06-03 11:32
Let's apply simple and noncontroversial patches first, and then see if the bigger changes are still worth it. Please open a new ticket.
msg88816 - (view)	Author: Guido van Rossum (gvanrossum) *	Date: 2009-06-03 17:06
Hm, so the extra pointer is a feature. I guess a compromise would be to keep the extra indirection but make it point into the same object in the base class. Thinking about how memory caching in modern CPUs work, this would probably be quite fast but it would still cost 8 bytes on most future (i.e., 64-bit) architectures. Still, I expect that a vanishingly small number of users will actually use that feature. Is it worth to make everyone pay for that flexibility, for what must be the first- or second-most commonly used type in Python 3.x (the other being int), which is still significantly slower than the common (8-bit) string type in 2.x?
msg88819 - (view)	Author: Collin Winter (collinwinter) *	Date: 2009-06-03 17:35
On Wed, Jun 3, 2009 at 2:36 AM, Marc-Andre Lemburg <report@bugs.python.org> wrote: > Marc-Andre Lemburg <mal@egenix.com> added the comment: >> All this is assuming the speed-up is important enough to bother. Has >> anyone run a comparison benchmark using the Unladen Swallow benchmarks? >> >> I trust those much more than micro-benchmarks (including, I assume, >> stringbench.py). I do expect that reducing the number of allocations >> for short-to-medium-size strings from 2 to 1 would be a significant >> speed-up, but I can't guess how much. > > While the Unladen Swallow aims at providing high-level benchmarks, > it's current state doesn't really implement that promise (yet). > > If you look at the list of benchmarks they use, most appear to be > dealing with pickling. That doesn't strike me as particularly useful > for testing real life Python usage. I would take issue with your characterization of those benchmarks. There are several benchmarks for cPickle, true, both macro and micro benchmarks, but I would not describe their number as "most of [our] benchmarks". For example, "slowpickle" and "slowunpickle" both use the pure-Python pickle.py, and are testing how close we can get that implementation to the tuned cPickle version. Regardless, I don't know that any of our benchmarks really stress unicode performance. We so far haven't cared about improving the performance of unicode objects. 2to3 uses unicode internally, so that might be a good benchmark to run. > If a high level benchmark is indeed what's wanted, then they should > setup pre-configured Django and Zope instances and run those through > a series of real-life usage scenarios to cover the web application > use space. We have a benchmark for Django and Spitfire templates, both of which are heavily used in the web application use space. We focused on template languages because in talking to Google's web app teams, they found their primary CPU bottlenecks to be templating systems, not ORMs or other components. > For scientific use cases, it would be good to have similar > setups using BioPython, NumPy and matplotlib. And so on. Much like > the high level benchmarks you have in the Windows world. We have NumPy in our correctness test suite, but no benchmarks based on it. Looking at all the packages you just named, they make heavy use of C/C++ extensions (with BioPython and matplotpib both depending on NumPy) or large C libraries (matplotlib can depend on Cairo, I see). We've been focusing on pure-Python performance, so I'm skeptical that benchmarks with such large C components would be a useful guide for our work. I'm happy to talk about this further outside of this thread, though. Collin
msg88824 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2009-06-03 20:41
> Still, I expect that a vanishingly small number of users will actually > use that feature. Apart from the example Marc-André just posted (and which is a 0.0.1 proof of concept he apparently just wrote), the number of users is, AFAICT, zero. Unless there's some closed source extension which happens to extend unicode as a C subtype. Now, as for easing the subclassing of unicode in C, there are probably several possibilities which range from devising a clever set of macros to abusing the ob_size field for a tagged pointer. People who really care should do a concrete proposal (and I don't know who these people are, apart from Marc-André).
msg88830 - (view)	Author: Guido van Rossum (gvanrossum) *	Date: 2009-06-03 21:31
On Wed, Jun 3, 2009 at 1:41 PM, Antoine Pitrou <report@bugs.python.org> wrote: > Apart from the example Marc-André just posted (and which is a 0.0.1 > proof of concept he apparently just wrote), the number of users is, > AFAICT, zero. IIUC Marc-Andre extracted that from a larger code base (MX) which he owns and has been maintaining for a decade or so. > Unless there's some closed source extension which happens to extend > unicode as a C subtype. I believe part of MX is closed source. > Now, as for easing the subclassing of unicode in C, there are probably > several possibilities which range from devising a clever set of macros > to abusing the ob_size field for a tagged pointer. People who really > care should do a concrete proposal (and I don't know who these people > are, apart from Marc-André). Not really if the core code uses a macro that depends on the layout of the object (i.e. the data immediately following the header, like old 8-bit strings), unless you change the core (or the macro) to only use this if the type matches exactly, and for subtypes use a more expensive API. But that would slow down unnecessarily for subclasses written in Python (of which there are plenty). But I would like to point out that few people if any have ever complained about the contiguous allocation for 8-bit strings in Python [0-2].x. And we certainly wouldn't have given in. Now that Unicode is no longer some fancy-schmancy advanced concept but the basis for all Python string processing I think we should apply the same policy.
msg88879 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2009-06-04 10:43
Here's a new version of the unicode reference type, extended to run in both Python 2.6 and 3.1: http://downloads.egenix.com/python/unicoderef-0.0.2.tar.gz I've also included a benchmark implemented in C which measures Unicode/Bytes allocation performance at high resolution and without any VM overhead. These are the results on a 64-bit system using a regular UCS2 build of Python 2.6.1 and 3.1 on a single core AMD system: * Unicode allocation is always much slower than Bytes This depends a lot on how much data you have to handle. If the data fits into the pymalloc managed sizes, there's little difference. * CPU cache lines / memory boundaries obviously have some impact on the results If things are properly aligned, performance is better than in the unaligned case. Unfortunately, there's nothing much the CPython implementation can do to benefit from this, since the hardware layouts are too diverse. This explains why you see some unexpected figures in the results, e.g. Python 3.1, 4096 words, 32chars - Unicode being faster than Bytes. * Something must have improved in pymalloc The 4096 word tests perform better on 3.1 than on 2.6. Since such improvement have a great impact, more emphasis should be placed on these, e.g. by having pymalloc provide more space to fixed size objects such PyUnicodeObject, reducing the need to create new arenas. Dictionaries and other PyObjects such as instances objects would also benefit from such improvements. Note that the benchmark scales the bytes word sizes to match the internal allocation size of the Unicode objects, ie. 2chars maps to a length 4 bytes string on a UCS2 build. Testing Unicode/Bytes Allocation Speed with Python 3.1rc1+ ------------------------------------------------------------------------ === 1024 words ============================================================ Variant 2chars, 1024 words, Unicode: 0.005503 seconds = 0.054 us per object (best of 50 rounds) Variant 2chars, 1024 words, Bytes : 0.006838 seconds = 0.067 us per object (best of 50 rounds) Variant 9chars, 1024 words, Unicode: 0.008956 seconds = 0.087 us per object (best of 50 rounds) Variant 9chars, 1024 words, Bytes : 0.007380 seconds = 0.072 us per object (best of 50 rounds) Variant 16chars, 1024 words, Unicode: 0.009257 seconds = 0.090 us per object (best of 50 rounds) Variant 16chars, 1024 words, Bytes : 0.007291 seconds = 0.071 us per object (best of 50 rounds) Variant 32chars, 1024 words, Unicode: 0.009589 seconds = 0.094 us per object (best of 50 rounds) Variant 32chars, 1024 words, Bytes : 0.007803 seconds = 0.076 us per object (best of 50 rounds) Variant pep100, 1024 words, Unicode: 0.010569 seconds = 0.103 us per object (best of 50 rounds) Variant pep100, 1024 words, Bytes : 0.008198 seconds = 0.080 us per object (best of 50 rounds) === 2048 words ============================================================ Variant 2chars, 2048 words, Unicode: 0.011624 seconds = 0.057 us per object (best of 50 rounds) Variant 2chars, 2048 words, Bytes : 0.013941 seconds = 0.068 us per object (best of 50 rounds) Variant 9chars, 2048 words, Unicode: 0.018608 seconds = 0.091 us per object (best of 50 rounds) Variant 9chars, 2048 words, Bytes : 0.014773 seconds = 0.072 us per object (best of 50 rounds) Variant 16chars, 2048 words, Unicode: 0.018556 seconds = 0.091 us per object (best of 50 rounds) Variant 16chars, 2048 words, Bytes : 0.014550 seconds = 0.071 us per object (best of 50 rounds) Variant 32chars, 2048 words, Unicode: 0.018972 seconds = 0.093 us per object (best of 50 rounds) Variant 32chars, 2048 words, Bytes : 0.016377 seconds = 0.080 us per object (best of 50 rounds) Variant pep100, 2048 words, Unicode: 0.021005 seconds = 0.103 us per object (best of 50 rounds) Variant pep100, 2048 words, Bytes : 0.016636 seconds = 0.081 us per object (best of 50 rounds) === 4096 words ============================================================ Variant 2chars, 4096 words, Unicode: 0.022950 seconds = 0.056 us per object (best of 50 rounds) Variant 2chars, 4096 words, Bytes : 0.027813 seconds = 0.068 us per object (best of 50 rounds) Variant 9chars, 4096 words, Unicode: 0.037229 seconds = 0.091 us per object (best of 50 rounds) Variant 9chars, 4096 words, Bytes : 0.031123 seconds = 0.076 us per object (best of 50 rounds) Variant 16chars, 4096 words, Unicode: 0.037118 seconds = 0.091 us per object (best of 50 rounds) Variant 16chars, 4096 words, Bytes : 0.036433 seconds = 0.089 us per object (best of 50 rounds) Variant 32chars, 4096 words, Unicode: 0.040970 seconds = 0.100 us per object (best of 50 rounds) Variant 32chars, 4096 words, Bytes : 0.051422 seconds = 0.126 us per object (best of 50 rounds) Variant pep100, 4096 words, Unicode: 0.049630 seconds = 0.121 us per object (best of 50 rounds) Variant pep100, 4096 words, Bytes : 0.034551 seconds = 0.084 us per object (best of 50 rounds) Testing Unicode/Bytes Allocation Speed with Python 2.6.1 ------------------------------------------------------------------------ === 1024 words ============================================================ Variant 2chars, 1024 words, Unicode: 0.005810 seconds = 0.057 us per object (best of 50 rounds) Variant 2chars, 1024 words, Bytes : 0.006918 seconds = 0.068 us per object (best of 50 rounds) Variant 9chars, 1024 words, Unicode: 0.008539 seconds = 0.083 us per object (best of 50 rounds) Variant 9chars, 1024 words, Bytes : 0.007618 seconds = 0.074 us per object (best of 50 rounds) Variant 16chars, 1024 words, Unicode: 0.008478 seconds = 0.083 us per object (best of 50 rounds) Variant 16chars, 1024 words, Bytes : 0.008097 seconds = 0.079 us per object (best of 50 rounds) Variant 32chars, 1024 words, Unicode: 0.008969 seconds = 0.088 us per object (best of 50 rounds) Variant 32chars, 1024 words, Bytes : 0.008649 seconds = 0.084 us per object (best of 50 rounds) Variant pep100, 1024 words, Unicode: 0.009779 seconds = 0.096 us per object (best of 50 rounds) Variant pep100, 1024 words, Bytes : 0.008318 seconds = 0.081 us per object (best of 50 rounds) === 2048 words ============================================================ Variant 2chars, 2048 words, Unicode: 0.011745 seconds = 0.057 us per object (best of 50 rounds) Variant 2chars, 2048 words, Bytes : 0.013978 seconds = 0.068 us per object (best of 50 rounds) Variant 9chars, 2048 words, Unicode: 0.017418 seconds = 0.085 us per object (best of 50 rounds) Variant 9chars, 2048 words, Bytes : 0.014443 seconds = 0.071 us per object (best of 50 rounds) Variant 16chars, 2048 words, Unicode: 0.018138 seconds = 0.089 us per object (best of 50 rounds) Variant 16chars, 2048 words, Bytes : 0.016200 seconds = 0.079 us per object (best of 50 rounds) Variant 32chars, 2048 words, Unicode: 0.021348 seconds = 0.104 us per object (best of 50 rounds) Variant 32chars, 2048 words, Bytes : 0.018144 seconds = 0.089 us per object (best of 50 rounds) Variant pep100, 2048 words, Unicode: 0.019593 seconds = 0.096 us per object (best of 50 rounds) Variant pep100, 2048 words, Bytes : 0.016692 seconds = 0.082 us per object (best of 50 rounds) === 4096 words ============================================================ Variant 2chars, 4096 words, Unicode: 0.024347 seconds = 0.059 us per object (best of 50 rounds) Variant 2chars, 4096 words, Bytes : 0.028411 seconds = 0.069 us per object (best of 50 rounds) Variant 9chars, 4096 words, Unicode: 0.041480 seconds = 0.101 us per object (best of 50 rounds) Variant 9chars, 4096 words, Bytes : 0.031532 seconds = 0.077 us per object (best of 50 rounds) Variant 16chars, 4096 words, Unicode: 0.047986 seconds = 0.117 us per object (best of 50 rounds) Variant 16chars, 4096 words, Bytes : 0.038006 seconds = 0.093 us per object (best of 50 rounds) Variant 32chars, 4096 words, Unicode: 0.062115 seconds = 0.152 us per object (best of 50 rounds) Variant 32chars, 4096 words, Bytes : 0.049687 seconds = 0.121 us per object (best of 50 rounds) Variant pep100, 4096 words, Unicode: 0.044331 seconds = 0.108 us per object (best of 50 rounds) Variant pep100, 4096 words, Bytes : 0.035939 seconds = 0.088 us per object (best of 50 rounds)
msg88880 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2009-06-04 10:54
Guido van Rossum wrote: > Guido van Rossum <guido@python.org> added the comment: > > On Wed, Jun 3, 2009 at 1:41 PM, Antoine Pitrou <report@bugs.python.org> wrote: >> Apart from the example Marc-André just posted (and which is a 0.0.1 >> proof of concept he apparently just wrote), the number of users is, >> AFAICT, zero. > > IIUC Marc-Andre extracted that from a larger code base (MX) which he > owns and has been maintaining for a decade or so. Only part of it. I wrote the sub-type Unicode Reference sub-type implementation just a few days ago, in order to demonstrate how easy it is provided you have a PyObject (rather than a PyVarObject) to build on. We should really publicize how easy it is to write such type extensions. I'm sure that a lot of things which often generated heated discussions (such as the slicing patches for Unicode) could easily be solved by just adding a few such sub-types to the core. >> Unless there's some closed source extension which happens to extend >> unicode as a C subtype. > > I believe part of MX is closed source. True. A large part of the code base is not available to the wider public. >> Now, as for easing the subclassing of unicode in C, there are probably >> several possibilities which range from devising a clever set of macros >> to abusing the ob_size field for a tagged pointer. People who really >> care should do a concrete proposal (and I don't know who these people >> are, apart from Marc-André). > > Not really if the core code uses a macro that depends on the layout of > the object (i.e. the data immediately following the header, like old > 8-bit strings), unless you change the core (or the macro) to only use > this if the type matches exactly, and for subtypes use a more > expensive API. But that would slow down unnecessarily for subclasses > written in Python (of which there are plenty). > > But I would like to point out that few people if any have ever > complained about the contiguous allocation for 8-bit strings in Python > [0-2].x. And we certainly wouldn't have given in. Now that Unicode is > no longer some fancy-schmancy advanced concept but the basis for all > Python string processing I think we should apply the same policy. I've spent enough time with this discussion. If you think it's better to make sub-typing harder and thereby closing the door for improvements which could really speed up e.g. template processing (by not requiring copying the same data over and over again), go for it. I still think that it's better to keep things the way they are and benefit from the fact that PyUnicodeObjects have a fixed size with the variable part being dealt with separately. Since pymalloc is being used to manage such objects, there's a lot of room for improvements, since the allocation scheme is under out control. E.g. we could have pymalloc allocate larger pools for PyUnicodeObjects. Doing the same for variable sized objects is a lot harder, consumes more memory and likely less efficient.
msg88881 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2009-06-04 11:26
> Since pymalloc is being used to manage such objects, there's > a lot of room for improvements, since the allocation scheme > is under out control. E.g. we could have pymalloc allocate > larger pools for PyUnicodeObjects. I'm not sure what "larger pools for PyUnicodeObjects" means. pymalloc doesn't have separate pools per object type, only per object size. OTOH, we could grow the size limit under which pymalloc is used, especially on 64-bit systems.
msg88883 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2009-06-04 12:13
Antoine Pitrou wrote: > Antoine Pitrou <pitrou@free.fr> added the comment: > >> Since pymalloc is being used to manage such objects, there's >> a lot of room for improvements, since the allocation scheme >> is under out control. E.g. we could have pymalloc allocate >> larger pools for PyUnicodeObjects. > > I'm not sure what "larger pools for PyUnicodeObjects" means. pymalloc > doesn't have separate pools per object type, only per object size. I meant larger pools for objects of sizeof(PyUnicodeObject) bytes. The same could be done for other often used PyObjects (and only for those). pymalloc is a lot faster than the OS malloc() and was designed for Python object memory management, ie. for small blocks... """ /* A fast, special-purpose memory allocator for small blocks, to be used on top of a general-purpose malloc -- heavily based on previous art. / / Vladimir Marangozov -- August 2000 */ """ > OTOH, we could grow the size limit under which pymalloc is used, > especially on 64-bit systems. The limit is 256 bytes. Increasing it doesn't make much sense, since the pools are 4k each and managed in arenas of 256kb. Anything larger than 256 bytes goes straight to the OS malloc().
msg88885 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2009-06-04 12:18
> Anything larger than 256 bytes goes straight to the OS malloc(). Under a 64-bit system, a plain dict is more than 256 bytes.
msg88928 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2009-06-05 09:50
Raymond suggested the patch be committed in 3.1, so as to minimize disruption between 3.1 and 3.2. Benjamin, what do you think?
msg88936 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2009-06-05 11:06
Antoine Pitrou wrote: > Antoine Pitrou <pitrou@free.fr> added the comment: > > Raymond suggested the patch be committed in 3.1, so as to minimize > disruption between 3.1 and 3.2. Benjamin, what do you think? Has Guido pronounced on this already ?
msg88946 - (view)	Author: Guido van Rossum (gvanrossum) *	Date: 2009-06-05 14:08
On Fri, Jun 5, 2009 at 4:06 AM, Marc-Andre Lemburg <report@bugs.python.org> wrote: > > Marc-Andre Lemburg <mal@egenix.com> added the comment: > > Antoine Pitrou wrote: >> Antoine Pitrou <pitrou@free.fr> added the comment: >> >> Raymond suggested the patch be committed in 3.1, so as to minimize >> disruption between 3.1 and 3.2. Benjamin, what do you think? > > Has Guido pronounced on this already ? I don't want it added to 3.1 unless we start the beta cycle afresh. It's too subtle for submitting after rc1. Talk to Benjamin if you disagree. I think it's fine to wait for 3.2. Maybe add something to the docs about not subclassing unicode in C.
msg88951 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2009-06-05 15:55
Guido van Rossum wrote: > I think it's fine to wait for 3.2. Maybe add something to the docs > about not subclassing unicode in C. We should have a wider discussion about this on python-dev. I'll publish the unicoderef extension and then we can see whether users want this or not. Antoine's patch makes such extensions impossible (provided you don't want to copy over the complete unicodeobject.c implementation in order to change the memory allocation scheme). Note that in Python 2.x you don't have such issues because there, most tools for text processing will happily work on any sort of buffer, so you don't need a string sub-type in order to implement e.g. references into another string (the buffer type will allow you to do this easily).
msg88953 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2009-06-05 16:17
> Note that in Python 2.x you don't have such issues because > there, most tools for text processing will happily work on > any sort of buffer, so you don't need a string sub-type > in order to implement e.g. references into another string > (the buffer type will allow you to do this easily). The new buffer API has a provision for type flags, although none of them had a chance to be implemented by the original author before he ran away... There could be a type flag for unicode characters and then its support could be implemented in the memoryview object.
msg88983 - (view)	Author: Terry J. Reedy (terry.reedy) *	Date: 2009-06-06 00:16
In the interest of possibly improving the imminent 3.1 release, I opened #6216 Raise Unicode KEEPALIVE_SIZE_LIMIT from 9 to 32? I wonder if it is possible to make it generically easier to subclass PyVarObjects (but my C knowledge to getting too faded to have any ideas).
msg89069 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2009-06-08 07:58
Terry J. Reedy wrote: > In the interest of possibly improving the imminent 3.1 release, > I opened #6216 > Raise Unicode KEEPALIVE_SIZE_LIMIT from 9 to 32? Thanks for opening that ticket. > I wonder if it is possible to make it generically easier to subclass > PyVarObjects (but my C knowledge to getting too faded to have any ideas). Even if we were to add some pointer arithmetic tricks to at least hide away the complexities, you'd no longer be able to change the way the data memory allocation works. The reason is simple: subclassing is about reusing existing method implementations and only adding/changing a few of them. If you want to change the way the allocation works, you'd have to replace all of them. Furthermore, using your subclasses objects with the existing APIs would no longer be safe, since these would still assume the original base class memory allocation scheme. In summary: Implementations like the unicoderef type I posted and most of the other use cases I mentioned are no longer possible, ie. you will always have to copy the data in order to work with the existing Unicode APIs on it. The current implementation has no problem with working on referenced data, since support for this was built in right from the start. That's what I meant with closing the door on future enhancements that would make a huge difference if used right, for a mere 10% performance increase.
msg97545 - (view)	Author: Adam Olsen (Rhamphoryncus)	Date: 2010-01-10 19:52
Points against the subclassing argument: * We have a null-termination invariant. For byte strings this was part of the public API, and I'm not sure that's changed for unicode strings; aren't you arguing that we should maximize how much of our implementation is a public API? This prevents lazy slicing. * UTF-16 and UTF-32 are rarely used encodings, especially for longer strings (ie files). For shorter strings (APIs) the unicode object overhead is more significant and we'd need a way to slave to the buffer's lifetime to that of the unicode object (hard to do). For longer strings UTF-8 would be much more useful, but that's been shot down before. * subclassing unicode so you can change the meaning of the fields (ie allocating your own buffer) is a gross hack. It relies far too much on fine details of the implementation and is fragile (what if you miss the dummy byte needed by fastsearch?) Most of the possible options could be, if they function correctly, applied directly to the basetype as a patch, so it's moot. * If you dislike PyVarObject in general (I think the API is ugly too) you should argue for a general policy discouraging future use of it, not just get in the way of the one place where it's most appropriate Terry: PyVarObjects would be much easier to subclass if the type object stored an offset to the beginning of the variable section, so it could be automatically recalculated for subclasses based on the size of the struct. This'd mean the PyBytesObject struct would no longer end with a char ob_sval[1]. The down side is a tiny bit more math when accessing the variable section (as the offset is no longer constant).
msg97553 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2010-01-10 21:59
Adam Olsen wrote: > > Adam Olsen <rhamph@gmail.com> added the comment: > > Points against the subclassing argument: > > * We have a null-termination invariant. For byte strings this was part of the public API, and I'm not sure that's changed for unicode strings; aren't you arguing that we should maximize how much of our implementation is a public API? This prevents lazy slicing. Base type Unicode buffers end with a null-Py_UNICODE termination, but this is not used anywhere, AFAIK. We could probably remove that overallocation at some point. There's no such thing as a null-termination invariant for Unicode. > * subclassing unicode so you can change the meaning of the fields (ie allocating your own buffer) is a gross hack. It relies far too much on fine details of the implementation and is fragile (what if you miss the dummy byte needed by fastsearch?) Most of the possible options could be, if they function correctly, applied directly to the basetype as a patch, so it's moot. Actually, Unicode objects were designed to be subclassable right from the start and adjusting the buffer to point e.g. into some other already allocated string was too. I removed this feature from Fredrik's type implementation with the intent to readd it later on as subclass. See the prototype implementation of such a subclass uniref that I've written to show how easy it is to add a subclass which can be used to slice large Unicode objects without having to reallocate new buffers all the time. BTW, I'm not aware of any changes to the PyUnicodeObject by some fastsearch implementation. Could you point me to this ?
msg97554 - (view)	Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) *	Date: 2010-01-10 22:27
> Base type Unicode buffers end with a null-Py_UNICODE termination, > but this is not used anywhere, AFAIK On Windows, code like CreateDirectoryW(PyUnicode_AS_UNICODE(po), NULL) is very common, at least in posixmodule.c.
msg97555 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2010-01-10 22:36
I find that the null termination for 8-bit strings makes low-level parsing operations (e.g., parsing a numeric string) safer and easier: for example, it makes skipping a series of digits with something like: while (isdigit(*s)) ++s; safe. I'd imagine that null terminated PyUNICODE arrays would have similar benefits.
msg97559 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2010-01-10 22:52
> I find that the null termination for 8-bit strings makes low-level > parsing operations (e.g., parsing a numeric string) safer and easier: Not to mention faster. The new IO library makes use of it (for newline detection), on both bytestrings and unicode strings.
msg97581 - (view)	Author: Adam Olsen (Rhamphoryncus)	Date: 2010-01-11 08:08
On Sun, Jan 10, 2010 at 14:59, Marc-Andre Lemburg <report@bugs.python.org> wrote: > BTW, I'm not aware of any changes to the PyUnicodeObject by some > fastsearch implementation. Could you point me to this ? /* We allocate one more byte to make sure the string is Ux0000 terminated. The overallocation is also used by fastsearch, which assumes that it's safe to look at str[length] (without making any assumptions about what it contains). */
msg98656 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2010-02-01 10:39
Antoine Pitrou wrote: > > Antoine Pitrou <pitrou@free.fr> added the comment: > >> I find that the null termination for 8-bit strings makes low-level >> parsing operations (e.g., parsing a numeric string) safer and easier: > > Not to mention faster. The new IO library makes use of it (for newline > detection), on both bytestrings and unicode strings. I'd consider that a bug. Esp. the IO lib should be 8-bit clean in the sense that it doesn't add any special meaning to NUL characters or code points. Besides, using a for-loop with a counter is both safer and faster than checking each an every character for NUL. Just think of what can happen if you have buggy code that overwrites the NUL byte in some corner case situation and then use the assumption of having the NUL byte as terminator - a classical buffer overrun. If you're lucky, you get a segfault. If not, you end up with data corruption or manipulation of data which could lead to unwanted code execution. The Python Unicode API deliberately tries to always use the combination of a Py_UNICODE* pointer and a length integer to avoid such issues.
msg98657 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2010-02-01 11:39
> I'd consider that a bug. Esp. the IO lib should be 8-bit clean > in the sense that it doesn't add any special meaning to NUL > characters or code points. It doesn't add any special meaning to them. It just relies on a NUL being present after the end of the string. It doesn't care about other NULs. > Besides, using a for-loop with a counter is both safer and faster > than checking each an every character for NUL. It's slower, since it has one more condition to check. Newline detection as it is written has a fast path in the form of: while (*c++ >= 0x20); > Just think of what can happen if you have buggy code that overwrites > the NUL byte in some corner case situation and then use the assumption > of having the NUL byte as terminator - a classical buffer overrun. Well, buggy code leads to bugs :)
msg98662 - (view)	Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) *	Date: 2010-02-01 13:43
Again, on Windows there are many many usages of PyUnicode_AS_UNICODE() that pass the result to various Windows API functions, expecting a nul-terminated array of WCHARs. Please don't change this!
msg98668 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2010-02-01 16:12
Amaury Forgeot d'Arc wrote: > > Amaury Forgeot d'Arc <amauryfa@gmail.com> added the comment: > >> Base type Unicode buffers end with a null-Py_UNICODE termination, >> but this is not used anywhere, AFAIK > On Windows, code like > CreateDirectoryW(PyUnicode_AS_UNICODE(po), NULL) > is very common, at least in posixmodule.c. The above usage is clearly wrong. PyUnicode_AS_UNICODE() should only be used to access Py_UNICODE data directly when working on Python Unicode objects. The macro is not meant to be used directly in external APIs. For such uses, the Unicode conversion APIs need to be used, e.g. the PyUnicode_AsWideChar() API. These will then also apply any 0-termination as necessary. Note that Python is free to change the meaning of Py_UNICODE (e.g. to use UCS4 on all platforms) or Unicode implementation details (such as e.g. the 0-termination) and this would then break any use such as the one you quoted.
msg98669 - (view)	Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) *	Date: 2010-02-01 17:27
Then there are many places to change, in core python as well as in third-party code. And PyArg_ParseTuple("u") would not work any more.
msg98672 - (view)	Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) *	Date: 2010-02-01 17:54
> Note that Python is free to change the meaning of Py_UNICODE > (e.g. to use UCS4 on all platforms) Python-UCS4 has never worked on Windows. Most developers on Windows, taking example on core python source code, implicitly assumed that HAVE_USABLE_WCHAR_T is true, and use the Py_Unicode* api the same way they use the PyString* functions. PyString_AsString is documented to return a nul-terminated array. If PyUnicode_AsUnicode starts to behave differently, people will have more trouble to port their modules to py3k. This is not an implementation detail.
msg98676 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2010-02-01 19:21
modules to py3k. > This is not an implementation detail. It is, otherwise I would have documented it. The fact that some developers are not using those APIs correctly doesn't change that. Note that PyUnicode_AsUnicode() only returns a pointer to the Py_UNICODE buffer. It makes no guarantees on the 0-termination. Developers need to use PyUnicode_GetSize() to access the size of the Unicode string. But no worries: We're not going to change it. It's too late after 10 years in the wild. Still, developers will have to be aware of the fact that 0-termination is not a guaranteed Unicode feature and should stop making that assumption and it will not necessarily hold or be guaranteed for Unicode subclasses.
msg98677 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2010-02-01 19:28
Le lundi 01 février 2010 à 19:21 +0000, Marc-Andre Lemburg a écrit : > > This is not an implementation detail. > > It is, otherwise I would have documented it. Ok, so the current allocation scheme of unicode objects is an implementation detail as well, right?
msg98686 - (view)	Author: Terry J. Reedy (terry.reedy) *	Date: 2010-02-01 20:30
>It is, otherwise I would have documented it. The fact that some developers are not using those APIs correctly doesn't change that. If, as Antoine claimed, 'it' is a documented feature of str strings, and Py3 says str = Unicode, it is a plausible inference.
msg110599 - (view)	Author: Mark Lawrence (BreamoreBoy) *	Date: 2010-07-17 19:28
@Antoine: do you wish to try and take this forward?
msg116937 - (view)	Author: Mark Lawrence (BreamoreBoy) *	Date: 2010-09-20 13:58
No reply to msg110599, I'll close this in a couple of weeks unless anyone objects.
msg116950 - (view)	Author: Benjamin Peterson (benjamin.peterson) *	Date: 2010-09-20 15:05
2010/9/20 Mark Lawrence <report@bugs.python.org>: > > Mark Lawrence <breamoreboy@yahoo.co.uk> added the comment: > > No reply to msg110599, I'll close this in a couple of weeks unless anyone objects. Please don't. This is still a valid issue.
msg118087 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2010-10-06 23:21
Updated patch against current py3k.
msg134406 - (view)	Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) *	Date: 2011-04-25 19:34
I just found that the extension zope.i18nmessageid: http://pypi.python.org/pypi/zope.i18nmessageid subclasses unicode at the C level: http://svn.zope.org/zope.i18nmessageid/trunk/src/zope/i18nmessageid/_zope_i18nmessageid_message.c?rev=120914&view=markup Notably, the Message structure is defined this way: typedef struct { PyUnicodeObject base; PyObject domain; PyObject default_; PyObject *mapping; } Message; How would such an extension type behave after the patch? Is there a workaround we can propose?
msg144625 - (view)	Author: STINNER Victor (vstinner) *	Date: 2011-09-29 20:03
The PEP 393 is based on the idea proposed in this issue (use only one memory block, not two), but also enhanced it to reduce more the memory using other technics: - use a different character type depending on the maximum character, - use a shorter structure for ASCII only strings The PEP 393 has been accepted and merged into Python 3.3. So I consider this issue as done.

History
Date	User	Action	Args
2022-04-11 14:56:30	admin	set	github: 46235
2011-09-29 20:03:47	vstinner	set	status: open -> closed nosy: + vstinner messages: + msg144625 resolution: fixed
2011-04-25 19:34:56	amaury.forgeotdarc	set	messages: + msg134406
2010-10-06 23:21:53	pitrou	set	files: + unialloc6.patch messages: + msg118087
2010-09-20 15:05:32	benjamin.peterson	set	status: pending -> open messages: + msg116950
2010-09-20 13:58:08	BreamoreBoy	set	status: open -> pending messages: + msg116937
2010-08-21 18:25:42	gvanrossum	set	assignee: gvanrossum ->
2010-07-17 19:28:53	BreamoreBoy	set	nosy: + BreamoreBoy messages: + msg110599
2010-02-01 20:30:07	terry.reedy	set	messages: + msg98686
2010-02-01 19:28:56	pitrou	set	messages: + msg98677
2010-02-01 19:21:41	lemburg	set	messages: + msg98676
2010-02-01 17:54:59	amaury.forgeotdarc	set	messages: + msg98672
2010-02-01 17:27:21	amaury.forgeotdarc	set	messages: + msg98669
2010-02-01 16:12:12	lemburg	set	messages: + msg98668
2010-02-01 13:43:13	amaury.forgeotdarc	set	messages: + msg98662
2010-02-01 11:39:53	pitrou	set	messages: + msg98657
2010-02-01 10:39:17	lemburg	set	messages: + msg98656
2010-02-01 02:00:14	ferringb	set	nosy: + ferringb
2010-01-11 08:08:29	Rhamphoryncus	set	messages: + msg97581
2010-01-10 22:52:25	pitrou	set	messages: + msg97559
2010-01-10 22:36:26	mark.dickinson	set	nosy: + mark.dickinson messages: + msg97555
2010-01-10 22:27:34	amaury.forgeotdarc	set	messages: + msg97554
2010-01-10 21:59:26	lemburg	set	messages: + msg97553
2010-01-10 19:52:13	Rhamphoryncus	set	nosy: + Rhamphoryncus messages: + msg97545
2009-06-08 07:58:45	lemburg	set	messages: + msg89069
2009-06-06 00:16:16	terry.reedy	set	nosy: + terry.reedy messages: + msg88983
2009-06-05 16:17:26	pitrou	set	messages: + msg88953
2009-06-05 15:55:18	lemburg	set	messages: + msg88951
2009-06-05 14:08:51	gvanrossum	set	messages: + msg88946
2009-06-05 11:06:15	lemburg	set	messages: + msg88936
2009-06-05 09:50:22	pitrou	set	nosy: + benjamin.peterson messages: + msg88928
2009-06-04 12:18:12	pitrou	set	messages: + msg88885
2009-06-04 12:13:33	lemburg	set	messages: + msg88883
2009-06-04 11:26:14	pitrou	set	messages: + msg88881
2009-06-04 10:54:42	lemburg	set	messages: + msg88880
2009-06-04 10:43:58	lemburg	set	messages: + msg88879
2009-06-03 21:31:25	gvanrossum	set	messages: + msg88830
2009-06-03 20:41:35	pitrou	set	messages: + msg88824
2009-06-03 17:35:14	collinwinter	set	messages: + msg88819
2009-06-03 17:06:43	gvanrossum	set	messages: + msg88816
2009-06-03 12:00:10	vstinner	set	nosy: - vstinner
2009-06-03 11:32:31	amaury.forgeotdarc	set	messages: + msg88806
2009-06-03 11:16:57	lemburg	set	messages: + msg88804
2009-06-03 10:15:50	pitrou	set	messages: + msg88802
2009-06-03 09:36:19	lemburg	set	messages: + msg88801
2009-06-02 23:51:46	gvanrossum	set	messages: + msg88779
2009-06-02 23:33:08	rhettinger	set	messages: - msg88776
2009-06-02 23:32:53	rhettinger	set	messages: + msg88777
2009-06-02 23:30:30	rhettinger	set	messages: + msg88776
2009-06-02 23:25:43	gvanrossum	set	messages: + msg88775
2009-06-02 23:09:39	rhettinger	set	assignee: gvanrossum messages: + msg88774 nosy: + rhettinger, gvanrossum
2009-06-02 22:25:57	lemburg	set	messages: + msg88768
2009-06-02 22:06:59	lemburg	set	messages: + msg88767
2009-06-02 17:17:41	pitrou	set	messages: + msg88753
2009-06-02 16:38:54	lemburg	set	messages: + msg88751
2009-06-02 14:17:14	eric.smith	set	nosy: + eric.smith
2009-06-02 14:12:49	pitrou	set	messages: + msg88746
2009-06-02 13:53:36	lemburg	set	messages: + msg88745
2009-06-02 13:46:45	lemburg	set	messages: + msg88744
2009-05-30 22:47:57	pitrou	set	messages: + msg88585
2009-05-30 22:41:10	jimjjewett	set	nosy: + jimjjewett messages: + msg88584
2009-05-25 09:32:04	amaury.forgeotdarc	set	messages: + msg88313
2009-05-25 09:24:24	pitrou	set	messages: + msg88312
2009-05-25 09:21:03	lemburg	set	messages: + msg88310
2009-05-25 08:45:44	amaury.forgeotdarc	set	messages: + msg88309
2009-05-25 08:17:55	lemburg	set	messages: + msg88307
2009-05-24 21:51:43	pitrou	set	status: closed -> open resolution: rejected -> (no value) messages: + msg88291
2009-05-24 21:48:38	lemburg	set	status: open -> closed resolution: rejected messages: + msg88290
2009-05-24 21:36:45	pitrou	set	messages: + msg88288
2009-05-24 21:31:00	lemburg	set	messages: + msg88287
2009-05-23 23:09:36	pitrou	set	files: + unialloc5.patch assignee: lemburg -> (no value) messages: + msg88254 stage: test needed -> patch review
2009-05-16 19:53:53	pitrou	set	messages: + msg87918
2009-05-16 19:47:02	collinwinter	set	messages: + msg87917
2009-05-16 19:37:38	ajaksu2	set	versions: + Python 3.2, - Python 3.0 nosy: + ajaksu2, ezio.melotti, vstinner, collinwinter messages: + msg87912 components: + Unicode stage: test needed
2008-03-22 12:56:47	lemburg	set	messages: + msg64320
2008-03-20 22:19:51	pitrou	set	messages: + msg64216
2008-03-20 22:13:04	lemburg	set	messages: + msg64215
2008-03-20 19:38:14	pitrou	set	files: - unialloc3.patch
2008-03-20 19:38:10	pitrou	set	files: - unialloc2.patch
2008-03-20 19:38:06	pitrou	set	files: - freelists.patch
2008-03-20 19:38:00	pitrou	set	files: - unialloc.patch
2008-03-20 19:37:55	pitrou	set	files: + freelists2.patch
2008-03-20 19:37:36	pitrou	set	files: + unialloc4.patch
2008-03-20 19:36:49	pitrou	set	messages: + msg64197
2008-03-20 19:10:33	lemburg	set	messages: + msg64194
2008-03-20 14:50:41	pitrou	set	messages: + msg64170
2008-03-20 14:34:33	lemburg	set	messages: + msg64169
2008-03-20 13:57:13	pitrou	set	messages: + msg64168
2008-03-20 12:53:13	lemburg	set	messages: + msg64167
2008-03-20 11:32:38	pitrou	set	messages: + msg64166
2008-03-20 11:00:14	lemburg	set	messages: + msg64165
2008-03-20 10:31:00	amaury.forgeotdarc	set	nosy: + amaury.forgeotdarc messages: + msg64163
2008-03-20 10:04:00	lemburg	set	messages: + msg64161
2008-03-19 22:00:58	pitrou	set	messages: + msg64112
2008-03-19 21:40:56	jafo	set	priority: normal assignee: lemburg messages: + msg64108 keywords: + patch nosy: + jafo
2008-02-16 20:32:32	pitrou	set	files: + unialloc3.patch messages: + msg62467
2008-02-12 19:58:45	pitrou	set	files: + unialloc2.patch messages: + msg62332
2008-01-30 15:56:24	pitrou	set	files: + freelists.patch messages: + msg61867
2008-01-30 11:45:14	lemburg	set	messages: + msg61862
2008-01-29 21:54:43	pitrou	set	messages: + msg61832
2008-01-27 19:03:09	pitrou	set	messages: + msg61747
2008-01-27 18:05:39	lemburg	set	messages: + msg61743
2008-01-27 16:46:46	pitrou	set	messages: + msg61741
2008-01-27 16:03:54	lemburg	set	messages: + msg61740
2008-01-27 15:35:36	pitrou	set	messages: + msg61738
2008-01-27 13:52:04	lemburg	set	messages: + msg61735
2008-01-27 00:24:00	pitrou	set	messages: + msg61732
2008-01-26 23:27:34	lemburg	set	nosy: + lemburg messages: + msg61730
2008-01-26 23:09:24	orivej	set	nosy: + orivej
2008-01-26 22:45:09	pitrou	set	files: + stringbench3k.py
2008-01-26 22:44:34	pitrou	create