Issue502415
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2002-01-11 18:07 by zooko, last changed 2022-04-10 16:04 by admin. This issue is now closed.
Files | ||||
---|---|---|---|---|
File name | Uploaded | Description | Edit | |
python.patch | zooko, 2002-01-11 18:07 |
Messages (8) | |||
---|---|---|---|
msg38690 - (view) | Author: Zooko O'Whielacronx (zooko) | Date: 2002-01-11 18:07 | |
This patch optimizes the string comparisons in class_getattr(), class_setattr(), instance_getattr1(), and instance_setattr(). I pulled out the relevant section of class_setattr() and measured its performance, yielding the following results: * in the case that the argument does *not* begin with "__", then the new version is 1.03 times as fast as the old. (This is a mystery to me, as the path through the code looks the same, in C. I examined the assembly that GCC v3.0.3 generated in -O3 mode, and it is true that the assembly for the new version is smaller/faster, although I don't really understand why.) * in the case that the argument is a string of random length between 1 and 19 inclusive, and it begins with "__" and ends with "X_" (where X is a random alphabetic character), then the new version 1.12 times as fast as the old. * in the case that the argument is a string of random length between 1 and 19 inclusive, and it begins with "__" and does *not* end with "_", then the new version 1.16 times as fast as the old. * in the case that the argument is (randomly) one of the six special names, then the new version is 2.7 times as fast as the old. * in the case that the argument is a string of random length between 1 and 19 inclusive, and it begins with "__" and ends with "__" (but is not one of the six special names), then the new version is 3.7 times as fast as the old. |
|||
msg38691 - (view) | Author: Jeremy Hylton (jhylton) ![]() |
Date: 2002-01-17 18:29 | |
Logged In: YES user_id=31392 This seems to add a lot of complexity for a few special cases. How important are these particular attributes? Do you have any benchmark applications that show real improvement? It seems like microbenchmarks overstate the benefit, since we don't know how often these attributes are looked up by most applications. It would also be interesting to see how much of the benefit for non __ names is the result of the PyString_AS_STRING() macro. Maybe that's all the change we really need :-). |
|||
msg38692 - (view) | Author: Zooko O'Whielacronx (zooko) | Date: 2002-01-17 20:33 | |
Logged In: YES user_id=52562 Yeah, the optimized version is less readable that the original. I'll try to come up with a benchmark application. Any ideas? Maybe some unit tests from Zope that use attribute lookups heavily? My guess is that the actual results in an application will be "marginal", like maybe between 0.5% to 3% improvement. |
|||
msg38693 - (view) | Author: Zooko O'Whielacronx (zooko) | Date: 2002-01-18 00:22 | |
Logged In: YES user_id=52562 Okay I've done some "mini benchmarks". The earlier reported micro-benchmarks were the result of running the inner loop itself, in C. These mini benchmarks are the result of running this Python script: class A: def __init__(self): self.a = 0 a = A() for i in xrange(2**20): a.a = i print a.a and then using different attribute names in place of `a'. The results are as expected: the optimized version is faster than the current one, depending on the shape of the attribute name, and dampened by the fact that there is now other work being done. The case that shows the smallest difference is when the attribute name neither begins nor ends with an '_'. In that case the above script runs about 2% faster with the optimizations. The case that shows the biggest difference is when the attribute begins and ends with '__', as in `__a__'. Then the above script runs about 15% faster. This still isn't a *real* application benchmark. I'm looking for one that is a reasonable case for real Python users but that also uses attribute lookups heavily. |
|||
msg38694 - (view) | Author: Zooko O'Whielacronx (zooko) | Date: 2002-03-14 16:24 | |
Logged In: YES user_id=52562 update: I did a real app benchmark of this patch by running one of the unit tests from PyXML-0.6.6. (Which one? The one that I guessed would favor my optimization the most. Unfortunately I've lost my notes and I don't remember which one.) I also separated out the "unroll strcmp" optimization from the "use macros" optimization on request. I have lost my notes, but I recall that my results showed what I expected: between 0.5 and 3 percent app-level speed-up for the unroll strcmp optimization. Interesting detail: a quirk in GCC 3 makes the unroll strcmp version is slightly faster than the current strcmp version *even* in the (common) case that the first two characters of the attribute name are *not* '__'. What should happen next: 1. Someone who has the authority to approve or reject this patch should tell me what kind of benchmark would be persuasive to you. I mean: what specific program I can run with and without my patch for a useful comparison. (If you require more than a 5% app-level speed-up, then let's give up on this patch now!) 2. Someone volunteer to test this patch with MSFT compiler, as I don't have one right now. Some people are still using the Windows platform, I've noticed [1], so it is worth benchmarking. Actually, someone should volunteer to benchmark GCC+Linux-or-MacOSX, too, as my computer is a laptop with variable-speed CPU and is really crummy for benchmarking. By the way, PEP 266 is a better solution to the problem but until it's implemented, this patch is the better patch. ;-) Note: this is one of those patches that looks uglier in "diff -u" format than in actual source code. Please browse the actual source side-by-side [2] to see how ugly it really is. Regards Zooko [1] http://www.google.com/press/zeitgeist/jan02-pie.gif [2] search for "class_getattr" in: http://zooko.com/classobject.c http://zooko.com/classobject-strcmpunroll.c --- zooko.com Security and Distributed Systems Engineering --- |
|||
msg38695 - (view) | Author: Neil Schemenauer (nascheme) * ![]() |
Date: 2002-03-24 01:57 | |
Logged In: YES user_id=35752 Based on the complexity added by the patch I would say at least a 5% speedup would be needed to offset the maintainence cost. -1 on the current patch. |
|||
msg38696 - (view) | Author: Zooko O'Whielacronx (zooko) | Date: 2002-03-24 15:12 | |
Logged In: YES user_id=52562 Okay, I just want to double-check these two points: 1. You did look at the actual resulting source code and not just the patch, right? Here's a side-by-side: http://zooko.com/temp.html 2. You realize that my promise that the actual speedup is < 5% is in a realistic application-level benchmark. For microbenchmarks, the speed-up is various but generally much higher than 5%, as described in this patch tracker entry. Given these two facts, then please reject this patch and spend your time on the new cached attribute lookups architecture instead. ;-) Regards, Zooko |
|||
msg38697 - (view) | Author: Neil Schemenauer (nascheme) * ![]() |
Date: 2002-03-24 18:25 | |
Logged In: YES user_id=35752 I've played with your patch for about 2 hours today. I benchmarked it, tried to clean it up using macros or inlined functions. I also tried a varation that exploited the fact that most names were interned strings. It's not worth it. Spend time on rattlesnake, pysco, or the namespace optimizations. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-10 16:04:52 | admin | set | github: 35907 |
2002-01-11 18:07:41 | zooko | create |