Message172602
> The code is now using the heavily optimized findchar() function.
I compared performances of the two methods: dummy loop vs find. Results with a string of 100,000 characters:
* Replace 100% (rewrite all characters): find is 12.5x slower than a loop
* Replace 50%: find is 3.3x slower
* Replace only 2 characters (0.001%): find is 10.4x faster
In practice, I bet that the most common case is to replace only a few characters. Replace all characters is a rare usecase.
Use attached "unicode.patch" on Python 3.4 with the following commands to reproduce my benchmark:
python -m timeit -s "a='a'; b='b'; text=a*100000" "text.replace(a, b)"
python -m timeit -s "a='a'; b='b'; text=(a+' ')*(100000//2)" "text.replace(a, b)"
python -m timeit -s "a='a'; b='b'; text=a+' '*100000+a" "text.replace(a, b)"
--
An option is to use the find method, and then switch to the dummy loop method if there are too much characters to replace. I don't know if it's necessary to develop such complex algorithm. It would be better to have a benchmark extracted from a real world application like a template engine. |
|
Date |
User |
Action |
Args |
2012-10-10 20:38:40 | vstinner | set | recipients:
+ vstinner, thomaslee, ezio.melotti, BreamoreBoy, serhiy.storchaka, kushal.das |
2012-10-10 20:38:40 | vstinner | set | messageid: <1349901520.73.0.991489146519.issue16061@psf.upfronthosting.co.za> |
2012-10-10 20:38:40 | vstinner | link | issue16061 messages |
2012-10-10 20:38:40 | vstinner | create | |
|