This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author vstinner
Recipients BreamoreBoy, ezio.melotti, kushal.das, serhiy.storchaka, thomaslee, vstinner
Date 2012-10-10.20:38:39
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1349901520.73.0.991489146519.issue16061@psf.upfronthosting.co.za>
In-reply-to
Content
> The code is now using the heavily optimized findchar() function.

I compared performances of the two methods: dummy loop vs find. Results with a string of 100,000 characters:

 * Replace 100% (rewrite all characters): find is 12.5x slower than a loop
 * Replace 50%: find is 3.3x slower
 * Replace only 2 characters (0.001%): find is 10.4x faster

In practice, I bet that the most common case is to replace only a few characters. Replace all characters is a rare usecase.

Use attached "unicode.patch" on Python 3.4 with the following commands to reproduce my benchmark:

python -m timeit -s "a='a'; b='b'; text=a*100000" "text.replace(a, b)"
python -m timeit -s "a='a'; b='b'; text=(a+' ')*(100000//2)" "text.replace(a, b)"
python -m timeit -s "a='a'; b='b'; text=a+' '*100000+a" "text.replace(a, b)"

--

An option is to use the find method, and then switch to the dummy loop method if there are too much characters to replace. I don't know if it's necessary to develop such complex algorithm. It would be better to have a benchmark extracted from a real world application like a template engine.
History
Date User Action Args
2012-10-10 20:38:40vstinnersetrecipients: + vstinner, thomaslee, ezio.melotti, BreamoreBoy, serhiy.storchaka, kushal.das
2012-10-10 20:38:40vstinnersetmessageid: <1349901520.73.0.991489146519.issue16061@psf.upfronthosting.co.za>
2012-10-10 20:38:40vstinnerlinkissue16061 messages
2012-10-10 20:38:40vstinnercreate