This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author vstinner
Recipients pitrou, serhiy.storchaka, vstinner
Date 2012-03-27.12:03:34
SpamBayes Score 1.5939268e-06
Marked as misclassified No
Message-id <1332849815.13.0.651392018601.issue14419@psf.upfronthosting.co.za>
In-reply-to
Content
New tests. I'm not conviced by the patch: it slows down the decoder for "short" strings. I don't understand which kind of ASCII encoded strings (specific length or content?) are optimized by the patch.

Unpatched:

$ ./python -m timeit -n 50000 -r 100 -s 'data=open("README", "r").read().encode("ascii")' 'data.decode("ASCII")'
50000 loops, best of 100: 1.41 usec per loop

$ ./python -m timeit -n 1000 -s 'import codecs; d = codecs.getdecoder("ascii"); x = bytes(range(128))*10' 'd(x)'
1000 loops, best of 3: 0.564 usec per loop

$ ./python -m timeit -n 1000 -s 'import codecs; d = codecs.getdecoder("ascii"); x = bytes(range(128))*1000' 'd(x)'
1000 loops, best of 3: 24.4 usec per loop

$ ./python -m timeit -n 10 -s 'import codecs; d = codecs.getdecoder("ascii"); x = bytes(range(128))*100000' 'd(x)'
10 loops, best of 3: 10.9 msec per loop

$ ./python -m timeit -n 1000 -s 'enc = "ascii"; import codecs; d = codecs.getdecoder(enc); x = ("\u0020" * 1000000).encode(enc)' 'd(x)'
1000 loops, best of 3: 722 usec per loop

Patched:

$ ./python -m timeit -n 50000 -r 100 -s 'data=open("README", "r").read().encode("ascii")' 'data.decode("ASCII")'
50000 loops, best of 100: 1.74 usec per loop

$ ./python -m timeit -n 1000 -s 'import codecs; d = codecs.getdecoder("ascii"); x = bytes(range(128))*10' 'd(x)'
1000 loops, best of 3: 0.597 usec per loop

$ ./python -m timeit -n 1000 -s 'import codecs; d = codecs.getdecoder("ascii"); x = bytes(range(128))*1000' 'd(x)'
1000 loops, best of 3: 27.3 usec per loop

$ ./python -m timeit -n 10 -s 'import codecs; d = codecs.getdecoder("ascii"); x = bytes(range(128))*100000' 'd(x)'
10 loops, best of 3: 8.32 msec per loop

$ ./python -m timeit -n 1000 -s 'enc = "ascii"; import codecs; d = codecs.getdecoder(enc); x = ("\u0020" * 1000000).encode(enc)' 'd(x)'
1000 loops, best of 3: 479 usec per loop
History
Date User Action Args
2012-03-27 12:03:35vstinnersetrecipients: + vstinner, pitrou, serhiy.storchaka
2012-03-27 12:03:35vstinnersetmessageid: <1332849815.13.0.651392018601.issue14419@psf.upfronthosting.co.za>
2012-03-27 12:03:34vstinnerlinkissue14419 messages
2012-03-27 12:03:34vstinnercreate