Message 129306 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	ezio.melotti
Recipients	belopolsky, eric.araujo, ezio.melotti, jcea, lemburg, sdaoden, vstinner
Date	2011-02-24.21:06:38
SpamBayes Score	8.439571e-06
Marked as misclassified	No
Message-id	<1298581599.76.0.248163906504.issue11303@psf.upfronthosting.co.za>
In-reply-to

Content
The attached patch is a proof of concept to see if Steffen proposal might be viable. I wrote another normalize_encoding function that implements the algorithm described in msg129259, adjusted the shortcuts and did some timings. (Note: the function is not tested extensively and might break. It might also be optimized further.) These are the results: # $ command # result with my patch # result without wolf@hp:~/dev/py/py3k$ ./python -m timeit "b'x'.decode('latin1')" 1000000 loops, best of 3: 0.626 usec per loop 100000 loops, best of 3: 2.03 usec per loop wolf@hp:~/dev/py/py3k$ ./python -m timeit "b'x'.decode('latin-1')" 1000000 loops, best of 3: 0.614 usec per loop 1000000 loops, best of 3: 0.616 usec per loop wolf@hp:~/dev/py/py3k$ ./python -m timeit "b'x'.decode('iso-8859-1')" 1000000 loops, best of 3: 0.993 usec per loop 1000000 loops, best of 3: 0.649 usec per loop wolf@hp:~/dev/py/py3k$ ./python -m timeit "b'x'.decode('iso8859_1')" 1000000 loops, best of 3: 1.01 usec per loop 100000 loops, best of 3: 2.08 usec per loop wolf@hp:~/dev/py/py3k$ ./python -m timeit "b'x'.decode('iso_8859_1')" 1000000 loops, best of 3: 0.734 usec per loop 1000000 loops, best of 3: 0.694 usec per loop wolf@hp:~/dev/py/py3k$ ./python -m timeit "b'x'.decode('utf8')" 1000000 loops, best of 3: 0.728 usec per loop 100000 loops, best of 3: 6.37 usec per loop

The attached patch is a proof of concept to see if Steffen proposal might be viable.

I wrote another normalize_encoding function that implements the algorithm described in msg129259, adjusted the shortcuts and did some timings. (Note: the function is not tested extensively and might break. It might also be optimized further.)

These are the results:
# $ command
# result with my patch
# result without
wolf@hp:~/dev/py/py3k$ ./python -m timeit "b'x'.decode('latin1')"
1000000 loops, best of 3: 0.626 usec per loop
100000 loops, best of 3: 2.03 usec per loop
wolf@hp:~/dev/py/py3k$ ./python -m timeit "b'x'.decode('latin-1')"
1000000 loops, best of 3: 0.614 usec per loop
1000000 loops, best of 3: 0.616 usec per loop
wolf@hp:~/dev/py/py3k$ ./python -m timeit "b'x'.decode('iso-8859-1')"
1000000 loops, best of 3: 0.993 usec per loop
1000000 loops, best of 3: 0.649 usec per loop
wolf@hp:~/dev/py/py3k$ ./python -m timeit "b'x'.decode('iso8859_1')"
1000000 loops, best of 3: 1.01 usec per loop
100000 loops, best of 3: 2.08 usec per loop
wolf@hp:~/dev/py/py3k$ ./python -m timeit "b'x'.decode('iso_8859_1')"
1000000 loops, best of 3: 0.734 usec per loop
1000000 loops, best of 3: 0.694 usec per loop
wolf@hp:~/dev/py/py3k$ ./python -m timeit "b'x'.decode('utf8')"
1000000 loops, best of 3: 0.728 usec per loop
100000 loops, best of 3: 6.37 usec per loop

History
Date	User	Action	Args
2011-02-24 21:06:39	ezio.melotti	set	recipients: + ezio.melotti, lemburg, jcea, belopolsky, vstinner, eric.araujo, sdaoden
2011-02-24 21:06:39	ezio.melotti	set	messageid: <1298581599.76.0.248163906504.issue11303@psf.upfronthosting.co.za>
2011-02-24 21:06:39	ezio.melotti	link	issue11303 messages
2011-02-24 21:06:39	ezio.melotti	create