Message 79416 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	pitrou
Recipients	amaury.forgeotdarc, lemburg, loewis, pitrou
Date	2009-01-08.15:22:31
SpamBayes Score	0.010872197
Marked as misclassified	No
Message-id	<1231428165.11860.28.camel@localhost>
In-reply-to	<1231420282.49.0.531733594931.issue4868@psf.upfronthosting.co.za>

Content
> Attached patch > (utf8decode4.patch) changes this and may enter the fast loop on the > first character. Thanks! > Does this idea apply to the encode function as well? Probably, although with less efficiency (a long can hold 1, 2 or 4 unicode characters depending on the build). The unrolling part also applies to simple codecs such as latin1. Unrolling PyUnicode_DecodeLatin1 a bit (4 copies per iteration) makes it twice faster on non-tiny strings. I'll experiment with utf16.

> Attached patch
> (utf8decode4.patch) changes this and may enter the fast loop on the
> first character.

Thanks!

> Does this idea apply to the encode function as well?

Probably, although with less efficiency (a long can hold 1, 2 or 4
unicode characters depending on the build).
The unrolling part also applies to simple codecs such as latin1.
Unrolling PyUnicode_DecodeLatin1 a bit (4 copies per iteration) makes it
twice faster on non-tiny strings. I'll experiment with utf16.

History
Date	User	Action	Args
2009-01-08 15:22:32	pitrou	set	recipients: + pitrou, lemburg, loewis, amaury.forgeotdarc
2009-01-08 15:22:31	pitrou	link	issue4868 messages
2009-01-08 15:22:31	pitrou	create